Why Bother?

© Copyright 2008, Paul Kislanko

I don't publish ratings because I think they are better than others that are readily available - that isn't in general likely, since there are some pretty good ones out there. I publish the ones I do because their algoritms are publicly known and they fit into the two main categories of advanced ratings:

Regardless of how "good" or "bad" any rating is (and there's no definition of goodness or badness that doesn't involve arbitrary, mostly subjective criteria) I can use correlations to the known ISR and ISOV ratings to determine whether any specific rating is more nearly "retrodictive" (correlates more closely with the ISR, such as MB) or "predictive" (correlates more closely with the ISOV, such as POM.)

There's a category of retrodictive rankings that is not "advanced" - and it includes the one used by NCAA committees to select and seed tournaments. It was that the Ratings Percentage Index was a poor characterization of the baseball field that led Boyd to suggest the ISR, and me to look for a better "formula" -type retrodictive rating. I came up with a formula that avoids some of the RPI's problems, and for lack of imagination named it Percentage Index 2.


Before I define the PI2 (or RPI, even) we can see a qualitative difference in the graph of ratings by team, first in ascending order by PI2 values:
RPI & PI2 by team
and then by ascending RPI values:
PI2 & RPI by team

The vertical gridlines are at 50-team increments beginning with the lowest-rated team in each case, so the rightmost two vertical segments represent the top 93 teams by each rating. We can tell from the graphs that in general the PI2 gives a higher rating to good teams than the RPI and a lower rating to not-so-good teams.

The PI2 and RPI are both functions of the form
ρ(team) = ƒ(WPρ(team),SOSρ(team))
where WPρ (yes, even the definition of winning percentage is rating-dependent) and SOSρ are defined by the rating in a specific way. Again, without knowing what the definitions are, we can draw qualitative conclusions by plotting ƒρ(team) with WPρ(team) and SOSρ(team) on the same graph:

RPI wpxsos

WPxSOS for PI2
Note that the PI2 is "large" only when both the WP and SOS are, while the RPI is "large" if either the WP or the SOS is large. In the PI2 a winless team always has a zero value, while in the RPI it has the nonzero SOS value. In the PI2 it is the undefeated teams whose values are determined by their SOS.

RPI = (WPRPI + 3×SOSRPI) ÷ 4
PI2 = √( WP × SOSPI2)

It's that "+" in the RPI definition that results in the either/or WP/SOS resulting in a good RPI, and the "×" in the PI2 that requires both WP and SOS to be high to get a high rating.

The RPI definition of Winning Percentage is somewhat problematic. Beginning with the 2004-05 season, game location has a large weight:
WPRPI = (0.6×#Home_wins+#Neutral_wins+1.4×#Road_wins)

(0.6×(#Home_wins+#Road_losses)+#Neutral_wins+#Neutral_losses+1.4×(#Road_wins + #Home_losses))
I say this definition is "problematic" because all of the data I've collected suggest the factors should be more like 0.84 and 1.16. In any case, PI2 uses the standard definition of winning percentage: #Wins ÷ (#Wins + #Losses), and does not include a home field advantage.

Aside: The NCAA used several decades more data than I have, but their analysis did not account for the fact that most of the "pre-(conference)season" games have teams that would have lost on any court as visitors at teams that would have won on any court. This analysis is difficult without using predictive ratings, which for very sound reasons the NCAA doesn't.

It's also problematic for two other reasons:

  • the extreme weights were intended to encourage better scheduling by the top teams - get them to play more road games against the teams likely to lose at any venue, to generate revenue and interest that help those teams get better. But because conference games are treated the same as games scheduled at the discretion of the team, none of the "big boys" need that incentive to get a high RPI. The formula would be better (though still not good) if conference games were treated as neutral-site games.
  • it turns out that the original RPI definition is measurably better at characterizing teams' results than the one adopted in 2004, and the PI2 is better than the original RPI according to the same measures.


SOS? What's that?

Formula-based systems typically define SOS as a function of Opponents' Winning Percentage (OWP) and Opponents' Opponent's WP (OOWP). Even for "simple" formula-based systems it turns out that these can have different definitions.

SOSρ = ƒ(OWPρ,OOWPρ)

RPI SOS components
Like the RPI itself, the RPI's SOS is a linear combination:
SOSRPI = (2×OWPRPI + OOWPRPI) ÷ 3
OWPRPI is the average of opponents' WP (the traditional kind) not including games against the team for which OWP is being calculated. So OWPRPI(team) is an average of averages. OOWPRPI(team) is just the average of (team)'s opponents' OWP values. This is not only an average of averages of averages, it also includes (team)'s WP in OOWP for all teams that have (team) as a common opponent. It also includes games between (team)'s opponents.

PI2 SOS components
The PI2's SOS is nearly the same as its OWP. Like the RPI's the PI2's SOS is a linear combination of OWP and OOWP, but unlike the RPI's, the PI2's coefficients are different for every team depending upon how the team's schedule fits into the games graph.
SOSPI2 = ( #OO×OWPPI2 + #O×OOWPPI2 )

(#O + #OO)

The basic idea is to weight OWP by how many OOs a team's O's achieved their OWP against, and OOWP by how many O's contributed wins over OOs.

There are some major differences compared to the RPI's definitions.

  1. OWP is calculated like batting average - instead of averaging the averages we use:
    OWPPI2 =∑ #Owins

    ( ∑ #Owins + ∑ #Olosses )
    where #Owins and #Olosses do not include games against the team for which we're calculating PI2.
  2. OOWP is calculated the same way as WP and OWP with the important difference that games against (team) or any of (team)'s opponents are not included.
    OOWPPI2 =∑ #OOwins

    ( ∑ #OOwins + ∑ #OOlosses )

It is item 2 that distinguishes the PI2. For every (team) every other team is either an opponent, an opponent's opponent, or "other." No O is considered an OO, and OOWPPI2 measures only how (team)'s OO's have done against "others."