I don't publish ratings because I think they are better than others that are readily available - that isn't in general likely, since there are some pretty good ones out there. I publish the ones I do because their algoritms are publicly known and they fit into the two main categories of advanced ratings:
Regardless of how "good" or "bad" any rating is (and there's no definition of goodness or badness that doesn't involve arbitrary, mostly subjective criteria) I can use correlations to the known ISR and ISOV ratings to determine whether any specific rating is more nearly "retrodictive" (correlates more closely with the ISR, such as MB) or "predictive" (correlates more closely with the ISOV, such as POM.)
There's a category of retrodictive rankings that is not "advanced" - and it includes the one used by NCAA committees to select and seed tournaments. It was that the Ratings Percentage Index was a poor characterization of the baseball field that led Boyd to suggest the ISR, and me to look for a better "formula" -type retrodictive rating. I came up with a formula that avoids some of the RPI's problems, and for lack of imagination named it Percentage Index 2.
The vertical gridlines are at 50-team increments beginning with the lowest-rated team in each case, so the rightmost two vertical segments represent the top 93 teams by each rating. We can tell from the graphs that in general the PI2 gives a higher rating to good teams than the RPI and a lower rating to not-so-good teams.
The PI2 and RPI are both functions of the form
|
|
It's that "+" in the RPI definition that results in the either/or WP/SOS resulting in a good RPI, and the "×" in the PI2 that requires both WP and SOS to be high to get a high rating.
The RPI definition of Winning Percentage is somewhat problematic. Beginning with the 2004-05 season, game location has a large weight:
|
Aside: The NCAA used several decades more data than I have, but their analysis did not account for the fact that most of the "pre-(conference)season" games have teams that would have lost on any court as visitors at teams that would have won on any court. This analysis is difficult without using predictive ratings, which for very sound reasons the NCAA doesn't. It's also problematic for two other reasons:
- the extreme weights were intended to encourage better scheduling by the top teams - get them to play more road games against the teams likely to lose at any venue, to generate revenue and interest that help those teams get better. But because conference games are treated the same as games scheduled at the discretion of the team, none of the "big boys" need that incentive to get a high RPI. The formula would be better (though still not good) if conference games were treated as neutral-site games.
- it turns out that the original RPI definition is measurably better at characterizing teams' results than the one adopted in 2004, and the PI2 is better than the original RPI according to the same measures.
|
|
|
The basic idea is to weight OWP by how many OOs a team's O's achieved their OWP against, and OOWP by how many O's contributed wins over OOs.
There are some major differences compared to the RPI's definitions.
|
|
It is item 2 that distinguishes the PI2. For every (team) every other team is either an opponent, an opponent's opponent, or "other." No O is considered an OO, and OOWPPI2 measures only how (team)'s OO's have done against "others."