|
Post by The Bofa on the Sofa on Dec 7, 2010 16:51:17 GMT -5
3) It's not surprising that it's hard to distinguish the superiority of one system over another. Well, maybe it is at first glance but when you think of it, you can see why various systems seem roughly comparable in how well they predict the outcome of matches. Heck, I even wonder sometime how far off it would be if one solely relied on won-loss records (maybe I vaguely remember someone coming up with some numbers on that) I've done things like that. It's not great. Better than using straight home/road though. But keep in mind, the objective doesn't have to be prediction. I mean, if that is your goal, then you wouldn't want to use ELO, even if it were better than RPI. You should use full Pablo. Then again, I brought up predictive capability for a specific reason - because I know that is a place where ELO does not perform well. We could,however, look at postdiction. There I suspect ELO does very well, and, in fact, will probably outperform even Pablo. Especially if you take out the time component. So this is where I said, it depends on your objective. If you want a method that best reflects results in terms of wins and losses, we can come up with that. If you want predictability, we can do that, too. If you want predictability with a few caveats, we can do that, too. The problem is that all of these are justifiable as objectives. If the NCAA said, "When selecting the teams, we want to pick the teams that have done the best over the course of the season by considering who they beat and who they lost to" I don't think anyone would complain. Then again, no one would object if they said, "We will pick the teams we think are best, those that we think have the best chances to win." Or they could say, "We want the best teams, the teams that have the best chances to win, determined by looking at their full season performance and only looking at wins and losses." These are different objectives, and therefore would involve different standards. And depending on which you pick, RPI could be the best tool. A few years back, I played around with creating a ranking with the sole goal of maximizing the correct percentage of wins and losses in matches that had been played. I got it, but the results would not make many people happy. It had a very annoying feature in that it tended to rank teams ahead of teams they beat. It sounds good, but it doesn't work. Let me give you an example: Take a team who has played the following ranked opponents with the corresponding outcomes 200 L 150 W 100 L 75 W 25 W Now, where would you rank this team? Well, let's see...they have a great win over #25, but a terrible loss to #200. Given their results, we might put them at 90. You'd have to be generous to put them better than 75, so I wouldn't. Call them 90. However, if we do that, that means only 1 of the 5 matches is reflected properly. If our goal is to maximize winning percentage, though, we would do that by ranking them something like 20. So if we had a team with a result like this, and ranked them so to maximize the best winning percentage, they would be ranked about 20. That's crazy of course. Consequently, it teaches us that we have to be very clear about defining our criteria. An ELO approach works much better for this, but it does not give the best results for "postdiction" that is possible.
|
|
|
Post by The Bofa on the Sofa on Dec 7, 2010 16:58:54 GMT -5
Makes no sense. After the preliminary games, everyone plays in their conferences. Teams in weak conferences will earn higher RPI's than those in the PAC-10, Big-12, and Big-10.
Which is really funny, because the top RPI conference was the Big Ten. The main thing that prevented the top Big Ten team from being top in RPI was Penn St's losses to Purdue and Indiana. That's what Purdue did, for example. They had the worst W/L record of any seeded team. Then again, they had among the toughest schedules (by any measure) of any team in the country. They were "rewarded" for it, in that they were allowed to be seeded despite having a less successful record. That's the whole goal of RPI, and if there were any way to "view" RPI, that is what I would say it is: it's W/L percentage put into context of the quality of the opponents.
|
|
|
Post by karplets on Dec 8, 2010 10:55:58 GMT -5
A few years back, I played around with creating a ranking with the sole goal of maximizing the correct percentage of wins and losses in matches that had been played. I got it, but the results would not make many people happy. ... Take a team who has played the following ranked opponents with the corresponding outcomes 200 L 150 W 100 L 75 W 25 W ... If our goal is to maximize winning percentage, though, we would do that by ranking them something like 20. So if we had a team with a result like this, and ranked them so to maximize the best winning percentage, they would be ranked about 20. That's crazy of course. This could get too technical or obscure for this thread, but by maximizing the winning pct for one team like this (by "over-rating" them), wouldn't it start throwing off the winning pct of other teams and the accuracy of the ratings as a whole? I mean, this team's now-inflated rating would be used in determining the ratings of its opponents, tending to inflate their ratings to where they would be expected to win matches they weren't expected to before. I can see how doing this for only one team may not result in a measurable degradation of the ratings as a whole, but applied as an algorithm to all other teams I'd think this would mess things up to the extent you could measure it. Then again maybe not. Then again it was just a hypothetical example to show how trying to define objectives a priori could lead to bizarre outcomes. Which is maybe why the NCAA doesn't try to define the objectives a priori that the RPI is supposed to achieve. Which is why in general, perhaps, defining objectives a priori isn't the best way to go about making decisions in the real world. (Speaking of obscure -- this does bring to mind a couple things an existensialist might say: Action precedes essence. )
(Crud, I can't remember the other one. But it has something to do with choice having a primary role in philosophy, I guess as opposed to being simply the conclusion you reach at the end of a philsophical argument) Keep it simple. Choose one: Pablo or Elo. And then lay out its virtues, what combination of objectives it appears to meet. Because they both have virtues that RPI does not have. It's possible RPI has virtues that neither Pablo or Elo have -- but damned if we can figure out what they are.
|
|
|
Post by karplets on Dec 8, 2010 11:09:50 GMT -5
3) It's not surprising that it's hard to distinguish the superiority of one system over another. Well, maybe it is at first glance but when you think of it, you can see why various systems seem roughly comparable in how well they predict the outcome of matches... Why is it hard to demonstrate the superiority of one rating system over another with numbers that hit you over the head? Think of it - how many matches are there between teams with a large gap in talent, skill, etc? A lot. In other words there's a lot of games between teams with a big difference in ratings no matter what rating system you use: RPI, Pablo, Massey, whatever. So any system is going to get the vast majority of those games right, except for the occasional true upset. Then there are true upsets. That's sports and no rating system can be expected to predict all or even most of them. (It's debatable whether that's even the point of a rating system although of course it would look genius if a rating could actually do that) So the difference in "success" between various rating systems is bound to be pretty small. Maybe the difference will be larger if you narrow the field in a few ways. For instance in games between more closely rated teams. Or games between conferences and regions. (In fact, someone at BigSoccer has compared some systems that way. The difference is still small but he does find RPI to be inferior when it comes to inter-regional games) But anyone expecting one system to hit them over the head with its superiority is going to be disappointed.
|
|
|
Post by pogoball on Dec 8, 2010 13:57:41 GMT -5
3) It's not surprising that it's hard to distinguish the superiority of one system over another. Well, maybe it is at first glance but when you think of it, you can see why various systems seem roughly comparable in how well they predict the outcome of matches... Why is it hard to demonstrate the superiority of one rating system over another with numbers that hit you over the head? Think of it - how many matches are there between teams with a large gap in talent, skill, etc? A lot. In other words there's a lot of games between teams with a big difference in ratings no matter what rating system you use: RPI, Pablo, Massey, whatever. So any system is going to get the vast majority of those games right, except for the occasional true upset. Then there are true upsets. That's sports and no rating system can be expected to predict all or even most of them. (It's debatable whether that's even the point of a rating system although of course it would look genius if a rating could actually do that) So the difference in "success" between various rating systems is bound to be pretty small. Maybe the difference will be larger if you narrow the field in a few ways. For instance in games between more closely rated teams. Or games between conferences and regions. (In fact, someone at BigSoccer has compared some systems that way. The difference is still small but he does find RPI to be inferior when it comes to inter-regional games) But anyone expecting one system to hit them over the head with its superiority is going to be disappointed. Great conversation! The NCAA uses RPI across the board as a rating system. (How) do they modify it for each sport? What group or person in the NCAA would be the one to appeal to for a change? Is there anyone that could answer the basic question that p-dub has been asking for years: "What is the purpose of RPI -- what is it supposed to measure?"
|
|
|
Post by karplets on Dec 9, 2010 1:01:32 GMT -5
Great conversation! The NCAA uses RPI across the board as a rating system. (How) do they modify it for each sport? What group or person in the NCAA would be the one to appeal to for a change? Is there anyone that could answer the basic question that p-dub has been asking for years: "What is the purpose of RPI -- what is it supposed to measure?" Well, the RPI, Pablo, and ratings-talk kind of took over this thread for now. Sorry about that to some of the others but p-dub is saying some very valuable things so I thought it was good to carry on for a bit. What is RPI supposed to measure? Well, again, p-dub has shown how it can be very tricky, at the very least, to define the proper objective for a rating system. The more cynical answer would be that the NCAA doesn't want to be pinned down to an answer because then another method might be shown to be superior to RPI according to those objectives. But I just think it's very tricky, maybe trickier than actually designing the rating system itself. I mean, for example, is it possible that one method could have slightly better overall correlation with the actual game results but have much higher variance for a handful of teams? (so the objective would have to be stated in terms of deviance and not just overall correlation) I'm not a math guy so I don't know but that wasn't just an abstract musing on my part. I wonder if RPI achieves about the same overall correlation as, say, a chess-rating-like system (an Elo system) yet is prone to having a handful of teams drastically off. Or relatively drastically (Um, Duke? Washington, which comes in #37 in the RPI?) Is that possible? I don't know. But I think the qualitative arguments against RPI are strong. I think what we know about how the better teams in mid-major conferences give an unnatural boost to the RPI of teams that play them - I think that's a qualitative argument against RPI. Also, that the RPI seems to invite manipulation of the ratings through strategic scheduling. I don't see how either a Pablo or an Elo rating can be manipulated the same way (by scheduling the Libertys and College of Charlestons*). For that reason alone, the RPI should be jettisoned in favor of either something like Pablo or an Elo. * I know women's soccer way more than v-ball so I'm not sure if Liberty and C of C are the best examples of what I'm talking about - teams with fattened-up won-loss percentages because they are the best teams in relatively weak conferences.
|
|
|
Post by jake on Dec 9, 2010 15:19:17 GMT -5
No system should penalize (RPI dropping) a team for winning a game against another team,...no matter how good or bad that opponent may be!!!!
This must be the BIGGEST fault in the RPI System.
If you need an example,...try Cal Poly (RPI #56) v. UC Riverside (RPI #+200). After winning this conference match, Cal Poly's RPI went down to 64.
At best a team's RPI should remain constant.
OUT!
|
|
|
Post by The Bofa on the Sofa on Dec 9, 2010 15:41:16 GMT -5
No system should penalize (RPI dropping) a team for winning a game against another team,...no matter how good or bad that opponent may be!!!! Pablo will do that, too.
|
|
|
Post by mikegarrison on Dec 9, 2010 15:49:23 GMT -5
No system should penalize (RPI dropping) a team for winning a game against another team,...no matter how good or bad that opponent may be!!!! Pablo will do that, too. Even if you win by a large margin? Or only if you win a close match?
|
|
|
Post by The Bofa on the Sofa on Dec 9, 2010 15:53:37 GMT -5
Even if you win by a large margin? Or only if you win a close match? In the combined model, it can be either. Works better that way overall (although i wouldn't be surprised if their were a systematic variation)
|
|
|
Post by lonewolf on Dec 9, 2010 16:30:15 GMT -5
-I think that some loss has to be reflected in playing a much weaker team to adjust the rating for quality of opponents and discourage playing only much weaker competition.
-P-dub....love the UNI/Hawaii fun fact.
-The basketball championship selection has been brought up several times over the last few weeks, so I thought I would post the 2 more pertinent sections.
From the Mission Statement:
From the Championship Structure:
|
|
|
Post by karplets on Dec 10, 2010 10:24:16 GMT -5
-I think that some loss has to be reflected in playing a much weaker team to adjust the rating for quality of opponents and discourage playing only much weaker competition. It would be consistent that Pablo penalizes teams for not winning as decisively as they "should" against much weaker teams. Now it doesn't quite make sense to me if there's almost an automatic ratings loss in Pablo for playing a much weaker team no matter how decisively you win, but I'm sure the net rating loss would be pretty small so the effect would be negligible. It's probably enough discouragement, isn't it, that with systems like Pablo or an Elo-rating that you don't get a boost to your ratings for beating much weaker teams. And for the overall accuracy of the ratings, at least the other teams that play you don't benefit from these "meaningless" wins since your rating doesn't go up because of them. Unlike RPI where those extra wins on your record helps their RPI.
|
|
|
Post by The Bofa on the Sofa on Dec 10, 2010 12:07:47 GMT -5
-I think that some loss has to be reflected in playing a much weaker team to adjust the rating for quality of opponents and discourage playing only much weaker competition. It would be consistent that Pablo penalizes teams for not winning as decisively as they "should" against much weaker teams. Now it doesn't quite make sense to me if there's almost an automatic ratings loss in Pablo for playing a much weaker team no matter how decisively you win, but I'm sure the net rating loss would be pretty small so the effect would be negligible. For a single match? Certainly. However, if you have a very large number of them, it can add up. To an extent, we see this with teams like UNI and Hawaii. They are drug down by the quality of their conference schedules. I mentioned that I think it is a systematic issue, and this is what I mean. I know that, overall, teams that have much weaker opponents tend to underperform if you just ignore those matches (assuming they won). There are different ways to interpret it, however. It could be that playing such weak competition does not properly prepare you for future matches against stronger opponents. If that were the case, then it wouldn't matter where they occurred. However, I suspect (and this is just my guess) that the bigger issue is one of self-selection. Teams that aren't as good tend to schedule much weaker opponents. Then again, maybe not. But I think that's the reason it tends to be better to put that in. I've never actually looked to see which matches flip between the two models. That could, in principle, shed more light on why it matters. Then again, I've never been able to figure out how to account for it even if I knew, so I haven't bothered.
|
|
|
Post by redbeard2008 on Dec 10, 2010 12:49:39 GMT -5
...if there were any way to "view" RPI, that is what I would say it is: it's W/L percentage put into context of the quality of the opponents. Perhaps the better way of saying it would be that RPI correlates a team's W/L percentage to the W/L percentages of opponents and opponents of opponents. The actual "quality" of the wins (or losses) is irrelevant to RPI, it seems to me, other than they are wins (or losses). Now, correcting for home/neutral/road wins/losses could add a qualitative element, in that better teams would presumably be more able to win on opponents courts, while worse teams would be less able to win on their home courts.
|
|