|
Post by karplets on Dec 5, 2011 12:37:18 GMT -5
Caution about the updated Massey ratings if anyone ventures over to their website www.masseyratings.com/rate.php?lg=cvol (and click on Division I in the menu list): while it supposedly reflects games through Saturday, it's missing a bunch of games. In general, the data input for the Massey Ratings appears less reliable for the post-season. Still, incomplete as it is, it shows us a few interesting things. For instance both Cal's and Washington's rating has gone down around 12 points (converted to Albyn Jones scale) even though their games haven't been recorded - so that's as a result of the change in ratings in their opponents - for example, Stanford whose games have been entered in, including the loss to Michigan. Stanford's rating dropped almost 30 points from 1865 to 1836 -- and may drop further still as the other Pac 12 games are entered. (USC, Stanford and Oregon's games are entered but UCLA, Cal and Washington's are not, for instance. Ohio State's win over Tenn is entered but not its victory over Middle Tenn. Minnesota's games are not entered) Generally I'm less interested in the revised post-playoff ratings and mainly concerned with the ratings at the end of the regular season -- to compare with the official RPI that's used by the Committee to determine placement in the tournament. But here I'm interested to see how much of a difference these results wil make considering the rather diametrical fortunes of the Big10 and the Pac 12. Regarding the reliability of the Massey ratings generally - and the data input specifically: 1) Take the Massey ratings with a grain of salt for that reason alone. I don't know if the Massey Ratings gets any money at all for ratings of sports like college volleyball or soccer and they're just one sport among many for them; so just how determined are they to get the data 100% correct? 2) Having said that, I believe the regular season women's soccer data is 99.9% reliable because they get their data from NC-soccer which is a really dedicated fan website. However, NC-soccer's interest is in simulating the RPI (much like Rich Kern does for VB) - so they don't update their database with playoff results, meaning Massey is dependent on another source for that data. 3) I don't know who the Massey Ratings relies on for data for volleyball -- but again, I noticed right away games missing for the playoffs, but generally the regular season data seems virtually complete and correct. 4) I wouldn't judge them too harshly over this. Like I said, I doubt they get much (if any) money for it. Just be aware of it and recognize it's not an inherent flaw with Massey or an Elo-type of rating. If they were ever paid to do this in an official capacity, they could take extra steps to make sure the data input is complete and correct.
|
|
|
Post by karplets on Dec 5, 2011 16:11:32 GMT -5
Over at BigSoccer, I believe I tied for first in the tournament bracket contest using the Massey / AJ ratings.
I don't want to make too much of that - it was a bit of a fluke that it came out that way (I'll explain later). And RPI did well, too - and if Duke had won the championship game instead of Stanford, then the RPI entry made by cpthomas would've tied for 1st 1 point ahead of my Massey/AJ entry.
So as far as that goes (bracket prediction) - both an Elo rating and RPI did well.
As far as problems with the bracket that partly (but not entirely) show up in the bracket -- the seeding of the SEC teams was frequently criticized and seemed unjustified from a Massey/Elo standpoint. And in fact the SEC, with 8 teams selected for the tournament, crashed out by the 2nd round.
Ah, Justice. Sometimes.
|
|
|
Post by The Bofa on the Sofa on Dec 5, 2011 16:28:42 GMT -5
Over at BigSoccer, I believe I tied for first in the tournament bracket contest using the Massey / AJ ratings. That's actually pretty odd, isn't it? As the PtW folks will tell you, the thing you gotta do to beat Pablo is to pick the upsets, so that's what you try to do. Now, most of the time folks are more unsuccessful than successful at doing that (as expected - if 20% are going to be upsets, you choose 5 upsets for a 25 match set; on average, only 1 of those will be right; similarly, you get 16 right of the other 20 by using Pablo, so the average outcome is Pablo gets 20 right and you get 17). The way to beat Pablo is to hope that the upsets you pick are the ones Pablo gets wrong. It's not easy to do, but with a big enough sample picking, it is not surprising when a couple pull it off each week. The same should go for brackets. If you try to pick the upsets from Massey/RPI/whatever, it's a long term losing strategy, but in a single season, there should be at least a few who can beat it. Either it was a really messed up year, or there just aren't enough folks participating.
|
|
|
Post by c4ndlelight on Dec 5, 2011 16:31:16 GMT -5
Over at BigSoccer, I believe I tied for first in the tournament bracket contest using the Massey / AJ ratings. That's actually pretty odd, isn't it? As the PtW folks will tell you, the thing you gotta do to beat Pablo is to pick the upsets, so that's what you try to do. Now, most of the time folks are more unsuccessful than successful at doing that (as expected - if 20% are going to be upsets, you choose 5 upsets for a 25 match set; on average, only 1 of those will be right; similarly, you get 16 right of the other 20 by using Pablo, so the average outcome is Pablo gets 20 right and you get 17). The way to beat Pablo is to hope that the upsets you pick are the ones Pablo gets wrong. It's not easy to do, but with a big enough sample picking, it is not surprising when a couple pull it off each week. The same should go for brackets. If you try to pick the upsets from Massey/RPI/whatever, it's a long term losing strategy, but in a single season, there should be at least a few who can beat it. Either it was a really messed up year, or there just aren't enough folks participating. For some reason, I've always thought of soccer, primarily due to the existence of ties, low rate of scoring ,and ability of red cards to completely throw a match, as one of the sports with the highest fluke percentage and most difficult to predict (with an ensuing high rate of upsets). I wouldn't be surprised that made it harder for individuals to predict upsets.
|
|
|
Post by The Bofa on the Sofa on Dec 5, 2011 16:37:07 GMT -5
That's actually pretty odd, isn't it? As the PtW folks will tell you, the thing you gotta do to beat Pablo is to pick the upsets, so that's what you try to do. Now, most of the time folks are more unsuccessful than successful at doing that (as expected - if 20% are going to be upsets, you choose 5 upsets for a 25 match set; on average, only 1 of those will be right; similarly, you get 16 right of the other 20 by using Pablo, so the average outcome is Pablo gets 20 right and you get 17). The way to beat Pablo is to hope that the upsets you pick are the ones Pablo gets wrong. It's not easy to do, but with a big enough sample picking, it is not surprising when a couple pull it off each week. The same should go for brackets. If you try to pick the upsets from Massey/RPI/whatever, it's a long term losing strategy, but in a single season, there should be at least a few who can beat it. Either it was a really messed up year, or there just aren't enough folks participating. For some reason, I've always thought of soccer, primarily due to the existence of ties, low rate of scoring ,and ability of red cards to completely throw a match, as one of the sports with the highest fluke percentage and most difficult to predict (with an ensuing high rate of upsets). I wouldn't be surprised that made it harder for individuals to predict upsets. You would think so, but that would basically affect the base rate, not the distribution so much. BTW, it is the scoring distribution of soccer being so low that I wouldn't bother doing something like Pablo rankings, which depends on statistical sampling of points. That's why an ELO version is probably best. I do wonder, what is the upset rate given the typical distribution of matches in the soccer world? Note that in women's tennis, I was finding a 35 - 40% upset rate, but that was due to standard deviation of individual performance.
|
|
|
Post by karplets on Dec 10, 2011 17:03:11 GMT -5
Here are the win probabilities for the quarterfinals using the converted Massey ratings.
(H) = homecourt for the favored team; (N) = neutral; (A) = favored team plays away
Next to last number is the rating differential adjusted for homecourt. The win probability is the last number and is in bold.
Illinois (A) 1955...... Florida 1857.... 38 / 0.560
Texas (N) 1954...... UCLA 1904..... 50 / 0.586
Iowa St (N) 1908...... Florida St 1829..... 79 / 0.627
USC (N) 1979.... Pepperdine 1849.... 130 / 0.711
All the matches are close - even Pepperdine has a very significant chance of an upset according to the converted Massey/Elo ratings.
*** In 56 matches so far in the tournament, the average expected win probability for the higher-rated team has been (.772), corresponding to an expected result of 43 wins for the higher-rated side. The actual result so far is 44 wins. The average deviation from the expected win probability is 1.3%
|
|
|
Post by karplets on Dec 12, 2011 1:05:14 GMT -5
The Massey/Elo ratings have done superbly this tournament. No, that doesn't mean it's picked more winners than anyone else. It's picked 46 out of 60 winners -- many of you can do that well or better. But the results have correlated very well to the expected win probabilities in two key ways.
First, the average expected win probability for the higher-rated teams was (.762). That means 46 out of 60 of the favored teams should've won. That's exactly what happened.
Second, and just as important - the win percentage should go up as the rating differential increases. That's the basic premise of an Elo system and other non-Elo systems that share this basic Elo concept (Pablo, probably, fits this definition). In a small sample such as the tournament, this doesn't always fit as smoothly as it should when you look at all games played. But in this case, the fit is almost perfect.
For instance, in the 18 games where the rating differential (adjusted for homecourt) was between 0 and 100, the average differential was 47 points and the average expected win probability was (.577). In 18 games that means the higher-rated team should win 10 games (10.39 rounded to the nearest game). The actual result was 11, or a win percentage of (.611)
In the 17 games where the rating differential was between 50 and 150, the average differential was 104 points meaning the average expected win probability was just over (.667). That means the higher-rated teams should win 11 out of 17 (11.34 rounded to the nearest game). The actual result was 11 wins, a win percentage of (.647)
As the rating differential goes up, the expected win percentage of course goes up. And in this tournament so did the actual win percentage. Again, I don't claim that it's always this good in a small sample. It usually isn't. But this time around the results really demonstrate the correlation between rating differential and win percentage.
range followed by avg rating differential / avg expected win pct / actual results and pct.
0-100.... 47 pts... avg expected win pct (.577)... actual 11/18 (.611)
50-150... 104 pts... avg expected win pct (.667)... actual 11/17 (.647)
100-200... 151 pts... avg expected win pct (.737)... actual 13/18 (.722)
150-250... 187 pts... avg expected win pct (.780)... actual 10/13(.769)
200-300...263 pts... avg expected win pct (.859)... actual 7/9 (.778)
>250....426 pts... avg expected win pct (.931)... actual 20/21 (.952)
This is a powerful aspect to the system that the RPI simply does not possess and cannot possess.
|
|
|
Post by karplets on Dec 14, 2011 12:25:55 GMT -5
Expected win probabilities for the semifnals:
USC (rating 1979) win probability (.535) over Illinois (1955)
UCLA (rating 1904) win probability (.627) over Florida St (1829)
The games are on a neutral court (although naturally you wonder if Florida St gets a bit of a boost since they're playing in their home state)
USC / Illinois is virtually a tossup in the Massey ratings. UCLA is favored over Florida St but the rating s give Florida St a very good chance at an upset.
In the PTW contest I have Illinois advancing to the final even though USC has the higher rating . That's because with the regionals at Hawaii, Hawaii was slightly favored to advance to this stage over USC despite Hawaii having a slightly lower rating than USC (1936 vs 1979)
I've been using the figure of 60 rating points for HCA (homecourt advantage)
|
|
|
Post by karplets on Dec 14, 2011 12:49:23 GMT -5
One note re Pablo vs Massey/Elo (not that it's really a contest) -
I rather expected Pablo, at some point, to "outpredict" the Massey ratings somewhere having to do with UNI, a team everyone seems to think is overrated by the Committee. But UNI is a case where the Massey ratings and RPI come close to agreeing.
So to nobody's surprise, it seems, UNI was knocked out by Florida. Pablo 1 / Massey 0 on that one.
But then I was surprised to find Pablo also rated Florida higher than Illinois so when Illinois advanced past Florida, Pablo gave one back to Massey/Elo.
Regarding UNI, I wondered whrther an Elo system overrates them and if I could see clearly why. Obviously they compile a sterling record playing in a weaker conference than most of the other top teams. That's always a danger for these systems - there's inherently greater uncertainty about a rating obtained that way. But I will say that my preliminary opinion is that UNI deserves the rating it has (or had) in Massey and I would seed them on the basis of the Massey rating rather than, say, the basis of a poll like the AVCA poll.
|
|
|
Post by karplets on Dec 16, 2011 1:19:50 GMT -5
Updated polls (PtW format from the PTW thread at the top of the forum)
Pablo 45-17 RPI 43-19
AVCA 44-16 RichKern 42-17
Massey/Elo (modified/Albyn Jones scale): 44-18
Record with the actual matchups as played in the tournament: 47/62
Interesting thing is that between the RPI, Pablo and Massey, the RPI was the only one that would "predict" Illinois over USC in the head-to-head matchup. (The Massey/Elo bracket had Illinois advancing to the final but because it predicted Hawaii as the semi-final opponent)
RPI: Illinois #1, USC #8 Pablo: USC #1, Illinois #10 Massey: USC #1, Illinois #2
On the other hand, RPI flubs the head-to-head between UCLA and Florida St:
RPI: FSU #10, UCLA # 15 Pablo: UCLA #13, FSU #23 Massey: UCLA #10, FSU #17
|
|
|
Post by karplets on Dec 18, 2011 13:45:29 GMT -5
With the tournament over, I think a few observations are in order regarding the various ranking /rating systems - primarily Pablo, RPI, Elo systems (such as, presumably, Massey 1), and human polls such as the AVCA.
1) as simple predictive tools, in a small sample such as the tournament , there's often not much difference between them - as Bofa on the Sofa pointed out previously
2) Pablo very likely is somewhat superior as a predictive tool on aggregate. But this superiority is slight and there are results it gets "wrong" that others don't. It's not as simple as getting the same ones right as other systems and then getting one or two more right.
3) Elo-like systems like Massey, though apparently less familiar to the VolleyTalk crowd, performs about on a par with the other methods (it came out in-between Pablo and RPI this time around and that's likely to be generally true - better at prediction than RPI but not as good as Pablo) and it has certain advantages which should make its use more widespread.
4) RPI does reasonably well measured this way as a simple predictive test.
4a) We could do worse than use the RPI. More precisely, the Committee could do worse
4b) RPI still has severe problems beyond those which show up in the simple predictive test, objections which really can't be met and which were evident in the tournament (such as Hawaii/USC 3rd round matchup)
5) Although I don't pay particularly close attention to polls, if I'm not mistaken the tournament results show they aren't particularly more reliable than the statistical ranking/rating methods and are subject to their own set of objections (bias, etc)
6) Prediction isn't everything. Although probability-based systems (Pablo and Massey/Elo both fit this category) are likely to be superior to both RPI and polls in this regard, their advantages are not limited to this . RPI, too, has certain advantages of its own which needs to be considered. (For instance, its relative transparency - the formula is published, it's open to all and hence less of a "black box")
In summary? The tournament results are consistent with the superiority of both Pablo and Elo-like systems to RPI (and probably polls) but the evidence isn't overwhelming. But neither can we expect this particular evidence to be.
|
|
|
Post by pogoball on Dec 18, 2011 15:52:56 GMT -5
Outstanding. Thank you for your contributions and keeping us up-to-date on these comparisons.
|
|
|
Post by karplets on Dec 18, 2011 16:16:30 GMT -5
You're welcome. It's a little wonky but Elo-type ratings can be quite attractive to fans of a game, as witness their ready acceptance and popularity among chessplayers -- and from what I understand, a lot of online game players that use similar rating systems. Where the converted Massey / Elo ratings were rather stunningly successful (maybe deceptively so) was, as I mentioned last week in looking at the results of groups of games arranged by rating differential. In theory, games between teams rated, for instance, 100 points apart in the Albyn Jones scale I use do not, nor should not, result in the favored teams winning all the time. It's worth repeating I think: if the "predictions" were always "right" then something is wrong; the ratings are off. Let's hope I can format a table a little better. In this tournament, the results rather stunningly follow the expected results based on rating differential. Let me stress - it's not usually this good. It's a bit of a fluke that it's this good this time. But what the heck, it illustrates the basic point nicely. range | avg diff. | expected pct | games played | won | actual pct. | 0-100 | 48 | 0.577 | 21 | 12 | 0.571 | 50-150 | 99 | 0.659 | 19 | 12 | 0.632 | 100-200 | 151 | 0.739 | 18 | 13 | 0.722 | 150-250 | 187 | 0.783 | 13 | 10 | 0.769 | 200-300 | 263 | 0.858 | 9 | 7 | 0.778 | 250+ | 426 | 0.95 | 21 | 20 | 0.952 |
Ooh, baby, it looks pretty good in the preview. I think I'll quit while I'm ahead, post this and make more comments later... Wait, I botched a row... I'll be back... There, better.
|
|
|
Post by karplets on Dec 18, 2011 20:34:09 GMT -5
Brief comment about the table above.
1) this is something on an entirely different level of what RPI is capable of 2) To me, it gives us some confidence that the ratngs have some validity and meaning. That it means something when teams are 100 points apart or 200 points apart or whatever.
3) in particular, to me, it should give us some confidence that the ratings have some validity measuring teams that play different schedules with different strength opponents.
Let's take the example of a team A that plays opponents all rated right about 1600 and has a winning pct of (.667) and a team B that plays opponents all rated around 1500 with a winning pct of (.800)--
Of course it isn't perfect but in this example we can have some confidence that Team A and Team B are about equal in strength which will be reflected in their rating, which would be 1700.
|
|
|
Post by pogoball on Dec 22, 2011 0:43:00 GMT -5
I'm surprised it does this well. When we consider that volleyball has almost no out of conference matches after September, I would think that would introduce variance into the ratings.
|
|