|
Post by ay2013 on Oct 28, 2014 15:26:47 GMT -5
That cumulative difference of 68 points among 13 teams is an average of about 5 places. That seems really significant when looking at the top 25, but 5 places in the context of 300+ teams is only a roughly 2% margin of difference. It should be noted that one of the largest differential in this sample is Pitt, an eastern team, with 7 points better in Pablo than rpi, as well as a few other non-western teams that fare better in Pablo than RPI--which does provide some examples against the "West gets screwed theme." Not a fan of the use of the methodology behind RPI, but the NCAA is never going to use a methodology it doesn't control (Pablo, Massey, etc). And the main significance of using RPI to complete the field for the tournament is for those teams on the bubble of the RPI threshold for making the tournament (teams with RPI somewhere in the 40 to 50 range). Since no one outside the top 25 rankings in RPI or the AVCA Poll has ever come close to winning a National Championship, I think the outrage over RPI is just a bit over-blown. If winning the championship were the only concern for each school, then you might be right. But for many programs, making the tournament is a big deal. And for many teams who do make the tourney, seeding and placement play a big roll in how far the team goes. In this respect, the outrage over the major fallacies of the RPI is not overblown. But why should the NCAA be concerned about what each individual school gets out of the tournament? The NCAA's job is to crown a champion, and let the team visit the sitting President.
|
|
|
Post by ay2013 on Oct 28, 2014 15:28:24 GMT -5
absolutely i love stanford but i'm ok with them not being number 1 on everything... actually i would love to see the current team play agains the 2007 team, i think the 2007 team would win in 4. I'd take 2007 Stanford in 3, with one competitive set.
|
|
|
Post by alpacaone on Oct 28, 2014 15:34:33 GMT -5
I feel there is a rather significant flaw in the title of this thread. "Rankings" like seedings, reward success, not potential. PSU, no matter how great I believe they are, did not find success against current Pablo #4, #11, #17. To rate them, or even rank their potential is fair, however, it certainly souldn't be viewed in the same light as a success based seeding, or ranking regardless what statistical evidence is derived.
|
|
|
Post by Cruz'n on Oct 28, 2014 15:40:46 GMT -5
If winning the championship were the only concern for each school, then you might be right. But for many programs, making the tournament is a big deal. And for many teams who do make the tourney, seeding and placement play a big roll in how far the team goes. In this respect, the outrage over the major fallacies of the RPI is not overblown. But why should the NCAA be concerned about what each individual school gets out of the tournament? The NCAA's job is to crown a champion, and let the team visit the sitting President. But the RPI's purpose is NOT crowning a champion, but fielding the tournament.
|
|
|
Post by chipNdink on Oct 28, 2014 15:59:19 GMT -5
I feel there is a rather significant flaw in the title of this thread. "Rankings" like seedings, reward success, not potential. PSU, no matter how great I believe they are, did not find success against current Pablo #4, #11, #17. To rate them, or even rank their potential is fair, however, it certainly souldn't be viewed in the same light as a success based seeding, or ranking regardless what statistical evidence is derived. Yes, we should call Pablo ratings (like chess ratings) rather than rankings. The Chess Champion or winner of any tournament is not (and often not) the highest rated player.
|
|
|
Post by mikegarrison on Oct 28, 2014 16:03:08 GMT -5
I feel there is a rather significant flaw in the title of this thread. "Rankings" like seedings, reward success, not potential. PSU, no matter how great I believe they are, did not find success against current Pablo #4, #11, #17. To rate them, or even rank their potential is fair, however, it certainly souldn't be viewed in the same light as a success based seeding, or ranking regardless what statistical evidence is derived. Ranking = "ordered list, based on rating" Rating = "numerical score" It's that simple. You are assigning some baggage to the word ranking (seedings, etc.) that is not actually inherently part of the concept of ranking. Pablo, whether looked at by ratings or rankings, is not designed to reflect the past success of the team. It is designed to (using your terminology) model their potential for future success. Since past success really does serve as a pretty valid predictor of future success, it's not too surprising that they tend to correlate. But there is no reason that it is invalid to make a list of ranking by pablo score. The ratings are more valuable than the rankings, but that doesn't mean the rankings have zero value.
|
|
|
Post by Cruz'n on Oct 28, 2014 16:14:59 GMT -5
I believe Pablo is designed to be more of a predictor, i.e. measuring which teams are better and more likely to win - I'd consider Pablo a 'rating' vs. a ranking, hence it ends of with an order of teams according to what it's formula spits out which team is more likely to win (on a neutal court). - two different things. Should Stanford be rated above Penn State? Not necessarily Should Stanford be ranked above Penn State? Absolutely Yes two completely acceptable results given their performance Who's better, Penn St or Stanford? There will be those who will point to the fact that Stanford beat Penn St, and that settles it. But it doesn't, not for me, and not objectively. Because there's more to it. Not all wins are equal. So take Stanford's win over Penn St, in Maples, where Penn St scored 51.5% of the points. What would happen if they played again? We can't know, of course, but we can look to see what has happened in other circumstances like this, where the home team has won despite scoring only 48.5% of the points. I have the database to do it. A team who wins at home despite scoring 48 - 49% of the points wins 19% of the time if the two teams play again on the other team's home court. Shoot, even if they play again on their home court, they only win 25% of the time. I was actually shocked at how lopsided this is. Admittedly, there aren't a ton of examples to base it upon (50 to 80 examples, depending on the circumstances), but still, that is really, really lopsided. Now, if the road team can pull of the 5 set win, they are more likely than not to win if they play again. Interestingly, it's been about 55% whether they turn around and play at home or even if they play again on the road. Could be some distortion due to small numbers, but the trend is consistent. Even if the two matches are both on neutral sites, 40% of the time, the team that lost the first match wins the second. Very clearly, this is what Pablo is seeing when it comes to Penn St. I keep telling you all this, because it's true. If you want to know how many wins and losses a team has, look at their wins and losses. However, if you want to know how good a team is, look at the number of points they are scoring, because that where that information is. Keep in mind, this is not just a Pablo bias, and I'm not just making this up. Sure, I started out Pablo with this type of idea as the underlying model, but as the data above show, it's born about by what happens on the court. See my comment yesterday in the 3210 thread. If you know how many points were scored, then knowing who won tells you very little about who is better (it's not completely worthless - there is a tiny, tiny premium in winning, to the tune of increasing the expected winning percentage by maybe 2% in the region close to zero - what this means is that if two teams score the exact same number of points, the winning team has like a 52% chance of winning a second match; the tipping point looks to be about 49.5% of the points). I appreciate all your analysis and input. You do a great service to the volleyball community. I imagine that if you keep at this for another ten years or more, you will continue to make tweaks. Thus, system isn't perfect, and can always improve. Still, I believe yours is the best for predicting winners, and when looking at the whole list of teams, yours is the most accurate for ranking teams. In the example you used above of Penn St and Stanford, you appear to base much of your prediction on their prior meeting, and the fact that Penn St outscored Stanford. Do you not consider opponents in common? Interestingly enough, Penn St also outscored both Nebraska (+3) and Illinois (+2) in those losses. Stanford was (-2) against Illinois, and (+22) against Nebraska. Does that get factored in? Perhaps in Penn St's case, outscoring opponents even in a loss can be attributed to having the best server in the country, who tends to score in long runs at times. If this happens in even one set, that set can quickly get out of hand.
|
|
|
Post by The Bofa on the Sofa on Oct 28, 2014 16:21:43 GMT -5
(interesting case of small numbers: the record of teams that win in 4 at home while scoring 48 - 49% of the points is 0 - 6) To get outscored in a four-set match means you had one really bad set. That means you are vulnerable to something. It's not terribly surprising that the vulnerability shows up as losses in the future. All that being said, I strongly suspect that the distinction between teams that won at home and teams that lost at home is just random error. I don't see any plausible causal mechanism there. It's 6 matches. that's why I called it small numbers
|
|
|
Post by mikegarrison on Oct 28, 2014 16:29:13 GMT -5
To get outscored in a four-set match means you had one really bad set. That means you are vulnerable to something. It's not terribly surprising that the vulnerability shows up as losses in the future. All that being said, I strongly suspect that the distinction between teams that won at home and teams that lost at home is just random error. I don't see any plausible causal mechanism there. It's 6 matches. that's why I called it small numbers Right, but I meant the entire dataset concerning teams that won or lost with a minority of the points at home. I completely believe that if the home team scored fewer points than the visitors in the first matchup, then they are likely to lose in the second matchup. But the data indicating that teams in that situation who won the first match are MORE likely to lose the second one than teams that lost the first match? My suspicion is that is statistical fuzz, rather than being a significant finding. I would want a causal mechanism and independent replication before I was willing to believe that part of it.
|
|
bluepenquin
Hall of Fame
4-Time VolleyTalk Poster of the Year (2019, 2018, 2017, 2016), All-VolleyTalk 1st Team (2021, 2020, 2019, 2018, 2017, 2016)
Posts: 12,423
|
Post by bluepenquin on Oct 28, 2014 16:32:24 GMT -5
It's even more bizarre. The team that scores 48.5% and wins at home is even more likely to lose the second match than the team that scores 48.5% and loses! I've revised things a bit by looking only at 5 set matches, but here are the numbers: For teams that score 48-49% at home and WIN Win 2nd match on road: 22% Win 2nd match at home again: 26% Win 2nd match on neutral site: 43% (only 20 examples) LOSE Win 2nd match on road: 26% Win 2nd match at home again: 29% Win 2nd match on neutral site: 37% (41 matches, so better) I won't claim that the difference between winning the second match after winning or losing is statistically significant, but it certainly does not support any hypothesis that winning makes you better in the second meeting. (interesting case of small numbers: the record of teams that win in 4 at home while scoring 48 - 49% of the points is 0 - 6) My guess - there is a bias in the sample. Better teams are more likely to lose close 5 set matches where they outscore the opponent than worse teams. Put another way - clearly worse teams rarely lose 5 set matches where they outscore their opponent, or at least not at the same rate as good teams. The fact that one team who narrowly outscores another is more likely to win the next time they play makes perfect sense to me. However, I suspect the sample is loaded with considerably more superior teams that lost the 1st match. This could be tested by looking at the sample and seeing how often the team that lost had the higher Pablo rating before the 1st match - I am going to guess it was much higher than 50%.
|
|
|
Post by ay2013 on Oct 28, 2014 18:02:39 GMT -5
But why should the NCAA be concerned about what each individual school gets out of the tournament? The NCAA's job is to crown a champion, and let the team visit the sitting President. But the RPI's purpose is NOT crowning a champion, but fielding the tournament. No, the NCAA Seeding Committee's purpose is to field the tournament, RPI's purpose is to provide a SOS rating based on certain inputs.
|
|
|
Post by redbeard2008 on Oct 28, 2014 18:07:42 GMT -5
The purpose is to provide something to blame other than the committee. It does that quite admirably.
|
|
|
Post by The Bofa on the Sofa on Oct 28, 2014 18:25:29 GMT -5
My guess - there is a bias in the sample. Better teams are more likely to lose close 5 set matches where they outscore the opponent than worse teams. That's a really interesting assertion, but personally, I think it is just as much bull%*$# as the claim that good teams have the "ability to win the close ones" But I do give you credit for being contrarian.
|
|
|
Post by The Bofa on the Sofa on Oct 28, 2014 18:28:08 GMT -5
It's 6 matches. that's why I called it small numbers Right, but I meant the entire dataset concerning teams that won or lost with a minority of the points at home. I completely believe that if the home team scored fewer points than the visitors in the first matchup, then they are likely to lose in the second matchup. But the data indicating that teams in that situation who won the first match are MORE likely to lose the second one than teams that lost the first match? My suspicion is that is statistical fuzz, rather than being a significant finding. I would want a causal mechanism and independent replication before I was willing to believe that part of it. As I mentioned, I don't know if it's statistically significant. However, the fact that it showed up both for Home/away and home/neutral is interesting. Let me check away/home and away/away.
|
|
|
Post by The Bofa on the Sofa on Oct 28, 2014 18:36:11 GMT -5
I feel there is a rather significant flaw in the title of this thread. "Rankings" like seedings, reward success, not potential. PSU, no matter how great I believe they are, did not find success against current Pablo #4, #11, #17. To rate them, or even rank their potential is fair, however, it certainly souldn't be viewed in the same light as a success based seeding, or ranking regardless what statistical evidence is derived. Ranking = "ordered list, based on rating" Rating = "numerical score" It's that simple. You are assigning some baggage to the word ranking (seedings, etc.) that is not actually inherently part of the concept of ranking. In fact, Pablo itself has nothing to do with rankings. It calculates ratings, and relative ratings at that. The ratings that I give the teams are determined by anchoring everything to the median, which is set at 5000. From there, I rank the teams based on their Pablo ratings. Teams can be ranked according to anything you want. You can rank them by average shoe size if you want.
|
|