The Ultimate Ranking System?

hebrooks87
Junior

Posts: 483

The Ultimate Ranking System? Nov 24, 2014 10:28:55 GMT -5

Quote

Post by hebrooks87 on Nov 24, 2014 10:28:55 GMT -5

Nov 21, 2014 17:52:38 GMT -5 bluepenquin said:

Nov 21, 2014 14:31:50 GMT -5 The Bofa on the Sofa said:

If you "ignore" UNLV's win against CSU, then what's the point of this exercise?

I've said all along doing this type of approach gives results that look odd. However, I'd actually like to hear the discussion of the actual methodology and foundations outside of "it don't like right"

Unfortunately I wouldn't have clue how to properly discuss your methodology on this than I could on the Pablo Rating (this is way over my head). I can only trust the results - this system is (much) better at 'predicting' who won matches already played and Pablo is very successful at predicting who would win matches going forward. This is based only on wins and losses (like RPI), while Pablo uses points in its calculation.

So maybe through examples I could better understand. A team like Oklahoma is significantly worse here than RPI. A thought would be that the win over Texas would do some considerable good in a system like this - where RPI doesn't care at all where the wins come from. Colorado comes in much better vs. RPI - and more improvement than other PAC 12 teams. LSU is a little worse.

i bring those 3 teams up to better understand - Oklahoma and Colorado has what seems like a huge upset on their resume, yet both teams go in different directions from RPI. LSU has a huge upset loss - but it doesn't appear to have much if any negative impact compared to RPI - especially when considering that just about every SEC team does worse in this System. So curious - does this better reward the High and Low teams (Oklahoma, LSU, Colorado) or does it reward the more consistent team (very few upset wins and losses). I am guessing that the answer is 'it depends'?

Also - at this point in the season, there is just very little movement in RPI and Pablo - especially at the top. Would this system act similarly?

The real issue with Oklahoma isn't "ignoring" the win over Texas, because it's also ignoring the loss at TCU. The critical match is the loss to SMU. The optimization has used the HCA very nicely with OU to put them close enough to Kansas, CSUN, and UALR to have their home wins over them and the road SMU loss correctly classified. Unless I've missed something, there are two OU misclassifications (win at Texas, loss at TCU). The closeness in the overall rankings of Kansas, CSUN, UALR, and SMU pretty much means OU is locked there. If OU moves up any compared to SMU, it adds a misclassification.

Colorado get the Washington and Oregon results right with the HCA, but CU has several misclassifications (4, I think).

In general, minimizing misclassification costs involves assumptions about the cost of different kinds of errors. In my professional life, the trade-off between false alarms and missed detections of tornadoes in the warning process is implicitly taken into account by assuming the cost of a missed detection is much larger than the cost of a false alarm (most medical tests work the same way). Here, the assumption is the costs are the same for all kinds of errors, so that minimizing cost is identical to minimizing errors. (The exception comes in the Harvard example give, earlier by the sofa-sitting Bofa, which I think is equivalent to assuming different costs for at least some errors.) That's not a bad default assumption in this case. I can imagine developing a way to weight errors differently, but it would likely have a lot of pseudo-objective inputs.

To me, one of the points of this kind of exercise is that it highlights the differences inherent in different ranking processes. Ranking processes are designed, implicitly or explicitly, with assumptions about what's important and provide some sort of view of an unknown truth. For the most part, all remotely reasonable systems agree in broad brush. The departures from that agreement tell us something about the impacts of those assumptions. RPI, in women's volleyball, unintentionally rewards major conference teams in the east compared to the west. (It's my understanding that our friends in women's soccer have similar issues with RPI.) Using a win/loss only system, compared to a points based system in the Pablo structure, moves a small number of teams a large distance.

Bofa, as you know, I am a lazy theoretician for the most part. It would be interesting to see a list of the number of misclassifications of each sign for each team and a comparison of the ratings from the full system and the W/L only system. I know I could put the tables together myself, but you probably already have them sitting around in some form.:-) Want to send them to me?

mikegarrison
Legend

All-Volleytalk 1st Team (2019, 2017, 2016), All-Volleytalk 2nd Team (2018)

Posts: 63,173

The Ultimate Ranking System? Nov 24, 2014 10:39:56 GMT -5 hebrooks87 likes this

Quote

Post by mikegarrison on Nov 24, 2014 10:39:56 GMT -5

Nov 24, 2014 10:28:55 GMT -5 hebrooks87 said:

Nov 21, 2014 17:52:38 GMT -5 bluepenquin said:

Unfortunately I wouldn't have clue how to properly discuss your methodology on this than I could on the Pablo Rating (this is way over my head). I can only trust the results - this system is (much) better at 'predicting' who won matches already played and Pablo is very successful at predicting who would win matches going forward. This is based only on wins and losses (like RPI), while Pablo uses points in its calculation.

So maybe through examples I could better understand. A team like Oklahoma is significantly worse here than RPI. A thought would be that the win over Texas would do some considerable good in a system like this - where RPI doesn't care at all where the wins come from. Colorado comes in much better vs. RPI - and more improvement than other PAC 12 teams. LSU is a little worse.

i bring those 3 teams up to better understand - Oklahoma and Colorado has what seems like a huge upset on their resume, yet both teams go in different directions from RPI. LSU has a huge upset loss - but it doesn't appear to have much if any negative impact compared to RPI - especially when considering that just about every SEC team does worse in this System. So curious - does this better reward the High and Low teams (Oklahoma, LSU, Colorado) or does it reward the more consistent team (very few upset wins and losses). I am guessing that the answer is 'it depends'?

Also - at this point in the season, there is just very little movement in RPI and Pablo - especially at the top. Would this system act similarly?

The real issue with Oklahoma isn't "ignoring" the win over Texas, because it's also ignoring the loss at TCU. The critical match is the loss to SMU. The optimization has used the HCA very nicely with OU to put them close enough to Kansas, CSUN, and UALR to have their home wins over them and the road SMU loss correctly classified. Unless I've missed something, there are two OU misclassifications (win at Texas, loss at TCU). The closeness in the overall rankings of Kansas, CSUN, UALR, and SMU pretty much means OU is locked there. If OU moves up any compared to SMU, it adds a misclassification.

Colorado get the Washington and Oregon results right with the HCA, but CU has several misclassifications (4, I think).

In general, minimizing misclassification costs involves assumptions about the cost of different kinds of errors. In my professional life, the trade-off between false alarms and missed detections of tornadoes in the warning process is implicitly taken into account by assuming the cost of a missed detection is much larger than the cost of a false alarm (most medical tests work the same way). Here, the assumption is the costs are the same for all kinds of errors, so that minimizing cost is identical to minimizing errors. (The exception comes in the Harvard example give, earlier by the sofa-sitting Bofa, which I think is equivalent to assuming different costs for at least some errors.) That's not a bad default assumption in this case. I can imagine developing a way to weight errors differently, but it would likely have a lot of pseudo-objective inputs.

To me, one of the points of this kind of exercise is that it highlights the differences inherent in different ranking processes. Ranking processes are designed, implicitly or explicitly, with assumptions about what's important and provide some sort of view of an unknown truth. For the most part, all remotely reasonable systems agree in broad brush. The departures from that agreement tell us something about the impacts of those assumptions. RPI, in women's volleyball, unintentionally rewards major conference teams in the east compared to the west. (It's my understanding that our friends in women's soccer have similar issues with RPI.) Using a win/loss only system, compared to a points based system in the Pablo structure, moves a small number of teams a large distance.

Bofa, as you know, I am a lazy theoretician for the most part. It would be interesting to see a list of the number of misclassifications of each sign for each team and a comparison of the ratings from the full system and the W/L only system. I know I could put the tables together myself, but you probably already have them sitting around in some form.:-) Want to send them to me?

Another way to put that is that you can minimize the number of errors or you can minimize the overall magnitude of error (usually RMS) or you can minimize the number of extremely large errors or....

You have to make choices with this kind of thing. And minimizing errors doesn't mean eliminating them. ANY ranking system is going to have individual teams that you can pick out and call an error. But so what? If you get 300 teams right and 20 teams pretty close and 10 teams wrong, that's damn good.

I like pablo better than this, though. I like judging a team on their strength of scoring points better than on their W/L record, because I think it tells us more useful information. But even with pablo, there will be some amount of teams that probably are misranked 1:1 because it was necessary to minimize the total error of all 330+ teams.

hebrooks87
Junior

Posts: 483

The Ultimate Ranking System? Nov 24, 2014 10:57:50 GMT -5

Quote

Post by hebrooks87 on Nov 24, 2014 10:57:50 GMT -5

Nov 24, 2014 10:39:56 GMT -5 mikegarrison said:

Nov 24, 2014 10:28:55 GMT -5 hebrooks87 said:

The real issue with Oklahoma isn't "ignoring" the win over Texas, because it's also ignoring the loss at TCU. The critical match is the loss to SMU. The optimization has used the HCA very nicely with OU to put them close enough to Kansas, CSUN, and UALR to have their home wins over them and the road SMU loss correctly classified. Unless I've missed something, there are two OU misclassifications (win at Texas, loss at TCU). The closeness in the overall rankings of Kansas, CSUN, UALR, and SMU pretty much means OU is locked there. If OU moves up any compared to SMU, it adds a misclassification.

Colorado get the Washington and Oregon results right with the HCA, but CU has several misclassifications (4, I think).

In general, minimizing misclassification costs involves assumptions about the cost of different kinds of errors. In my professional life, the trade-off between false alarms and missed detections of tornadoes in the warning process is implicitly taken into account by assuming the cost of a missed detection is much larger than the cost of a false alarm (most medical tests work the same way). Here, the assumption is the costs are the same for all kinds of errors, so that minimizing cost is identical to minimizing errors. (The exception comes in the Harvard example give, earlier by the sofa-sitting Bofa, which I think is equivalent to assuming different costs for at least some errors.) That's not a bad default assumption in this case. I can imagine developing a way to weight errors differently, but it would likely have a lot of pseudo-objective inputs.

To me, one of the points of this kind of exercise is that it highlights the differences inherent in different ranking processes. Ranking processes are designed, implicitly or explicitly, with assumptions about what's important and provide some sort of view of an unknown truth. For the most part, all remotely reasonable systems agree in broad brush. The departures from that agreement tell us something about the impacts of those assumptions. RPI, in women's volleyball, unintentionally rewards major conference teams in the east compared to the west. (It's my understanding that our friends in women's soccer have similar issues with RPI.) Using a win/loss only system, compared to a points based system in the Pablo structure, moves a small number of teams a large distance.

Bofa, as you know, I am a lazy theoretician for the most part. It would be interesting to see a list of the number of misclassifications of each sign for each team and a comparison of the ratings from the full system and the W/L only system. I know I could put the tables together myself, but you probably already have them sitting around in some form.:-) Want to send them to me?

Another way to put that is that you can minimize the number of errors or you can minimize the overall magnitude of error (usually RMS) or you can minimize the number of extremely large errors or....

You have to make choices with this kind of thing. And minimizing errors doesn't mean eliminating them. ANY ranking system is going to have individual teams that you can pick out and call an error. But so what? If you get 300 teams right and 20 teams pretty close and 10 teams wrong, that's damn good.

I like pablo better than this, though. I like judging a team on their strength of scoring points better than on their W/L record, because I think it tells us more useful information. But even with pablo, there will be some amount of teams that probably are misranked 1:1 because it was necessary to minimize the total error of all 330+ teams.

As I think about this more, one of the issues is structural. Minimizing errors with dichotomous identification of dichotomous events (win/lose) is a more unstable process than minimizing errors with a probabilistic forecast (or identification) of a dichotomous event. The use of the points-based predictor (which I also prefer) allows for smoother predictions and results. A team can almost win (or lose) and get some credit (fault) for that result.

The Bofa on the Sofa
Volleyball 'Champion'

All-Volleytalk 1st Team (2016), All-Volleytalk 2nd Team (2018)

"As your friend, I have to be honest with you: I don't care about you or your problems" - Chloe

Posts: 13,217

The Ultimate Ranking System? Nov 24, 2014 13:34:50 GMT -5

Quote

Post by The Bofa on the Sofa on Nov 24, 2014 13:34:50 GMT -5

Nov 24, 2014 10:28:55 GMT -5 hebrooks87 said:

To me, one of the points of this kind of exercise is that it highlights the differences inherent in different ranking processes. Ranking processes are designed, implicitly or explicitly, with assumptions about what's important and provide some sort of view of an unknown truth. For the most part, all remotely reasonable systems agree in broad brush. The departures from that agreement tell us something about the impacts of those assumptions. RPI, in women's volleyball, unintentionally rewards major conference teams in the east compared to the west. (It's my understanding that our friends in women's soccer have similar issues with RPI.) Using a win/loss only system, compared to a points based system in the Pablo structure, moves a small number of teams a large distance.

To me, the point of this kind of exercise is to demonstrate the folly of the "The rankings should at least reflect who won or lost" comments I hear, either as an objection to Pablo or as a justification for RPI (at least the concept of it). We can do that. We can create rankings that do an amazing job of reflecting who won or lost. At least, much better than the ones we use.

And for good reason. THIS is what you get when you focus on reflecting simple wins and losses as well as possible. You get a lot of crazy results.

Bofa, as you know, I am a lazy theoretician for the most part. It would be interesting to see a list of the number of misclassifications of each sign for each team and a comparison of the ratings from the full system and the W/L only system. I know I could put the tables together myself, but you probably already have them sitting around in some form.:-) Want to send them to me?

I can send you the set of rankings that I used to create that assessment table, that's all I have. Figuring out the number of misclassifications for each teams is a little bit harder.

I'm doing a HCA-free method now. It's a lot harder than including a HCA (success right now is something like 87.1%; it's a difference of fewer than 10 matches though)

The Bofa on the Sofa
Volleyball 'Champion'

All-Volleytalk 1st Team (2016), All-Volleytalk 2nd Team (2018)

"As your friend, I have to be honest with you: I don't care about you or your problems" - Chloe

Posts: 13,217

The Ultimate Ranking System? Nov 24, 2014 14:40:19 GMT -5

Quote

Post by The Bofa on the Sofa on Nov 24, 2014 14:40:19 GMT -5

Nov 24, 2014 10:39:56 GMT -5 mikegarrison said:

Nov 24, 2014 10:28:55 GMT -5 hebrooks87 said:

The real issue with Oklahoma isn't "ignoring" the win over Texas, because it's also ignoring the loss at TCU. The critical match is the loss to SMU. The optimization has used the HCA very nicely with OU to put them close enough to Kansas, CSUN, and UALR to have their home wins over them and the road SMU loss correctly classified. Unless I've missed something, there are two OU misclassifications (win at Texas, loss at TCU). The closeness in the overall rankings of Kansas, CSUN, UALR, and SMU pretty much means OU is locked there. If OU moves up any compared to SMU, it adds a misclassification.

Colorado get the Washington and Oregon results right with the HCA, but CU has several misclassifications (4, I think).

In general, minimizing misclassification costs involves assumptions about the cost of different kinds of errors. In my professional life, the trade-off between false alarms and missed detections of tornadoes in the warning process is implicitly taken into account by assuming the cost of a missed detection is much larger than the cost of a false alarm (most medical tests work the same way). Here, the assumption is the costs are the same for all kinds of errors, so that minimizing cost is identical to minimizing errors. (The exception comes in the Harvard example give, earlier by the sofa-sitting Bofa, which I think is equivalent to assuming different costs for at least some errors.) That's not a bad default assumption in this case. I can imagine developing a way to weight errors differently, but it would likely have a lot of pseudo-objective inputs.

To me, one of the points of this kind of exercise is that it highlights the differences inherent in different ranking processes. Ranking processes are designed, implicitly or explicitly, with assumptions about what's important and provide some sort of view of an unknown truth. For the most part, all remotely reasonable systems agree in broad brush. The departures from that agreement tell us something about the impacts of those assumptions. RPI, in women's volleyball, unintentionally rewards major conference teams in the east compared to the west. (It's my understanding that our friends in women's soccer have similar issues with RPI.) Using a win/loss only system, compared to a points based system in the Pablo structure, moves a small number of teams a large distance.

Bofa, as you know, I am a lazy theoretician for the most part. It would be interesting to see a list of the number of misclassifications of each sign for each team and a comparison of the ratings from the full system and the W/L only system. I know I could put the tables together myself, but you probably already have them sitting around in some form.:-) Want to send them to me?

Another way to put that is that you can minimize the number of errors or you can minimize the overall magnitude of error (usually RMS) or you can minimize the number of extremely large errors or....

This is why something like Massey (or the W/L version of Pablo) is a much better way to do this. If you do an RMS, you would rather have two matches with errors of 400 points than 1 match with an error of 0 and the other with an error of 800. In this approach, if you have a win over #20 and a loss to #140, it's better to either be ranked better than 20 or below 140. In the Pablo w/l approach, you'd put them at #80, even if you don't account for points. In the full Pablo approach, you put them between the two and closer to which ever team they had the closer match against.

When I did the old W/L approach (and it's not hard, I just have to do another run and it takes time), the number of correct results was maybe 2% higher than full Pablo, so up to something like 84% instead of 82%. Definitely better, but not up to the Ultimate Ranking System where it got up to 88.5.

Last Edit: Nov 24, 2014 14:40:49 GMT -5 by The Bofa on the Sofa

pogoball
Professional

Posts: 2,720

The Ultimate Ranking System? Nov 25, 2014 0:40:51 GMT -5

Quote

Post by pogoball on Nov 25, 2014 0:40:51 GMT -5

Nov 24, 2014 13:34:50 GMT -5 The Bofa on the Sofa said:

And for good reason. THIS is what you get when you focus on reflecting simple wins and losses as well as possible. You get a lot of crazy results.

To make sure I understand, when you say crazy results, are you saying that when the URS is wrong, it can be more wrong than, say pablo?

That is, it will outperform pablo in accuracy (88 to 82), but it may have a winner ranked considerably lower than a loser when it is wrong, whereas pablo may have fewer accurate predictions, but when it is wrong, it is predictably wrong (the teams will be ranked closer together).

The Bofa on the Sofa
Volleyball 'Champion'

All-Volleytalk 1st Team (2016), All-Volleytalk 2nd Team (2018)

"As your friend, I have to be honest with you: I don't care about you or your problems" - Chloe

Posts: 13,217

The Ultimate Ranking System? Nov 25, 2014 8:57:37 GMT -5

Quote

Post by The Bofa on the Sofa on Nov 25, 2014 8:57:37 GMT -5

Nov 25, 2014 0:40:51 GMT -5 pogoball said:

Nov 24, 2014 13:34:50 GMT -5 The Bofa on the Sofa said:

And for good reason. THIS is what you get when you focus on reflecting simple wins and losses as well as possible. You get a lot of crazy results.

To make sure I understand, when you say crazy results, are you saying that when the URS is wrong, it can be more wrong than, say pablo?

That is, it will outperform pablo in accuracy (88 to 82), but it may have a winner ranked considerably lower than a loser when it is wrong,
whereas pablo may have fewer accurate predictions, but when it is wrong, it is predictably wrong (the teams will be ranked closer together).

So if for Pablo we focus on the W/L version (since full Pablo is a completely different concept and completely not comparable). The answer to the above is basically yes, but it goes both ways. You will also be ranked closer to teams that you beat than you should. So a team that beat's 20 and loses to 120 is going to be put just ahead of #20, the team they beat, but way ahead of #120, who they lost to (or just behind 120 and way behind 20). However, if you look at those two matches and what they tell you, you'd think they'd be telling you that the team is somewhere between 20 and 120. But that's not what the URS is going to do.

That's why I say it's going to give us crazy results. The problem is that once it becomes apparent to the system that the team needs to be ranked above 120, then that outcome doesn't matter any more.

Or we could consider it in light of the fact that if you have two extreme results, one at each end, instead of taking the average like Massey or Pablo, the URS is going to choose and extreme and consider it legit, and ignore the other one. The reason it's legitimate is because there is a 50/50 chance whether it will go with extreme high or extreme low, so if you look at the overall mix, half go up, half go down, and the superposition is somewhere in-between, but from an individual standpoint, you don't have the shades of gray.

The Bofa on the Sofa
Volleyball 'Champion'

All-Volleytalk 1st Team (2016), All-Volleytalk 2nd Team (2018)

"As your friend, I have to be honest with you: I don't care about you or your problems" - Chloe

Posts: 13,217

The Ultimate Ranking System? Nov 25, 2014 15:00:11 GMT -5

Quote

Post by The Bofa on the Sofa on Nov 25, 2014 15:00:11 GMT -5

Variations on the theme Part 1: No HCA
1 Stanford 7360
2 Texas 6940
3 Washington 6895
4 Wisconsin 6890
5 Illinois 6865
5 North Carolina 6865
7 Florida State 6765
8 Nebraska 6745
9 Penn State 6740
10 Oregon 6555
10 Colorado 6555
10 Arizona 6555
13 Florida 6540
14 Colorado State 6465
15 BYU 6460
15 Loyola Marymount 6460
17 UCLA 6455
18 Kentucky 6370
19 Pittsburgh 6365
19 Santa Clara 6365
19 Duke 6365
22 Miami-FL 6310
23 Iowa State 6270
24 Virginia 6245
24 Louisville 6245
24 Oregon State 6245
24 Texas A&M 6245
24 Arizona State 6245
29 USC 6240
30 Kansas State 6215
31 Oklahoma 6170
32 Kansas 6165
32 Utah 6165
34 Purdue 6135
34 Ohio 6135
34 Minnesota 6135
34 Ohio State 6135
38 Long Beach State 6125
39 West Virginia 6095
40 Seton Hall 6090
40 Creighton 6090
42 Alabama 6040
43 Marquette 5980
44 Cal State Northridge 5950
44 San Diego 5950
44 LSU 5950
44 Hawaii 5950
48 UCF 5945
49 LIU Brooklyn 5930
50 Michigan State 5925
50 Pacific 5925

The Bofa on the Sofa
Volleyball 'Champion'

All-Volleytalk 1st Team (2016), All-Volleytalk 2nd Team (2018)

"As your friend, I have to be honest with you: I don't care about you or your problems" - Chloe

Posts: 13,217

The Ultimate Ranking System? Nov 25, 2014 15:01:01 GMT -5

Quote

Post by The Bofa on the Sofa on Nov 25, 2014 15:01:01 GMT -5

Variations on the theme Part 2: With HCA = 159 but more recent matches weighted more heavily
1 Stanford stan 7255
2 Wisconsin wi 6885
3 Texas tx 6835
4 Washington wa 6800
5 Illinois il 6720
6 North Carolina nc 6710
7 Colorado co 6640
8 Florida State fs 6605
9 Penn State ps 6560
10 Duke duke 6450
10 Oregon or 6450
12 Florida fl 6445
13 Arizona az 6420
14 Nebraska nebr 6405
15 Loyola Marymount lm 6355
16 Colorado State csu 6285
17 BYU byu 6280
18 UCLA ucla 6255
19 Kansas State ks 6235
20 Santa Clara sanc 6195
21 Iowa State is 6190
22 Oregon State ors 6185
23 USC usc 6160
24 Kentucky ky 6155
25 Texas A&M txam 6130
25 UNLV nvlv 6130
27 Arizona State asu 6105
28 Utah utah 6100
28 Miami-FL miam 6100
30 Long Beach State lbs 6085
31 Kansas k 6080
32 Creighton crei 6070
33 San Diego sd 6030
34 Oklahoma ok 6025
35 LIU Brooklyn li 5975
36 Alabama al 5970
37 Ohio ou 5965
38 Minnesota mn 5960
39 Seton Hall seth 5955
40 Washington State was 5945
41 LSU lsu 5920
42 Purdue purd 5915
43 Marquette marq 5910
43 Pittsburgh pitt 5910
45 Hawaii hi 5895
46 Illinois State ils 5890
46 Ohio State osu 5890
48 Pacific pac 5870
48 Northern Illinois nil 5870
50 Cal State Northridge csn 5860
50 Arkansas-Little Rock alr 5860

The Bofa on the Sofa
Volleyball 'Champion'

All-Volleytalk 1st Team (2016), All-Volleytalk 2nd Team (2018)

"As your friend, I have to be honest with you: I don't care about you or your problems" - Chloe

Posts: 13,217

The Ultimate Ranking System? Nov 25, 2014 17:07:54 GMT -5

Quote

Post by The Bofa on the Sofa on Nov 25, 2014 17:07:54 GMT -5

Assessment (model, correct, %)
RPI Raw 3728 0.823139766
RPI HCA 3761 0.830426143
Full Pablo 3734 0.824464562
UPR 3971 0.876793994
UPR No HCA 3826 0.844778097
UPR Time 3957 0.873702804

So not surprising, we sacrifice a bit when emphasizing the later part of the season, but still the performance is good. You can get a lot better with a HCA

bluepenquin
Hall of Fame

4-Time VolleyTalk Poster of the Year (2019, 2018, 2017, 2016), All-VolleyTalk 1st Team (2021, 2020, 2019, 2018, 2017, 2016)

Posts: 12,447

The Ultimate Ranking System? Nov 25, 2014 17:25:50 GMT -5

Quote

Post by bluepenquin on Nov 25, 2014 17:25:50 GMT -5

Nov 25, 2014 17:07:54 GMT -5 The Bofa on the Sofa said:

Assessment (model, correct, %)
RPI Raw 3728 0.823139766
RPI HCA 3761 0.830426143
Full Pablo 3734 0.824464562
UPR 3971 0.876793994
UPR No HCA 3826 0.844778097
UPR Time 3957 0.873702804

So not surprising, we sacrifice a bit when emphasizing the later part of the season, but still the performance is good. You can get a lot better with a HCA

Also not surprised - I think time is usually overstated before one does the analysis. Never liked - who is 'Hot', or what was their record in the last 10 matches.

Wonder how much better RPI Raw would improve with HCA (3.1%)?

The Bofa on the Sofa
Volleyball 'Champion'

All-Volleytalk 1st Team (2016), All-Volleytalk 2nd Team (2018)

"As your friend, I have to be honest with you: I don't care about you or your problems" - Chloe

Posts: 13,217

The Ultimate Ranking System? Nov 25, 2014 17:36:56 GMT -5

Quote

Post by The Bofa on the Sofa on Nov 25, 2014 17:36:56 GMT -5

Nov 25, 2014 17:25:50 GMT -5 bluepenquin said:

Nov 25, 2014 17:07:54 GMT -5 The Bofa on the Sofa said:

Assessment (model, correct, %)
RPI Raw 3728 0.823139766
RPI HCA 3761 0.830426143
Full Pablo 3734 0.824464562
UPR 3971 0.876793994
UPR No HCA 3826 0.844778097
UPR Time 3957 0.873702804

So not surprising, we sacrifice a bit when emphasizing the later part of the season, but still the performance is good. You can get a lot better with a HCA

Also not surprised - I think time is usually overstated before one does the analysis. Never liked - who is 'Hot', or what was their record in the last 10 matches.

Wonder how much better RPI Raw would improve with HCA (3.1%)?

The RPI Raw is RKPI straight from the horse's mouth. RPI HCA is RKPI with an empirical HCA added in to maximize the success (.0291).

As for the time aspect...I've looked at this in terms of predictability of outcomes. There is very little time dependence throughout the season until basically November, at which case it ramps up significantly.

The time function I use in Pablo is something like a half-life of 200 days from the beginning of the season to November (where a match played 200 days ago counts half of what one does now), but a half-life of 56 days starting at Nov 1. This is based on analysis of past years' results.

Note that even the 56 day half-life starting in November is still larger than what I used to use, which was 48 for the whole season. But an analysis of results shows that teams don't change near that much during the year.

In the NFL, however, they do. The half-life for an NFL outcome is something like 4 weeks. That means that if two teams play at the beginning of the season and again at the end, by the time they play at the end, who won the first week tells you almost nothing. What matters is the home field.

mikegarrison
Legend

All-Volleytalk 1st Team (2019, 2017, 2016), All-Volleytalk 2nd Team (2018)

Posts: 63,173

The Ultimate Ranking System? Nov 25, 2014 17:37:29 GMT -5

Quote

Post by mikegarrison on Nov 25, 2014 17:37:29 GMT -5

Nov 25, 2014 17:25:50 GMT -5 bluepenquin said:

Nov 25, 2014 17:07:54 GMT -5 The Bofa on the Sofa said:

Assessment (model, correct, %)
RPI Raw 3728 0.823139766
RPI HCA 3761 0.830426143
Full Pablo 3734 0.824464562
UPR 3971 0.876793994
UPR No HCA 3826 0.844778097
UPR Time 3957 0.873702804

So not surprising, we sacrifice a bit when emphasizing the later part of the season, but still the performance is good. You can get a lot better with a HCA

Also not surprised - I think time is usually overstated before one does the analysis. Never liked - who is 'Hot', or what was their record in the last 10 matches.

Wonder how much better RPI Raw would improve with HCA (3.1%)?

Well I'm not surprised either, but not for the same reason you are.

If you design the ranking system to reflect a certain input (time-weighted wins) and then measure it against something else (raw wins) then of course it probably won't score as high as something that used the same input as what you are measuring it against.

It's like if I want to rank order everybody in a room by height, and I have a scale, I can decide to weigh them all and figure the taller people usually weigh more. But that's not going to give me as good of a ranking by height as if I use actual height as my input.

What's interesting is the HCA. Bofa, when you say that with HCA you get a better fit to the input than without it, does that mean you match 87% of the raw wins/losses with your HCA-adjusted methodology or does it mean you match 87% of the HCA-adjusted wins/losses with your HCA-adjusted methodology?

gogophers
Professional

Posts: 2,655

The Ultimate Ranking System? Nov 25, 2014 17:52:49 GMT -5

Quote

Post by gogophers on Nov 25, 2014 17:52:49 GMT -5

I won't pretend to understand any of this, but maybe this is a good time to ask a question about HCA. Pablo assumes, if I understand it correctly, that HCA is the same for all teams playing at home. But I would think that certain teams, by virtue of location (e.g., mountain schools)or attendance would have markedly better HCAs than other schools. I can understand the difficulty or crafting a HCA specific to a team. But--and here, finally, is the question--does using a HCA that is, out of necessity, the same for everyone create wrong predictions to a significant degree?

bluepenquin
Hall of Fame

4-Time VolleyTalk Poster of the Year (2019, 2018, 2017, 2016), All-VolleyTalk 1st Team (2021, 2020, 2019, 2018, 2017, 2016)

Posts: 12,447

The Ultimate Ranking System? Nov 25, 2014 17:59:02 GMT -5

Quote

Post by bluepenquin on Nov 25, 2014 17:59:02 GMT -5

Nov 25, 2014 17:37:29 GMT -5 mikegarrison said:

Nov 25, 2014 17:25:50 GMT -5 bluepenquin said:

Also not surprised - I think time is usually overstated before one does the analysis. Never liked - who is 'Hot', or what was their record in the last 10 matches.

Wonder how much better RPI Raw would improve with HCA (3.1%)?

Well I'm not surprised either, but not for the same reason you are.

If you design the ranking system to reflect a certain input (time-weighted wins) and then measure it against something else (raw wins) then of course it probably won't score as high as something that used the same input as what you are measuring it against.

It's like if I want to rank order everybody in a room by height, and I have a scale, I can decide to weigh them all and figure the taller people usually weigh more. But that's not going to give me as good of a ranking by height as if I use actual height as my input.

What's interesting is the HCA. Bofa, when you say that with HCA you get a better fit to the input than without it, does that mean you match 87% of the raw wins/losses with your HCA-adjusted methodology or does it mean you match 87% of the HCA-adjusted wins/losses with your HCA-adjusted methodology?

Kind of realized my error once I posted. Thinking more in terms of time and predicting going forward and not fitting backwards. But then, after thinking about it, I don't really understand how the formula could work for time - going backwards? If done like you describe - then it would not be necessary and I then become surprised that the two are that close.