Replies: 4 comments 9 replies
-
I should also clarify: if you want a sense of what these numbers actually "mean" (in reference to something like chess), the current multiplier we're using for all ratings is |
Beta Was this translation helpful? Give feedback.
-
If you could clarify, how would you incorporate osutrack data, and what about players who do not have any data on the website? Also, I think that adjusting ratings can be done, but at a less frequent rate, such as yearly. Admittedly, I'm not too experienced in rating systems such as this one, but I don't think ratings will deviate far enough from the median. Maybe use a metric to automatically adjust the rating according to percentile? Although, that could be flawed if, again, the playerbase changes too rapidly. |
Beta Was this translation helpful? Give feedback.
-
I think the range of 225-1350 makes sense for starting ratings. Ideally a change in initial rating should not affect someone's rating after multiple tournaments - it might be worth ignoring osutrack data completely just to stay consistent across all players. |
Beta Was this translation helpful? Give feedback.
-
Okay, here's one piece of systematic data so we can point to it later. Here are the rough star ratings for the Quarterfinals pools in each of the rank ranges' world cups (which I think is an okay benchmark for the typical difficulty of the rank range at the time -- choosing QF here because that's near the "average" pool that people would play):
These numbers are stratified enough that I think we don't actually need any further correction to "old player initial ratings" other than the decay that we're already doing (though I think how exactly decay is calculated could still use some discussion later). If anyone thinks there's a problem with how I'm thinking about this / wants to gather their own data, feel free to add it in a reply here! |
Beta Was this translation helpful? Give feedback.
-
Starting a discussion for this to keep some better records as we build this in over the next few weeks. Thoughts very welcome!
Overall, the point of starting people out at different ratings is that a tournament player's starting rank is significant prior knowledge that should be taken into account when we have a Bayesian rating algorithm. While everyone begins with some rating volatility, it is best to assume that a generic rank 1000 player beats a generic rank 100000 player a significant fraction of the time to make the "mixing of ratings" faster.
At the current moment, we only have a formula coded in to fit osu!std. Ignoring the multiplier scaling, players start off with a standard deviation (volatility) of
5
and a rating of45 - 3.2 ln(rank)
, clipped from below and above at5
and30
. The reasons for this choice are the following:ln(rank)
was essentially normally distributed. Thus a formula like this would give a starting distribution of ratings which is close to bell-shaped. Furthermore, the Plackett-Luce model documentation recommends an initialization ofsigma = mu/3
, and indeed a rank 12000 player starts at a rating around15
.20
and22
) looked good after processing data. (If our spread had been too wide, we would have seen high initial ratings drop quickly and low initial ratings climb significantly because we were too overconfident in high-ranked players' performance.)However, there are a few problems we'll want to address as we continue thinking about this:
Any thoughts on any of these points would be appreciated -- I'm making this a single discussion point because all of these questions are sort of connected and answers to any of them would be useful for others.
Beta Was this translation helpful? Give feedback.
All reactions