-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
calibration of current and new ais #74
Comments
Hi, I have played against the Calibrated Rank AI in KaTrain version 1.2 six times now, at various strengths (10, 11, 12, 14kyu) and they seem quite good. I used to play against the "P-Pick" settings in version 1.1 but those would always outplay me in the middlegame and then go insane in the endgame. The Calibrated Rank AI is definitely more consistent, less dominant in the middlegame, less crazy in the endgame. More like a human. The only thing that still feels funny or "artificial" to me now (but I am a weak player) is that during the fuseki the AI tenukis a whole lot. I know players like me have the opposite problem, failing to tenuki enough, but the AI really jumps around. I am sure there will always be minor details and fine-tuning, but you have achieved a whole lot here. The Calibrated Rank AI represents, as nearly as I can tell, both a strong step forward and a pretty good approximation of what a "real" go game ought to feel like. |
Yes, this is due to the pick-based algorithm inherently being a bit tenuki happy, more so if there are fewer moves to choose from. I've played the 0-4kyu ones a bit and they feel more balanced in that respect. Nevertheless, the AI sees the opening as having a large number of decent moves, as seen by e.g. opening tengen in a teaching game. |
Hi, could you point me to where in the program the code is that calls the "save an SGF file" operations? I would like to see if I can make the Save command write two SGF files -- one is the normal one, and one would be a specific one with just the details that I want, formatted in a particular way, that I can read into a different program. |
|
set it to the latest 20b for a day to see what happens, the stronger bots run away with their rank a bit. sgf in the sgf_20b directory @bale-go
|
Interesting. It seems as you get closer to the rank of the pure policy network the correctness of the move rank estimation becomes really important. |
With the 15b, bots with ranks weaker than 18k weren't stable at all, if I remember correctly. Maybe now it would be possible to have consistent bots for beginners, using the 20b net? |
I do not think much would change at the weaker rank region. |
I think at higher strengths you have a lot more chance to hit the top move / near the top move several times in a row and play out a consistent strategy. The lower ones pick like 8 moves? You're kind of stuck choosing the 'least bad' move then, which the policy was definitely not trained for. |
True, that's why I still think that if the bot was able to see the whole board with 1 visit per candidate move and chose one that would fit a given rank, that would be ideal. |
I'm having a great time playing against Calibrated Rank. I modified the three files that you named, so that when I save an SGF, the normal one is saved and also a pipe-delimited text file that I can read into Excel. The text file shows the moves, points lost, score, etc. A couple of questions about that: (a) on a game that I won, my average points lost per move was 0.79, and on another game (that I lost) it was 1.10. Don't those seem low? (My opponent was Calibrated Rank at 13kyu, and we seem evenly matched currently. Shouldn't we be losing more points per move on average?) (b) Another thing is just housekeeping. Within the text files, when you go from a given score, and then add/subtract the impact of a player's move, you should get the next printed score, and so on. But the arithmetic on that is only approximate in the generated file, it fluctuates. And we're not just talking about rounding very small differences. Do you know what might be making the arithmetic from move to move less than exact? |
yeah, mean point loss seems too low for those ranks. i'm not sure what you mean by (b) |
Looking over the game where my mean point loss was just 0.79 it was a pretty "smooth" game. Sometimes at my level, though, there will be a big move that both sides don't notice for a long time, causing every move to lose several points until finally somebody plays the big move. In a case like that our mean point loss per move would be way higher. Maybe the 0.79 is partly justified because neither player had any large messups like that. Concerning (b), I didn't express it right and now I see more clearly what's up. In the file game_node.py you have an SGF file show "Estimated point loss" for a move whenever "if sgf and points_lost > 0.5" is true. I kept that but for my second output file, I wanted an "Estimated point loss" no matter what, so I changed that condition to "if sgf and points_lost != 999.000" Not sure if any of this is of interest or helpful, but I'm mentioning it just in case there's something useful there. |
I seriously doubt you can divide a fit on 4 points of noisy data into 2 regions. Happy to see it reasonable consistent though. |
I used the user data to estimate the change in the outlier free move rank. The new bot uses the overall kyu rank estimation from the tested calibrated rank bot (first part of the equation) but I modified the shape of the curve to mimic human play better (starting with 0.3116). The equation should scale to various board sizes, although it was only tested in 19x19. |
How do you relate the outlier free mean to the n_moves + override that you give the bot? |
Outlier free mean: OFM OFM = 0.0630149 + 0.762399 * NLMB/NMSK If you are interested, I can upload the data (100000 runs for each NMSK, NLMB pairs to get the OFM) I used for the symbolic regression. |
I don't understand NMSK, but as long as you took this into account, let's give it a spin on OGS |
May I ask what "outlier free mean" means? Does it mean the mean, ignoring a certain arbitrary percent of the highest and lowest values? Or does it ignore any values that are more than, say, 3 standard deviations from the mean? Something like that? |
@sanderland In other words p:pick algorithm sees only NMSK from total NLMB. @SimonLewis7407 It ignores the best and worst 20% |
@SimonLewis7407 it's some compromise between mean and median @bale-go likes to use. I think possibly just using median could be better insofar it's not a new invention and very close to this anyway. |
Yes, median was my first choice too. The problem with it is that it can only be an integer. At higher ranks there is a significant difference between mean move rank of 2.6 and 3.4, but the median would give 3.0 to both. The complexity has definitely increased. I checked the equations with random sampling of the input variables and they behaved well.
A lot of games end with >5d rank estimates due to that. |
I suggest we cap our rank estimate at 4d and just show a '4d+' label if the top end is that. |
That looks great! |
Nice! |
All released - the strength models for all the AIs are a bit rushed, so probably there's quite a bit of room for improvement. |
@sanderland where in the code are you able to prevent the calibrated bot from making illegal moves for the selected rule set. The mobile version is great, but being unable to exclude illegal moves for non tromp taylor rulesets from the pool of possible moves might be making it just a hair weaker than your version. Thanks! |
the policy should be -1 for illegal moves. |
simplified still needs this, but compute is too low to do it |
make an ai that mimics human play and is not just policy based
The text was updated successfully, but these errors were encountered: