calibration of current and new ais #74

sanderland · 2020-06-16T11:52:11Z

make an ai that mimics human play and is not just policy based

SimonLewis7407 · 2020-06-16T22:31:22Z

Hi, I have played against the Calibrated Rank AI in KaTrain version 1.2 six times now, at various strengths (10, 11, 12, 14kyu) and they seem quite good. I used to play against the "P-Pick" settings in version 1.1 but those would always outplay me in the middlegame and then go insane in the endgame. The Calibrated Rank AI is definitely more consistent, less dominant in the middlegame, less crazy in the endgame. More like a human.

The only thing that still feels funny or "artificial" to me now (but I am a weak player) is that during the fuseki the AI tenukis a whole lot. I know players like me have the opposite problem, failing to tenuki enough, but the AI really jumps around.

I am sure there will always be minor details and fine-tuning, but you have achieved a whole lot here. The Calibrated Rank AI represents, as nearly as I can tell, both a strong step forward and a pretty good approximation of what a "real" go game ought to feel like.

sanderland · 2020-06-17T07:27:18Z

Yes, this is due to the pick-based algorithm inherently being a bit tenuki happy, more so if there are fewer moves to choose from. I've played the 0-4kyu ones a bit and they feel more balanced in that respect. Nevertheless, the AI sees the opening as having a large number of decent moves, as seen by e.g. opening tengen in a teaching game.

SimonLewis7407 · 2020-06-18T21:51:20Z

Hi, could you point me to where in the program the code is that calls the "save an SGF file" operations? I would like to see if I can make the Save command write two SGF files -- one is the normal one, and one would be a specific one with just the details that I want, formatted in a particular way, that I can read into a different program.

sanderland · 2020-06-19T07:37:02Z

@SimonLewis7407

game.py has write_sgf which sets up some options / filenames and calls
sgf in sgf_parser.py, which has the actual parser and generator, and hooks into
sgf_properties of game_node.py, which modifies the comment to include/exclude info based on settings.

sanderland · 2020-06-23T08:00:23Z

set it to the latest 20b for a day to see what happens, the stronger bots run away with their rank a bit. sgf in the sgf_20b directory @bale-go

- 15b model
katrain-10k[11k]
katrain-14k[14k]
katrain-18k[17k]
katrain-2d[3d]
katrain-2k[1d]
katrain-6k[6k]

- 20b model one day
katrain-10k[10k]
katrain-14k[11k]
katrain-18k[17k]
katrain-2d[5d]
katrain-2k[2d]
katrain-6k[6k]

bale-go · 2020-06-23T08:24:56Z

Interesting. It seems as you get closer to the rank of the pure policy network the correctness of the move rank estimation becomes really important.
If you settle for a larger NN model for katrain, I can recalibrate the calibrated rank bot.

Dontbtme · 2020-06-23T08:25:23Z

With the 15b, bots with ranks weaker than 18k weren't stable at all, if I remember correctly. Maybe now it would be possible to have consistent bots for beginners, using the 20b net?

bale-go · 2020-06-23T08:27:12Z

I do not think much would change at the weaker rank region.
The 15b model is strong enough to be a perfect player there.

sanderland · 2020-06-23T08:30:29Z

I think at higher strengths you have a lot more chance to hit the top move / near the top move several times in a row and play out a consistent strategy. The lower ones pick like 8 moves? You're kind of stuck choosing the 'least bad' move then, which the policy was definitely not trained for.

Dontbtme · 2020-06-23T08:41:34Z

True, that's why I still think that if the bot was able to see the whole board with 1 visit per candidate move and chose one that would fit a given rank, that would be ideal.
But you said that wouldn't be practical, and since having stable bots starting from 17kyu or so is already awesome, I'll gladly forget about it and move on ^_^

SimonLewis7407 · 2020-06-23T21:30:31Z

I'm having a great time playing against Calibrated Rank. I modified the three files that you named, so that when I save an SGF, the normal one is saved and also a pipe-delimited text file that I can read into Excel. The text file shows the moves, points lost, score, etc. A couple of questions about that:

(a) on a game that I won, my average points lost per move was 0.79, and on another game (that I lost) it was 1.10. Don't those seem low? (My opponent was Calibrated Rank at 13kyu, and we seem evenly matched currently. Shouldn't we be losing more points per move on average?)

(b) Another thing is just housekeeping. Within the text files, when you go from a given score, and then add/subtract the impact of a player's move, you should get the next printed score, and so on. But the arithmetic on that is only approximate in the generated file, it fluctuates. And we're not just talking about rounding very small differences. Do you know what might be making the arithmetic from move to move less than exact?

sanderland · 2020-06-24T10:32:05Z

yeah, mean point loss seems too low for those ranks. i'm not sure what you mean by (b)

SimonLewis7407 · 2020-06-24T17:49:46Z

Looking over the game where my mean point loss was just 0.79 it was a pretty "smooth" game. Sometimes at my level, though, there will be a big move that both sides don't notice for a long time, causing every move to lose several points until finally somebody plays the big move. In a case like that our mean point loss per move would be way higher. Maybe the 0.79 is partly justified because neither player had any large messups like that.

Concerning (b), I didn't express it right and now I see more clearly what's up. In the file game_node.py you have an SGF file show "Estimated point loss" for a move whenever "if sgf and points_lost > 0.5" is true. I kept that but for my second output file, I wanted an "Estimated point loss" no matter what, so I changed that condition to "if sgf and points_lost != 999.000"
There are two results of that. One is that the mean point loss will be understated compared to regular Katrain, since there are some negative point losses thrown in. The other thing is that I guess the arithmetic from move to move (current estimated score, minus the current move's point loss, should always equal the next current estimated score) will no longer be perfect, just close.
I found that around 20% of moves had a small "negative" point loss.

Not sure if any of this is of interest or helpful, but I'm mentioning it just in case there's something useful there.

bale-go · 2020-06-25T21:34:27Z

I recalibrated the calibrated rank bot to the final 20b model. At least 30 games were played at each rank.
At >3 kyu (weaker than 3 kyu) the #_of_moves_seen by KataGo did not change between models 15b and 20b.
At <3 kyu less moves was sufficient for the stronger policy net to play an even game with various pachi bots.

Even games with different bots at different total number of moves seen by katago policy net 20b model.
GnuGo 3.8 at level 10 (8 kyu): 30
pachi -t =5000 --nodcnn (3 kyu): 66
pachi -t =5000:15000 (3 dan): 100
pachi -t =5000:15000 with 2 handicap stones and 0.5 komi (ca. 5 dan): 118

Blue line/points are 15b, magenta line/points are 20b.

The calibration for 20b can be divided into two regions:

kyu-rank > 1k: int(round(10^(-0.05737 kyu + 1.9482)))
kyu-rank < 1k: int(round(10^(-0.03585 kyu + 1.9284)))

sanderland · 2020-06-26T08:30:06Z

I seriously doubt you can divide a fit on 4 points of noisy data into 2 regions. Happy to see it reasonable consistent though.
Could you try some runs down to 15k or does gnugo not go that low?

bale-go · 2020-06-26T08:50:17Z

I assumed that at lower ranks the calibration stays the same as 15b.
GnuGo does not get weaker much more, unless we use large handicaps.

But there is a more pressing concern.
I plotted the outlier free mean of move ranks vs. the # of legal moves on board for calibrated rank bot and users.
It seems that what works at lower ranks might not work for stronger players.
Namely, at stronger levels the outlier free mean of move ranks does not decrease over the game as much (or at all).

I'm working on a model to take this into account, both for the next calibrated rank bot and the rank estimation.

bale-go · 2020-06-26T15:54:15Z

I used the user data to estimate the change in the outlier free move rank.
I used the regression that had the lowest AIC value.

The new bot uses the overall kyu rank estimation from the tested calibrated rank bot (first part of the equation) but I modified the shape of the curve to mimic human play better (starting with 0.3116).
(0.0630149 + 0.762399 * board_size/(10**(-0.05737*kyu_rank+1.9482))) * (0.31164467+0.55726218*(n_legal_moves/board_size)*np.exp(-1*(3.0308747*(n_legal_moves/board_size)*(n_legal_moves/board_size)-(n_legal_moves/board_size)-0.045792218*kyu_rank-0.31164467)**2)-0.0064860256*kyu_rank)

The equation should scale to various board sizes, although it was only tested in 19x19.
The new bot should be relatively stronger in the opening, and it does not become superhuman by the endgame (previous calibrated rank bot at 2 kyu did not make any mistakes at 100 legal moves left).
I played against it a few times I feel it much more balanced, but I'm afraid I am biased :)
I created a PR with the user data based AI.

sanderland · 2020-06-26T20:09:22Z

How do you relate the outlier free mean to the n_moves + override that you give the bot?

bale-go · 2020-06-26T21:03:54Z

Outlier free mean: OFM
Number of moves seen by katago: NMSK
Number of legal moves on board: NLMB

OFM = 0.0630149 + 0.762399 * NLMB/NMSK
or
NMSK = NLMB/(1.31165*OFM - 0.08265)

If you are interested, I can upload the data (100000 runs for each NMSK, NLMB pairs to get the OFM) I used for the symbolic regression.

sanderland · 2020-06-26T21:08:22Z

I don't understand NMSK, but as long as you took this into account, let's give it a spin on OGS

SimonLewis7407 · 2020-06-26T21:11:52Z

May I ask what "outlier free mean" means? Does it mean the mean, ignoring a certain arbitrary percent of the highest and lowest values? Or does it ignore any values that are more than, say, 3 standard deviations from the mean? Something like that?

bale-go · 2020-06-26T21:12:53Z

@sanderland In other words p:pick algorithm sees only NMSK from total NLMB.
For example an 8k p:pick sees only 30 moves (NMSK = 30)
A 3k p:pick sees ca. 60 moves (NMSK = 60)

@SimonLewis7407 It ignores the best and worst 20%

sanderland · 2020-06-26T21:34:15Z

@SimonLewis7407 it's some compromise between mean and median @bale-go likes to use. I think possibly just using median could be better insofar it's not a new invention and very close to this anyway.
@bale-go The length and complexity of the equations you are generating are getting a bit too long for my liking, it's very hard to see what the asymtotic behaviour is and so on and whether there's a divide by zero waiting to happen. I'd appreciate an attempt to simplify the equation to be more human readable/understandable. If that's hard, at least introduce a helper var or two and cut some insignificant digits
Regardless, the last equation is on OGS now, we'll see what it does :)

bale-go · 2020-06-26T21:53:05Z

Yes, median was my first choice too. The problem with it is that it can only be an integer. At higher ranks there is a significant difference between mean move rank of 2.6 and 3.4, but the median would give 3.0 to both.

The complexity has definitely increased. I checked the equations with random sampling of the input variables and they behaved well.
The main issue I see now is with the rank estimation at the end of the game. As you predicted it earlier:

yep, and in endgame the number of moves is small and the best one can be really obvious, so calling someone a 9d over doing hane-connect a few times is also tricky.

A lot of games end with >5d rank estimates due to that.

sanderland · 2020-06-26T23:22:42Z

I suggest we cap our rank estimate at 4d and just show a '4d+' label if the top end is that.
also my usual "black -l 120" did some fun things to your new equation in ai.py ;)

bale-go · 2020-06-27T10:56:13Z

I made the equation cleaner.
Also, I had to remove the obvious moves from moves.csv since they distorted the mean move rank.

sanderland · 2020-06-27T11:41:25Z

That looks great!

sanderland · 2020-07-02T07:42:14Z

Self-play tournament shows policy is a lot stronger than the 4d calibrated rank, probably since they can't exploit the other's lack of reading the way humans can.

(kyu rank vs elo, first point is policy, others calibrated rank)

bale-go · 2020-07-02T07:45:56Z

Nice!
The linearity of the plot (except for the first point) is really convincing.

sanderland · 2020-07-07T17:37:26Z

All released - the strength models for all the AIs are a bit rushed, so probably there's quite a bit of room for improvement.

portkata · 2020-11-02T20:44:54Z

@sanderland where in the code are you able to prevent the calibrated bot from making illegal moves for the selected rule set. The mobile version is great, but being unable to exclude illegal moves for non tromp taylor rulesets from the pool of possible moves might be making it just a hair weaker than your version. Thanks!

sanderland · 2020-11-02T22:18:02Z

@sanderland where in the code are you able to prevent the calibrated bot from making illegal moves for the selected rule set. The mobile version is great, but being unable to exclude illegal moves for non tromp taylor rulesets from the pool of possible moves might be making it just a hair weaker than your version. Thanks!

the policy should be -1 for illegal moves.

sanderland · 2021-01-16T18:12:33Z

simplified still needs this, but compute is too low to do it

sanderland added the 1.3 label Jun 16, 2020

sanderland mentioned this issue Jun 16, 2020

AIs calibrated to kyu/dan strength with easier to understand settings #44

Closed

sanderland added future sometime-maybe and removed 1.3 labels Jun 21, 2020

isty2e mentioned this issue Jun 22, 2020

improved UI for ai settings #72

Closed

sanderland changed the title ~~score loss based calibrated ai~~ calibration of current and new ais Jun 23, 2020

This was referenced Jun 26, 2020

User data based ai #103

Merged

Improving rank feedback to user #73

Closed

bale-go mentioned this issue Jun 27, 2020

user_data_based_AI_obvious-move-patch #109

Merged

sanderland added 1.3 and removed future sometime-maybe labels Jun 27, 2020

sanderland removed the 1.3 label Jul 7, 2020

sanderland added the rank-calibration label Jul 7, 2020

hope366 mentioned this issue Jul 18, 2020

Questions about Calibrated Rank #144

Closed

portkata mentioned this issue Sep 27, 2020

Android binary lightvector/KataGo#321

Open

sanderland closed this as completed Jan 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

calibration of current and new ais #74

calibration of current and new ais #74

sanderland commented Jun 16, 2020

SimonLewis7407 commented Jun 16, 2020

sanderland commented Jun 17, 2020

SimonLewis7407 commented Jun 18, 2020

sanderland commented Jun 19, 2020

sanderland commented Jun 23, 2020

bale-go commented Jun 23, 2020

Dontbtme commented Jun 23, 2020

bale-go commented Jun 23, 2020

sanderland commented Jun 23, 2020

Dontbtme commented Jun 23, 2020 •

edited

Loading

SimonLewis7407 commented Jun 23, 2020

sanderland commented Jun 24, 2020

SimonLewis7407 commented Jun 24, 2020

bale-go commented Jun 25, 2020

sanderland commented Jun 26, 2020

bale-go commented Jun 26, 2020

bale-go commented Jun 26, 2020 •

edited

Loading

sanderland commented Jun 26, 2020

bale-go commented Jun 26, 2020

sanderland commented Jun 26, 2020

SimonLewis7407 commented Jun 26, 2020

bale-go commented Jun 26, 2020 •

edited

Loading

sanderland commented Jun 26, 2020 •

edited

Loading

bale-go commented Jun 26, 2020

sanderland commented Jun 26, 2020 •

edited

Loading

bale-go commented Jun 27, 2020 •

edited

Loading

sanderland commented Jun 27, 2020

sanderland commented Jul 2, 2020 •

edited

Loading

bale-go commented Jul 2, 2020

sanderland commented Jul 7, 2020

portkata commented Nov 2, 2020 •

edited

Loading

sanderland commented Nov 2, 2020

sanderland commented Jan 16, 2021

calibration of current and new ais #74

calibration of current and new ais #74

Comments

sanderland commented Jun 16, 2020

SimonLewis7407 commented Jun 16, 2020

sanderland commented Jun 17, 2020

SimonLewis7407 commented Jun 18, 2020

sanderland commented Jun 19, 2020

sanderland commented Jun 23, 2020

bale-go commented Jun 23, 2020

Dontbtme commented Jun 23, 2020

bale-go commented Jun 23, 2020

sanderland commented Jun 23, 2020

Dontbtme commented Jun 23, 2020 • edited Loading

SimonLewis7407 commented Jun 23, 2020

sanderland commented Jun 24, 2020

SimonLewis7407 commented Jun 24, 2020

bale-go commented Jun 25, 2020

sanderland commented Jun 26, 2020

bale-go commented Jun 26, 2020

bale-go commented Jun 26, 2020 • edited Loading

sanderland commented Jun 26, 2020

bale-go commented Jun 26, 2020

sanderland commented Jun 26, 2020

SimonLewis7407 commented Jun 26, 2020

bale-go commented Jun 26, 2020 • edited Loading

sanderland commented Jun 26, 2020 • edited Loading

bale-go commented Jun 26, 2020

sanderland commented Jun 26, 2020 • edited Loading

bale-go commented Jun 27, 2020 • edited Loading

sanderland commented Jun 27, 2020

sanderland commented Jul 2, 2020 • edited Loading

bale-go commented Jul 2, 2020

sanderland commented Jul 7, 2020

portkata commented Nov 2, 2020 • edited Loading

sanderland commented Nov 2, 2020

sanderland commented Jan 16, 2021

Dontbtme commented Jun 23, 2020 •

edited

Loading

bale-go commented Jun 26, 2020 •

edited

Loading

bale-go commented Jun 26, 2020 •

edited

Loading

sanderland commented Jun 26, 2020 •

edited

Loading

sanderland commented Jun 26, 2020 •

edited

Loading

bale-go commented Jun 27, 2020 •

edited

Loading

sanderland commented Jul 2, 2020 •

edited

Loading

portkata commented Nov 2, 2020 •

edited

Loading