sample N games at one time in replay_buffer #117

mokemokechicken · 2021-01-15T01:39:47Z

When there are many games(about over 2000) in replay_buffer and large batch_size(for example over 256), cpu usage of ReplayBuffer is nearly 100% in my environment.
Slow get_batch() makes slow training.

Sampling games at one time reduce the cpu usage to around 50% and training keep fast.

logar16 · 2021-01-15T02:04:45Z

How does this change affect the overall speed of training? I can’t imagine it would make things faster? Can you give metrics comparing before and after values? It would be best to consider not just CPU load but how this would affect the overall training time. How does your suggested change *both* improve CPU and still keep training fast? What was the problem with batching before? Batching will essentially always be faster if vectorization is properly used, so maybe we need to fix the batching instead of introducing a for-loop bottleneck that will need to be fixed later anyway.

mokemokechicken · 2021-01-15T12:36:24Z

Hi, @logar16
thank you for your reply!

Can you give metrics comparing before and after values?

Ok, I will show the metrics.

How does this change affect the overall speed of training?

In the current implementation, sample_game() is called batch_size times in one get_batch().
In sample_game(), this code may be slow when len(self.buffer) is large.

            game_probs = numpy.array(
                [game_history.game_priority for game_history in self.buffer.values()],
                dtype="float32",
            )

In the patched implementation, these code is called at once in one get_batch().

This is not conclusive evidence, but it is a graph that shows how training slows down over time.
In the patched implementation, this no longer occurs.

Later, I will report a comparison of the two implementations under the same conditions.

mokemokechicken · 2021-01-15T14:45:50Z

I ran connect4 with the following environment and config.

Enviroment
- GeForce GTX 1080
- 8 CPU, 64GB Memory
Config(changed only)
- num_simulations = 2 # for fast game generation
- batch_size = 256
- num_unroll_steps = 5
- replay_buffer_size = 10000 # (original)

Red is before, Blue is after.
After about 1k-1.5k LogSteps, red training became slower (about 30-50% in this case).
ReplayBuffer size around 1k-1.5k LogSteps is about 2k-3k.

※ I don't know why both self-plays became slower after 1.5k~2k LogSteps...

ahainaut · 2021-02-09T10:06:55Z

@mokemokechicken
Thank you for this great feature !
It looks good to me.

…ple_n_games_at_one_time_in_get_batch sample N games at one time in replay_buffer

sample N games at one time in replay_buffer

9e957f5

ahainaut merged commit 97e4931 into werner-duvaud:master Feb 9, 2021

egafni pushed a commit to egafni/muzero-general that referenced this pull request Apr 15, 2021

Merge pull request werner-duvaud#117 from mokemokechicken/feature/sam…

2f3e3cb

…ple_n_games_at_one_time_in_get_batch sample N games at one time in replay_buffer

EpicLiem pushed a commit to EpicLiem/muzero-general-chess-archive that referenced this pull request Feb 4, 2023

Merge pull request werner-duvaud#117 from mokemokechicken/feature/sam…

99adb87

…ple_n_games_at_one_time_in_get_batch sample N games at one time in replay_buffer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sample N games at one time in replay_buffer #117

sample N games at one time in replay_buffer #117

mokemokechicken commented Jan 15, 2021

logar16 commented Jan 15, 2021 via email •

edited

Loading

mokemokechicken commented Jan 15, 2021

mokemokechicken commented Jan 15, 2021

ahainaut commented Feb 9, 2021

sample N games at one time in replay_buffer #117

sample N games at one time in replay_buffer #117

Conversation

mokemokechicken commented Jan 15, 2021

logar16 commented Jan 15, 2021 via email • edited Loading

mokemokechicken commented Jan 15, 2021

mokemokechicken commented Jan 15, 2021

ahainaut commented Feb 9, 2021

logar16 commented Jan 15, 2021 via email •

edited

Loading