Reproducible results with the RandomAgent #251

GMMDMDIDEMS · 2024-02-23T13:45:49Z

GMMDMDIDEMS
Feb 23, 2024

I try to achieve reproducible results with the RandomAgent. As I understand it, I should get the same results by specifying a seed, e.g.:

agent = RandomAgent(
    action_space=env.action_space, num_actions=env.max_allowed_actions, seed=42
)

However, if I change the number of episodes I get different different results. Shouldn't the results be identical in every episode?
What is the noop_values argument in the RandomAgent used for?

What is the purpose of the env seed that can be assigned in the def evaluate(...) method?

Answered by mike-gimelfarb

Feb 24, 2024

Hi,

Please find my answers below:

However, if I change the number of episodes I get different different results. Shouldn't the results be identical in every episode?
Not quite. The seed generation provides a starting point for an (infinite) sequence of random numbers. That is, every episode may use a different seed in generating randomness, but the sequence of random numbers and outcome should still be identical each time you run the experiment (if you find this is not working, let us know). The run_gym examples all fix the seed, if I recall. To get the reproducible results with random agent, you need to fix the environment seed and the random policy seed, since they each use their own RN…

View full answer

mike-gimelfarb · 2024-02-24T00:44:58Z

mike-gimelfarb
Feb 24, 2024
Maintainer

Hi,

Please find my answers below:

However, if I change the number of episodes I get different different results. Shouldn't the results be identical in every episode?
Not quite. The seed generation provides a starting point for an (infinite) sequence of random numbers. That is, every episode may use a different seed in generating randomness, but the sequence of random numbers and outcome should still be identical each time you run the experiment (if you find this is not working, let us know). The run_gym examples all fix the seed, if I recall. To get the reproducible results with random agent, you need to fix the environment seed and the random policy seed, since they each use their own RNG, as the example shows.

What is the purpose of the env seed that can be assigned in the def evaluate(...) method?
In relation to the previous question, this particular seed will fix the seed at the start of each episode, rather than at the start of the experiment, so that you will get the behaviour that you asked about (for the random policy there is no such option currently, as you will need to reset the seed manually in each episode). I do not remember why this option was added, since I do not see a use case of generating the same outcome many times, so I'm thinking of removing this option.

What is the noop_values argument in the RandomAgent used for?
Recently, we added the option for the random agent to work when the vectorized flag is set to True in the environment. This was necessary because the gym spaces are different and the usual sampling procedure does not work. Now, looking at the code, I don't see where the field is actually used in the policy, so it seems it can be removed. I'm going to link @ataitler to this post so he can confirm we can remove the field , or that it is used externally somewhere.

1 reply

GMMDMDIDEMS Feb 24, 2024
Author

Thanks for the explanation. I had a misconception of what was happening internally, and thought that the seed was the same in each episode. My idea was to iterate over different seeds to ensure that the performance of the results is not affected by the choice of a particular seed, but as described, this already happens by default when the agent runs for multiple episodes. That makes sense.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyrddlgym-project

Reproducible results with the RandomAgent #251

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

pyrddlgym-project

Reproducible results with the RandomAgent #251

GMMDMDIDEMS Feb 23, 2024

Replies: 1 comment · 1 reply

mike-gimelfarb Feb 24, 2024 Maintainer

GMMDMDIDEMS Feb 24, 2024 Author

GMMDMDIDEMS
Feb 23, 2024

Replies: 1 comment 1 reply

mike-gimelfarb
Feb 24, 2024
Maintainer

GMMDMDIDEMS Feb 24, 2024
Author