BUG: Atari environments do not default to 108K frames (27K steps) per episode

After testing the PPO algorithm across _56 Atari environments_, I noticed a discrepancy with some environments. In particular,  the mean rewards attained differed from mean rewards attained by the PPO implementations from Stable Baselines3 and CleanRL in _nine environments_. The table below shows the nine environments, where five trials was conducted for each (implementation, environment) permutation, and an environment-wise one-way ANOVA was subsequently conducted to determine the effect of implementation source on mean reward. With respect to Baselines (not the 108 variant), it is observed that the implementation means are _significantly_ different.

<img width="1019" alt="Screenshot 2024-08-02 at 4 09 53 AM" src="https://github.com/user-attachments/assets/d4919e46-cdaf-4559-a6ad-ed76af8f3266">

In the figure below, the training curves are aggregated from five trials to indicate the minimum, maximum, and mean within the shaded regions. The y-axis represents the mean reward while the x-axis represents the number of frames (total of 40 million frames). The curves for Baselines, Stable Baselines3, and CleanRL are in purple, orange, and red respectively (the blue and green curves can be ignored). It can be observed that Baselines' curves are significantly different than the curves from CleanRL and Stable Baselines3, aligning with the table above. 

<img width="484" alt="Screenshot 2024-08-02 at 4 35 44 AM" src="https://github.com/user-attachments/assets/5ff56443-c2d4-4baa-93c1-eef3e869f0ec">

After manually debugging the code, I managed to locate the inconsistency. The environment was not conforming to the ALE specification of [108K frames per episode](https://github.com/Farama-Foundation/Arcade-Learning-Environment/blob/aff5939521687ad778c420f170883a169814ab2c/src/python/registration.py#L164) for the v4 variant---the default variant used by this repository and most DRL Libraries (e.g., [CleanRL](https://github.com/vwxyzjn/cleanrl/blob/65789babaae033433078504b4ff0b925d5e27b99/cleanrl/ppo_atari.py#L45) and [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3/blob/6ad6fa55b6e38c8456dd333f71fe45373f66fe90/stable_baselines3/common/atari_wrappers.py#L269)). After setting `max_episode_steps` in the `make_atari` function to be 27K (108K frames), the implementations were now consistent in _three_ out of the nine environments, as seen in the table above and the figure below.

<img width="471" alt="Screenshot 2024-08-02 at 4 51 37 AM" src="https://github.com/user-attachments/assets/70fc3641-c58c-4a2b-a1ff-821566cda923">


I will create a pull request which sets the default number of frames per episode to be 108K (27K steps), with minimal changes to the original codebase so that it does not affect other components. However, I believe that there might still be other inconsistencies since there are six environments that still significantly differ between the implementations. Any suggestions on the possible causes of these inconsistencies would be much appreciated. In case the pull-request is not accepted, I have also included the fix below for those wanting to train Atari environments :)

```python
# one line change in baselines/baselines/common/atari_wrappers.py
def make_atari(env_id, max_episode_steps=27000):
```

Run Command To Replicate:
```bash
python -m baselines.run --alg=ppo2 --env=AtlantisNoFrameskip-v4 --seed 0 --num_timesteps 10e6 --network cnn --num_env 8
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BUG: Atari environments do not default to 108K frames (27K steps) per episode #1234

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BUG: Atari environments do not default to 108K frames (27K steps) per episode #1234

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions