Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReturnAverage and NonzeroRewardsAverage in logs become nan after a period of time. #207

Open
bryanyuan1 opened this issue Jul 9, 2021 · 0 comments

Comments

@bryanyuan1
Copy link

Hi. I am new to rlpyt and I ran the dqn_async_gpu example. I looked at the logs at /data/local and I was trying to find the episode rewards. I suspect that NonzeroRewardsAverage or ReturnAverage can be the rewards.

However, these columns both show numbers of -21 at the very first rows, and then after 1000000 timesteps, suddenly both columns have 'nan' in all following rows.

I found it confusing - as one issue mentioned, nan means there is no entry. It does make sense that in the beginning, the reward might be nan because there are not enough samples to learn. However, why there is no reward entry in the log after learning for a decent amount of time.

Here is the code that I run:

from rlpyt.utils.launching.affinity import make_affinity
from rlpyt.samplers.async_.gpu_sampler import AsyncGpuSampler
from rlpyt.envs.atari.atari_env import AtariEnv, AtariTrajInfo
from rlpyt.algos.dqn.dqn import DQN
from rlpyt.agents.dqn.atari.atari_dqn_agent import AtariDqnAgent
from rlpyt.runners.async_rl import AsyncRlEval
from rlpyt.utils.logging.context import logger_context


def build_and_train(game="asterix", run_ID=0):
    # Change these inputs to match local machine and desired parallelism.
    affinity = make_affinity(
        run_slot=0,
        n_cpu_core=16,  # Use 16 cores across all experiments.
        n_gpu=8,  # Use 8 gpus across all experiments.
        gpu_per_run=6,
        sample_gpu_per_run=2,
        async_sample=True,
        optim_sample_share_gpu=False,
        # hyperthread_offset=24,  # If machine has 24 cores.
        n_socket=1,  # Presume CPU socket affinity to lower/upper half GPUs.
        # gpu_per_run=2,  # How many GPUs to parallelize one run across.
        # cpu_per_run=1,
    )

    sampler = AsyncGpuSampler(
        EnvCls=AtariEnv,
        TrajInfoCls=AtariTrajInfo,
        env_kwargs=dict(game=game),
        batch_T=5,
        batch_B=36,
        max_decorrelation_steps=100,
        eval_env_kwargs=dict(game=game),
        eval_n_envs=2,
        eval_max_steps=int(10e3),
        eval_max_trajectories=4,
    )
    algo = DQN(
        replay_ratio=8,
        min_steps_learn=1e4,
        replay_size=int(1e5)
    )
    agent = AtariDqnAgent()
    runner = AsyncRlEval(
        algo=algo,
        agent=agent,
        sampler=sampler,
        n_steps=5e7,
        log_interval_steps=1e4,
        affinity=affinity,
    )
    config = dict(game=game)
    name = "async_dqn_" + game
    log_dir = "async_dqn"
    with logger_context(log_dir, run_ID, name, config):
        runner.train()
        
build_and_train(
    game='pong',
    run_ID=0,
)

Here is a link to the log .csv file generated in the log: https://docs.google.com/spreadsheets/d/1reG3pavGxveoFufgzAEp5RhLSKilNr3MKq-NILeqJaI/edit?usp=sharing

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant