ReturnAverage and NonzeroRewardsAverage in logs become nan after a period of time. #207

bryanyuan1 · 2021-07-09T03:14:32Z

Hi. I am new to rlpyt and I ran the dqn_async_gpu example. I looked at the logs at /data/local and I was trying to find the episode rewards. I suspect that NonzeroRewardsAverage or ReturnAverage can be the rewards.

However, these columns both show numbers of -21 at the very first rows, and then after 1000000 timesteps, suddenly both columns have 'nan' in all following rows.

I found it confusing - as one issue mentioned, nan means there is no entry. It does make sense that in the beginning, the reward might be nan because there are not enough samples to learn. However, why there is no reward entry in the log after learning for a decent amount of time.

Here is the code that I run:

from rlpyt.utils.launching.affinity import make_affinity
from rlpyt.samplers.async_.gpu_sampler import AsyncGpuSampler
from rlpyt.envs.atari.atari_env import AtariEnv, AtariTrajInfo
from rlpyt.algos.dqn.dqn import DQN
from rlpyt.agents.dqn.atari.atari_dqn_agent import AtariDqnAgent
from rlpyt.runners.async_rl import AsyncRlEval
from rlpyt.utils.logging.context import logger_context


def build_and_train(game="asterix", run_ID=0):
    # Change these inputs to match local machine and desired parallelism.
    affinity = make_affinity(
        run_slot=0,
        n_cpu_core=16,  # Use 16 cores across all experiments.
        n_gpu=8,  # Use 8 gpus across all experiments.
        gpu_per_run=6,
        sample_gpu_per_run=2,
        async_sample=True,
        optim_sample_share_gpu=False,
        # hyperthread_offset=24,  # If machine has 24 cores.
        n_socket=1,  # Presume CPU socket affinity to lower/upper half GPUs.
        # gpu_per_run=2,  # How many GPUs to parallelize one run across.
        # cpu_per_run=1,
    )

    sampler = AsyncGpuSampler(
        EnvCls=AtariEnv,
        TrajInfoCls=AtariTrajInfo,
        env_kwargs=dict(game=game),
        batch_T=5,
        batch_B=36,
        max_decorrelation_steps=100,
        eval_env_kwargs=dict(game=game),
        eval_n_envs=2,
        eval_max_steps=int(10e3),
        eval_max_trajectories=4,
    )
    algo = DQN(
        replay_ratio=8,
        min_steps_learn=1e4,
        replay_size=int(1e5)
    )
    agent = AtariDqnAgent()
    runner = AsyncRlEval(
        algo=algo,
        agent=agent,
        sampler=sampler,
        n_steps=5e7,
        log_interval_steps=1e4,
        affinity=affinity,
    )
    config = dict(game=game)
    name = "async_dqn_" + game
    log_dir = "async_dqn"
    with logger_context(log_dir, run_ID, name, config):
        runner.train()
        
build_and_train(
    game='pong',
    run_ID=0,
)

Here is a link to the log .csv file generated in the log: https://docs.google.com/spreadsheets/d/1reG3pavGxveoFufgzAEp5RhLSKilNr3MKq-NILeqJaI/edit?usp=sharing

Thank you!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReturnAverage and NonzeroRewardsAverage in logs become nan after a period of time. #207

ReturnAverage and NonzeroRewardsAverage in logs become nan after a period of time. #207

bryanyuan1 commented Jul 9, 2021

ReturnAverage and NonzeroRewardsAverage in logs become nan after a period of time. #207

ReturnAverage and NonzeroRewardsAverage in logs become nan after a period of time. #207

Comments

bryanyuan1 commented Jul 9, 2021