You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi. I am new to rlpyt and I ran the dqn_async_gpu example. I looked at the logs at /data/local and I was trying to find the episode rewards. I suspect that NonzeroRewardsAverage or ReturnAverage can be the rewards.
However, these columns both show numbers of -21 at the very first rows, and then after 1000000 timesteps, suddenly both columns have 'nan' in all following rows.
I found it confusing - as one issue mentioned, nan means there is no entry. It does make sense that in the beginning, the reward might be nan because there are not enough samples to learn. However, why there is no reward entry in the log after learning for a decent amount of time.
Here is the code that I run:
from rlpyt.utils.launching.affinity import make_affinity
from rlpyt.samplers.async_.gpu_sampler import AsyncGpuSampler
from rlpyt.envs.atari.atari_env import AtariEnv, AtariTrajInfo
from rlpyt.algos.dqn.dqn import DQN
from rlpyt.agents.dqn.atari.atari_dqn_agent import AtariDqnAgent
from rlpyt.runners.async_rl import AsyncRlEval
from rlpyt.utils.logging.context import logger_context
def build_and_train(game="asterix", run_ID=0):
# Change these inputs to match local machine and desired parallelism.
affinity = make_affinity(
run_slot=0,
n_cpu_core=16, # Use 16 cores across all experiments.
n_gpu=8, # Use 8 gpus across all experiments.
gpu_per_run=6,
sample_gpu_per_run=2,
async_sample=True,
optim_sample_share_gpu=False,
# hyperthread_offset=24, # If machine has 24 cores.
n_socket=1, # Presume CPU socket affinity to lower/upper half GPUs.
# gpu_per_run=2, # How many GPUs to parallelize one run across.
# cpu_per_run=1,
)
sampler = AsyncGpuSampler(
EnvCls=AtariEnv,
TrajInfoCls=AtariTrajInfo,
env_kwargs=dict(game=game),
batch_T=5,
batch_B=36,
max_decorrelation_steps=100,
eval_env_kwargs=dict(game=game),
eval_n_envs=2,
eval_max_steps=int(10e3),
eval_max_trajectories=4,
)
algo = DQN(
replay_ratio=8,
min_steps_learn=1e4,
replay_size=int(1e5)
)
agent = AtariDqnAgent()
runner = AsyncRlEval(
algo=algo,
agent=agent,
sampler=sampler,
n_steps=5e7,
log_interval_steps=1e4,
affinity=affinity,
)
config = dict(game=game)
name = "async_dqn_" + game
log_dir = "async_dqn"
with logger_context(log_dir, run_ID, name, config):
runner.train()
build_and_train(
game='pong',
run_ID=0,
)
Hi. I am new to rlpyt and I ran the dqn_async_gpu example. I looked at the logs at /data/local and I was trying to find the episode rewards. I suspect that NonzeroRewardsAverage or ReturnAverage can be the rewards.
However, these columns both show numbers of -21 at the very first rows, and then after 1000000 timesteps, suddenly both columns have 'nan' in all following rows.
I found it confusing - as one issue mentioned, nan means there is no entry. It does make sense that in the beginning, the reward might be nan because there are not enough samples to learn. However, why there is no reward entry in the log after learning for a decent amount of time.
Here is the code that I run:
Here is a link to the log .csv file generated in the log: https://docs.google.com/spreadsheets/d/1reG3pavGxveoFufgzAEp5RhLSKilNr3MKq-NILeqJaI/edit?usp=sharing
Thank you!
The text was updated successfully, but these errors were encountered: