Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ray[RLlib]: Windows fatal exception: access violation #24955

Closed
Peter-P779 opened this issue May 19, 2022 · 16 comments
Closed

ray[RLlib]: Windows fatal exception: access violation #24955

Peter-P779 opened this issue May 19, 2022 · 16 comments
Labels
bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks QS Quantsight triage label rllib RLlib related issues windows

Comments

@Peter-P779
Copy link

Peter-P779 commented May 19, 2022

What happened + What you expected to happen

Expectation: Training CartPole
What Happens: WINDOWS FATAL EXECTION ACCESS VIOLATION

D:\ML\test_RLlib\TF_Env\Scripts\python.exe D:/ML/test_RLlib/test/main.py
2022-05-19 10:49:33,916	INFO services.py:1456 -- View the Ray dashboard at http://127.0.0.1:8265
D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\tune\tune.py:455: UserWarning: Consider boosting PBT performance by enabling `reuse_actors` as well as implementing `reset_config` for Trainable.
  warnings.warn(
2022-05-19 10:49:36,775	WARNING trial_runner.py:1489 -- You are trying to access _search_alg interface of TrialRunner in TrialScheduler, which is being restricted. If you believe it is reasonable for your scheduler to access this TrialRunner API, please reach out to Ray team on GitHub. A more strict API access pattern would be enforced starting 1.12s.0
2022-05-19 10:49:36,900	INFO trial_runner.py:803 -- starting DQNTrainer_CartPole-v0_9caae_00000
(pid=1516) 
(DQNTrainer pid=7004) 2022-05-19 10:49:43,322	INFO trainer.py:2295 -- Your framework setting is 'tf', meaning you are using static-graph mode. Set framework='tf2' to enable eager execution with tf2.x. You may also then want to set eager_tracing=True in order to reach similar execution speed as with static-graph mode.
(DQNTrainer pid=7004) 2022-05-19 10:49:43,322	INFO simple_q.py:161 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting `simple_optimizer=True` if this doesn't work for you.
(pid=22604) 
(pid=10440) 
(pid=22456) 
(RolloutWorker pid=23604) Setting the path for recording to D:\ML\test_RLlib\test\results\DQNTrainer_2022-05-19_10-49-36\DQNTrainer_CartPole-v0_9caae_00000_0_2022-05-19_10-49-36\
(RolloutWorker pid=18268) Setting the path for recording to D:\ML\test_RLlib\test\results\DQNTrainer_2022-05-19_10-49-36\DQNTrainer_CartPole-v0_9caae_00000_0_2022-05-19_10-49-36\
(RolloutWorker pid=15504) Setting the path for recording to D:\ML\test_RLlib\test\results\DQNTrainer_2022-05-19_10-49-36\DQNTrainer_CartPole-v0_9caae_00000_0_2022-05-19_10-49-36\
(RolloutWorker pid=23604) 2022-05-19 10:49:49,852	WARNING rollout_worker.py:498 -- We've added a module for checking environments that are used in experiments. It will cause your environment to fail if your environment is not set upcorrectly. You can disable check env by setting `disable_env_checking` to True in your experiment config dictionary. You can run the environment checking module standalone by calling ray.rllib.utils.check_env(env).
(RolloutWorker pid=18268) 2022-05-19 10:49:49,864	WARNING rollout_worker.py:498 -- We've added a module for checking environments that are used in experiments. It will cause your environment to fail if your environment is not set upcorrectly. You can disable check env by setting `disable_env_checking` to True in your experiment config dictionary. You can run the environment checking module standalone by calling ray.rllib.utils.check_env(env).
(RolloutWorker pid=15504) 2022-05-19 10:49:49,846	WARNING rollout_worker.py:498 -- We've added a module for checking environments that are used in experiments. It will cause your environment to fail if your environment is not set upcorrectly. You can disable check env by setting `disable_env_checking` to True in your experiment config dictionary. You can run the environment checking module standalone by calling ray.rllib.utils.check_env(env).
(RolloutWorker pid=23604) 2022-05-19 10:49:49,938	DEBUG rollout_worker.py:1704 -- Creating policy for default_policy
(RolloutWorker pid=23604) 2022-05-19 10:49:49,938	DEBUG catalog.py:805 -- Created preprocessor <ray.rllib.models.preprocessors.NoPreprocessor object at 0x0000020B927EB100>: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32) -> (4,)
(RolloutWorker pid=23604) 2022-05-19 10:49:49,953	DEBUG worker_set.py:457 -- Creating TF session {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}
(RolloutWorker pid=18268) 2022-05-19 10:49:49,938	DEBUG rollout_worker.py:1704 -- Creating policy for default_policy
(RolloutWorker pid=18268) 2022-05-19 10:49:49,938	DEBUG catalog.py:805 -- Created preprocessor <ray.rllib.models.preprocessors.NoPreprocessor object at 0x000002AB85A7A100>: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32) -> (4,)
(RolloutWorker pid=18268) 2022-05-19 10:49:49,953	DEBUG worker_set.py:457 -- Creating TF session {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}
(RolloutWorker pid=15504) 2022-05-19 10:49:49,938	DEBUG rollout_worker.py:1704 -- Creating policy for default_policy
(RolloutWorker pid=15504) 2022-05-19 10:49:49,938	DEBUG catalog.py:805 -- Created preprocessor <ray.rllib.models.preprocessors.NoPreprocessor object at 0x000001A0229FA100>: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32) -> (4,)
(RolloutWorker pid=15504) 2022-05-19 10:49:49,953	DEBUG worker_set.py:457 -- Creating TF session {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}
(RolloutWorker pid=23604) 2022-05-19 10:49:50,623	INFO tf_policy.py:166 -- TFPolicy (worker=1) running on CPU.
(RolloutWorker pid=23604) 2022-05-19 10:49:50,692	INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `q_values` to view-reqs.
(RolloutWorker pid=23604) 2022-05-19 10:49:50,693	INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
(RolloutWorker pid=23604) 2022-05-19 10:49:50,693	INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_logp` to view-reqs.
(RolloutWorker pid=23604) 2022-05-19 10:49:50,694	INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_prob` to view-reqs.
(RolloutWorker pid=23604) 2022-05-19 10:49:50,694	INFO dynamic_tf_policy.py:718 -- Testing `postprocess_trajectory` w/ dummy batch.
(RolloutWorker pid=23604) 2022-05-19 10:49:50,695	DEBUG dynamic_tf_policy.py:752 -- Initializing loss function with dummy input:
(RolloutWorker pid=23604) 
(RolloutWorker pid=23604) { 'action_dist_inputs': <tf.Tensor 'default_policy_wk1/action_dist_inputs:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=23604)   'action_logp': <tf.Tensor 'default_policy_wk1/action_logp:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=23604)   'action_prob': <tf.Tensor 'default_policy_wk1/action_prob:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=23604)   'actions': <tf.Tensor 'default_policy_wk1/action:0' shape=(?,) dtype=int64>,
(RolloutWorker pid=23604)   'agent_index': <tf.Tensor 'default_policy_wk1/agent_index:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=23604)   'dones': <tf.Tensor 'default_policy_wk1/dones:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=23604)   'eps_id': <tf.Tensor 'default_policy_wk1/eps_id:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=23604)   'new_obs': <tf.Tensor 'default_policy_wk1/new_obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=23604)   'obs': <tf.Tensor 'default_policy_wk1/obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=23604)   'prev_actions': <tf.Tensor 'default_policy_wk1/prev_actions:0' shape=(?,) dtype=int64>,
(RolloutWorker pid=23604)   'prev_rewards': <tf.Tensor 'default_policy_wk1/prev_rewards:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=23604)   'q_values': <tf.Tensor 'default_policy_wk1/q_values:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=23604)   'rewards': <tf.Tensor 'default_policy_wk1/rewards:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=23604)   't': <tf.Tensor 'default_policy_wk1/t:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=23604)   'unroll_id': <tf.Tensor 'default_policy_wk1/unroll_id:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=23604)   'weights': <tf.Tensor 'default_policy_wk1/weights:0' shape=(?,) dtype=float32>}
(RolloutWorker pid=23604) 
(RolloutWorker pid=18268) 2022-05-19 10:49:50,627	INFO tf_policy.py:166 -- TFPolicy (worker=3) running on CPU.
(RolloutWorker pid=18268) 2022-05-19 10:49:50,697	INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `q_values` to view-reqs.
(RolloutWorker pid=18268) 2022-05-19 10:49:50,697	INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
(RolloutWorker pid=18268) 2022-05-19 10:49:50,698	INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_logp` to view-reqs.
(RolloutWorker pid=18268) 2022-05-19 10:49:50,698	INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_prob` to view-reqs.
(RolloutWorker pid=18268) 2022-05-19 10:49:50,698	INFO dynamic_tf_policy.py:718 -- Testing `postprocess_trajectory` w/ dummy batch.
(RolloutWorker pid=15504) 2022-05-19 10:49:50,617	INFO tf_policy.py:166 -- TFPolicy (worker=2) running on CPU.
(RolloutWorker pid=15504) 2022-05-19 10:49:50,687	INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `q_values` to view-reqs.
(RolloutWorker pid=15504) 2022-05-19 10:49:50,688	INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
(RolloutWorker pid=15504) 2022-05-19 10:49:50,688	INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_logp` to view-reqs.
(RolloutWorker pid=15504) 2022-05-19 10:49:50,689	INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_prob` to view-reqs.
(RolloutWorker pid=15504) 2022-05-19 10:49:50,689	INFO dynamic_tf_policy.py:718 -- Testing `postprocess_trajectory` w/ dummy batch.
(RolloutWorker pid=23604) 2022-05-19 10:49:51,114	DEBUG tf_policy.py:742 -- These tensors were used in the loss functions:
(RolloutWorker pid=23604) { 'action_dist_inputs': <tf.Tensor 'default_policy_wk1/action_dist_inputs:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=23604)   'action_logp': <tf.Tensor 'default_policy_wk1/action_logp:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=23604)   'action_prob': <tf.Tensor 'default_policy_wk1/action_prob:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=23604)   'actions': <tf.Tensor 'default_policy_wk1/action:0' shape=(?,) dtype=int64>,
(RolloutWorker pid=23604)   'dones': <tf.Tensor 'default_policy_wk1/dones:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=23604)   'new_obs': <tf.Tensor 'default_policy_wk1/new_obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=23604)   'obs': <tf.Tensor 'default_policy_wk1/obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=23604)   'q_values': <tf.Tensor 'default_policy_wk1/q_values:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=23604)   'rewards': <tf.Tensor 'default_policy_wk1/rewards:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=23604)   'weights': <tf.Tensor 'default_policy_wk1/weights:0' shape=(?,) dtype=float32>}
(RolloutWorker pid=23604) 
(DQNTrainer pid=7004) 2022-05-19 10:49:51,371	INFO worker_set.py:154 -- Inferred observation/action spaces from remote worker (local worker has no env): {'default_policy': (Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32), Discrete(2)), '__env__': (Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32), Discrete(2))}
(RolloutWorker pid=23604) 2022-05-19 10:49:51,360	DEBUG rollout_worker.py:779 -- Created rollout worker with env <ray.rllib.env.vector_env.VectorEnvWrapper object at 0x0000020B9A35B340> (<Monitor<TimeLimit<CartPoleEnv<CartPole-v0>>>>), policies {}
(RolloutWorker pid=18268) 2022-05-19 10:49:51,368	DEBUG rollout_worker.py:779 -- Created rollout worker with env <ray.rllib.env.vector_env.VectorEnvWrapper object at 0x000002AB95C6A340> (<Monitor<TimeLimit<CartPoleEnv<CartPole-v0>>>>), policies {}
(RolloutWorker pid=15504) 2022-05-19 10:49:51,364	DEBUG rollout_worker.py:779 -- Created rollout worker with env <ray.rllib.env.vector_env.VectorEnvWrapper object at 0x000001A032B8B340> (<Monitor<TimeLimit<CartPoleEnv<CartPole-v0>>>>), policies {}
(DQNTrainer pid=7004) 2022-05-19 10:49:51,437	DEBUG rollout_worker.py:1704 -- Creating policy for default_policy
(DQNTrainer pid=7004) 2022-05-19 10:49:51,437	DEBUG catalog.py:805 -- Created preprocessor <ray.rllib.models.preprocessors.NoPreprocessor object at 0x000002747AA78130>: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32) -> (4,)
(DQNTrainer pid=7004) 2022-05-19 10:49:51,437	DEBUG worker_set.py:457 -- Creating TF session {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}
(DQNTrainer pid=7004) 2022-05-19 10:49:51,922	INFO tf_policy.py:166 -- TFPolicy (worker=local) running on CPU.
(DQNTrainer pid=7004) 2022-05-19 10:49:51,978	INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `q_values` to view-reqs.
(DQNTrainer pid=7004) 2022-05-19 10:49:51,979	INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
(DQNTrainer pid=7004) 2022-05-19 10:49:51,979	INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_logp` to view-reqs.
(DQNTrainer pid=7004) 2022-05-19 10:49:51,980	INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_prob` to view-reqs.
(DQNTrainer pid=7004) 2022-05-19 10:49:51,980	INFO dynamic_tf_policy.py:718 -- Testing `postprocess_trajectory` w/ dummy batch.
(DQNTrainer pid=7004) 2022-05-19 10:49:51,981	DEBUG dynamic_tf_policy.py:752 -- Initializing loss function with dummy input:
(DQNTrainer pid=7004) 
(DQNTrainer pid=7004) { 'action_dist_inputs': <tf.Tensor 'default_policy/action_dist_inputs:0' shape=(?, 2) dtype=float32>,
(DQNTrainer pid=7004)   'action_logp': <tf.Tensor 'default_policy/action_logp:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=7004)   'action_prob': <tf.Tensor 'default_policy/action_prob:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=7004)   'actions': <tf.Tensor 'default_policy/action:0' shape=(?,) dtype=int64>,
(DQNTrainer pid=7004)   'agent_index': <tf.Tensor 'default_policy/agent_index:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=7004)   'dones': <tf.Tensor 'default_policy/dones:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=7004)   'eps_id': <tf.Tensor 'default_policy/eps_id:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=7004)   'new_obs': <tf.Tensor 'default_policy/new_obs:0' shape=(?, 4) dtype=float32>,
(DQNTrainer pid=7004)   'obs': <tf.Tensor 'default_policy/obs:0' shape=(?, 4) dtype=float32>,
(DQNTrainer pid=7004)   'prev_actions': <tf.Tensor 'default_policy/prev_actions:0' shape=(?,) dtype=int64>,
(DQNTrainer pid=7004)   'prev_rewards': <tf.Tensor 'default_policy/prev_rewards:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=7004)   'q_values': <tf.Tensor 'default_policy/q_values:0' shape=(?, 2) dtype=float32>,
(DQNTrainer pid=7004)   'rewards': <tf.Tensor 'default_policy/rewards:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=7004)   't': <tf.Tensor 'default_policy/t:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=7004)   'unroll_id': <tf.Tensor 'default_policy/unroll_id:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=7004)   'weights': <tf.Tensor 'default_policy/weights:0' shape=(?,) dtype=float32>}
(DQNTrainer pid=7004) 
(DQNTrainer pid=7004) 2022-05-19 10:49:52,371	DEBUG tf_policy.py:742 -- These tensors were used in the loss functions:
(DQNTrainer pid=7004) { 'action_dist_inputs': <tf.Tensor 'default_policy/action_dist_inputs:0' shape=(?, 2) dtype=float32>,
(DQNTrainer pid=7004)   'action_logp': <tf.Tensor 'default_policy/action_logp:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=7004)   'action_prob': <tf.Tensor 'default_policy/action_prob:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=7004)   'actions': <tf.Tensor 'default_policy/action:0' shape=(?,) dtype=int64>,
(DQNTrainer pid=7004)   'dones': <tf.Tensor 'default_policy/dones:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=7004)   'new_obs': <tf.Tensor 'default_policy/new_obs:0' shape=(?, 4) dtype=float32>,
(DQNTrainer pid=7004)   'obs': <tf.Tensor 'default_policy/obs:0' shape=(?, 4) dtype=float32>,
(DQNTrainer pid=7004)   'q_values': <tf.Tensor 'default_policy/q_values:0' shape=(?, 2) dtype=float32>,
(DQNTrainer pid=7004)   'rewards': <tf.Tensor 'default_policy/rewards:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=7004)   'weights': <tf.Tensor 'default_policy/weights:0' shape=(?,) dtype=float32>}
(DQNTrainer pid=7004) 
(DQNTrainer pid=7004) 2022-05-19 10:49:52,579	INFO rollout_worker.py:1727 -- Built policy map: {}
(DQNTrainer pid=7004) 2022-05-19 10:49:52,579	INFO rollout_worker.py:1728 -- Built preprocessor map: {'default_policy': <ray.rllib.models.preprocessors.NoPreprocessor object at 0x000002747AA78130>}
(DQNTrainer pid=7004) 2022-05-19 10:49:52,580	INFO rollout_worker.py:666 -- Built filter map: {'default_policy': <ray.rllib.utils.filter.NoFilter object at 0x000002747C501FA0>}
(DQNTrainer pid=7004) 2022-05-19 10:49:52,580	DEBUG rollout_worker.py:779 -- Created rollout worker with env None (None), policies {}
== Status ==
Current time: 2022-05-19 10:49:52 (running for 00:00:15.84)
Memory usage on this node: 14.6/15.8 GiB: ***LOW MEMORY*** less than 10% of the memory on this node is available for use. This can cause unexpected crashes. Consider reducing the memory used by your application or reducing the Ray object store size by setting `object_store_memory` when calling `ray.init`.
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 4.0/12 CPUs, 0/1 GPUs, 0.0/2.27 GiB heap, 0.0/1.14 GiB objects
Result logdir: D:\ML\test_RLlib\test\results\DQNTrainer_2022-05-19_10-49-36
Number of trials: 3/3 (2 PENDING, 1 RUNNING)
+------------------------------------+----------+----------------+----------+-------------+
| Trial name                         | status   | loc            |    gamma |          lr |
|------------------------------------+----------+----------------+----------+-------------|
| DQNTrainer_CartPole-v0_9caae_00000 | RUNNING  | 127.0.0.1:7004 | 0.934952 | 0.000708551 |
| DQNTrainer_CartPole-v0_9caae_00001 | PENDING  |                | 0.976634 | 0.000561509 |
| DQNTrainer_CartPole-v0_9caae_00002 | PENDING  |                | 0.940114 | 0.000492675 |
+------------------------------------+----------+----------------+----------+-------------+


2022-05-19 10:49:52,620	INFO trial_runner.py:803 -- starting DQNTrainer_CartPole-v0_9caae_00001
(DQNTrainer pid=7004) 2022-05-19 10:49:52,605	WARNING util.py:60 -- Install gputil for GPU system monitoring.
(DQNTrainer pid=7004) 2022-05-19 10:49:52,659	WARNING trainer.py:1083 -- Worker crashed during call to `step_attempt()`. To try to continue training without the failed worker, set `ignore_worker_failures=True`.
(DQNTrainer pid=7004) 2022-05-19 10:49:52,664	ERROR worker.py:92 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::RolloutWorker.par_iter_next() (pid=18268, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x000002AB859C89D0>)
(DQNTrainer pid=7004) ModuleNotFoundError: No module named 'pyglet'
(DQNTrainer pid=7004) 
(DQNTrainer pid=7004) During handling of the above exception, another exception occurred:
(DQNTrainer pid=7004) 
(DQNTrainer pid=7004) ray::RolloutWorker.par_iter_next() (pid=18268, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x000002AB859C89D0>)
(DQNTrainer pid=7004)   File "python\ray\_raylet.pyx", line 656, in ray._raylet.execute_task
(DQNTrainer pid=7004)   File "python\ray\_raylet.pyx", line 697, in ray._raylet.execute_task
(DQNTrainer pid=7004)   File "python\ray\_raylet.pyx", line 663, in ray._raylet.execute_task
(DQNTrainer pid=7004)   File "python\ray\_raylet.pyx", line 667, in ray._raylet.execute_task
(DQNTrainer pid=7004)   File "python\ray\_raylet.pyx", line 614, in ray._raylet.execute_task.function_executor
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\function_manager.py", line 701, in actor_method_executor
(DQNTrainer pid=7004)     return method(__ray_actor, *args, **kwargs)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462, in _resume_span
(DQNTrainer pid=7004)     return method(self, *_args, **_kwargs)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 1186, in par_iter_next
(DQNTrainer pid=7004)     return next(self.local_it)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 404, in gen_rollouts
(DQNTrainer pid=7004)     yield self.sample()
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462, in _resume_span
(DQNTrainer pid=7004)     return method(self, *_args, **_kwargs)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 815, in sample
(DQNTrainer pid=7004)     batches = [self.input_reader.next()]
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\evaluation\sampler.py", line 116, in next
(DQNTrainer pid=7004)     batches = [self.get_data()]
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\evaluation\sampler.py", line 289, in get_data
(DQNTrainer pid=7004)     item = next(self._env_runner)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\evaluation\sampler.py", line 668, in _env_runner
(DQNTrainer pid=7004)     unfiltered_obs, rewards, dones, infos, off_policy_actions = base_env.poll()
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\env\vector_env.py", line 291, in poll
(DQNTrainer pid=7004)     self.new_obs = self.vector_env.vector_reset()
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\env\vector_env.py", line 227, in vector_reset
(DQNTrainer pid=7004)     return [e.reset() for e in self.envs]
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\env\vector_env.py", line 227, in <listcomp>
(DQNTrainer pid=7004)     return [e.reset() for e in self.envs]
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\gym\wrappers\monitor.py", line 56, in reset
(DQNTrainer pid=7004)     self._after_reset(observation)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\gym\wrappers\monitor.py", line 241, in _after_reset
(DQNTrainer pid=7004)     self.reset_video_recorder()
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\gym\wrappers\monitor.py", line 267, in reset_video_recorder
(DQNTrainer pid=7004)     self.video_recorder.capture_frame()
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\gym\wrappers\monitoring\video_recorder.py", line 132, in capture_frame
(DQNTrainer pid=7004)     frame = self.env.render(mode=render_mode)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\gym\core.py", line 295, in render
(DQNTrainer pid=7004)     return self.env.render(mode, **kwargs)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\gym\envs\classic_control\cartpole.py", line 179, in render
(DQNTrainer pid=7004)     from gym.envs.classic_control import rendering
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\gym\envs\classic_control\rendering.py", line 17, in <module>
(DQNTrainer pid=7004)     raise ImportError(
(DQNTrainer pid=7004) ImportError: 
(DQNTrainer pid=7004)     Cannot import pyglet.
(DQNTrainer pid=7004)     HINT: you can install pyglet directly via 'pip install pyglet'.
(DQNTrainer pid=7004)     But if you really just want to install all Gym dependencies and not have to think about it,
(DQNTrainer pid=7004)     'pip install -e .[all]' or 'pip install gym[all]' will do it.
(DQNTrainer pid=7004) 2022-05-19 10:49:52,664	ERROR worker.py:92 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::RolloutWorker.par_iter_next() (pid=15504, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x000001A0229489D0>)
(DQNTrainer pid=7004) ModuleNotFoundError: No module named 'pyglet'
(DQNTrainer pid=7004) 
(DQNTrainer pid=7004) During handling of the above exception, another exception occurred:
(DQNTrainer pid=7004) 
(DQNTrainer pid=7004) ray::RolloutWorker.par_iter_next() (pid=15504, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x000001A0229489D0>)
(DQNTrainer pid=7004)   File "python\ray\_raylet.pyx", line 656, in ray._raylet.execute_task
(DQNTrainer pid=7004)   File "python\ray\_raylet.pyx", line 697, in ray._raylet.execute_task
(DQNTrainer pid=7004)   File "python\ray\_raylet.pyx", line 663, in ray._raylet.execute_task
(DQNTrainer pid=7004)   File "python\ray\_raylet.pyx", line 667, in ray._raylet.execute_task
(DQNTrainer pid=7004)   File "python\ray\_raylet.pyx", line 614, in ray._raylet.execute_task.function_executor
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\function_manager.py", line 701, in actor_method_executor
(DQNTrainer pid=7004)     return method(__ray_actor, *args, **kwargs)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462, in _resume_span
(DQNTrainer pid=7004)     return method(self, *_args, **_kwargs)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 1186, in par_iter_next
(DQNTrainer pid=7004)     return next(self.local_it)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 404, in gen_rollouts
(DQNTrainer pid=7004)     yield self.sample()
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462, in _resume_span
(DQNTrainer pid=7004)     return method(self, *_args, **_kwargs)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 815, in sample
(DQNTrainer pid=7004)     batches = [self.input_reader.next()]
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\evaluation\sampler.py", line 116, in next
(DQNTrainer pid=7004)     batches = [self.get_data()]
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\evaluation\sampler.py", line 289, in get_data
(DQNTrainer pid=7004)     item = next(self._env_runner)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\evaluation\sampler.py", line 668, in _env_runner
(DQNTrainer pid=7004)     unfiltered_obs, rewards, dones, infos, off_policy_actions = base_env.poll()
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\env\vector_env.py", line 291, in poll
(DQNTrainer pid=7004)     self.new_obs = self.vector_env.vector_reset()
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\env\vector_env.py", line 227, in vector_reset
(DQNTrainer pid=7004)     return [e.reset() for e in self.envs]
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\env\vector_env.py", line 227, in <listcomp>
(DQNTrainer pid=7004)     return [e.reset() for e in self.envs]
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\gym\wrappers\monitor.py", line 56, in reset
(DQNTrainer pid=7004)     self._after_reset(observation)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\gym\wrappers\monitor.py", line 241, in _after_reset
(DQNTrainer pid=7004)     self.reset_video_recorder()
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\gym\wrappers\monitor.py", line 267, in reset_video_recorder
(DQNTrainer pid=7004)     self.video_recorder.capture_frame()
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\gym\wrappers\monitoring\video_recorder.py", line 132, in capture_frame
(DQNTrainer pid=7004)     frame = self.env.render(mode=render_mode)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\gym\core.py", line 295, in render
(RolloutWorker pid=23604) 2022-05-19 10:49:52,649	INFO rollout_worker.py:809 -- Generating sample batch of size 4
(RolloutWorker pid=23604) 2022-05-19 10:49:52,650	DEBUG sampler.py:609 -- No episode horizon specified, setting it to Env's limit (200).
(RolloutWorker pid=18268) 2022-05-19 10:49:52,650	DEBUG sampler.py:609 -- No episode horizon specified, setting it to Env's limit (200).
(RolloutWorker pid=15504) 2022-05-19 10:49:52,650	DEBUG sampler.py:609 -- No episode horizon specified, setting it to Env's limit (200).
(DQNTrainer pid=7004)     return self.env.render(mode, **kwargs)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\gym\envs\classic_control\cartpole.py", line 179, in render
(DQNTrainer pid=7004)     from gym.envs.classic_control import rendering
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\gym\envs\classic_control\rendering.py", line 17, in <module>
(DQNTrainer pid=7004)     raise ImportError(
(DQNTrainer pid=7004) ImportError: 
(DQNTrainer pid=7004)     Cannot import pyglet.
(DQNTrainer pid=7004)     HINT: you can install pyglet directly via 'pip install pyglet'.
(DQNTrainer pid=7004)     But if you really just want to install all Gym dependencies and not have to think about it,
(DQNTrainer pid=7004)     'pip install -e .[all]' or 'pip install gym[all]' will do it.
(pid=14788) 
(DQNTrainer pid=11332) 2022-05-19 10:49:57,976	INFO trainer.py:2295 -- Your framework setting is 'tf', meaning you are using static-graph mode. Set framework='tf2' to enable eager execution with tf2.x. You may also then want to set eager_tracing=True in order to reach similar execution speed as with static-graph mode.
(DQNTrainer pid=11332) 2022-05-19 10:49:57,976	INFO simple_q.py:161 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting `simple_optimizer=True` if this doesn't work for you.
(pid=11748) 
(pid=13188) 
(pid=16516) 
(pid=) [2022-05-19 10:50:04,680 E 9068 23852] (raylet.exe) agent_manager.cc:107: The raylet exited immediately because the Ray agent failed. The raylet fate shares with the agent. This can happen because the Ray agent was unexpectedly killed or failed. See `dashboard_agent.log` for the root cause.
(bundle_reservation_check_func pid=21264) 
(bundle_reservation_check_func pid=10124) 
(bundle_reservation_check_func pid=15300) 
(pid=2700) 
(RolloutWorker pid=23604) 
(RolloutWorker pid=18268) 
(RolloutWorker pid=15504) 
(DQNTrainer pid=7004) 
(RolloutWorker pid=13824) Stack (most recent call first):
(RolloutWorker pid=13824)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\utils.py", line 116 in push_error_to_driver
(RolloutWorker pid=13824)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 449 in main_loop
(RolloutWorker pid=13824)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 235 in <module>
(RolloutWorker pid=600) Stack (most recent call first):
(RolloutWorker pid=600)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\utils.py", line 116 in push_error_to_driver
(RolloutWorker pid=600)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 449 in main_loop
(RolloutWorker pid=600)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 235 in <module>
(RolloutWorker pid=12964) Stack (most recent call first):
(RolloutWorker pid=12964)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\utils.py", line 116 in push_error_to_driver
(RolloutWorker pid=12964)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 449 in main_loop
(RolloutWorker pid=12964)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 235 in <module>
(pid=) 2022-05-19 10:50:06,397	INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=58524 --object-store-name=tcp://127.0.0.1:64691 --raylet-name=tcp://127.0.0.1:63689 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=64921 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:61713 --redis-password=5241590000000000 --startup-token=19 --runtime-env-hash=213246870
(pid=) 2022-05-19 10:50:06,397	INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=58524 --object-store-name=tcp://127.0.0.1:64691 --raylet-name=tcp://127.0.0.1:63689 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=64921 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:61713 --redis-password=5241590000000000 --startup-token=17 --runtime-env-hash=213246870
(pid=) 2022-05-19 10:50:06,412	INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=58524 --object-store-name=tcp://127.0.0.1:64691 --raylet-name=tcp://127.0.0.1:63689 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=64921 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:61713 --redis-password=5241590000000000 --startup-token=18 --runtime-env-hash=213246870
(pid=) 2022-05-19 10:50:07,241	INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=58524 --object-store-name=tcp://127.0.0.1:64691 --raylet-name=tcp://127.0.0.1:63689 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=64921 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:61713 --redis-password=5241590000000000 --startup-token=14 --runtime-env-hash=213246870
(pid=) 2022-05-19 10:50:07,272	INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=58524 --object-store-name=tcp://127.0.0.1:64691 --raylet-name=tcp://127.0.0.1:63689 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=64921 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:61713 --redis-password=5241590000000000 --startup-token=15 --runtime-env-hash=213246870
(pid=) 2022-05-19 10:50:07,303	INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=58524 --object-store-name=tcp://127.0.0.1:64691 --raylet-name=tcp://127.0.0.1:63689 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=64921 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:61713 --redis-password=5241590000000000 --startup-token=13 --runtime-env-hash=213246870
(pid=) 2022-05-19 10:50:07,881	INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=58524 --object-store-name=tcp://127.0.0.1:64691 --raylet-name=tcp://127.0.0.1:63689 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=64921 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:61713 --redis-password=5241590000000000 --startup-token=12 --runtime-env-hash=213246870
2022-05-19 10:50:33,288	WARNING worker.py:1382 -- The node with node id: d581586b7c7e0633fb90264635b2f193775bd304c5a049fce7f81e2a and ip: 127.0.0.1 has been marked dead because the detector has missed too many heartbeats from it. This can happen when a raylet crashes unexpectedly or has lagging heartbeats.
2022-05-19 10:50:33,303	WARNING resource_updater.py:51 -- Cluster resources not detected or are 0. Attempt #2...
== Status ==
Current time: 2022-05-19 10:50:33 (running for 00:00:56.53)
Memory usage on this node: 12.3/15.8 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 8.0/12 CPUs, 0/1 GPUs, 0.0/2.27 GiB heap, 0.0/1.14 GiB objects
Result logdir: D:\ML\test_RLlib\test\results\DQNTrainer_2022-05-19_10-49-36
Number of trials: 3/3 (1 PENDING, 2 RUNNING)
+------------------------------------+----------+----------------+----------+-------------+
| Trial name                         | status   | loc            |    gamma |          lr |
|------------------------------------+----------+----------------+----------+-------------|
| DQNTrainer_CartPole-v0_9caae_00000 | RUNNING  | 127.0.0.1:7004 | 0.934952 | 0.000708551 |
| DQNTrainer_CartPole-v0_9caae_00001 | RUNNING  |                | 0.976634 | 0.000561509 |
| DQNTrainer_CartPole-v0_9caae_00002 | PENDING  |                | 0.940114 | 0.000492675 |
+------------------------------------+----------+----------------+----------+-------------+


(DQNTrainer pid=11332) Stack (most recent call first):
(DQNTrainer pid=11332)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\utils.py", line 116 in push_error_to_driver
(DQNTrainer pid=11332)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 449 in main_loop
(DQNTrainer pid=11332)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 235 in <module>
(pid=) 2022-05-19 10:50:33,366	INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=58524 --object-store-name=tcp://127.0.0.1:64691 --raylet-name=tcp://127.0.0.1:63689 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=64921 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:61713 --redis-password=5241590000000000 --startup-token=16 --runtime-env-hash=213246870
2022-05-19 10:50:33,803	WARNING resource_updater.py:51 -- Cluster resources not detected or are 0. Attempt #3...
2022-05-19 10:50:34,303	WARNING resource_updater.py:51 -- Cluster resources not detected or are 0. Attempt #4...
2022-05-19 10:50:34,819	WARNING resource_updater.py:51 -- Cluster resources not detected or are 0. Attempt #5...
2022-05-19 10:50:35,319	WARNING resource_updater.py:64 -- Cluster resources cannot be detected or are 0. You can resume this experiment by passing in `resume=True` to `run`.
2022-05-19 10:50:35,319	WARNING util.py:171 -- The `on_step_begin` operation took 2.016 s, which may be a performance bottleneck.
2022-05-19 10:50:35,319	INFO trial_runner.py:803 -- starting DQNTrainer_CartPole-v0_9caae_00002
Windows fatal exception: access violation


Process finished with exit code -1073741819 (0xC0000005)

Versions / Dependencies

ray, version 1.12.0
Python 3.9.12
gym 0.21.0

pip install ray
pip install "ray[rllib]" tensorflow torch
pip install ray[default]
pip install ray[tune]
pip install gym

Reproduction script

import ray
from ray import tune
from ray.rllib.agents.dqn import DQNTrainer
from ray.tune.schedulers import PopulationBasedTraining
import gym
import random

config = {
    "env":"CartPole-v0",
    "num_workers":3,
    "record_env":True,
    "num_gpus": 0,
    "framework":"tf",
    }

if __name__ == "__main__":

    pbt = PopulationBasedTraining(
        time_attr="time_total_s",
        perturbation_interval=7200,
        resample_probability=0.25,
        hyperparam_mutations={
            "lr": lambda: random.uniform(1e-3, 5e-5),
            "gamma": lambda: random.uniform(0.90, 0.99),
        },

    )
    import tensorflow as tf




    ray.init()

    tune.run(DQNTrainer, scheduler=pbt,
             config=config,
             num_samples=3,
             metric="episode_reward_mean",
             mode="max",
             local_dir="./results",
             sync_config=tune.SyncConfig(syncer=None),
             checkpoint_freq=500,
             keep_checkpoints_num=20)

    ray.shutdown()

Issue Severity

High: It blocks me from completing my task.

@Peter-P779 Peter-P779 added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels May 19, 2022
@czgdp1807
Copy link
Contributor

Let me look into this.

@czgdp1807
Copy link
Contributor

czgdp1807 commented May 20, 2022

I ran this script and it keeps on running without any issues. It opens a bunch of Windows with some animations. See the attached screenshot.

My hardware information - 8 CPUs and 16 GB RAM on Azure Windows VM.

Screenshot 2022-05-20 at 1 12 56 PM

@Peter-P779
Copy link
Author

Peter-P779 commented May 20, 2022

The error can't be reproduced on your machine then.
At my machine the windows with the carts also open and then i get the error. Is there some log or so i can send for further analysis? The same programm runs perfect on wsl with ubuntu tho.

Laptop Dell G3 15:

Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz 2.59 GHz (12 Core)
16,0 GB RAM
NVIDIA GeForce RTX 2060

Os:
Windows 11 Home
Version:
21H2

@czgdp1807
Copy link
Contributor

I see. Note that I don't have any GPU on my Azure Windows VM. Its Windows 10 Pro 20 H2.

@Peter-P779
Copy link
Author

So there might be a serious problem once the cluster gets updated?

@gjoliver
Copy link
Member

A random input, the error message seems to say:

(DQNTrainer pid=7004) ModuleNotFoundError: No module named 'pyglet'

@gjoliver gjoliver added P1 Issue that should be fixed within a few weeks rllib RLlib related issues windows QS Quantsight triage label and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels May 20, 2022
@Peter-P779
Copy link
Author

Peter-P779 commented May 20, 2022

Yeah but that wasn't the reason for the error.
I got the error on VizDoom. The cartpole thing is just a simpler setup for error report. Hence i didn't notice the missing package.

Here the updated console output after installing the package.

D:\ML\test_RLlib>call TF_Env/Scripts/activate
2022-05-20 22:50:02,463 INFO services.py:1456 -- View the Ray dashboard at http://127.0.0.1:8265
D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\tune\tune.py:455: UserWarning: Consider boosting PBT performance by enabling `reuse_actors` as well as implementing `reset_config` for Trainable.
  warnings.warn(
2022-05-20 22:50:06,949 WARNING trial_runner.py:1489 -- You are trying to access _search_alg interface of TrialRunner in TrialScheduler, which is being restricted. If you believe it is reasonable for your scheduler to access this TrialRunner API, please reach out to Ray team on GitHub. A more strict API access pattern would be enforced starting 1.12s.0
2022-05-20 22:50:07,066 INFO trial_runner.py:803 -- starting DQNTrainer_CartPole-v0_6e434_00000
(DQNTrainer pid=19664) 2022-05-20 22:50:13,392  INFO trainer.py:2295 -- Your framework setting is 'tf', meaning you are using static-graph mode. Set framework='tf2' to enable eager execution with tf2.x. You may also then want to set eager_tracing=True in order to reach similar execution speed as with static-graph mode.
(DQNTrainer pid=19664) 2022-05-20 22:50:13,393  INFO simple_q.py:161 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting `simple_optimizer=True` if this doesn't work for you.
(RolloutWorker pid=16968) 2022-05-20 22:50:19,353       WARNING rollout_worker.py:498 -- We've added a module for checking environments that are used in experiments. It will cause your environment to fail if your environment is not set upcorrectly. You can disable check env by setting `disable_env_checking` to True in your experiment config dictionary. You can run the environment checking module standalone by calling ray.rllib.utils.check_env(env).
(RolloutWorker pid=16968) Setting the path for recording to D:\ML\test_RLlib\results\DQNTrainer_2022-05-20_22-50-06\DQNTrainer_CartPole-v0_6e434_00000_0_2022-05-20_22-50-07\
(RolloutWorker pid=9892) 2022-05-20 22:50:19,416        WARNING rollout_worker.py:498 -- We've added a module for checking environments that are used in experiments. It will cause your environment to fail if your environment is not set upcorrectly. You can disable check env by setting `disable_env_checking` to True in your experiment config dictionary. You can run the environment checking module standalone by calling ray.rllib.utils.check_env(env).
(RolloutWorker pid=9892) Setting the path for recording to D:\ML\test_RLlib\results\DQNTrainer_2022-05-20_22-50-06\DQNTrainer_CartPole-v0_6e434_00000_0_2022-05-20_22-50-07\
(RolloutWorker pid=6604) 2022-05-20 22:50:19,420        WARNING rollout_worker.py:498 -- We've added a module for checking environments that are used in experiments. It will cause your environment to fail if your environment is not set upcorrectly. You can disable check env by setting `disable_env_checking` to True in your experiment config dictionary. You can run the environment checking module standalone by calling ray.rllib.utils.check_env(env).
(RolloutWorker pid=6604) Setting the path for recording to D:\ML\test_RLlib\results\DQNTrainer_2022-05-20_22-50-06\DQNTrainer_CartPole-v0_6e434_00000_0_2022-05-20_22-50-07\
(RolloutWorker pid=16968) 2022-05-20 22:50:21,101       DEBUG rollout_worker.py:1704 -- Creating policy for default_policy
(RolloutWorker pid=16968) 2022-05-20 22:50:21,103       DEBUG catalog.py:805 -- Created preprocessor <ray.rllib.models.preprocessors.NoPreprocessor object at 0x0000016C16DC81C0>: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32) -> (4,)
(RolloutWorker pid=16968) 2022-05-20 22:50:21,103       DEBUG worker_set.py:457 -- Creating TF session {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}
(RolloutWorker pid=9892) 2022-05-20 22:50:21,101        DEBUG rollout_worker.py:1704 -- Creating policy for default_policy
(RolloutWorker pid=9892) 2022-05-20 22:50:21,103        DEBUG catalog.py:805 -- Created preprocessor <ray.rllib.models.preprocessors.NoPreprocessor object at 0x0000010B135F81C0>: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32) -> (4,)
(RolloutWorker pid=9892) 2022-05-20 22:50:21,103        DEBUG worker_set.py:457 -- Creating TF session {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}
(RolloutWorker pid=6604) 2022-05-20 22:50:21,101        DEBUG rollout_worker.py:1704 -- Creating policy for default_policy
(RolloutWorker pid=6604) 2022-05-20 22:50:21,103        DEBUG catalog.py:805 -- Created preprocessor <ray.rllib.models.preprocessors.NoPreprocessor object at 0x0000019D8A1081C0>: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32) -> (4,)
(RolloutWorker pid=6604) 2022-05-20 22:50:21,103        DEBUG worker_set.py:457 -- Creating TF session {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}
(RolloutWorker pid=16968) 2022-05-20 22:50:21,696       INFO tf_policy.py:166 -- TFPolicy (worker=2) running on CPU.
(RolloutWorker pid=9892) 2022-05-20 22:50:21,704        INFO tf_policy.py:166 -- TFPolicy (worker=1) running on CPU.
(RolloutWorker pid=6604) 2022-05-20 22:50:21,704        INFO tf_policy.py:166 -- TFPolicy (worker=3) running on CPU.
(RolloutWorker pid=16968) 2022-05-20 22:50:21,772       INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `q_values` to view-reqs.
(RolloutWorker pid=16968) 2022-05-20 22:50:21,772       INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
(RolloutWorker pid=16968) 2022-05-20 22:50:21,773       INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_logp` to view-reqs.
(RolloutWorker pid=16968) 2022-05-20 22:50:21,773       INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_prob` to view-reqs.
(RolloutWorker pid=16968) 2022-05-20 22:50:21,773       INFO dynamic_tf_policy.py:718 -- Testing `postprocess_trajectory` w/ dummy batch.
(RolloutWorker pid=9892) 2022-05-20 22:50:21,777        INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `q_values` to view-reqs.
(RolloutWorker pid=9892) 2022-05-20 22:50:21,777        INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
(RolloutWorker pid=9892) 2022-05-20 22:50:21,778        INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_logp` to view-reqs.
(RolloutWorker pid=9892) 2022-05-20 22:50:21,779        INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_prob` to view-reqs.
(RolloutWorker pid=9892) 2022-05-20 22:50:21,779        INFO dynamic_tf_policy.py:718 -- Testing `postprocess_trajectory` w/ dummy batch.
(RolloutWorker pid=9892) 2022-05-20 22:50:21,780        DEBUG dynamic_tf_policy.py:752 -- Initializing loss function with dummy input:
(RolloutWorker pid=9892)
(RolloutWorker pid=9892) { 'action_dist_inputs': <tf.Tensor 'default_policy_wk1/action_dist_inputs:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=9892)   'action_logp': <tf.Tensor 'default_policy_wk1/action_logp:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892)   'action_prob': <tf.Tensor 'default_policy_wk1/action_prob:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892)   'actions': <tf.Tensor 'default_policy_wk1/action:0' shape=(?,) dtype=int64>,
(RolloutWorker pid=9892)   'agent_index': <tf.Tensor 'default_policy_wk1/agent_index:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892)   'dones': <tf.Tensor 'default_policy_wk1/dones:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892)   'eps_id': <tf.Tensor 'default_policy_wk1/eps_id:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892)   'new_obs': <tf.Tensor 'default_policy_wk1/new_obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=9892)   'obs': <tf.Tensor 'default_policy_wk1/obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=9892)   'prev_actions': <tf.Tensor 'default_policy_wk1/prev_actions:0' shape=(?,) dtype=int64>,
(RolloutWorker pid=9892)   'prev_rewards': <tf.Tensor 'default_policy_wk1/prev_rewards:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892)   'q_values': <tf.Tensor 'default_policy_wk1/q_values:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=9892)   'rewards': <tf.Tensor 'default_policy_wk1/rewards:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892)   't': <tf.Tensor 'default_policy_wk1/t:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892)   'unroll_id': <tf.Tensor 'default_policy_wk1/unroll_id:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892)   'weights': <tf.Tensor 'default_policy_wk1/weights:0' shape=(?,) dtype=float32>}
(RolloutWorker pid=9892)
(RolloutWorker pid=6604) 2022-05-20 22:50:21,776        INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `q_values` to view-reqs.
(RolloutWorker pid=6604) 2022-05-20 22:50:21,777        INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
(RolloutWorker pid=6604) 2022-05-20 22:50:21,777        INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_logp` to view-reqs.
(RolloutWorker pid=6604) 2022-05-20 22:50:21,778        INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_prob` to view-reqs.
(RolloutWorker pid=6604) 2022-05-20 22:50:21,778        INFO dynamic_tf_policy.py:718 -- Testing `postprocess_trajectory` w/ dummy batch.
(RolloutWorker pid=9892) 2022-05-20 22:50:22,197        DEBUG tf_policy.py:742 -- These tensors were used in the loss functions:
(RolloutWorker pid=9892) { 'action_dist_inputs': <tf.Tensor 'default_policy_wk1/action_dist_inputs:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=9892)   'action_logp': <tf.Tensor 'default_policy_wk1/action_logp:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892)   'action_prob': <tf.Tensor 'default_policy_wk1/action_prob:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892)   'actions': <tf.Tensor 'default_policy_wk1/action:0' shape=(?,) dtype=int64>,
(RolloutWorker pid=9892)   'dones': <tf.Tensor 'default_policy_wk1/dones:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892)   'new_obs': <tf.Tensor 'default_policy_wk1/new_obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=9892)   'obs': <tf.Tensor 'default_policy_wk1/obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=9892)   'q_values': <tf.Tensor 'default_policy_wk1/q_values:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=9892)   'rewards': <tf.Tensor 'default_policy_wk1/rewards:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892)   'weights': <tf.Tensor 'default_policy_wk1/weights:0' shape=(?,) dtype=float32>}
(RolloutWorker pid=9892)
(DQNTrainer pid=19664) 2022-05-20 22:50:22,455  INFO worker_set.py:154 -- Inferred observation/action spaces from remote worker (local worker has no env): {'default_policy': (Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32), Discrete(2)), '__env__': (Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32), Discrete(2))}
(RolloutWorker pid=16968) 2022-05-20 22:50:22,440       DEBUG rollout_worker.py:779 -- Created rollout worker with env <ray.rllib.env.vector_env.VectorEnvWrapper object at 0x0000016C1E6D9400> (<Monitor<TimeLimit<CartPoleEnv<CartPole-v0>>>>), policies {}
(RolloutWorker pid=9892) 2022-05-20 22:50:22,440        DEBUG rollout_worker.py:779 -- Created rollout worker with env <ray.rllib.env.vector_env.VectorEnvWrapper object at 0x0000010B1B1A9400> (<Monitor<TimeLimit<CartPoleEnv<CartPole-v0>>>>), policies {}
(RolloutWorker pid=6604) 2022-05-20 22:50:22,424        DEBUG rollout_worker.py:779 -- Created rollout worker with env <ray.rllib.env.vector_env.VectorEnvWrapper object at 0x0000019D91B5A400> (<Monitor<TimeLimit<CartPoleEnv<CartPole-v0>>>>), policies {}
(DQNTrainer pid=19664) 2022-05-20 22:50:22,518  DEBUG rollout_worker.py:1704 -- Creating policy for default_policy
(DQNTrainer pid=19664) 2022-05-20 22:50:22,518  DEBUG catalog.py:805 -- Created preprocessor <ray.rllib.models.preprocessors.NoPreprocessor object at 0x000001D96D6951F0>: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32) -> (4,)
(DQNTrainer pid=19664) 2022-05-20 22:50:22,518  DEBUG worker_set.py:457 -- Creating TF session {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}
(DQNTrainer pid=19664) 2022-05-20 22:50:22,996  INFO tf_policy.py:166 -- TFPolicy (worker=local) running on CPU.
(DQNTrainer pid=19664) 2022-05-20 22:50:23,034  INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `q_values` to view-reqs.
(DQNTrainer pid=19664) 2022-05-20 22:50:23,034  INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
(DQNTrainer pid=19664) 2022-05-20 22:50:23,034  INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_logp` to view-reqs.
(DQNTrainer pid=19664) 2022-05-20 22:50:23,034  INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_prob` to view-reqs.
(DQNTrainer pid=19664) 2022-05-20 22:50:23,034  INFO dynamic_tf_policy.py:718 -- Testing `postprocess_trajectory` w/ dummy batch.
(DQNTrainer pid=19664) 2022-05-20 22:50:23,050  DEBUG dynamic_tf_policy.py:752 -- Initializing loss function with dummy input:
(DQNTrainer pid=19664)
(DQNTrainer pid=19664) { 'action_dist_inputs': <tf.Tensor 'default_policy/action_dist_inputs:0' shape=(?, 2) dtype=float32>,
(DQNTrainer pid=19664)   'action_logp': <tf.Tensor 'default_policy/action_logp:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664)   'action_prob': <tf.Tensor 'default_policy/action_prob:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664)   'actions': <tf.Tensor 'default_policy/action:0' shape=(?,) dtype=int64>,
(DQNTrainer pid=19664)   'agent_index': <tf.Tensor 'default_policy/agent_index:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664)   'dones': <tf.Tensor 'default_policy/dones:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664)   'eps_id': <tf.Tensor 'default_policy/eps_id:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664)   'new_obs': <tf.Tensor 'default_policy/new_obs:0' shape=(?, 4) dtype=float32>,
(DQNTrainer pid=19664)   'obs': <tf.Tensor 'default_policy/obs:0' shape=(?, 4) dtype=float32>,
(DQNTrainer pid=19664)   'prev_actions': <tf.Tensor 'default_policy/prev_actions:0' shape=(?,) dtype=int64>,
(DQNTrainer pid=19664)   'prev_rewards': <tf.Tensor 'default_policy/prev_rewards:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664)   'q_values': <tf.Tensor 'default_policy/q_values:0' shape=(?, 2) dtype=float32>,
(DQNTrainer pid=19664)   'rewards': <tf.Tensor 'default_policy/rewards:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664)   't': <tf.Tensor 'default_policy/t:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664)   'unroll_id': <tf.Tensor 'default_policy/unroll_id:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664)   'weights': <tf.Tensor 'default_policy/weights:0' shape=(?,) dtype=float32>}
(DQNTrainer pid=19664)
(DQNTrainer pid=19664) 2022-05-20 22:50:23,416  DEBUG tf_policy.py:742 -- These tensors were used in the loss functions:
(DQNTrainer pid=19664) { 'action_dist_inputs': <tf.Tensor 'default_policy/action_dist_inputs:0' shape=(?, 2) dtype=float32>,
(DQNTrainer pid=19664)   'action_logp': <tf.Tensor 'default_policy/action_logp:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664)   'action_prob': <tf.Tensor 'default_policy/action_prob:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664)   'actions': <tf.Tensor 'default_policy/action:0' shape=(?,) dtype=int64>,
(DQNTrainer pid=19664)   'dones': <tf.Tensor 'default_policy/dones:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664)   'new_obs': <tf.Tensor 'default_policy/new_obs:0' shape=(?, 4) dtype=float32>,
(DQNTrainer pid=19664)   'obs': <tf.Tensor 'default_policy/obs:0' shape=(?, 4) dtype=float32>,
(DQNTrainer pid=19664)   'q_values': <tf.Tensor 'default_policy/q_values:0' shape=(?, 2) dtype=float32>,
(DQNTrainer pid=19664)   'rewards': <tf.Tensor 'default_policy/rewards:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664)   'weights': <tf.Tensor 'default_policy/weights:0' shape=(?,) dtype=float32>}
(DQNTrainer pid=19664)
(DQNTrainer pid=19664) 2022-05-20 22:50:23,619  INFO rollout_worker.py:1727 -- Built policy map: {}
(DQNTrainer pid=19664) 2022-05-20 22:50:23,619  INFO rollout_worker.py:1728 -- Built preprocessor map: {'default_policy': <ray.rllib.models.preprocessors.NoPreprocessor object at 0x000001D96D6951F0>}
(DQNTrainer pid=19664) 2022-05-20 22:50:23,619  INFO rollout_worker.py:666 -- Built filter map: {'default_policy': <ray.rllib.utils.filter.NoFilter object at 0x000001D974F91310>}
(DQNTrainer pid=19664) 2022-05-20 22:50:23,619  DEBUG rollout_worker.py:779 -- Created rollout worker with env None (None), policies {}
== Status ==
Current time: 2022-05-20 22:50:23 (running for 00:00:16.72)
Memory usage on this node: 10.6/15.8 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 4.0/12 CPUs, 0/1 GPUs, 0.0/4.55 GiB heap, 0.0/2.28 GiB objects
Result logdir: D:\ML\test_RLlib\results\DQNTrainer_2022-05-20_22-50-06
Number of trials: 3/3 (2 PENDING, 1 RUNNING)
+------------------------------------+----------+-----------------+----------+-------------+
| Trial name                         | status   | loc             |    gamma |          lr |
|------------------------------------+----------+-----------------+----------+-------------|
| DQNTrainer_CartPole-v0_6e434_00000 | RUNNING  | 127.0.0.1:19664 | 0.901065 | 0.000687763 |
| DQNTrainer_CartPole-v0_6e434_00001 | PENDING  |                 | 0.952011 | 0.000508342 |
| DQNTrainer_CartPole-v0_6e434_00002 | PENDING  |                 | 0.922938 | 0.00096638  |
+------------------------------------+----------+-----------------+----------+-------------+


2022-05-20 22:50:23,650 INFO trial_runner.py:803 -- starting DQNTrainer_CartPole-v0_6e434_00001
(DQNTrainer pid=19664) 2022-05-20 22:50:23,634  INFO trainable.py:152 -- Trainable.setup took 10.243 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
(DQNTrainer pid=19664) 2022-05-20 22:50:23,634  WARNING util.py:60 -- Install gputil for GPU system monitoring.
(RolloutWorker pid=16968) 2022-05-20 22:50:23,681       DEBUG sampler.py:609 -- No episode horizon specified, setting it to Env's limit (200).
(RolloutWorker pid=9892) 2022-05-20 22:50:23,681        INFO rollout_worker.py:809 -- Generating sample batch of size 4
(RolloutWorker pid=9892) 2022-05-20 22:50:23,681        DEBUG sampler.py:609 -- No episode horizon specified, setting it to Env's limit (200).
(RolloutWorker pid=6604) 2022-05-20 22:50:23,681        DEBUG sampler.py:609 -- No episode horizon specified, setting it to Env's limit (200).
(RolloutWorker pid=9892) 2022-05-20 22:50:24,892        INFO sampler.py:672 -- Raw obs from env: { 0: { 'agent0': np.ndarray((4,), dtype=float32, min=-0.039, max=-0.006, mean=-0.029)}}
(RolloutWorker pid=9892) 2022-05-20 22:50:24,892        INFO sampler.py:673 -- Info return from env: {0: {'agent0': None}}
(RolloutWorker pid=9892) 2022-05-20 22:50:24,892        INFO sampler.py:908 -- Preprocessed obs: np.ndarray((4,), dtype=float32, min=-0.039, max=-0.006, mean=-0.029)
(RolloutWorker pid=9892) 2022-05-20 22:50:24,893        INFO sampler.py:913 -- Filtered obs: np.ndarray((4,), dtype=float32, min=-0.039, max=-0.006, mean=-0.029)
(RolloutWorker pid=9892) 2022-05-20 22:50:24,894        INFO sampler.py:1143 -- Inputs to compute_actions():
(RolloutWorker pid=9892)
(RolloutWorker pid=9892) { 'default_policy': [ { 'data': { 'agent_id': 'agent0',
(RolloutWorker pid=9892)                                   'env_id': 0,
(RolloutWorker pid=9892)                                   'info': None,
(RolloutWorker pid=9892)                                   'obs': np.ndarray((4,), dtype=float32, min=-0.039, max=-0.006, mean=-0.029),
(RolloutWorker pid=9892)                                   'prev_action': None,
(RolloutWorker pid=9892)                                   'prev_reward': None,
(RolloutWorker pid=9892)                                   'rnn_state': None},
(RolloutWorker pid=9892)                         'type': 'PolicyEvalData'}]}
(RolloutWorker pid=9892)
(RolloutWorker pid=9892) 2022-05-20 22:50:24,895        INFO tf_run_builder.py:98 -- Executing TF run without tracing. To dump TF timeline traces to disk, set the TF_TIMELINE_DIR environment variable.
(RolloutWorker pid=9892) 2022-05-20 22:50:24,982        INFO sampler.py:1169 -- Outputs of compute_actions():
(RolloutWorker pid=9892)
(RolloutWorker pid=9892) { 'default_policy': ( np.ndarray((1,), dtype=int64, min=1.0, max=1.0, mean=1.0),
(RolloutWorker pid=9892)                       [],
(RolloutWorker pid=9892)                       { 'action_dist_inputs': np.ndarray((1, 2), dtype=float32, min=-0.038, max=0.044, mean=0.003),
(RolloutWorker pid=9892)                         'action_logp': np.ndarray((1,), dtype=float32, min=0.0, max=0.0, mean=0.0),
(RolloutWorker pid=9892)                         'action_prob': np.ndarray((1,), dtype=float32, min=1.0, max=1.0, mean=1.0),
(RolloutWorker pid=9892)                         'q_values': np.ndarray((1, 2), dtype=float32, min=-0.038, max=0.044, mean=0.003)})}
(RolloutWorker pid=9892)
(DQNTrainer pid=19664) 2022-05-20 22:50:25,352  INFO replay_buffer.py:47 -- Estimated max memory usage for replay buffer is 0.00305 GB (50000.0 batches of size 1, 61 bytes each), available system memory is 16.929984512 GB
(RolloutWorker pid=9892) 2022-05-20 22:50:25,340        INFO simple_list_collector.py:904 -- Trajectory fragment after postprocess_trajectory():
(RolloutWorker pid=9892)
(RolloutWorker pid=9892) { 'agent0': { 'actions': np.ndarray((4,), dtype=int64, min=0.0, max=1.0, mean=0.5),
(RolloutWorker pid=9892)               'agent_index': np.ndarray((4,), dtype=int32, min=0.0, max=0.0, mean=0.0),
(RolloutWorker pid=9892)               'dones': np.ndarray((4,), dtype=bool, min=0.0, max=0.0, mean=0.0),
(RolloutWorker pid=9892)               'eps_id': np.ndarray((4,), dtype=int32, min=1734707724.0, max=1734707724.0, mean=1734707724.0),
(RolloutWorker pid=9892)               'infos': np.ndarray((4,), dtype=object, head={}),
(RolloutWorker pid=9892)               'new_obs': np.ndarray((4, 4), dtype=float32, min=-0.615, max=0.353, mean=-0.063),
(RolloutWorker pid=9892)               'obs': np.ndarray((4, 4), dtype=float32, min=-0.615, max=0.353, mean=-0.059),
(RolloutWorker pid=9892)               'rewards': np.ndarray((4,), dtype=float32, min=1.0, max=1.0, mean=1.0),
(RolloutWorker pid=9892)               'unroll_id': np.ndarray((4,), dtype=int32, min=0.0, max=0.0, mean=0.0),
(RolloutWorker pid=9892)               'weights': np.ndarray((4,), dtype=float32, min=1.0, max=1.0, mean=1.0)}}
(RolloutWorker pid=9892)
(RolloutWorker pid=9892) 2022-05-20 22:50:25,341        INFO rollout_worker.py:854 -- Completed sample batch:
(RolloutWorker pid=9892)
(RolloutWorker pid=9892) { 'actions': np.ndarray((4,), dtype=int64, min=0.0, max=1.0, mean=0.5),
(RolloutWorker pid=9892)   'agent_index': np.ndarray((4,), dtype=int32, min=0.0, max=0.0, mean=0.0),
(RolloutWorker pid=9892)   'dones': np.ndarray((4,), dtype=bool, min=0.0, max=0.0, mean=0.0),
(RolloutWorker pid=9892)   'eps_id': np.ndarray((4,), dtype=int32, min=1734707724.0, max=1734707724.0, mean=1734707724.0),
(RolloutWorker pid=9892)   'new_obs': np.ndarray((4, 4), dtype=float32, min=-0.615, max=0.353, mean=-0.063),
(RolloutWorker pid=9892)   'obs': np.ndarray((4, 4), dtype=float32, min=-0.615, max=0.353, mean=-0.059),
(RolloutWorker pid=9892)   'rewards': np.ndarray((4,), dtype=float32, min=1.0, max=1.0, mean=1.0),
(RolloutWorker pid=9892)   'unroll_id': np.ndarray((4,), dtype=int32, min=0.0, max=0.0, mean=0.0),
(RolloutWorker pid=9892)   'weights': np.ndarray((4,), dtype=float32, min=1.0, max=1.0, mean=1.0)}
(RolloutWorker pid=9892)
(DQNTrainer pid=13672) 2022-05-20 22:50:31,174  INFO trainer.py:2295 -- Your framework setting is 'tf', meaning you are using static-graph mode. Set framework='tf2' to enable eager execution with tf2.x. You may also then want to set eager_tracing=True in order to reach similar execution speed as with static-graph mode.
(DQNTrainer pid=13672) 2022-05-20 22:50:31,174  INFO simple_q.py:161 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting `simple_optimizer=True` if this doesn't work for you.
(pid=) [2022-05-20 22:50:34,744 E 16452 19288] (raylet.exe) agent_manager.cc:107: The raylet exited immediately because the Ray agent failed. The raylet fate shares with the agent. This can happen because the Ray agent was unexpectedly killed or failed. See `dashboard_agent.log` for the root cause.
(DQNTrainer pid=19664) Stack (most recent call first):
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 362 in get_objects
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 1803 in get
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\client_mode_hook.py", line 105 in wrapper
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 492 in base_iterator
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 807 in apply_foreach
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 807 in apply_foreach
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 807 in apply_foreach
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 807 in apply_foreach
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 779 in __next__
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 1108 in build_union
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 869 in apply_filter
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 807 in apply_foreach
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 869 in apply_filter
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 869 in apply_filter
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 807 in apply_foreach
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 779 in __next__
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\agents\trainer.py", line 2174 in _exec_plan_or_training_iteration_fn
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462 in _resume_span
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\agents\trainer.py", line 1155 in step_attempt
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462 in _resume_span
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\agents\trainer.py", line 1074 in step
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462 in _resume_span
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\tune\trainable.py", line 349 in train
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462 in _resume_span
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\function_manager.py", line 701 in actor_method_executor
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 449 in main_loop
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 235 in <module>
(pid=) 2022-05-20 22:50:34,963  INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=49340 --object-store-name=tcp://127.0.0.1:51555 --raylet-name=tcp://127.0.0.1:56782 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=63069 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:58986 --redis-password=5241590000000000 --startup-token=12 --runtime-env-hash=2135802228
(pid=) 2022-05-20 22:50:35,277  INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=49340 --object-store-name=tcp://127.0.0.1:51555 --raylet-name=tcp://127.0.0.1:56782 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=63069 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:58986 --redis-password=5241590000000000 --startup-token=13 --runtime-env-hash=2135802228
(pid=) 2022-05-20 22:50:35,371  INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=49340 --object-store-name=tcp://127.0.0.1:51555 --raylet-name=tcp://127.0.0.1:56782 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=63069 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:58986 --redis-password=5241590000000000 --startup-token=14 --runtime-env-hash=2135802228
(pid=) 2022-05-20 22:50:35,434  INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=49340 --object-store-name=tcp://127.0.0.1:51555 --raylet-name=tcp://127.0.0.1:56782 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=63069 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:58986 --redis-password=5241590000000000 --startup-token=15 --runtime-env-hash=2135802228
(RolloutWorker pid=16900) 2022-05-20 22:50:39,099       WARNING rollout_worker.py:498 -- We've added a module for checking environments that are used in experiments. It will cause your environment to fail if your environment is not set upcorrectly. You can disable check env by setting `disable_env_checking` to True in your experiment config dictionary. You can run the environment checking module standalone by calling ray.rllib.utils.check_env(env).
(RolloutWorker pid=16900) Setting the path for recording to D:\ML\test_RLlib\results\DQNTrainer_2022-05-20_22-50-06\DQNTrainer_CartPole-v0_6e434_00001_1_2022-05-20_22-50-23\
(RolloutWorker pid=13860) 2022-05-20 22:50:39,108       WARNING rollout_worker.py:498 -- We've added a module for checking environments that are used in experiments. It will cause your environment to fail if your environment is not set upcorrectly. You can disable check env by setting `disable_env_checking` to True in your experiment config dictionary. You can run the environment checking module standalone by calling ray.rllib.utils.check_env(env).
(RolloutWorker pid=13860) Setting the path for recording to D:\ML\test_RLlib\results\DQNTrainer_2022-05-20_22-50-06\DQNTrainer_CartPole-v0_6e434_00001_1_2022-05-20_22-50-23\
(RolloutWorker pid=8672) 2022-05-20 22:50:39,072        WARNING rollout_worker.py:498 -- We've added a module for checking environments that are used in experiments. It will cause your environment to fail if your environment is not set upcorrectly. You can disable check env by setting `disable_env_checking` to True in your experiment config dictionary. You can run the environment checking module standalone by calling ray.rllib.utils.check_env(env).
(RolloutWorker pid=8672) Setting the path for recording to D:\ML\test_RLlib\results\DQNTrainer_2022-05-20_22-50-06\DQNTrainer_CartPole-v0_6e434_00001_1_2022-05-20_22-50-23\
(RolloutWorker pid=16900) 2022-05-20 22:50:39,934       DEBUG rollout_worker.py:1704 -- Creating policy for default_policy
(RolloutWorker pid=16900) 2022-05-20 22:50:39,934       DEBUG catalog.py:805 -- Created preprocessor <ray.rllib.models.preprocessors.NoPreprocessor object at 0x000001B2406E91C0>: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32) -> (4,)
(RolloutWorker pid=16900) 2022-05-20 22:50:39,942       DEBUG worker_set.py:457 -- Creating TF session {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}
(RolloutWorker pid=13860) 2022-05-20 22:50:39,934       DEBUG rollout_worker.py:1704 -- Creating policy for default_policy
(RolloutWorker pid=13860) 2022-05-20 22:50:39,934       DEBUG catalog.py:805 -- Created preprocessor <ray.rllib.models.preprocessors.NoPreprocessor object at 0x000002451DAD81C0>: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32) -> (4,)
(RolloutWorker pid=13860) 2022-05-20 22:50:39,934       DEBUG worker_set.py:457 -- Creating TF session {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}
(RolloutWorker pid=8672) 2022-05-20 22:50:39,934        DEBUG rollout_worker.py:1704 -- Creating policy for default_policy
(RolloutWorker pid=8672) 2022-05-20 22:50:39,934        DEBUG catalog.py:805 -- Created preprocessor <ray.rllib.models.preprocessors.NoPreprocessor object at 0x000001752E4491C0>: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32) -> (4,)
(RolloutWorker pid=8672) 2022-05-20 22:50:39,934        DEBUG worker_set.py:457 -- Creating TF session {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}
(RolloutWorker pid=16900) 2022-05-20 22:50:40,593       INFO tf_policy.py:166 -- TFPolicy (worker=1) running on CPU.
(RolloutWorker pid=16900) 2022-05-20 22:50:40,672       INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `q_values` to view-reqs.
(RolloutWorker pid=16900) 2022-05-20 22:50:40,672       INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
(RolloutWorker pid=16900) 2022-05-20 22:50:40,672       INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_logp` to view-reqs.
(RolloutWorker pid=16900) 2022-05-20 22:50:40,672       INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_prob` to view-reqs.
(RolloutWorker pid=16900) 2022-05-20 22:50:40,672       INFO dynamic_tf_policy.py:718 -- Testing `postprocess_trajectory` w/ dummy batch.
(RolloutWorker pid=16900) 2022-05-20 22:50:40,672       DEBUG dynamic_tf_policy.py:752 -- Initializing loss function with dummy input:
(RolloutWorker pid=16900)
(RolloutWorker pid=16900) { 'action_dist_inputs': <tf.Tensor 'default_policy_wk1/action_dist_inputs:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=16900)   'action_logp': <tf.Tensor 'default_policy_wk1/action_logp:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900)   'action_prob': <tf.Tensor 'default_policy_wk1/action_prob:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900)   'actions': <tf.Tensor 'default_policy_wk1/action:0' shape=(?,) dtype=int64>,
(RolloutWorker pid=16900)   'agent_index': <tf.Tensor 'default_policy_wk1/agent_index:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900)   'dones': <tf.Tensor 'default_policy_wk1/dones:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900)   'eps_id': <tf.Tensor 'default_policy_wk1/eps_id:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900)   'new_obs': <tf.Tensor 'default_policy_wk1/new_obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=16900)   'obs': <tf.Tensor 'default_policy_wk1/obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=16900)   'prev_actions': <tf.Tensor 'default_policy_wk1/prev_actions:0' shape=(?,) dtype=int64>,
(RolloutWorker pid=16900)   'prev_rewards': <tf.Tensor 'default_policy_wk1/prev_rewards:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900)   'q_values': <tf.Tensor 'default_policy_wk1/q_values:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=16900)   'rewards': <tf.Tensor 'default_policy_wk1/rewards:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900)   't': <tf.Tensor 'default_policy_wk1/t:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900)   'unroll_id': <tf.Tensor 'default_policy_wk1/unroll_id:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900)   'weights': <tf.Tensor 'default_policy_wk1/weights:0' shape=(?,) dtype=float32>}
(RolloutWorker pid=16900)
(RolloutWorker pid=13860) 2022-05-20 22:50:40,593       INFO tf_policy.py:166 -- TFPolicy (worker=3) running on CPU.
(RolloutWorker pid=13860) 2022-05-20 22:50:40,672       INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `q_values` to view-reqs.
(RolloutWorker pid=13860) 2022-05-20 22:50:40,672       INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
(RolloutWorker pid=13860) 2022-05-20 22:50:40,672       INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_logp` to view-reqs.
(RolloutWorker pid=13860) 2022-05-20 22:50:40,672       INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_prob` to view-reqs.
(RolloutWorker pid=13860) 2022-05-20 22:50:40,672       INFO dynamic_tf_policy.py:718 -- Testing `postprocess_trajectory` w/ dummy batch.
(RolloutWorker pid=8672) 2022-05-20 22:50:40,609        INFO tf_policy.py:166 -- TFPolicy (worker=2) running on CPU.
(RolloutWorker pid=8672) 2022-05-20 22:50:40,672        INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `q_values` to view-reqs.
(RolloutWorker pid=8672) 2022-05-20 22:50:40,687        INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
(RolloutWorker pid=8672) 2022-05-20 22:50:40,687        INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_logp` to view-reqs.
(RolloutWorker pid=8672) 2022-05-20 22:50:40,687        INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_prob` to view-reqs.
(RolloutWorker pid=8672) 2022-05-20 22:50:40,687        INFO dynamic_tf_policy.py:718 -- Testing `postprocess_trajectory` w/ dummy batch.
(RolloutWorker pid=16900) 2022-05-20 22:50:41,144       DEBUG tf_policy.py:742 -- These tensors were used in the loss functions:
(RolloutWorker pid=16900) { 'action_dist_inputs': <tf.Tensor 'default_policy_wk1/action_dist_inputs:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=16900)   'action_logp': <tf.Tensor 'default_policy_wk1/action_logp:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900)   'action_prob': <tf.Tensor 'default_policy_wk1/action_prob:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900)   'actions': <tf.Tensor 'default_policy_wk1/action:0' shape=(?,) dtype=int64>,
(RolloutWorker pid=16900)   'dones': <tf.Tensor 'default_policy_wk1/dones:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900)   'new_obs': <tf.Tensor 'default_policy_wk1/new_obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=16900)   'obs': <tf.Tensor 'default_policy_wk1/obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=16900)   'q_values': <tf.Tensor 'default_policy_wk1/q_values:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=16900)   'rewards': <tf.Tensor 'default_policy_wk1/rewards:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900)   'weights': <tf.Tensor 'default_policy_wk1/weights:0' shape=(?,) dtype=float32>}
(RolloutWorker pid=16900)
(RolloutWorker pid=16900) 2022-05-20 22:50:41,393       DEBUG rollout_worker.py:779 -- Created rollout worker with env <ray.rllib.env.vector_env.VectorEnvWrapper object at 0x000001B24812A400> (<Monitor<TimeLimit<CartPoleEnv<CartPole-v0>>>>), policies {}
(RolloutWorker pid=16900) Stack (most recent call first):
(RolloutWorker pid=16900)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 449 in main_loop
(RolloutWorker pid=16900)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 235 in <module>
(RolloutWorker pid=13860) 2022-05-20 22:50:41,408       DEBUG rollout_worker.py:779 -- Created rollout worker with env <ray.rllib.env.vector_env.VectorEnvWrapper object at 0x0000024525549400> (<Monitor<TimeLimit<CartPoleEnv<CartPole-v0>>>>), policies {}
(RolloutWorker pid=13860) Stack (most recent call first):
(RolloutWorker pid=13860)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 449 in main_loop
(RolloutWorker pid=13860)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 235 in <module>
(RolloutWorker pid=8672) 2022-05-20 22:50:41,408        DEBUG rollout_worker.py:779 -- Created rollout worker with env <ray.rllib.env.vector_env.VectorEnvWrapper object at 0x000001753E589400> (<Monitor<TimeLimit<CartPoleEnv<CartPole-v0>>>>), policies {}
(RolloutWorker pid=8672) Stack (most recent call first):
(RolloutWorker pid=8672)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 449 in main_loop
(RolloutWorker pid=8672)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 235 in <module>
(pid=) 2022-05-20 22:50:41,503  INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=49340 --object-store-name=tcp://127.0.0.1:51555 --raylet-name=tcp://127.0.0.1:56782 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=63069 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:58986 --redis-password=5241590000000000 --startup-token=17 --runtime-env-hash=2135802228
(pid=) 2022-05-20 22:50:41,534  INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=49340 --object-store-name=tcp://127.0.0.1:51555 --raylet-name=tcp://127.0.0.1:56782 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=63069 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:58986 --redis-password=5241590000000000 --startup-token=19 --runtime-env-hash=2135802228
(pid=) 2022-05-20 22:50:41,566  INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=49340 --object-store-name=tcp://127.0.0.1:51555 --raylet-name=tcp://127.0.0.1:56782 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=63069 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:58986 --redis-password=5241590000000000 --startup-token=18 --runtime-env-hash=2135802228
2022-05-20 22:51:03,817 WARNING worker.py:1382 -- The node with node id: 208e7e234a5d9af609995e90f0035f9db3b57f2130560403fe34704d and ip: 127.0.0.1 has been marked dead because the detector has missed too many heartbeats from it. This can happen when a raylet crashes unexpectedly or has lagging heartbeats.
== Status ==
Current time: 2022-05-20 22:51:03 (running for 00:00:56.88)
Memory usage on this node: 8.5/15.8 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 8.0/12 CPUs, 0/1 GPUs, 0.0/4.55 GiB heap, 0.0/2.28 GiB objects
Result logdir: D:\ML\test_RLlib\results\DQNTrainer_2022-05-20_22-50-06
Number of trials: 3/3 (1 PENDING, 2 RUNNING)
+------------------------------------+----------+-----------------+----------+-------------+
| Trial name                         | status   | loc             |    gamma |          lr |
|------------------------------------+----------+-----------------+----------+-------------|
| DQNTrainer_CartPole-v0_6e434_00000 | RUNNING  | 127.0.0.1:19664 | 0.901065 | 0.000687763 |
| DQNTrainer_CartPole-v0_6e434_00001 | RUNNING  |                 | 0.952011 | 0.000508342 |
| DQNTrainer_CartPole-v0_6e434_00002 | PENDING  |                 | 0.922938 | 0.00096638  |
+------------------------------------+----------+-----------------+----------+-------------+


2022-05-20 22:51:03,824 WARNING resource_updater.py:51 -- Cluster resources not detected or are 0. Attempt #2...
(DQNTrainer pid=13672) Stack (most recent call first):
(DQNTrainer pid=13672)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\utils.py", line 116 in push_error_to_driver
(DQNTrainer pid=13672)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 449 in main_loop
(DQNTrainer pid=13672)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 235 in <module>
(pid=) 2022-05-20 22:51:03,915  INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=49340 --object-store-name=tcp://127.0.0.1:51555 --raylet-name=tcp://127.0.0.1:56782 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=63069 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:58986 --redis-password=5241590000000000 --startup-token=16 --runtime-env-hash=2135802228
2022-05-20 22:51:04,338 WARNING resource_updater.py:51 -- Cluster resources not detected or are 0. Attempt #3...
2022-05-20 22:51:04,848 WARNING resource_updater.py:51 -- Cluster resources not detected or are 0. Attempt #4...
2022-05-20 22:51:05,351 WARNING resource_updater.py:51 -- Cluster resources not detected or are 0. Attempt #5...
2022-05-20 22:51:05,855 WARNING resource_updater.py:64 -- Cluster resources cannot be detected or are 0. You can resume this experiment by passing in `resume=True` to `run`.
2022-05-20 22:51:05,855 WARNING util.py:171 -- The `on_step_begin` operation took 2.033 s, which may be a performance bottleneck.
2022-05-20 22:51:05,855 INFO trial_runner.py:803 -- starting DQNTrainer_CartPole-v0_6e434_00002
Windows fatal exception: access violation

@mattip
Copy link
Contributor

mattip commented Jun 8, 2022

I could not reproduce this with latest ray HEAD. I did need to remove the "record_env":True, parameter since it has been removed. Could you try again with a latest nightly

@Peter-P779
Copy link
Author

Peter-P779 commented Jun 10, 2022

With the nightly version all 3 parallel tune runs start. The access violation does not occure but another unspecific error does. Actor died unexpected.

console output
(DQNTrainer pid=12200) 2022-06-10 14:14:38,704  WARNING trainer.py:546 -- Worker crashed during call to `step_attempt()`. To try to continue training without failed worker(s), set `ignore_worker_failures=True`. To try to recover the failed worker(s), set `recreate_failed_workers=True`.
(DQNTrainer pid=11888) Stack (most recent call first):
(DQNTrainer pid=11888)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\utils.py", line 116 in push_error_to_driver
(DQNTrainer pid=11888)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 451 in main_loop
(DQNTrainer pid=11888)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 238 in <module>
(RolloutWorker pid=19976) Stack (most recent call first):
(RolloutWorker pid=19976)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 451 in main_loop
(RolloutWorker pid=19976)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 238 in <module>
(DQNTrainer pid=12200) Stack (most recent call first):
(DQNTrainer pid=12200)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\utils.py", line 116 in push_error_to_driver
(DQNTrainer pid=12200)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 451 in main_loop
(DQNTrainer pid=12200)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 238 in <module>
2022-06-10 14:14:40,376 ERROR trial_runner.py:886 -- Trial DQNTrainer_CartPole-v0_abcc0_00000: Error processing event.
NoneType: None
Result for DQNTrainer_CartPole-v0_abcc0_00000:
  agent_timesteps_total: 20160
  counters:
    last_target_update_ts: 20160
    num_agent_steps_sampled: 20160
    num_agent_steps_trained: 51104
    num_env_steps_sampled: 20160
    num_env_steps_trained: 51104
    num_target_updates: 39
  custom_metrics: {}
  date: 2022-06-10_14-14-36
  done: false
  episode_len_mean: 120.44
  episode_media: {}
  episode_reward_max: 198.0
  episode_reward_mean: 120.44
  episode_reward_min: 17.0
  episodes_this_iter: 6
  episodes_total: 545
  experiment_id: 6cc0969c65b54734be7464e33b9e7b11
  experiment_tag: '0'
  hostname: DESKTOP-IH6PS6N
  info:
    last_target_update_ts: 20160
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_lr: 0.00015024440654087812
          max_q: 20.960540771484375
          mean_q: 17.269336700439453
          mean_td_error: 1.2344892024993896
          min_q: -1.7945809364318848
          model: {}
        num_agent_steps_trained: 32.0
        td_error:
        - 0.1963062286376953
        - 13.694271087646484
        - -0.1513042449951172
        - -0.4782733917236328
        - 17.277982711791992
        - -0.45970726013183594
        - -1.119187355041504
        - 0.2422313690185547
        - -0.3445110321044922
        - -0.04111480712890625
        - -0.02532196044921875
        - -1.5336990356445312
        - -0.1737194061279297
        - -0.32820892333984375
        - 0.20307254791259766
        - -2.7945809364318848
        - 0.06833648681640625
        - 0.0260009765625
        - 0.1797008514404297
        - -0.18329429626464844
        - -0.07255172729492188
        - -1.2686529159545898
        - 0.1986217498779297
        - 0.15063190460205078
        - 16.994535446166992
        - -0.0013408660888671875
        - 0.14139938354492188
        - -0.039340972900390625
        - -0.1665172576904297
        - 0.0407257080078125
        - -0.04569053649902344
        - -0.6831436157226562
    num_agent_steps_sampled: 20160
    num_agent_steps_trained: 51104
    num_env_steps_sampled: 20160
    num_env_steps_trained: 51104
    num_target_updates: 39
  iterations_since_restore: 20
  node_ip: 127.0.0.1
  num_agent_steps_sampled: 20160
  num_agent_steps_trained: 51104
  num_env_steps_sampled: 20160
  num_env_steps_sampled_this_iter: 1008
  num_env_steps_trained: 51104
  num_env_steps_trained_this_iter: 2688
  num_healthy_workers: 3
  off_policy_estimator: {}
  perf:
    cpu_util_percent: 70.95
    ram_util_percent: 94.5
  pid: 11888
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.10525665388206647
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.07798196456066409
    mean_inference_ms: 1.1943969157090608
    mean_raw_obs_processing_ms: 0.22432358224892415
  sampler_results:
    custom_metrics: {}
    episode_len_mean: 120.44
    episode_media: {}
    episode_reward_max: 198.0
    episode_reward_mean: 120.44
    episode_reward_min: 17.0
    episodes_this_iter: 6
    hist_stats:
      episode_lengths:
      - 103
      - 31
      - 51
      - 41
      - 58
      - 94
      - 72
      - 53
      - 50
      - 34
      - 56
      - 102
      - 48
      - 17
      - 38
      - 58
      - 44
      - 84
      - 140
      - 87
      - 94
      - 174
      - 112
      - 110
      - 178
      - 135
      - 129
      - 137
      - 101
      - 158
      - 127
      - 109
      - 138
      - 153
      - 115
      - 170
      - 124
      - 127
      - 124
      - 136
      - 153
      - 168
      - 97
      - 142
      - 133
      - 165
      - 148
      - 136
      - 120
      - 117
      - 96
      - 93
      - 129
      - 113
      - 124
      - 123
      - 86
      - 129
      - 105
      - 115
      - 138
      - 106
      - 127
      - 113
      - 144
      - 128
      - 168
      - 107
      - 118
      - 143
      - 153
      - 112
      - 159
      - 148
      - 187
      - 173
      - 175
      - 166
      - 198
      - 148
      - 154
      - 131
      - 135
      - 132
      - 114
      - 125
      - 166
      - 99
      - 121
      - 110
      - 100
      - 119
      - 132
      - 145
      - 137
      - 163
      - 134
      - 164
      - 172
      - 176
      episode_reward:
      - 103.0
      - 31.0
      - 51.0
      - 41.0
      - 58.0
      - 94.0
      - 72.0
      - 53.0
      - 50.0
      - 34.0
      - 56.0
      - 102.0
      - 48.0
      - 17.0
      - 38.0
      - 58.0
      - 44.0
      - 84.0
      - 140.0
      - 87.0
      - 94.0
      - 174.0
      - 112.0
      - 110.0
      - 178.0
      - 135.0
      - 129.0
      - 137.0
      - 101.0
      - 158.0
      - 127.0
      - 109.0
      - 138.0
      - 153.0
      - 115.0
      - 170.0
      - 124.0
      - 127.0
      - 124.0
      - 136.0
      - 153.0
      - 168.0
      - 97.0
      - 142.0
      - 133.0
      - 165.0
      - 148.0
      - 136.0
      - 120.0
      - 117.0
      - 96.0
      - 93.0
      - 129.0
      - 113.0
      - 124.0
      - 123.0
      - 86.0
      - 129.0
      - 105.0
      - 115.0
      - 138.0
      - 106.0
      - 127.0
      - 113.0
      - 144.0
      - 128.0
      - 168.0
      - 107.0
      - 118.0
      - 143.0
      - 153.0
      - 112.0
      - 159.0
      - 148.0
      - 187.0
      - 173.0
      - 175.0
      - 166.0
      - 198.0
      - 148.0
      - 154.0
      - 131.0
      - 135.0
      - 132.0
      - 114.0
      - 125.0
      - 166.0
      - 99.0
      - 121.0
      - 110.0
      - 100.0
      - 119.0
      - 132.0
      - 145.0
      - 137.0
      - 163.0
      - 134.0
      - 164.0
      - 172.0
      - 176.0
    off_policy_estimator: {}
    policy_reward_max: {}
    policy_reward_mean: {}
    policy_reward_min: {}
    sampler_perf:
      mean_action_processing_ms: 0.10525665388206647
      mean_env_render_ms: 0.0
      mean_env_wait_ms: 0.07798196456066409
      mean_inference_ms: 1.1943969157090608
      mean_raw_obs_processing_ms: 0.22432358224892415
  time_since_restore: 51.58517932891846
  time_this_iter_s: 2.7187509536743164
  time_total_s: 51.58517932891846
  timers:
    learn_throughput: 20485.939
    learn_time_ms: 1.562
    load_throughput: 20482.188
    load_time_ms: 1.562
    synch_weights_time_ms: 0.0
    training_iteration_time_ms: 31.25
  timestamp: 1654863276
  timesteps_since_restore: 0
  timesteps_total: 20160
  training_iteration: 20
  trial_id: abcc0_00000
  warmup_time: 9.709909439086914

== Status ==
Current time: 2022-06-10 14:14:40 (running for 00:01:39.80)
Memory usage on this node: 10.6/15.8 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 8.0/12 CPUs, 0/1 GPUs, 0.0/4.04 GiB heap, 0.0/2.02 GiB objects
Current best trial: abcc0_00002 with episode_reward_mean=145.05 and parameters={'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, '_fake_gpus': False, 'custom_resources_per_worker': {}, 'placement_strategy': 'PACK', 'eager_tracing': False, 'eager_max_retraces': 20, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'env': 'CartPole-v0', 'env_config': {}, 'observation_space': None, 'action_space': None, 'env_task_fn': None, 'render_env': False, 'record_env': False, 'clip_rewards': None, 'normalize_actions': True, 'clip_actions': False, 'disable_env_checking': False, 'num_workers': 3, 'num_envs_per_worker': 1, 'sample_collector': <class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector'>, 'sample_async': False, 'rollout_fragment_length': 4, 'batch_mode': 'truncate_episodes', 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'ignore_worker_failures': False, 'recreate_failed_workers': False, 'horizon': None, 'soft_horizon': False, 'no_done_at_end': False, 'preprocessor_pref': 'deepmind', 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'compress_observations': False, 'gamma': 0.9579890222308893, 'lr': 0.0003559115980235469, 'train_batch_size': 32, 'model': {'_use_default_native_models': False, '_disable_preprocessor_api': False, '_disable_action_flattening': False, 'fcnet_hiddens': [256, 256], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': True, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'framestack': True, 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1}, 'optimizer': {}, 'explore': True, 'exploration_config': {'type': 'EpsilonGreedy', 'initial_epsilon': 1.0, 'final_epsilon': 0.02, 'epsilon_timesteps': 10000}, 'input_config': {}, 'actions_in_input_normalized': False, 'input_evaluation': [<class 'ray.rllib.offline.estimators.importance_sampling.ImportanceSampling'>, <class 'ray.rllib.offline.estimators.weighted_importance_sampling.WeightedImportanceSampling'>], 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_config': {}, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'evaluation_interval': None, 'evaluation_duration': 10, 'evaluation_duration_unit': 'episodes', 'evaluation_parallel_to_training': False, 'evaluation_config': {'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, '_fake_gpus': False, 'custom_resources_per_worker': {}, 'placement_strategy': 'PACK', 'eager_tracing': False, 'eager_max_retraces': 20, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'env': 'CartPole-v0', 'env_config': {}, 'observation_space': None, 'action_space': None, 'env_task_fn': None, 'render_env': False, 'record_env': False, 'clip_rewards': None, 'normalize_actions': True, 'clip_actions': False, 'disable_env_checking': False, 'num_workers': 3, 'num_envs_per_worker': 1, 'sample_collector': <class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector'>, 'sample_async': False, 'rollout_fragment_length': 4, 'batch_mode': 'truncate_episodes', 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'ignore_worker_failures': False, 'recreate_failed_workers': False, 'horizon': None, 'soft_horizon': False, 'no_done_at_end': False, 'preprocessor_pref': 'deepmind', 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'compress_observations': False, 'gamma': 0.9579890222308893, 'lr': 0.0003559115980235469, 'train_batch_size': 32, 'model': {'_use_default_native_models': False, '_disable_preprocessor_api': False, '_disable_action_flattening': False, 'fcnet_hiddens': [256, 256], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': True, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'framestack': True, 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1}, 'optimizer': {}, 'explore': False, 'exploration_config': {'type': 'EpsilonGreedy', 'initial_epsilon': 1.0, 'final_epsilon': 0.02, 'epsilon_timesteps': 10000}, 'input_config': {}, 'actions_in_input_normalized': False, 'input_evaluation': [<class 'ray.rllib.offline.estimators.importance_sampling.ImportanceSampling'>, <class 'ray.rllib.offline.estimators.weighted_importance_sampling.WeightedImportanceSampling'>], 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_config': {}, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'evaluation_interval': None, 'evaluation_duration': 10, 'evaluation_duration_unit': 'episodes', 'evaluation_parallel_to_training': False, 'evaluation_config': {'explore': False}, 'evaluation_num_workers': 0, 'always_attach_evaluation_results': False, 'in_evaluation': False, 'keep_per_episode_custom_metrics': False, 'metrics_episode_collection_timeout_s': 180, 'metrics_num_episodes_for_smoothing': 100, 'min_time_s_per_reporting': 1, 'min_train_timesteps_per_reporting': 0, 'min_sample_timesteps_per_reporting': 1000, 'logger_creator': None, 'logger_config': None, 'log_level': 'DEBUG', 'log_sys_usage': True, 'fake_sampler': False, 'seed': None, '_tf_policy_handles_more_than_one_loss': False, '_disable_preprocessor_api': False, '_disable_action_flattening': False, '_disable_execution_plan_api': True, 'simple_optimizer': False, 'monitor': -1, 'evaluation_num_episodes': -1, 'metrics_smoothing_episodes': -1, 'timesteps_per_iteration': -1, 'min_iter_time_s': -1, 'collect_metrics_timeout': -1, 'buffer_size': -1, 'prioritized_replay': -1, 'learning_starts': -1, 'replay_batch_size': -1, 'replay_sequence_length': None, 'prioritized_replay_alpha': -1, 'prioritized_replay_beta': -1, 'prioritized_replay_eps': -1, 'target_network_update_freq': 500, 'replay_buffer_config': {'type': <class 'ray.rllib.utils.replay_buffers.multi_agent_prioritized_replay_buffer.MultiAgentPrioritizedReplayBuffer'>, 'prioritized_replay': -1, 'capacity': 50000, 'prioritized_replay_alpha': 0.6, 'prioritized_replay_beta': 0.4, 'prioritized_replay_eps': 1e-06, 'replay_sequence_length': 1, 'worker_side_prioritization': False, 'replay_mode': 'independent', 'replay_batch_size': 32}, 'store_buffer_in_checkpoints': False, 'lr_schedule': None, 'adam_epsilon': 1e-08, 'grad_clip': 40, 'num_atoms': 1, 'v_min': -10.0, 'v_max': 10.0, 'noisy': False, 'sigma0': 0.5, 'dueling': True, 'hiddens': [256], 'double_q': True, 'n_step': 1, 'before_learn_on_batch': None, 'training_intensity': None, 'input': 'sampler', 'multiagent': {'policies': {'default_policy': PolicySpec(policy_class=<class 'ray.rllib.policy.tf_policy_template.DQNTFPolicy'>, observation_space=Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32), action_space=Discrete(2), config={})}, 'policy_map_capacity': 100, 'policy_map_cache': None, 'policy_mapping_fn': None, 'policies_to_train': None, 'observation_fn': None, 'replay_mode': 'independent', 'count_steps_by': 'env_steps'}, 'callbacks': <class 'ray.rllib.agents.callbacks.DefaultCallbacks'>, 'create_env_on_driver': False, 'custom_eval_function': None, 'framework': 'tf', 'num_cpus_for_driver': 1}, 'evaluation_num_workers': 0, 'always_attach_evaluation_results': False, 'in_evaluation': False, 'keep_per_episode_custom_metrics': False, 'metrics_episode_collection_timeout_s': 180, 'metrics_num_episodes_for_smoothing': 100, 'min_time_s_per_reporting': 1, 'min_train_timesteps_per_reporting': 0, 'min_sample_timesteps_per_reporting': 1000, 'logger_creator': None, 'logger_config': None, 'log_level': 'DEBUG', 'log_sys_usage': True, 'fake_sampler': False, 'seed': None, '_tf_policy_handles_more_than_one_loss': False, '_disable_preprocessor_api': False, '_disable_action_flattening': False, '_disable_execution_plan_api': True, 'simple_optimizer': False, 'monitor': -1, 'evaluation_num_episodes': -1, 'metrics_smoothing_episodes': -1, 'timesteps_per_iteration': -1, 'min_iter_time_s': -1, 'collect_metrics_timeout': -1, 'buffer_size': -1, 'prioritized_replay': -1, 'learning_starts': -1, 'replay_batch_size': -1, 'replay_sequence_length': None, 'prioritized_replay_alpha': -1, 'prioritized_replay_beta': -1, 'prioritized_replay_eps': -1, 'target_network_update_freq': 500, 'replay_buffer_config': {'type': <class 'ray.rllib.utils.replay_buffers.multi_agent_prioritized_replay_buffer.MultiAgentPrioritizedReplayBuffer'>, 'prioritized_replay': -1, 'capacity': 50000, 'prioritized_replay_alpha': 0.6, 'prioritized_replay_beta': 0.4, 'prioritized_replay_eps': 1e-06, 'replay_sequence_length': 1, 'worker_side_prioritization': False, 'replay_mode': 'independent', 'replay_batch_size': 32}, 'store_buffer_in_checkpoints': False, 'lr_schedule': None, 'adam_epsilon': 1e-08, 'grad_clip': 40, 'num_atoms': 1, 'v_min': -10.0, 'v_max': 10.0, 'noisy': False, 'sigma0': 0.5, 'dueling': True, 'hiddens': [256], 'double_q': True, 'n_step': 1, 'before_learn_on_batch': None, 'training_intensity': None, 'input': 'sampler', 'multiagent': {'policies': {'default_policy': PolicySpec(policy_class=<class 'ray.rllib.policy.tf_policy_template.DQNTFPolicy'>, observation_space=Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32), action_space=Discrete(2), config={})}, 'policy_map_capacity': 100, 'policy_map_cache': None, 'policy_mapping_fn': None, 'policies_to_train': None, 'observation_fn': None, 'replay_mode': 'independent', 'count_steps_by': 'env_steps'}, 'callbacks': <class 'ray.rllib.agents.callbacks.DefaultCallbacks'>, 'create_env_on_driver': False, 'custom_eval_function': None, 'framework': 'tf', 'num_cpus_for_driver': 1}
Result logdir: D:\ML\test_RLlib\results\DQNTrainer_2022-06-10_14-13-00
Number of trials: 3/3 (1 ERROR, 2 RUNNING)
+------------------------------------+----------+-----------------+----------+-------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+
| Trial name                         | status   | loc             |    gamma |          lr |   iter |   total time (s) |    ts |   reward |   episode_reward_max |   episode_reward_min |   episode_len_mean |
|------------------------------------+----------+-----------------+----------+-------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------|
| DQNTrainer_CartPole-v0_abcc0_00001 | RUNNING  | 127.0.0.1:22368 | 0.921279 | 0.000754346 |     20 |          51.5196 | 20160 |   140.57 |                  200 |                   24 |             140.57 |
| DQNTrainer_CartPole-v0_abcc0_00002 | RUNNING  | 127.0.0.1:12200 | 0.957989 | 0.000355912 |     20 |          51.8039 | 20160 |   145.05 |                  200 |                    9 |             145.05 |
| DQNTrainer_CartPole-v0_abcc0_00000 | ERROR    | 127.0.0.1:11888 | 0.948385 | 0.000150244 |     20 |          51.5852 | 20160 |   120.44 |                  198 |                   17 |             120.44 |
+------------------------------------+----------+-----------------+----------+-------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+
Number of errored trials: 1
+------------------------------------+--------------+----------------------------------------------------------------------------------------------------------------------------+
| Trial name                         |   # failures | error file                                                                                                                 |
|------------------------------------+--------------+----------------------------------------------------------------------------------------------------------------------------|
| DQNTrainer_CartPole-v0_abcc0_00000 |            1 | D:\ML\test_RLlib\results\DQNTrainer_2022-06-10_14-13-00\DQNTrainer_CartPole-v0_abcc0_00000_0_2022-06-10_14-13-00\error.txt |
+------------------------------------+--------------+----------------------------------------------------------------------------------------------------------------------------+

2022-06-10 14:14:40,391 ERROR trial_runner.py:886 -- Trial DQNTrainer_CartPole-v0_abcc0_00002: Error processing event.
NoneType: None
Result for DQNTrainer_CartPole-v0_abcc0_00002:
  agent_timesteps_total: 20160
  counters:
    last_target_update_ts: 20160
    num_agent_steps_sampled: 20160
    num_agent_steps_trained: 51104
    num_env_steps_sampled: 20160
    num_env_steps_trained: 51104
    num_target_updates: 39
  custom_metrics: {}
  date: 2022-06-10_14-14-37
  done: false
  episode_len_mean: 145.05
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 145.05
  episode_reward_min: 9.0
  episodes_this_iter: 5
  episodes_total: 314
  experiment_id: e41e93a18aeb40159aac256448355baf
  experiment_tag: '2'
  hostname: DESKTOP-IH6PS6N
  info:
    last_target_update_ts: 20160
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_lr: 0.0003559115866664797
          max_q: 23.171512603759766
          mean_q: 21.19548797607422
          mean_td_error: 0.6148348450660706
          min_q: 13.149581909179688
          model: {}
        num_agent_steps_trained: 32.0
        td_error:
        - -0.05719184875488281
        - -1.0752925872802734
        - 0.172882080078125
        - 0.6037330627441406
        - 0.6031627655029297
        - 0.41985321044921875
        - -0.424652099609375
        - -0.17944717407226562
        - 0.47750282287597656
        - 14.227084159851074
        - 0.08045196533203125
        - 0.4776134490966797
        - 0.36383819580078125
        - 0.32493019104003906
        - 0.5207614898681641
        - -0.18519973754882812
        - 0.48108482360839844
        - 0.5255012512207031
        - 0.18582534790039062
        - -0.45144081115722656
        - 0.4896736145019531
        - -0.18197059631347656
        - -0.5692501068115234
        - 1.8208446502685547
        - -1.270355224609375
        - -0.2646903991699219
        - 0.10660362243652344
        - 0.5325069427490234
        - 1.428915023803711
        - 0.564300537109375
        - -0.07600593566894531
        - 0.003143310546875
    num_agent_steps_sampled: 20160
    num_agent_steps_trained: 51104
    num_env_steps_sampled: 20160
    num_env_steps_trained: 51104
    num_target_updates: 39
  iterations_since_restore: 20
  node_ip: 127.0.0.1
  num_agent_steps_sampled: 20160
  num_agent_steps_trained: 51104
  num_env_steps_sampled: 20160
  num_env_steps_sampled_this_iter: 1008
  num_env_steps_trained: 51104
  num_env_steps_trained_this_iter: 2688
  num_healthy_workers: 3
  off_policy_estimator: {}
  perf:
    cpu_util_percent: 73.325
    ram_util_percent: 94.55000000000001
  pid: 12200
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.10668920705483345
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.09492990508066096
    mean_inference_ms: 1.2676988860450573
    mean_raw_obs_processing_ms: 0.22440369556604942
  sampler_results:
    custom_metrics: {}
    episode_len_mean: 145.05
    episode_media: {}
    episode_reward_max: 200.0
    episode_reward_mean: 145.05
    episode_reward_min: 9.0
    episodes_this_iter: 5
    hist_stats:
      episode_lengths:
      - 32
      - 9
      - 15
      - 200
      - 55
      - 72
      - 120
      - 42
      - 22
      - 16
      - 32
      - 64
      - 53
      - 106
      - 24
      - 102
      - 183
      - 160
      - 27
      - 200
      - 145
      - 200
      - 200
      - 200
      - 81
      - 200
      - 200
      - 200
      - 200
      - 200
      - 200
      - 200
      - 199
      - 147
      - 179
      - 200
      - 200
      - 146
      - 200
      - 150
      - 200
      - 200
      - 200
      - 191
      - 200
      - 200
      - 200
      - 171
      - 200
      - 200
      - 200
      - 162
      - 177
      - 105
      - 71
      - 47
      - 101
      - 90
      - 49
      - 87
      - 100
      - 54
      - 85
      - 108
      - 153
      - 132
      - 179
      - 176
      - 138
      - 144
      - 148
      - 152
      - 155
      - 145
      - 140
      - 152
      - 183
      - 182
      - 118
      - 139
      - 136
      - 138
      - 135
      - 139
      - 139
      - 200
      - 184
      - 183
      - 123
      - 136
      - 200
      - 200
      - 200
      - 200
      - 177
      - 200
      - 200
      - 200
      - 200
      - 200
      episode_reward:
      - 32.0
      - 9.0
      - 15.0
      - 200.0
      - 55.0
      - 72.0
      - 120.0
      - 42.0
      - 22.0
      - 16.0
      - 32.0
      - 64.0
      - 53.0
      - 106.0
      - 24.0
      - 102.0
      - 183.0
      - 160.0
      - 27.0
      - 200.0
      - 145.0
      - 200.0
      - 200.0
      - 200.0
      - 81.0
      - 200.0
      - 200.0
      - 200.0
      - 200.0
      - 200.0
      - 200.0
      - 200.0
      - 199.0
      - 147.0
      - 179.0
      - 200.0
      - 200.0
      - 146.0
      - 200.0
      - 150.0
      - 200.0
      - 200.0
      - 200.0
      - 191.0
      - 200.0
      - 200.0
      - 200.0
      - 171.0
      - 200.0
      - 200.0
      - 200.0
      - 162.0
      - 177.0
      - 105.0
      - 71.0
      - 47.0
      - 101.0
      - 90.0
      - 49.0
      - 87.0
      - 100.0
      - 54.0
      - 85.0
      - 108.0
      - 153.0
      - 132.0
      - 179.0
      - 176.0
      - 138.0
      - 144.0
      - 148.0
      - 152.0
      - 155.0
      - 145.0
      - 140.0
      - 152.0
      - 183.0
      - 182.0
      - 118.0
      - 139.0
      - 136.0
      - 138.0
      - 135.0
      - 139.0
      - 139.0
      - 200.0
      - 184.0
      - 183.0
      - 123.0
      - 136.0
      - 200.0
      - 200.0
      - 200.0
      - 200.0
      - 177.0
      - 200.0
      - 200.0
      - 200.0
      - 200.0
      - 200.0
    off_policy_estimator: {}
    policy_reward_max: {}
    policy_reward_mean: {}
    policy_reward_min: {}
    sampler_perf:
      mean_action_processing_ms: 0.10668920705483345
      mean_env_render_ms: 0.0
      mean_env_wait_ms: 0.09492990508066096
      mean_inference_ms: 1.2676988860450573
      mean_raw_obs_processing_ms: 0.22440369556604942
  time_since_restore: 51.803911447525024
  time_this_iter_s: 2.546875476837158
  time_total_s: 51.803911447525024
  timers:
    learn_throughput: 0.0
    learn_time_ms: 0.0
    load_throughput: 0.0
    load_time_ms: 0.0
    synch_weights_time_ms: 6.25
    training_iteration_time_ms: 31.25
  timestamp: 1654863277
  timesteps_since_restore: 0
  timesteps_total: 20160
  training_iteration: 20
  trial_id: abcc0_00002
  warmup_time: 8.938966274261475

(DQNTrainer pid=22368) 2022-06-10 14:14:40,797  WARNING trainer.py:546 -- Worker crashed during call to `step_attempt()`. To try to continue training without failed worker(s), set `ignore_worker_failures=True`. To try to recover the failed worker(s), set `recreate_failed_workers=True`.
(DQNTrainer pid=22368) Stack (most recent call first):
(DQNTrainer pid=22368)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\utils.py", line 116 in push_error_to_driver
(DQNTrainer pid=22368)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 451 in main_loop
(DQNTrainer pid=22368)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 238 in <module>
(pid=) [2022-06-10 14:14:40,799 E 9364 20076] (gcs_server.exe) gcs_server.cc:294: Failed to get the resource load: GrpcUnavailable: RPC Error message: failed to connect to all addresses; RPC Error details:
(pid=) [2022-06-10 14:14:40,799 E 9364 20076] (gcs_server.exe) gcs_server.cc:294: Failed to get the resource load: GrpcUnavailable: RPC Error message: failed to connect to all addresses; RPC Error details:
(pid=) [2022-06-10 14:14:40,923 E 9364 20076] (gcs_server.exe) gcs_server.cc:294: Failed to get the resource load: GrpcUnavailable: RPC Error message: failed to connect to all addresses; RPC Error details:
(pid=) [2022-06-10 14:14:41,933 E 9364 20076] (gcs_server.exe) gcs_server.cc:294: Failed to get the resource load: GrpcUnavailable: RPC Error message: failed to connect to all addresses; RPC Error details:
2022-06-10 14:14:42,386 ERROR trial_runner.py:886 -- Trial DQNTrainer_CartPole-v0_abcc0_00001: Error processing event.
NoneType: None
Result for DQNTrainer_CartPole-v0_abcc0_00001:
  agent_timesteps_total: 20160
  counters:
    last_target_update_ts: 20160
    num_agent_steps_sampled: 20160
    num_agent_steps_trained: 51104
    num_env_steps_sampled: 20160
    num_env_steps_trained: 51104
    num_target_updates: 39
  custom_metrics: {}
  date: 2022-06-10_14-14-36
  done: false
  episode_len_mean: 140.57
  episode_media: {}
  episode_reward_max: 200.0
  episode_reward_mean: 140.57
  episode_reward_min: 24.0
  episodes_this_iter: 9
  episodes_total: 331
  experiment_id: 654971ccf0804e15806b20a8999dfda9
  experiment_tag: '1'
  hostname: DESKTOP-IH6PS6N
  info:
    last_target_update_ts: 20160
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_lr: 0.0007543457322753966
          max_q: 13.48817253112793
          mean_q: 11.894736289978027
          mean_td_error: 0.01804572343826294
          min_q: 5.995281219482422
          model: {}
        num_agent_steps_trained: 32.0
        td_error:
        - -0.1672649383544922
        - -0.21295738220214844
        - 0.2727012634277344
        - -0.26985740661621094
        - -0.10014915466308594
        - 0.18021106719970703
        - 0.34715843200683594
        - -0.5478029251098633
        - 0.2668333053588867
        - -0.15447235107421875
        - -0.3227243423461914
        - -0.45029163360595703
        - -0.30243873596191406
        - 0.016755104064941406
        - 0.1908550262451172
        - -0.5886564254760742
        - -0.06766986846923828
        - 0.34069156646728516
        - 0.33641910552978516
        - -0.7434234619140625
        - -0.23517131805419922
        - 0.060619354248046875
        - -0.6940898895263672
        - 4.995281219482422
        - -0.7546262741088867
        - -0.08367347717285156
        - 0.02113056182861328
        - 0.0528717041015625
        - 0.2747077941894531
        - -0.10827159881591797
        - -0.8170671463012695
        - -0.15816402435302734
    num_agent_steps_sampled: 20160
    num_agent_steps_trained: 51104
    num_env_steps_sampled: 20160
    num_env_steps_trained: 51104
    num_target_updates: 39
  iterations_since_restore: 20
  node_ip: 127.0.0.1
  num_agent_steps_sampled: 20160
  num_agent_steps_trained: 51104
  num_env_steps_sampled: 20160
  num_env_steps_sampled_this_iter: 1008
  num_env_steps_trained: 51104
  num_env_steps_trained_this_iter: 2688
  num_healthy_workers: 3
  off_policy_estimator: {}
  perf:
    cpu_util_percent: 71.125
    ram_util_percent: 94.5
  pid: 22368
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.12795337718083277
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.08253337092390917
    mean_inference_ms: 1.1938984049080479
    mean_raw_obs_processing_ms: 0.21474479833075882
  sampler_results:
    custom_metrics: {}
    episode_len_mean: 140.57
    episode_media: {}
    episode_reward_max: 200.0
    episode_reward_mean: 140.57
    episode_reward_min: 24.0
    episodes_this_iter: 9
    hist_stats:
      episode_lengths:
      - 31
      - 35
      - 51
      - 90
      - 53
      - 170
      - 28
      - 27
      - 92
      - 63
      - 156
      - 79
      - 60
      - 27
      - 99
      - 24
      - 158
      - 97
      - 91
      - 115
      - 200
      - 200
      - 65
      - 200
      - 137
      - 165
      - 200
      - 155
      - 111
      - 153
      - 77
      - 129
      - 180
      - 117
      - 128
      - 130
      - 87
      - 101
      - 121
      - 200
      - 153
      - 105
      - 147
      - 158
      - 189
      - 132
      - 127
      - 139
      - 112
      - 123
      - 189
      - 163
      - 200
      - 169
      - 200
      - 178
      - 200
      - 200
      - 178
      - 200
      - 178
      - 200
      - 112
      - 200
      - 168
      - 167
      - 143
      - 100
      - 200
      - 193
      - 200
      - 200
      - 186
      - 164
      - 177
      - 200
      - 159
      - 195
      - 200
      - 172
      - 200
      - 200
      - 143
      - 200
      - 154
      - 200
      - 135
      - 138
      - 135
      - 156
      - 139
      - 97
      - 125
      - 121
      - 200
      - 108
      - 113
      - 125
      - 83
      - 107
      episode_reward:
      - 31.0
      - 35.0
      - 51.0
      - 90.0
      - 53.0
      - 170.0
      - 28.0
      - 27.0
      - 92.0
      - 63.0
      - 156.0
      - 79.0
      - 60.0
      - 27.0
      - 99.0
      - 24.0
      - 158.0
      - 97.0
      - 91.0
      - 115.0
      - 200.0
      - 200.0
      - 65.0
      - 200.0
      - 137.0
      - 165.0
      - 200.0
      - 155.0
      - 111.0
      - 153.0
      - 77.0
      - 129.0
      - 180.0
      - 117.0
      - 128.0
      - 130.0
      - 87.0
      - 101.0
      - 121.0
      - 200.0
      - 153.0
      - 105.0
      - 147.0
      - 158.0
      - 189.0
      - 132.0
      - 127.0
      - 139.0
      - 112.0
      - 123.0
      - 189.0
      - 163.0
      - 200.0
      - 169.0
      - 200.0
      - 178.0
      - 200.0
      - 200.0
      - 178.0
      - 200.0
      - 178.0
      - 200.0
      - 112.0
      - 200.0
      - 168.0
      - 167.0
      - 143.0
      - 100.0
      - 200.0
      - 193.0
      - 200.0
      - 200.0
      - 186.0
      - 164.0
      - 177.0
      - 200.0
      - 159.0
      - 195.0
      - 200.0
      - 172.0
      - 200.0
      - 200.0
      - 143.0
      - 200.0
      - 154.0
      - 200.0
      - 135.0
      - 138.0
      - 135.0
      - 156.0
      - 139.0
      - 97.0
      - 125.0
      - 121.0
      - 200.0
      - 108.0
      - 113.0
      - 125.0
      - 83.0
      - 107.0
    off_policy_estimator: {}
    policy_reward_max: {}
    policy_reward_mean: {}
    policy_reward_min: {}
    sampler_perf:
      mean_action_processing_ms: 0.12795337718083277
      mean_env_render_ms: 0.0
      mean_env_wait_ms: 0.08253337092390917
      mean_inference_ms: 1.1938984049080479
      mean_raw_obs_processing_ms: 0.21474479833075882
  time_since_restore: 51.519564390182495
  time_this_iter_s: 2.7031259536743164
  time_total_s: 51.519564390182495
  timers:
    learn_throughput: 10240.078
    learn_time_ms: 3.125
    load_throughput: 0.0
    load_time_ms: 0.0
    synch_weights_time_ms: 4.688
    training_iteration_time_ms: 29.687
  timestamp: 1654863276
  timesteps_since_restore: 0
  timesteps_total: 20160
  training_iteration: 20
  trial_id: abcc0_00001
  warmup_time: 8.743526935577393

== Status ==
Current time: 2022-06-10 14:14:42 (running for 00:01:41.87)
Memory usage on this node: 9.1/15.8 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 0/12 CPUs, 0/1 GPUs, 0.0/4.04 GiB heap, 0.0/2.02 GiB objects
Current best trial: abcc0_00002 with episode_reward_mean=145.05 and parameters={'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, '_fake_gpus': False, 'custom_resources_per_worker': {}, 'placement_strategy': 'PACK', 'eager_tracing': False, 'eager_max_retraces': 20, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'env': 'CartPole-v0', 'env_config': {}, 'observation_space': None, 'action_space': None, 'env_task_fn': None, 'render_env': False, 'record_env': False, 'clip_rewards': None, 'normalize_actions': True, 'clip_actions': False, 'disable_env_checking': False, 'num_workers': 3, 'num_envs_per_worker': 1, 'sample_collector': <class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector'>, 'sample_async': False, 'rollout_fragment_length': 4, 'batch_mode': 'truncate_episodes', 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'ignore_worker_failures': False, 'recreate_failed_workers': False, 'horizon': None, 'soft_horizon': False, 'no_done_at_end': False, 'preprocessor_pref': 'deepmind', 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'compress_observations': False, 'gamma': 0.9579890222308893, 'lr': 0.0003559115980235469, 'train_batch_size': 32, 'model': {'_use_default_native_models': False, '_disable_preprocessor_api': False, '_disable_action_flattening': False, 'fcnet_hiddens': [256, 256], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': True, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'framestack': True, 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1}, 'optimizer': {}, 'explore': True, 'exploration_config': {'type': 'EpsilonGreedy', 'initial_epsilon': 1.0, 'final_epsilon': 0.02, 'epsilon_timesteps': 10000}, 'input_config': {}, 'actions_in_input_normalized': False, 'input_evaluation': [<class 'ray.rllib.offline.estimators.importance_sampling.ImportanceSampling'>, <class 'ray.rllib.offline.estimators.weighted_importance_sampling.WeightedImportanceSampling'>], 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_config': {}, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'evaluation_interval': None, 'evaluation_duration': 10, 'evaluation_duration_unit': 'episodes', 'evaluation_parallel_to_training': False, 'evaluation_config': {'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, '_fake_gpus': False, 'custom_resources_per_worker': {}, 'placement_strategy': 'PACK', 'eager_tracing': False, 'eager_max_retraces': 20, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'env': 'CartPole-v0', 'env_config': {}, 'observation_space': None, 'action_space': None, 'env_task_fn': None, 'render_env': False, 'record_env': False, 'clip_rewards': None, 'normalize_actions': True, 'clip_actions': False, 'disable_env_checking': False, 'num_workers': 3, 'num_envs_per_worker': 1, 'sample_collector': <class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector'>, 'sample_async': False, 'rollout_fragment_length': 4, 'batch_mode': 'truncate_episodes', 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'ignore_worker_failures': False, 'recreate_failed_workers': False, 'horizon': None, 'soft_horizon': False, 'no_done_at_end': False, 'preprocessor_pref': 'deepmind', 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'compress_observations': False, 'gamma': 0.9579890222308893, 'lr': 0.0003559115980235469, 'train_batch_size': 32, 'model': {'_use_default_native_models': False, '_disable_preprocessor_api': False, '_disable_action_flattening': False, 'fcnet_hiddens': [256, 256], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': True, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'framestack': True, 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1}, 'optimizer': {}, 'explore': False, 'exploration_config': {'type': 'EpsilonGreedy', 'initial_epsilon': 1.0, 'final_epsilon': 0.02, 'epsilon_timesteps': 10000}, 'input_config': {}, 'actions_in_input_normalized': False, 'input_evaluation': [<class 'ray.rllib.offline.estimators.importance_sampling.ImportanceSampling'>, <class 'ray.rllib.offline.estimators.weighted_importance_sampling.WeightedImportanceSampling'>], 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_config': {}, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'evaluation_interval': None, 'evaluation_duration': 10, 'evaluation_duration_unit': 'episodes', 'evaluation_parallel_to_training': False, 'evaluation_config': {'explore': False}, 'evaluation_num_workers': 0, 'always_attach_evaluation_results': False, 'in_evaluation': False, 'keep_per_episode_custom_metrics': False, 'metrics_episode_collection_timeout_s': 180, 'metrics_num_episodes_for_smoothing': 100, 'min_time_s_per_reporting': 1, 'min_train_timesteps_per_reporting': 0, 'min_sample_timesteps_per_reporting': 1000, 'logger_creator': None, 'logger_config': None, 'log_level': 'DEBUG', 'log_sys_usage': True, 'fake_sampler': False, 'seed': None, '_tf_policy_handles_more_than_one_loss': False, '_disable_preprocessor_api': False, '_disable_action_flattening': False, '_disable_execution_plan_api': True, 'simple_optimizer': False, 'monitor': -1, 'evaluation_num_episodes': -1, 'metrics_smoothing_episodes': -1, 'timesteps_per_iteration': -1, 'min_iter_time_s': -1, 'collect_metrics_timeout': -1, 'buffer_size': -1, 'prioritized_replay': -1, 'learning_starts': -1, 'replay_batch_size': -1, 'replay_sequence_length': None, 'prioritized_replay_alpha': -1, 'prioritized_replay_beta': -1, 'prioritized_replay_eps': -1, 'target_network_update_freq': 500, 'replay_buffer_config': {'type': <class 'ray.rllib.utils.replay_buffers.multi_agent_prioritized_replay_buffer.MultiAgentPrioritizedReplayBuffer'>, 'prioritized_replay': -1, 'capacity': 50000, 'prioritized_replay_alpha': 0.6, 'prioritized_replay_beta': 0.4, 'prioritized_replay_eps': 1e-06, 'replay_sequence_length': 1, 'worker_side_prioritization': False, 'replay_mode': 'independent', 'replay_batch_size': 32}, 'store_buffer_in_checkpoints': False, 'lr_schedule': None, 'adam_epsilon': 1e-08, 'grad_clip': 40, 'num_atoms': 1, 'v_min': -10.0, 'v_max': 10.0, 'noisy': False, 'sigma0': 0.5, 'dueling': True, 'hiddens': [256], 'double_q': True, 'n_step': 1, 'before_learn_on_batch': None, 'training_intensity': None, 'input': 'sampler', 'multiagent': {'policies': {'default_policy': PolicySpec(policy_class=<class 'ray.rllib.policy.tf_policy_template.DQNTFPolicy'>, observation_space=Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32), action_space=Discrete(2), config={})}, 'policy_map_capacity': 100, 'policy_map_cache': None, 'policy_mapping_fn': None, 'policies_to_train': None, 'observation_fn': None, 'replay_mode': 'independent', 'count_steps_by': 'env_steps'}, 'callbacks': <class 'ray.rllib.agents.callbacks.DefaultCallbacks'>, 'create_env_on_driver': False, 'custom_eval_function': None, 'framework': 'tf', 'num_cpus_for_driver': 1}, 'evaluation_num_workers': 0, 'always_attach_evaluation_results': False, 'in_evaluation': False, 'keep_per_episode_custom_metrics': False, 'metrics_episode_collection_timeout_s': 180, 'metrics_num_episodes_for_smoothing': 100, 'min_time_s_per_reporting': 1, 'min_train_timesteps_per_reporting': 0, 'min_sample_timesteps_per_reporting': 1000, 'logger_creator': None, 'logger_config': None, 'log_level': 'DEBUG', 'log_sys_usage': True, 'fake_sampler': False, 'seed': None, '_tf_policy_handles_more_than_one_loss': False, '_disable_preprocessor_api': False, '_disable_action_flattening': False, '_disable_execution_plan_api': True, 'simple_optimizer': False, 'monitor': -1, 'evaluation_num_episodes': -1, 'metrics_smoothing_episodes': -1, 'timesteps_per_iteration': -1, 'min_iter_time_s': -1, 'collect_metrics_timeout': -1, 'buffer_size': -1, 'prioritized_replay': -1, 'learning_starts': -1, 'replay_batch_size': -1, 'replay_sequence_length': None, 'prioritized_replay_alpha': -1, 'prioritized_replay_beta': -1, 'prioritized_replay_eps': -1, 'target_network_update_freq': 500, 'replay_buffer_config': {'type': <class 'ray.rllib.utils.replay_buffers.multi_agent_prioritized_replay_buffer.MultiAgentPrioritizedReplayBuffer'>, 'prioritized_replay': -1, 'capacity': 50000, 'prioritized_replay_alpha': 0.6, 'prioritized_replay_beta': 0.4, 'prioritized_replay_eps': 1e-06, 'replay_sequence_length': 1, 'worker_side_prioritization': False, 'replay_mode': 'independent', 'replay_batch_size': 32}, 'store_buffer_in_checkpoints': False, 'lr_schedule': None, 'adam_epsilon': 1e-08, 'grad_clip': 40, 'num_atoms': 1, 'v_min': -10.0, 'v_max': 10.0, 'noisy': False, 'sigma0': 0.5, 'dueling': True, 'hiddens': [256], 'double_q': True, 'n_step': 1, 'before_learn_on_batch': None, 'training_intensity': None, 'input': 'sampler', 'multiagent': {'policies': {'default_policy': PolicySpec(policy_class=<class 'ray.rllib.policy.tf_policy_template.DQNTFPolicy'>, observation_space=Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32), action_space=Discrete(2), config={})}, 'policy_map_capacity': 100, 'policy_map_cache': None, 'policy_mapping_fn': None, 'policies_to_train': None, 'observation_fn': None, 'replay_mode': 'independent', 'count_steps_by': 'env_steps'}, 'callbacks': <class 'ray.rllib.agents.callbacks.DefaultCallbacks'>, 'create_env_on_driver': False, 'custom_eval_function': None, 'framework': 'tf', 'num_cpus_for_driver': 1}
Result logdir: D:\ML\test_RLlib\results\DQNTrainer_2022-06-10_14-13-00
Number of trials: 3/3 (3 ERROR)
+------------------------------------+----------+-----------------+----------+-------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+
| Trial name                         | status   | loc             |    gamma |          lr |   iter |   total time (s) |    ts |   reward |   episode_reward_max |   episode_reward_min |   episode_len_mean |
|------------------------------------+----------+-----------------+----------+-------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------|
| DQNTrainer_CartPole-v0_abcc0_00000 | ERROR    | 127.0.0.1:11888 | 0.948385 | 0.000150244 |     20 |          51.5852 | 20160 |   120.44 |                  198 |                   17 |             120.44 |
| DQNTrainer_CartPole-v0_abcc0_00001 | ERROR    | 127.0.0.1:22368 | 0.921279 | 0.000754346 |     20 |          51.5196 | 20160 |   140.57 |                  200 |                   24 |             140.57 |
| DQNTrainer_CartPole-v0_abcc0_00002 | ERROR    | 127.0.0.1:12200 | 0.957989 | 0.000355912 |     20 |          51.8039 | 20160 |   145.05 |                  200 |                    9 |             145.05 |
+------------------------------------+----------+-----------------+----------+-------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+
Number of errored trials: 3
+------------------------------------+--------------+----------------------------------------------------------------------------------------------------------------------------+
| Trial name                         |   # failures | error file                                                                                                                 |
|------------------------------------+--------------+----------------------------------------------------------------------------------------------------------------------------|
| DQNTrainer_CartPole-v0_abcc0_00000 |            1 | D:\ML\test_RLlib\results\DQNTrainer_2022-06-10_14-13-00\DQNTrainer_CartPole-v0_abcc0_00000_0_2022-06-10_14-13-00\error.txt |
| DQNTrainer_CartPole-v0_abcc0_00001 |            1 | D:\ML\test_RLlib\results\DQNTrainer_2022-06-10_14-13-00\DQNTrainer_CartPole-v0_abcc0_00001_1_2022-06-10_14-13-16\error.txt |
| DQNTrainer_CartPole-v0_abcc0_00002 |            1 | D:\ML\test_RLlib\results\DQNTrainer_2022-06-10_14-13-00\DQNTrainer_CartPole-v0_abcc0_00002_2_2022-06-10_14-13-30\error.txt |
+------------------------------------+--------------+----------------------------------------------------------------------------------------------------------------------------+

2022-06-10 14:14:42,464 ERROR ray_trial_executor.py:107 -- An exception occurred when trying to stop the Ray actor:Traceback (most recent call last):
  File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\tune\ray_trial_executor.py", line 98, in post_stop_cleanup
    ray.get(future, timeout=0)
  File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 1845, in get
    raise value
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
        class_name: DQNTrainer
        actor_id: 597648098ad48add1a4d5fd001000000
        pid: 11888
        namespace: 5da0e462-e686-4c27-bc17-342ca89eed52
        ip: 127.0.0.1
The actor is dead because because all references to the actor were removed.

(pid=) [2022-06-10 14:14:42,933 E 9364 20076] (gcs_server.exe) gcs_server.cc:294: Failed to get the resource load: GrpcUnavailable: RPC Error message: failed to connect to all addresses; RPC Error details:
(pid=) [2022-06-10 14:14:43,949 E 9364 20076] (gcs_server.exe) gcs_server.cc:294: Failed to get the resource load: GrpcUnavailable: RPC Error message: failed to connect to all addresses; RPC Error details:
Traceback (most recent call last):
  File "D:\ML\test_RLlib\test\main.py", line 38, in <module>
    tune.run(DQNTrainer, scheduler=pbt,
  File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\tune\tune.py", line 746, in run
    raise TuneError("Trials did not complete", incomplete_trials)
ray.tune.error.TuneError: ('Trials did not complete', [DQNTrainer_CartPole-v0_abcc0_00000, DQNTrainer_CartPole-v0_abcc0_00001, DQNTrainer_CartPole-v0_abcc0_00002])

Error file of each worker are the same:

Failure # 1 (occurred at 2022-06-10_14-14-40)
Traceback (most recent call last):
  File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\tune\ray_trial_executor.py", line 934, in get_next_executor_event
    future_result = ray.get(ready_future)
  File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 1845, in get
    raise value
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.

Edit: (mattip) put the error log into a <details> block to hide it

@mattip
Copy link
Contributor

mattip commented Jun 12, 2022

@Peter-P779 did you change anything in the script or install instructions? Which nightly did you use?

@Peter-P779
Copy link
Author

Peter-P779 commented Jun 12, 2022

I didn't change anything in the script except deleting "record_env":True. I loaded the environment and executed following commands:

pip uninstall -y ray
pip install -U https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp39-cp39-win_amd64.whl

The version is the windows python 3.9 nightly:
D:\ML\test_RLlib>ray --version
ray, version 2.0.0.dev0

@adlerjan
Copy link

Hello all,

I can reproduce the crash on my Windows Desktop on both the current nightly and pypi release.

I stumbled over this Issue investigating an unexpected crash using only Ray Core, which exclusively occurs on my Home Desktop.
On other systems (work notebook, high performance cluster, linux notebook) ray works like a charm.
Exactly the same thing, afte ~90s of runtime the system crashes with the identical message, sometimes with an access violation error atop at the end.

@mattip
Copy link
Contributor

mattip commented Nov 15, 2022

TL;DR: I could not reproduce. If someone can still reproduce this, please report what you did using the comment below as a template, starting from a vanilla python installation.

And in too much detail:

Here is the script I used

Modified script
import ray
from ray import tune
from ray.rllib.algorithms.dqn.dqn import DQN as DQNTrainer
from ray.tune.schedulers import PopulationBasedTraining
import gym
import random

config = {
    "env":"CartPole-v0",
    "num_workers":3,
    "num_gpus": 0,
    "framework":"tf",
    }

if __name__ == "__main__":

    pbt = PopulationBasedTraining(
        time_attr="time_total_s",
        perturbation_interval=7200,
        resample_probability=0.25,
        hyperparam_mutations={
            "lr": lambda: random.uniform(1e-3, 5e-5),
            "gamma": lambda: random.uniform(0.90, 0.99),
        },

    )

    import tensorflow as tf
    ray.init()

    tune.run(DQNTrainer, scheduler=pbt,
             config=config,
             num_samples=3,
             metric="episode_reward_mean",
             mode="max",
             local_dir="./results",
             sync_config=tune.SyncConfig(syncer=None),
             checkpoint_freq=500,
             keep_checkpoints_num=20)

    ray.shutdown()

Here is what I did

CPython39\python.exe -m venv d:\temp\issue24955
d:\temp\issue24955\Scripts\activate
>python -c "import sys; print(sys.version)"
3.9.6 (tags/v3.9.6:db3ff76, Jun 28 2021, 15:26:21) [MSC v.1929 64 bit (AMD64)]
>pip install "ray==2.1,0" "ray[rllib]==2.1.0" "ray[default]==2.1.0" 
>pip install "ray[tune]==2.1.0" "gym==0.23.1" "tensorflow==2.10.0"
>pip install pygame gpuutils pywin32
REM copy script to d:\temp\issue24955.py
> python d:\temp\issue24955.py

I then get a number of diagonostic messages on startup with hints to improve the script

Startup messages ``` 2022-11-15 15:32:12,844 INFO worker.py:1519 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 d:\temp\issue24955\lib\site-packages\ray\tune\tune.py:523: UserWarning: Consider boosting PBT performance by enabling `reuse_actors` as well as implementing `reset_config` for Trainable. warnings.warn( 2022-11-15 15:32:14,520 WARNING trial_runner.py:1604 -- You are trying to access _search_alg interface of TrialRunner in TrialScheduler, which is being restricted. If you believe it is reasonable for your scheduler to access this TrialRunner API, please reach out to Ray team on GitHub. A more strict API access pattern would be enforced starting 1.12s.0 (DQN pid=10472) 2022-11-15 15:32:19,748 INFO algorithm.py:2303 -- Your framework setting is 'tf', meaning you are using static-graph mode. Set framework='tf2' to enable eager execution with tf2.x. You may also then want to set eager_tracing=True in order to reach similar execution speed as with static-graph mode. (DQN pid=10472) 2022-11-15 15:32:19,748 INFO simple_q.py:307 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting `simple_optimizer=True` if this doesn't work for you. (DQN pid=10472) 2022-11-15 15:32:19,748 INFO algorithm.py:457 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags. (RolloutWorker pid=1228) d:\temp\issue24955\lib\site-packages\gym\envs\registration.py:505: UserWarning: WARN: The environment CartPole-v0 is out of date. You should consider upgrading to version `v1` with the environment ID `CartPole-v1`. (RolloutWorker pid=1228) logger.warn( (RolloutWorker pid=10248) d:\temp\issue24955\lib\site-packages\gym\envs\registration.py:505: UserWarning: WARN: The environment CartPole-v0 is out of date. You should consider upgrading to version `v1` with the environment ID `CartPole-v1`.(RolloutWorker pid=10248) logger.warn( (RolloutWorker pid=6740) d:\temp\issue24955\lib\site-packages\gym\envs\registration.py:505: UserWarning: WARN: The environment CartPole-v0 is out of date. You should consider upgrading to version `v1` with the environment ID `CartPole-v1`. (RolloutWorker pid=6740) logger.warn( (RolloutWorker pid=1228) 2022-11-15 15:32:24,547 WARNING env.py:159 -- Your env reset() method appears to take 'seed' or 'return_info' arguments. Note that these are not yet supported in RLlib. Seeding will take place using 'env.seed()' and the info dict will not be returned from reset. == Status == ```

The script runs, and I can see the resource usage on the dashboard. There are 8 RolloutWorker actors and 2 DQN actors. The processes seem to take up to 14.3GB of RAM. The script runs for much more than 90 seconds: I stopped it after ~10 minutes by pressing CTRL-C, and it stopped cleanly:

2022-11-15 15:42:37,640 ERROR tune.py:773 -- Trials did not complete: [DQN_CartPole-v0_eaa22_00000, DQN_CartPole-v0_eaa22_00001, DQN_CartPole-v0_eaa22_00002]
2022-11-15 15:42:37,640 INFO tune.py:777 -- Total run time: 623.14 seconds (622.83 seconds for the tuning loop).
2022-11-15 15:42:37,640 WARNING tune.py:783 -- Experiment has been interrupted, but the most recent state was saved. You can continue running this experiment by passing `resume=True` to `tune.run()`

@adlerjan
Copy link

Hey, sorry for the somewhat unspecific response. It has been a while, but I remember after cross-examination of my working and non-working systems that the issue only occured with a specific Python 3.9 patch version. Switching to a previous patch resolved my problems completely.

@mattip
Copy link
Contributor

mattip commented Nov 15, 2022

Perhaps your machine has 16GB of RAM which is enough on linux but not sufficient on windows to run this experiment.

@richardliaw
Copy link
Contributor

Closing this as we seem to lack a reproduction/may be related to python versioning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks QS Quantsight triage label rllib RLlib related issues windows
Projects
None yet
Development

No branches or pull requests

6 participants