Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rllib]test dqn failed (OMP: Error #15: Initializing libiomp5.dylib, but found libiomp5.dylib already initialized) #5109

Closed
Ray-0403 opened this issue Jul 3, 2019 · 3 comments
Labels
stale The issue is stale. It will be closed within 7 days unless there are further conversation

Comments

@Ray-0403
Copy link

Ray-0403 commented Jul 3, 2019

i run the code: rllib train --run DQN --env CartPole-v0
and output:WARNING:tensorflow:From /anaconda3/lib/python3.7/site-packages/tensorflow/python/compat/compat.py:175: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
2019-07-03 22:06:29,131 INFO node.py:498 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-07-03_22-06-29_129981_91756/logs.
2019-07-03 22:06:29,241 INFO services.py:409 -- Waiting for redis server at 127.0.0.1:35461 to respond...
2019-07-03 22:06:29,355 INFO services.py:409 -- Waiting for redis server at 127.0.0.1:24787 to respond...
2019-07-03 22:06:29,359 INFO services.py:806 -- Starting Redis shard with 1.72 GB max memory.
2019-07-03 22:06:29,374 INFO node.py:512 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-07-03_22-06-29_129981_91756/logs.
2019-07-03 22:06:29,376 INFO services.py:1446 -- Starting the Plasma object store with 2.58 GB memory using /tmp.
2019-07-03 22:06:29,975 INFO tune.py:61 -- Tip: to resume incomplete experiments, pass resume='prompt' or resume=True to run()
2019-07-03 22:06:29,976 INFO tune.py:233 -- Starting a new experiment.
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/4 CPUs, 0/0 GPUs
Memory usage on this node: 4.3/8.6 GB

2019-07-03 22:06:29,991 WARNING signature.py:108 -- The function with_updates has a **kwargs argument, which is currently not supported.
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 1/4 CPUs, 0/0 GPUs
Memory usage on this node: 4.3/8.6 GB
Result logdir: /Users/bluecharles/ray_results/default
Number of trials: 1 ({'RUNNING': 1})
RUNNING trials:

  • DQN_CartPole-v0_0: RUNNING

(pid=91767) WARNING:tensorflow:From /anaconda3/lib/python3.7/site-packages/tensorflow/python/compat/compat.py:175: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
(pid=91767) Instructions for updating:
(pid=91767) non-resource variables are not supported in the long term
(pid=91767) 2019-07-03 22:06:33,871 INFO rollout_worker.py:301 -- Creating policy evaluation worker 0 on CPU (please ignore any CUDA init errors)
(pid=91767) 2019-07-03 22:06:33.872850: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
(pid=91767) WARNING:tensorflow:From /anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
(pid=91767) Instructions for updating:
(pid=91767) Colocations handled automatically by placer.
(pid=91767) WARNING:tensorflow:From /anaconda3/lib/python3.7/site-packages/ray/rllib/models/fcnet.py:37: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
(pid=91767) Instructions for updating:
(pid=91767) Use keras.layers.dense instead.
(pid=91767) WARNING:tensorflow:From /anaconda3/lib/python3.7/site-packages/tensorflow/python/util/decorator_utils.py:145: GraphKeys.VARIABLES (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
(pid=91767) Instructions for updating:
(pid=91767) Use tf.GraphKeys.GLOBAL_VARIABLES instead.
(pid=91767) WARNING:tensorflow:From /anaconda3/lib/python3.7/site-packages/ray/rllib/agents/dqn/dqn_policy.py:297: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
(pid=91767) Instructions for updating:
(pid=91767) Use tf.random.categorical instead.
(pid=91767) 2019-07-03 22:06:34,306 INFO dynamic_tf_policy.py:313 -- Initializing loss function with dummy input:
(pid=91767)
(pid=91767) { 'actions': <tf.Tensor 'default_policy/actions:0' shape=(?,) dtype=int64>,
(pid=91767) 'dones': <tf.Tensor 'default_policy/dones:0' shape=(?,) dtype=bool>,
(pid=91767) 'new_obs': <tf.Tensor 'default_policy/new_obs:0' shape=(?, 4) dtype=float32>,
(pid=91767) 'obs': <tf.Tensor 'default_policy/observation:0' shape=(?, 4) dtype=float32>,
(pid=91767) 'q_values': <tf.Tensor 'default_policy/q_values:0' shape=(?, 2) dtype=float32>,
(pid=91767) 'rewards': <tf.Tensor 'default_policy/rewards:0' shape=(?,) dtype=float32>,
(pid=91767) 'weights': <tf.Tensor 'default_policy/weights:0' shape=(?,) dtype=float32>}
(pid=91767)
(pid=91767) WARNING:tensorflow:From /anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
(pid=91767) Instructions for updating:
(pid=91767) Use tf.cast instead.
(pid=91767) 2019-07-03 22:06:36,195 INFO rollout_worker.py:719 -- Built policy map: {'default_policy': <ray.rllib.policy.tf_policy_template.DQNTFPolicy object at 0x1ce300edd8>}
(pid=91767) 2019-07-03 22:06:36,195 INFO rollout_worker.py:720 -- Built preprocessor map: {'default_policy': <ray.rllib.models.preprocessors.NoPreprocessor object at 0x1ce300eac8>}
(pid=91767) 2019-07-03 22:06:36,195 INFO rollout_worker.py:333 -- Built filter map: {'default_policy': <ray.rllib.utils.filter.NoFilter object at 0x1ce2ff97f0>}
(pid=91767) 2019-07-03 22:06:36,263 INFO rollout_worker.py:428 -- Generating sample batch of size 4
(pid=91767) 2019-07-03 22:06:36,264 INFO sampler.py:308 -- Raw obs from env: { 0: { 'agent0': np.ndarray((4,), dtype=float64, min=-0.034, max=0.034, mean=0.0)}}
(pid=91767) 2019-07-03 22:06:36,264 INFO sampler.py:309 -- Info return from env: {0: {'agent0': None}}
(pid=91767) 2019-07-03 22:06:36,265 INFO sampler.py:407 -- Preprocessed obs: np.ndarray((4,), dtype=float64, min=-0.034, max=0.034, mean=0.0)
(pid=91767) 2019-07-03 22:06:36,265 INFO sampler.py:411 -- Filtered obs: np.ndarray((4,), dtype=float64, min=-0.034, max=0.034, mean=0.0)
(pid=91767) 2019-07-03 22:06:36,265 INFO sampler.py:525 -- Inputs to compute_actions():
(pid=91767)
(pid=91767) { 'default_policy': [ { 'data': { 'agent_id': 'agent0',
(pid=91767) 'env_id': 0,
(pid=91767) 'info': None,
(pid=91767) 'obs': np.ndarray((4,), dtype=float64, min=-0.034, max=0.034, mean=0.0),
(pid=91767) 'prev_action': np.ndarray((), dtype=int64, min=0.0, max=0.0, mean=0.0),
(pid=91767) 'prev_reward': 0.0,
(pid=91767) 'rnn_state': []},
(pid=91767) 'type': 'PolicyEvalData'}]}
(pid=91767)
(pid=91767) 2019-07-03 22:06:36,266 INFO tf_run_builder.py:92 -- Executing TF run without tracing. To dump TF timeline traces to disk, set the TF_TIMELINE_DIR environment variable.
2019-07-03 22:06:36,541 ERROR trial_runner.py:487 -- Error processing event.
Traceback (most recent call last):
File "/anaconda3/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 436, in _process_trial
result = self.trial_executor.fetch_result(trial)
File "/anaconda3/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 323, in fetch_result
result = ray.get(trial_future[0])
File "/anaconda3/lib/python3.7/site-packages/ray/worker.py", line 2195, in get
raise value
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
2019-07-03 22:06:36,545 ERROR worker.py:1672 -- A worker died or was killed while executing task 1a1196143f2285f42cb7bbd936d59382.
2019-07-03 22:06:36,546 INFO ray_trial_executor.py:187 -- Destroying actor for trial DQN_CartPole-v0_0. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/4 CPUs, 0/0 GPUs
Memory usage on this node: 4.4/8.6 GB
Result logdir: /Users/bluecharles/ray_results/default
Number of trials: 1 ({'ERROR': 1})
ERROR trials:

  • DQN_CartPole-v0_0: ERROR, 1 failures: /Users/bluecharles/ray_results/default/DQN_CartPole-v0_0_2019-07-03_22-06-291xjgsv8c/error_2019-07-03_22-06-36.txt

== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/4 CPUs, 0/0 GPUs
Memory usage on this node: 4.4/8.6 GB
Result logdir: /Users/bluecharles/ray_results/default
Number of trials: 1 ({'ERROR': 1})
ERROR trials:

  • DQN_CartPole-v0_0: ERROR, 1 failures: /Users/bluecharles/ray_results/default/DQN_CartPole-v0_0_2019-07-03_22-06-291xjgsv8c/error_2019-07-03_22-06-36.txt

Traceback (most recent call last):
File "/anaconda3/bin/rllib", line 10, in
sys.exit(cli())
File "/anaconda3/lib/python3.7/site-packages/ray/rllib/scripts.py", line 38, in cli
train.run(options, train_parser)
File "/anaconda3/lib/python3.7/site-packages/ray/rllib/train.py", line 147, in run
resume=args.resume)
File "/anaconda3/lib/python3.7/site-packages/ray/tune/tune.py", line 333, in run_experiments
raise_on_failed_trial=raise_on_failed_trial)
File "/anaconda3/lib/python3.7/site-packages/ray/tune/tune.py", line 273, in run
raise TuneError("Trials did not complete", errored_trials)
ray.tune.error.TuneError: ('Trials did not complete', [DQN_CartPole-v0_0])
(pid=91767) OMP: Error #15: Initializing libiomp5.dylib, but found libiomp5.dylib already initialized.
(pid=91767) OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
(pid=91767) Fatal Python error: Aborted
(pid=91767)

Anyone can help me?

@Ray-0403 Ray-0403 changed the title test dqn failed [rllib]test dqn failed Jul 3, 2019
@Ray-0403 Ray-0403 changed the title [rllib]test dqn failed [rllib]test dqn failed (OMP: Error #15: Initializing libiomp5.dylib, but found libiomp5.dylib already initialized Jul 3, 2019
@Ray-0403 Ray-0403 changed the title [rllib]test dqn failed (OMP: Error #15: Initializing libiomp5.dylib, but found libiomp5.dylib already initialized [rllib]test dqn failed (OMP: Error #15: Initializing libiomp5.dylib, but found libiomp5.dylib already initialized) Jul 3, 2019
@stale
Copy link

stale bot commented Nov 13, 2020

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

  • If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
  • If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

@stale stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Nov 13, 2020
@stale
Copy link

stale bot commented Nov 27, 2020

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!

@stale stale bot closed this as completed Nov 27, 2020
@RocketRider
Copy link
Contributor

Did you find any solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale The issue is stale. It will be closed within 7 days unless there are further conversation
Projects
None yet
Development

No branches or pull requests

3 participants