No metrics logged when using wandb integrations of sb3 #212

XiaobenLi00 · 2024-08-21T08:14:10Z

When I run sb3_job_script https://github.com/MyoHub/myosuite/blob/main/myosuite/agents/sb3_job_script.py using wandb integrations, it didn't report a warning or error, but in the wandb website there is no metrics logged.
I tried with other framework, e.g., torchrl_job_script https://github.com/MyoHub/myosuite/blob/main/myosuite/agents/torchrl_job_script.py, and the metrics could be logged.
I am not sure where the problem come from, so could you help to figure out?

XiaobenLi00 · 2024-08-21T13:33:17Z

I also notice that in the process of learning, log information seems not to be printed or saved.

XiaobenLi00 · 2024-08-21T13:47:51Z

I guess there is some problems with the using of logger, I am a littled confused, looking forward to your help.

jamesheald · 2024-08-21T20:44:47Z

Hey @XiaobenLi00. Thanks for the question.

I defined the tensorboard_log directory to be within the wandb folder, and that fixed things for me. I now see metrics within the wandb app where I didn't before.

Specifically, when instantiating the PPO (or SAC) model in sb3_job_script.py, I included tensorboard_log=f"wandb/{run.id}" as an argument.

Does this work for you?

XiaobenLi00 · 2024-08-22T01:32:24Z

@jamesheald Thanks a lot for your reply, this seems work for me.
By the way, I want to save env videos during learning. But it seems neither EvalCallback nor monitor_gym=True works. I also tried VideoRecorderCallback from https://stable-baselines3.readthedocs.io/en/master/guide/tensorboard.html and it also fails and issue an error 'NoneType' object has no attribute 'transpose'. I have no idea about this and may need your help.

XiaobenLi00 · 2024-08-22T01:33:37Z

I am also wondering that is it OK to call model.set_logger after model.learn, would it cause problems?

XiaobenLi00 · 2024-08-22T01:37:09Z

I also find that the num of steps is small, how should I to adjust this?

jamesheald · 2024-08-22T19:51:23Z

@jamesheald Thanks a lot for your reply, this seems work for me. By the way, I want to save env videos during learning. But it seems neither EvalCallback nor monitor_gym=True works. I also tried VideoRecorderCallback from https://stable-baselines3.readthedocs.io/en/master/guide/tensorboard.html and it also fails and issue an error 'NoneType' object has no attribute 'transpose'. I have no idea about this and may need your help.

The monitor_gym flag automatically logs videos that are generated, but videos need to be generated by another process, such as VideoRecorderCallback (which you tried) or an environment wrapper like VecVideoRecorder. The problem is that both of these methods expect the gym environment to have an env.render() function, and the myosuite gym environments don't have this function; in myosuite, env.mj_render() is used for onscreen rendering and env.sim.renderer.render_offscreen() is used for offscreen rendering. You may be able to create a gym wrapper that allows you to use VideoRecorderCallback or VecVideoRecorder. Alternatively you can manually save videos, as shown in the tutorial here.

I also find that the num of steps is small, how should I to adjust this?

If you're using the provided config files, https://github.com/MyoHub/myosuite/tree/main/myosuite/agents/config, you can adjust it there, or alternatively you can specify it directly as an argument to model.learn, for example:

model.learn(total_timesteps=1_000_000)

XiaobenLi00 · 2024-08-23T02:23:29Z

@jamesheald Thank you very much indeed for your answer!

I have understood how to record videos by defining wrapper and I will try.

I did have a problem which is not related to this issue very much.

I am wondering what roll_out mean in the learning process, I can see it in the log information but I am not sure what it actually means.

In my exp, I use

callback += [EvalCallback(max(job_data.eval_freq // job_data.n_env, 1), eval_env)]
callback += [InfoCallback()]
callback += [FallbackCheckpoint(max(job_data.restore_checkpoint_freq // job_data.n_env, 1))]
callback += [CheckpointCallback(save_freq=max(job_data.save_freq // job_data.n_env, 1), save_path=f'logs/',
                                            name_prefix='rl_models')]
# callback += [VideoRecorderCallback(eval_env, render_freq=max(job_data.render_freq // job_data.n_env, 1))]

and config

# PPO.learn function
total_timesteps   : 2000000
log_interval      : 10000

render_freq : 100000
eval_freq : 100000
restore_checkpoint_freq : 100000
save_freq : 100000

Then I got logs in wandb like this:

I am wondering why there are 50 steps in the chart, how to adjust this number, it seems not related to the parameters in the config file. What if I want to use much denser time intervals?

cherylwang20 · 2024-08-24T20:56:42Z

Hi,

You should change the x-axis to global steps to sync to the training steps. I suggest you reading the SB3 training documentations for more information on the use of each parameter: https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html

XiaobenLi00 · 2024-08-25T00:50:21Z

@jamesheald @cherylwang20

Thanks a lot for your kind help!

XiaobenLi00 · 2024-09-02T15:19:34Z

Hi @cherylwang20 @jamesheald

Thanks for your kind help.

I read the sb3 docs and try to follow your suggestions. But I still have problems with the log info.

When log info to tensorboard, self.logger.dump(step=self.num_timesteps) is called after self.logger.record

self.logger.record("time/iterations", iteration, exclude="tensorboard")
if len(self.ep_info_buffer) > 0 and len(self.ep_info_buffer[0]) > 0:
    self.logger.record("rollout/ep_rew_mean", safe_mean([ep_info["r"] for ep_info in self.ep_info_buffer]))
    self.logger.record("rollout/ep_len_mean", safe_mean([ep_info["l"] for ep_info in self.ep_info_buffer]))
self.logger.record("time/fps", fps)
self.logger.record("time/time_elapsed", int(time_elapsed), exclude="tensorboard")
self.logger.record("time/total_timesteps", self.num_timesteps, exclude="tensorboard")
if len(self.ep_success_buffer) > 0:
    self.logger.record("rollout/success_rate", safe_mean(self.ep_success_buffer))
self.logger.dump(step=self.num_timesteps)

but in the plots of wandb, the x-axis is just the num of steps instead of self.num_timesteps

Do you have any suggestions?

BTW, it seems that the info is just logged each rollout, thus the time interval is too large, so how to get denser log info?

Looking forward to your suggestions!

jamesheald · 2025-02-02T15:50:50Z

For rendering videos, this simple environment wrapper makes VideoRecorderCallback and VecVideoRecorder compatible with myosuite environments:

from myosuite.utils import gym

class RenderWrapper(gym.Wrapper):
    render_mode = 'rgb_array'
    def __init__(self, env):
        super().__init__(env)
    def render(self):
        return self.env.unwrapped.sim.renderer.render_offscreen(width=400,height=400,camera_id=1)

env_id = 'myoElbowPose1D6MRandom-v0'
env = gym.make(env_id)
env = RenderWrapper(env)

jamesheald mentioned this issue Aug 22, 2024

reset() function and render_mode problem #210

Closed

cherylwang20 closed this as completed Aug 24, 2024

XiaobenLi00 mentioned this issue Sep 4, 2024

Record videos during training #221

Open

jamesheald mentioned this issue Oct 7, 2024

tensorboard directory added in the wand folder so that metrics get lo… #256

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No metrics logged when using wandb integrations of sb3 #212

No metrics logged when using wandb integrations of sb3 #212

XiaobenLi00 commented Aug 21, 2024

XiaobenLi00 commented Aug 21, 2024

XiaobenLi00 commented Aug 21, 2024

jamesheald commented Aug 21, 2024

XiaobenLi00 commented Aug 22, 2024

XiaobenLi00 commented Aug 22, 2024

XiaobenLi00 commented Aug 22, 2024

jamesheald commented Aug 22, 2024

XiaobenLi00 commented Aug 23, 2024

cherylwang20 commented Aug 24, 2024

XiaobenLi00 commented Aug 25, 2024

XiaobenLi00 commented Sep 2, 2024

jamesheald commented Feb 2, 2025

No metrics logged when using wandb integrations of sb3 #212

No metrics logged when using wandb integrations of sb3 #212

Comments

XiaobenLi00 commented Aug 21, 2024

XiaobenLi00 commented Aug 21, 2024

XiaobenLi00 commented Aug 21, 2024

jamesheald commented Aug 21, 2024

XiaobenLi00 commented Aug 22, 2024

XiaobenLi00 commented Aug 22, 2024

XiaobenLi00 commented Aug 22, 2024

jamesheald commented Aug 22, 2024

XiaobenLi00 commented Aug 23, 2024

cherylwang20 commented Aug 24, 2024

XiaobenLi00 commented Aug 25, 2024

XiaobenLi00 commented Sep 2, 2024

jamesheald commented Feb 2, 2025