Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have you compared the SSRL with PPO? #9

Open
hdadong opened this issue Oct 28, 2024 · 2 comments
Open

Have you compared the SSRL with PPO? #9

hdadong opened this issue Oct 28, 2024 · 2 comments

Comments

@hdadong
Copy link

hdadong commented Oct 28, 2024

Have you compared the SSRL with PPO? I find the following code in your code:
What's the performance between them?

def make_ppo_networks(cfg: DictConfig, saved_policies_dir: Path,
                      env: RlwamEnv, ppo_params_path: Path = None):
    if ppo_params_path is not None:
        path = ppo_params_path
    else:
        # todo
        path = saved_policies_dir / 'go1_ppo_policy.pkl'
    with open(path, 'rb') as f:
        params = dill.load(f)

    # create the policy network
    normalize = lambda x, y: x
    if cfg.common.normalize_observations:
        normalize = running_statistics.normalize
    ppo_network = ppo_networks.make_ppo_networks(
        env.observation_size*cfg.contact_generate.obs_history_length,
        env.action_size,
        preprocess_observations_fn=normalize,
        policy_hidden_layer_sizes=((cfg.actor_network.hidden_size,)
                                   * cfg.actor_network.hidden_layers)
    )
    make_policy = ppo_networks.make_inference_fn(ppo_network)

    return params, make_policy
@jake-levy
Copy link
Member

I've used PPO with a similar Go1 environment for another project, but I haven't used it with the go1_go_fast env yet. The code you found is an artifact from the other project. Based on my results from the other environment, I suspect PPO performance would be similar or worse than SAC. Let me know if you'd like me to add PPO functionality; I have the code in another private repo.

@hdadong
Copy link
Author

hdadong commented Oct 31, 2024

Having a PPO implementation as a baseline would be fantastic! Most reinforcement learning algorithms for robots (particularly humanoid robots) are built on Isaac Gym’s PPO framework. When adapting algorithms for a humanoid model in SSRL, the first step is typically to transfer the reward function from Isaac Gym and test with PPO, since its advantages in parallelism and fast convergence. Therefore, including a PPO baseline would be immensely helpful. Thank you for considering this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants