Issue on Ant-v2 expertd data and Humanoid-v2 random seed Experiments

Hi~Thank you very much for sharing your paper and source code !!! I am new to inverse RL and I want to implement your method on the robot recently.
**About Ant-v2**
1. And I found that the reward for each step in your Ant-v2 expert data is 1. Why set the reward like this? And how to run sqil correctly in your code

**About random seeds**

1. I found that the results with different random seeds in the humanoid experiments are very different, some results are around 1500 points, is it because the number of learning steps is only 50000 or the expert data is 1?

I runned with this _**python train_iq.py env=humanoid agent=sac expert.demos=1 method.loss=v0 method.regularize=True agent.actor_lr=3e-05 seed=0/1/2/3/4/5 agent.init_temp=1**_
![seed](https://user-images.githubusercontent.com/62631419/191644824-4a97f52b-4da7-45e5-b49f-44a5f1cf7d27.png)
Your work is very valuable and I look forward to your help in solving my doubts.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue on Ant-v2 expertd data and Humanoid-v2 random seed Experiments #7

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue on Ant-v2 expertd data and Humanoid-v2 random seed Experiments #7

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions