Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About dataset #1

Closed
penghao-wu opened this issue Aug 8, 2022 · 16 comments
Closed

About dataset #1

penghao-wu opened this issue Aug 8, 2022 · 16 comments

Comments

@penghao-wu
Copy link

Thanks for sharing your great work! Just wonder is it possible to share your zip file of those sampled frames from the youtube videos? It is a little bit time and space consuming to download all the videos.
Thank you in advance.

@zqh0253
Copy link
Collaborator

zqh0253 commented Aug 9, 2022

Hi, thanks for your interest in our work. You can find the frames here: OneDrive link.
After downloading, you can run

cat sega* > frames.zip

to get the zip file.

@penghao-wu
Copy link
Author

Thank you very much! Another question about the down-stream tasks: do you also use the DI-drive engine for imitation learning and collect data using the default setting? Besides, which carla version is used for training and evaluation?

@zqh0253
Copy link
Collaborator

zqh0253 commented Aug 9, 2022

Yes, I use DI-drive engine for imitation learning and the Carla version I use is 0.9.9.4.

Based on default settings, I change several things including the camera setting (to match the pretrained image resolution).

@penghao-wu
Copy link
Author

Thanks. Could you provide the camera settings you used including the size, position, and fov? Also, could you provide more details about the collecting and evaluation suites. For example, do you use the default 'FullTown01-v1' suite and weather to collect data, and what evaluation suite and weather are used (straight/One turn/navigation/navigation with dynamics). That would be very helpful.
Thanks a lot.

@zqh0253
Copy link
Collaborator

zqh0253 commented Aug 10, 2022

The camera setting I use is dict(size=[320, 180], position=[2.0, 0.0, 1.4], rotation=[0,0,0], fov=100). I use default FullTown01-v1 to collect data and FullTown02-v2 for evaluation.
Note that these settings are based on an old version of DI-drive ( with commit ID: f532c9e9a6b26386a933049c1754ca5262d76e0a).

@penghao-wu
Copy link
Author

Thanks for your help. I appreciate it a lot.

@penghao-wu
Copy link
Author

penghao-wu commented Aug 10, 2022

Sorry to bother you again. I have a few questions about the training to confirm.

  • Why do we have a rindex 6510787 in the dataset, is it used to fix the index problem in dir-65?
  • There are 8,642,040 frames in total, and if we sample them by an interval of 10, there would be 0.8M samples in total, which does not match 1.3M.
  • For pre-training, do you take the checkpoint with best accuracy on the 30% test-set or take the last epoch one.
  • For imitation learning training, do you use the last epoch (100 epoch) checkpoint for evaluation?
    Thanks in advance.

@zqh0253
Copy link
Collaborator

zqh0253 commented Aug 10, 2022

Why do we have a rindex 6510787 in the dataset.

This number is used to fix a bug in the index. The data part is not ready yet and I am still working on this.

There are 8,642,040 frames in total, and if we sample them by an interval of 10, there would be 0.8M samples in total, which does not match 1.3M.

Current 0.8M data is a small portion of the whole ytb frames that I experiment with. I pick up the video clips with close visual appearance to Carla to form these 0.8M frames. This does not affect the pretrain quality in carla downstream tasks. I will consider uploading the whole dataset in the future.

For pre-training, do you take the checkpoint with best accuracy on the 30% test-set or take the last epoch one.

I use the last epoch. I found the test accuracy stable during training so I simply pick the last checkpoint.

For imitation learning training, do you use the last epoch (100 epoch) checkpoint for evaluation?

This one is a little tricky. Due to IL's distribution shift problem, I found that the test performance between different epochs varies greatly. It is also hard to decide which epoch to pick since we do not have a validate environment. So for each pretrain weight, I pick (i*10)th, i=[3, 10] checkpoint and report the highest success rate.

@penghao-wu
Copy link
Author

penghao-wu commented Aug 12, 2022

Hi, I follow your instructions and train an agent (pretrained from imagenet) using 4K data. However, the SR evaluated on the FullTown02-v2 suite of it is $37.3 \pm 3.1$, which is higher than the reported $21.3 \pm 7.5$ in paper. Do I miss any details or other modifications are needed? What are the possible reasons in your opinion? I am using Carla 0.9.9.4 and the same DI-Drive version as you. I sample the 10% data uniformly (eg data_4K = data_40K[::10]), is this the same way you do it or I should choose data_4K = data_40K[:4000].

Besides, do you plan to release the pre-calculated steer values for the uploaded 80K frames or the code for the inverse dynamic model? If not, could you please share more details about the model structure so that I can implement and train it by myself.

Also, as the DI-drive only contains PPO model with bev input, could you provide your model file or model details for PPO training?

Thanks a lot!

@zqh0253
Copy link
Collaborator

zqh0253 commented Aug 15, 2022

Hi,

I sample the 10% data uniformly (eg data_4K = data_40K[::10]), is this the same way you do it or I should choose data_4K = data_40K[:4000].

I reduce the data set size in trajectory level, data_40K[:4000]. Since redundancy exists in adjacent frames, reducing in trajectory level will create harder problem than reducing in frame level (data_40K[::10]). So I think the performance gap you reported is expected.

Do you plan to release the pre-calculated steer values?

Yes, I am working on this part and will release in the future. Stay tuned.

PPO training.

I do not experiment much with PPO model design. A resnet34 backbone is used to extract the visual feature. Then, the feature goes through a mlp and is concatenated with the velocity as the output of the encoder.

@zqh0253
Copy link
Collaborator

zqh0253 commented Aug 15, 2022

Let's keep this issue only dataset-relevant. If you have any further questions about training, feel free to open a new one.

@zqh0253 zqh0253 closed this as completed Aug 15, 2022
@SiyuanHuang95
Copy link

Why do we have a rindex 6510787 in the dataset.

This number is used to fix a bug in the index. The data part is not ready yet and I am still working on this.

There are 8,642,040 frames in total, and if we sample them by an interval of 10, there would be 0.8M samples in total, which does not match 1.3M.

Current 0.8M data is a small portion of the whole ytb frames that I experiment with. I pick up the video clips with close visual appearance to Carla to form these 0.8M frames. This does not affect the pretrain quality in carla downstream tasks. I will consider uploading the whole dataset in the future.

For pre-training, do you take the checkpoint with best accuracy on the 30% test-set or take the last epoch one.

I use the last epoch. I found the test accuracy stable during training so I simply pick the last checkpoint.

For imitation learning training, do you use the last epoch (100 epoch) checkpoint for evaluation?

This one is a little tricky. Due to IL's distribution shift problem, I found that the test performance between different epochs varies greatly. It is also hard to decide which epoch to pick since we do not have a validate environment. So for each pretrain weight, I pick (i*10)th, i=[3, 10] checkpoint and report the highest success rate.

Hi, Thanks for sharing your great work!

  • In this issue, you mentioned that you picked up the clips with a close visual appearance to Carla, so what is the criteria for visual appearance similarity? Picking up 0.8M will bring a better performance or it just saves the training cost?
  • Did you follow the default suite of DI-Drive for IL Carla dataset generation?

Bests,

@zqh0253
Copy link
Collaborator

zqh0253 commented Dec 1, 2022

Hi, thanks for your interest in our work.

  1. There are no clear criteria; I simply removed some driving videos with extreme weather to save the training cost. It would help with more carefully designed measures, for example, calculating the feature distance between Carla frames and video frames of a particular video and then sorting all the videos based on that distance.
  2. Yes, I follow the default suite of DI-Drive for IL dataset generation, except for several settings mentioned here.

@SiyuanHuang95
Copy link

SiyuanHuang95 commented Dec 1, 2022 via email

@zqh0253
Copy link
Collaborator

zqh0253 commented Dec 7, 2022

  1. We didn't conduct experiments comparing different dataset size. Indeed with more diversity, the pre-trained large model could be even stronger.
  2. From the very start, DI-drive did not support Carla 0.9.9. And that is why we used an old version.

@SiyuanHuang95
Copy link

  1. Okay, thanks.
  2. Thanks for your information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants