-
-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions on evaluation experiments in nuScenes validation dataset #32
Comments
Hi @Fengtao22, thanks for your interest!
Yes, I did some filtering to omit invalid samples. I just published the complete processing script in #30, which results in 5369 validation samples.
You are right, the nuScenes samples are 2 seconds long, and our model predicts 25 frames at 10 Hz. However, nuScenes (incl. sweeps) is logged at 12 Hz. Therefore, to align perfectly with the model input, we take one initial frame and 2 seconds video to build a sequence of 25 frames.
Sorry for the confusion, here is how I get each point in Figure 10:
Thus, for each data point in Figure 10, I go through 1500 samples from Waymo validation set. For each sample, I generate a random trajectory using the same deviation and perform reward estimation. In short, it takes 1500 x 5 x 10 = 75000 denoising steps to get each data point. The most expensive figure I have ever drawn! |
Thanks for your detailed response! When you calculate the L2 errors, do you just accumulate the norm for the future 4 points or average the norm over the four values? |
Hi, first of all, thanks for open sourcing your work! I have three questions with respect to your video generation using nuScenes validation dataset:
In your nuScenes_val.json file, there are total 5369 samples (each sample contains 25 frames). This number does not match the validation samples in nuScenes (6019 frames). Is it because you filter out the frames that does not have future (2seconds) frames? I did a calculation, there are 5951 unique validation samples in your json file, and it seems 17 video clips do not have future frames (17*4=6019-5951) among the 150 scenes.
In your nuScenes_val.json file, the provided traj contains 10 elements, which I believe is the future 2 seconds ego trajectory points including its current position with respect to the first frame of that sample, in other words, the traj list format is like [x_current, y_current, x_current+0.5, y_current+0.5, x_current+1, y_current+1, x_current+1.5, y_current+1.5, x_current+2, y_current+2]. Correct me if I am wrong. Why don't we use 2.5 seconds future predictions since the default frame number is 25 and frequency is 10Hz?
To get the L2 vs reward relationship, how did you calculate the L2 (using the ground truth future trajectory with a random sampled trajectory?) How many samples (25 frames a sample) do you use for getting average reward for a fixed L2 error (in Figure 10 left of your paper)?
Thanks!
The text was updated successfully, but these errors were encountered: