Skip to content
This repository has been archived by the owner on Feb 22, 2024. It is now read-only.

Questions on supervised traning part code and performance #2

Closed
Kin-Zhang opened this issue Jul 29, 2023 · 16 comments
Closed

Questions on supervised traning part code and performance #2

Kin-Zhang opened this issue Jul 29, 2023 · 16 comments

Comments

@Kin-Zhang
Copy link

Thanks for your work and open source,

When I read the code following, I'm wondering why set a assert here for z index must be 0? since Voxelization is 3D, z should not be 0?

def forward_single(self, before_pseudoimage: torch.Tensor,
after_pseudoimage: torch.Tensor,
point_offsets: torch.Tensor,
voxel_coords: torch.Tensor) -> torch.Tensor:
voxel_coords = voxel_coords.long()
assert (voxel_coords[:, 0] == 0).all(), "Z index must be 0"
# Voxel coords are Z, Y, X, and the pseudoimage is Channel, Y, X
# I have confirmed via visualization that these coordinates are correct.
after_voxel_vectors = after_pseudoimage[:, voxel_coords[:, 1],
voxel_coords[:, 2]].T
before_voxel_vectors = before_pseudoimage[:, voxel_coords[:, 1],
voxel_coords[:, 2]].T
concatenated_vectors = torch.cat(
[before_voxel_vectors, after_voxel_vectors, point_offsets], dim=1)
flow = self.decoder(concatenated_vectors)
return flow

@Kin-Zhang
Copy link
Author

I see, saw your config to set z voxel size to the range, so for the config, it will be 0 [len=1] but why for that? 2D grid is better?

@kylevedder
Copy link
Owner

The student is the FastFlow3D model, which uses a PointPillars feature encoder that turns everything into a 2D birds eye view pseudoimage.

The voxelization function that I am using is provided by MMCV, which is more general and can be used for 3D voxelization (e.g. SECOND / VoxelNet). I set the minimum and maximum point height in the config as you referenced, and the point clouds should be chopped accordingly, so everything should be in a single very tall voxel (a point pillar, hence the name PointPillars) to form the pseudoimage. I added the referenced assert to validate this assumption when doing the voxilization for FastFlow3D; if something had a Z index other than zero, it means that the assumption is being violated and that assert should trigger.

@Kin-Zhang
Copy link
Author

Thanks for your reply. 🥰

@Kin-Zhang Kin-Zhang changed the title why z index must be 0? Questions on supervised traning part code and performance Jul 30, 2023
@Kin-Zhang Kin-Zhang reopened this Jul 30, 2023
@kylevedder
Copy link
Owner

Download the pre-trained model weights and run the evaluation on those to ensure that everything else is setup correctly.

If you're able to reproduce those test numbers, then there's something going on with the training run that we can dig into further.

@Kin-Zhang
Copy link
Author

Kin-Zhang commented Aug 1, 2023

Thanks for your help! Appreciate.

By the way, I saw there is new eval on zeroflow in leaderboard

  1. what's mean on XL?
  2. is it still zeroflow pipeline [but it's kind of weird since it is over the teacher NSFP?] or fastflow3d with gt flow to supervise for the result in the leaderboard?

@kylevedder
Copy link
Owner

ZeroFlow XL is the ZeroFlow pipeline with two changes:

  1. We use twice as much data, pulling unlabeled point clouds from the Argoverse 2 LiDAR dataset (I updated GETTING_STARTED.md with details)
  2. We use an enlarged student model; we quadruple the pseudoimage area (512x512 to 1024x1024 by making the pillars twice as small in each dimension), we double the size of the point embedding vector (thereby making each layer of the UNet twice as fat), and we add another layer to the UNet. To contextualize the size of the model change, the normal student model weights are 79MB, the XL model weights are 1.3GB. I've pushed the new UNet backbone to the repo.

To be clear, like with ZeroFlow, ZeroFlow XL is using zero human labels. Our results are simply because we used more unlabeled data and added more parameters! We are able to beat the teacher model's performance and achieve state-of-the-art on the AV2 test set because our model has seen enough diverse data and is expressive enough to learn to distinguish noise from signal in the teacher labels.

On Sunday when we got this result, I tweeted this cool updated graph showing we are doing better than our scaling laws predicted on the normal student model:

F2VMqQXXEAAmSyq

To further drive home this point, here are the raw results from our submissions to the AV2 Scene Flow test split:

If you look at the linked results, our XL model outperforms the teacher across all three categories of the Threeway EPE, but makes particularly large gains in the static foreground category. This means our model has learned to recognize (a lack of motion) better than NSFP is able to represent because it's seen enough data to know that, in expectation, static objects should have zero flow, even if there's a bit of noise in the teacher labels. This is while, in expectation, also extracting what correct movement vectors look like for moving objects.

I also have more good news! I'm an idiot and forgot to run the XL model with our Speed Scaling feature enabled (Equation 5 from the paper), and so I stopped training this model after only 5 epochs (this is akin to seeing ~10 epochs worth of frames). This means that the XL model is undertrained, and it's missing a feature that provides free Foreground Dynamic EPE improvements (which substantially improves Threeway EPE). We are training a new XL model with these features enabled, and for more epochs, so we should hopefully get even better performance from our new student model.

@Kin-Zhang
Copy link
Author

Kin-Zhang commented Aug 1, 2023

thank you so much for sharing these! looking forward to your updates.

@Kin-Zhang
Copy link
Author

one more question 😊, XL dataset means also needs zeroflow paper pipeline which is NSFP to produce pseudo label first on the LiDAR dataset? or a new pipeline to do that?

@kylevedder
Copy link
Owner

That's correct, I used NSFP to pseudolabel the Argoverse 2 LiDAR dataset data subset. We have a large SLURM cluster with a bunch of old 2080tis so the pseudolabeling only took a few days because I could parallelize across them all.

I used the data_prep_scripts/split_nsfp_jobs_sbatch.py to setup and launch these jobs for both the Sensor and LiDAR subset pseudolabeling.

@Kin-Zhang
Copy link
Author

@kylevedder sorry to bother you again, but could you specify which three models here to get these three results? or maybe one Base Zeroflow results?

Base ZeroFlow results; Threeway EPE of 0.0814
NSFP results; Threeway EPE of 0.0684
ZeroFlow XL results; Threeway EPE of 0.0578

@kylevedder
Copy link
Owner

As we discuss in the paper, our reported Threeway EPE for ZeroFlow is an average of three runs.

These weights are the ones highlighted in the weight repo README:

https://github.com/kylevedder/zeroflow_weights/tree/master/argo/nsfp_distilatation_speed_scaled_updated

https://github.com/kylevedder/zeroflow_weights/tree/master/argo/nsfp_distilatation_speed_scaled_updated_run2

https://github.com/kylevedder/zeroflow_weights/tree/master/argo/nsfp_distilatation_speed_scaled_updated_run3

NSFP doesn't have trained weights, it's a test time optimization method.

We have not uploaded the ZeroFlow XL weights, they are too large (1.2GB) and require me to setup git LFS.

@yanconglin
Copy link

yanconglin commented Oct 23, 2023

hi, kylevedder,

Could you please let me know the number of samples in the processed Argo/Waymo dataset, train/val/test splits respectively? It seems there are several versions of waymo scene flow datasets, such as PCAccumulation, ECCV2022. The strategies to calculate gt glows are similar. But I wonder if there is a difference in scale. I can not find any info in the paper and sup.

It seems ego motion compensation is used when creating the scene flow dataset as mentioned in the sup. Could you please share the results WITHOUT ego-motion compensation, if any. So far my results show NSFP performs worse on dynamic objects when using ego-motion on waymo. Not sure to what extent this impacts the distillation. Any insights from your side? Thank you!

@kylevedder
Copy link
Owner

Dataset Details

For Argoverse2, I read the dataset straight from disk as downloaded, sans the minor folder rearrangement I discuss in the GETTING_STARTED.md.

For Waymo Open, the exact dataset version and labels are detailed in my GETTING_STARTED.md -- we use 1.4.2 and use the standard flow labels provided on the Waymo Open website. We preprocess the data from the annoying .proto format into easy to read .pkl files, along the way removing the ground plane from the point clouds. For details, please read the preprocessing scripts discussed in the Getting Started, they are pretty easy to read and frankly I do not remember all the nuances of my preprocessing.

ZeroFlow without ego motion compensation

We do not have any results for ZeroFlow / FastFlow3D without ego motion compensation. In principle we can train our feedforward model without ego compensation, but it's reasonable to assume decent quality ego compensation is available at test time on modern service robot / autonomous vehicle stacks. Chodosh et al 2023 makes a fairly compelling case that ego compensation in general is broadly useful, so we decided to use it.

NSFP without ego motion compensation

I don't directly have head to head NSFP results with and without ego compensation. In my early work using NSFP on Argoverse2 I saw Threeway EPE was better with compensation (which makes sense, it's an easier problem), and we ran with that on Waymo.

How much worse is NSFP on the dynamic bin? Do you have more details on what kinds of dynamic objects it's performing worse on / when? Are you doing ground removal? (this is basically mandatory to get NSFP to work, otherwise it fits a bunch of zero vectors to the lidar on the ground)

I also found that NSFP performance is very dependent upon dataloading details -- ZeroFlow's implementation integrates the author implementation of NSFP, but we use our own data loaders. The NSFP authors actually reached out to me to discuss dataloader details because our NSFP implementation (which is listed as the Baseline NSFP implementation on the Argoverse2 Scene Flow leaderboard) actually significantly outperformed their own implementation. Their entry is NP (NSFP), my NSFP implementation is the Host_67820_Team NSFP entry (the challenge organizers asked me to send them results for a strong baseline).

@kylevedder
Copy link
Owner

kylevedder commented Oct 24, 2023 via email

@Kin-Zhang
Copy link
Author

I also found that NSFP performance is very dependent upon dataloading details -- ZeroFlow's implementation integrates the author implementation of NSFP, but we use our own data loaders.

I also found when I tried to reproduce the FastFlow3D result which is Zeroflow teacher network. However, since I used the official dataloader inside av2-api, and there are some score lost. I will try to find out in the following days.

@Kin-Zhang
Copy link
Author

I also found that NSFP performance is very dependent upon dataloading details -- ZeroFlow's implementation integrates the author implementation of NSFP, but we use our own data loaders.

I also found when I tried to reproduce the FastFlow3D result which is Zeroflow teacher network. However, since I used the official dataloader inside av2-api, and there are some score lost. I will try to find out in the following days.

Thanks to @kylevedder , he mentioned one of reasons in the email (in case someone after me has same problem, I attached his words here):

If your numbers are significantly worse, then something is wrong. If that's the case, my first guess is that you trained on a single GPU using the given config, which is setup to train on 4x GPUs simultaneously and thus has the per GPU batch size set to 16 instead of 64.

Thanks again to Kyle!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants