Questions on supervised traning part code and performance #2

Kin-Zhang · 2023-07-29T21:28:50Z

Thanks for your work and open source,

When I read the code following, I'm wondering why set a assert here for z index must be 0? since Voxelization is 3D, z should not be 0?

zeroflow/models/heads/fast_flow_decoder.py

Lines 15 to 32 in 58a93f7

    
           def forward_single(self, before_pseudoimage: torch.Tensor, 
        
                              after_pseudoimage: torch.Tensor, 
        
                              point_offsets: torch.Tensor, 
        
                              voxel_coords: torch.Tensor) -> torch.Tensor: 
        
               voxel_coords = voxel_coords.long() 
        
               assert (voxel_coords[:, 0] == 0).all(), "Z index must be 0" 
        
               # Voxel coords are Z, Y, X, and the pseudoimage is Channel, Y, X 
        
               # I have confirmed via visualization that these coordinates are correct. 
        
               after_voxel_vectors = after_pseudoimage[:, voxel_coords[:, 1], 
        
                                                       voxel_coords[:, 2]].T 
        
               before_voxel_vectors = before_pseudoimage[:, voxel_coords[:, 1], 
        
                                                         voxel_coords[:, 2]].T 
        
               concatenated_vectors = torch.cat( 
        
                   [before_voxel_vectors, after_voxel_vectors, point_offsets], dim=1) 
        
               flow = self.decoder(concatenated_vectors) 
        
               return flow

Kin-Zhang · 2023-07-29T22:01:47Z

I see, saw your config to set z voxel size to the range, so for the config, it will be 0 [len=1] but why for that? 2D grid is better?

kylevedder · 2023-07-29T22:33:41Z

The student is the FastFlow3D model, which uses a PointPillars feature encoder that turns everything into a 2D birds eye view pseudoimage.

The voxelization function that I am using is provided by MMCV, which is more general and can be used for 3D voxelization (e.g. SECOND / VoxelNet). I set the minimum and maximum point height in the config as you referenced, and the point clouds should be chopped accordingly, so everything should be in a single very tall voxel (a point pillar, hence the name PointPillars) to form the pseudoimage. I added the referenced assert to validate this assumption when doing the voxilization for FastFlow3D; if something had a Z index other than zero, it means that the assumption is being violated and that assert should trigger.

Kin-Zhang · 2023-07-29T23:29:26Z

Thanks for your reply. 🥰

kylevedder · 2023-07-30T16:33:03Z

Download the pre-trained model weights and run the evaluation on those to ensure that everything else is setup correctly.

If you're able to reproduce those test numbers, then there's something going on with the training run that we can dig into further.

Kin-Zhang · 2023-08-01T10:02:22Z

Thanks for your help! Appreciate.

By the way, I saw there is new eval on zeroflow in leaderboard

what's mean on XL?
is it still zeroflow pipeline [but it's kind of weird since it is over the teacher NSFP?] or fastflow3d with gt flow to supervise for the result in the leaderboard?

kylevedder · 2023-08-01T18:09:00Z

ZeroFlow XL is the ZeroFlow pipeline with two changes:

We use twice as much data, pulling unlabeled point clouds from the Argoverse 2 LiDAR dataset (I updated GETTING_STARTED.md with details)
We use an enlarged student model; we quadruple the pseudoimage area (512x512 to 1024x1024 by making the pillars twice as small in each dimension), we double the size of the point embedding vector (thereby making each layer of the UNet twice as fat), and we add another layer to the UNet. To contextualize the size of the model change, the normal student model weights are 79MB, the XL model weights are 1.3GB. I've pushed the new UNet backbone to the repo.

To be clear, like with ZeroFlow, ZeroFlow XL is using zero human labels. Our results are simply because we used more unlabeled data and added more parameters! We are able to beat the teacher model's performance and achieve state-of-the-art on the AV2 test set because our model has seen enough diverse data and is expressive enough to learn to distinguish noise from signal in the teacher labels.

On Sunday when we got this result, I tweeted this cool updated graph showing we are doing better than our scaling laws predicted on the normal student model:

To further drive home this point, here are the raw results from our submissions to the AV2 Scene Flow test split:

Base ZeroFlow results; Threeway EPE of 0.0814
NSFP results; Threeway EPE of 0.0684
ZeroFlow XL results; Threeway EPE of 0.0578

If you look at the linked results, our XL model outperforms the teacher across all three categories of the Threeway EPE, but makes particularly large gains in the static foreground category. This means our model has learned to recognize (a lack of motion) better than NSFP is able to represent because it's seen enough data to know that, in expectation, static objects should have zero flow, even if there's a bit of noise in the teacher labels. This is while, in expectation, also extracting what correct movement vectors look like for moving objects.

I also have more good news! I'm an idiot and forgot to run the XL model with our Speed Scaling feature enabled (Equation 5 from the paper), and so I stopped training this model after only 5 epochs (this is akin to seeing ~10 epochs worth of frames). This means that the XL model is undertrained, and it's missing a feature that provides free Foreground Dynamic EPE improvements (which substantially improves Threeway EPE). We are training a new XL model with these features enabled, and for more epochs, so we should hopefully get even better performance from our new student model.

Kin-Zhang · 2023-08-01T18:32:37Z

thank you so much for sharing these! looking forward to your updates.

Kin-Zhang · 2023-08-01T18:41:13Z

one more question 😊, XL dataset means also needs zeroflow paper pipeline which is NSFP to produce pseudo label first on the LiDAR dataset? or a new pipeline to do that?

kylevedder · 2023-08-01T19:42:48Z

That's correct, I used NSFP to pseudolabel the Argoverse 2 LiDAR dataset data subset. We have a large SLURM cluster with a bunch of old 2080tis so the pseudolabeling only took a few days because I could parallelize across them all.

I used the data_prep_scripts/split_nsfp_jobs_sbatch.py to setup and launch these jobs for both the Sensor and LiDAR subset pseudolabeling.

Kin-Zhang · 2023-08-09T10:07:16Z

@kylevedder sorry to bother you again, but could you specify which three models here to get these three results? or maybe one Base Zeroflow results?

Base ZeroFlow results; Threeway EPE of 0.0814
NSFP results; Threeway EPE of 0.0684
ZeroFlow XL results; Threeway EPE of 0.0578

kylevedder · 2023-08-09T10:43:23Z

As we discuss in the paper, our reported Threeway EPE for ZeroFlow is an average of three runs.

These weights are the ones highlighted in the weight repo README:

https://github.com/kylevedder/zeroflow_weights/tree/master/argo/nsfp_distilatation_speed_scaled_updated

https://github.com/kylevedder/zeroflow_weights/tree/master/argo/nsfp_distilatation_speed_scaled_updated_run2

https://github.com/kylevedder/zeroflow_weights/tree/master/argo/nsfp_distilatation_speed_scaled_updated_run3

NSFP doesn't have trained weights, it's a test time optimization method.

We have not uploaded the ZeroFlow XL weights, they are too large (1.2GB) and require me to setup git LFS.

yanconglin · 2023-10-23T16:19:16Z

hi, kylevedder,

Could you please let me know the number of samples in the processed Argo/Waymo dataset, train/val/test splits respectively? It seems there are several versions of waymo scene flow datasets, such as PCAccumulation, ECCV2022. The strategies to calculate gt glows are similar. But I wonder if there is a difference in scale. I can not find any info in the paper and sup.

It seems ego motion compensation is used when creating the scene flow dataset as mentioned in the sup. Could you please share the results WITHOUT ego-motion compensation, if any. So far my results show NSFP performs worse on dynamic objects when using ego-motion on waymo. Not sure to what extent this impacts the distillation. Any insights from your side? Thank you!

kylevedder · 2023-10-24T00:26:41Z

Dataset Details

For Argoverse2, I read the dataset straight from disk as downloaded, sans the minor folder rearrangement I discuss in the GETTING_STARTED.md.

For Waymo Open, the exact dataset version and labels are detailed in my GETTING_STARTED.md -- we use 1.4.2 and use the standard flow labels provided on the Waymo Open website. We preprocess the data from the annoying .proto format into easy to read .pkl files, along the way removing the ground plane from the point clouds. For details, please read the preprocessing scripts discussed in the Getting Started, they are pretty easy to read and frankly I do not remember all the nuances of my preprocessing.

ZeroFlow without ego motion compensation

We do not have any results for ZeroFlow / FastFlow3D without ego motion compensation. In principle we can train our feedforward model without ego compensation, but it's reasonable to assume decent quality ego compensation is available at test time on modern service robot / autonomous vehicle stacks. Chodosh et al 2023 makes a fairly compelling case that ego compensation in general is broadly useful, so we decided to use it.

NSFP without ego motion compensation

I don't directly have head to head NSFP results with and without ego compensation. In my early work using NSFP on Argoverse2 I saw Threeway EPE was better with compensation (which makes sense, it's an easier problem), and we ran with that on Waymo.

How much worse is NSFP on the dynamic bin? Do you have more details on what kinds of dynamic objects it's performing worse on / when? Are you doing ground removal? (this is basically mandatory to get NSFP to work, otherwise it fits a bunch of zero vectors to the lidar on the ground)

I also found that NSFP performance is very dependent upon dataloading details -- ZeroFlow's implementation integrates the author implementation of NSFP, but we use our own data loaders. The NSFP authors actually reached out to me to discuss dataloader details because our NSFP implementation (which is listed as the Baseline NSFP implementation on the Argoverse2 Scene Flow leaderboard) actually significantly outperformed their own implementation. Their entry is NP (NSFP), my NSFP implementation is the Host_67820_Team NSFP entry (the challenge organizers asked me to send them results for a strong baseline).

kylevedder · 2023-10-24T16:06:03Z

Hi Yancong, Thanks for posting your result table. I don't know why ego compensated NSFP performs so much worse on your dynamic objects. My recommendation is you actually visualize the results for both ego compensated and uncompensated for the same point clouds to make sure that 1) your data is setup properly 2) NSFP is actually optimizing the right things 3) NSFP has actually converged I'm curious to find out what you discover from this, please keep me posted!

…

On Tue, Oct 24, 2023 at 8:28 AM CVisioner ***@***.***> wrote: Hi, kylevedder. This is what I got on the waymo dataset PCAccumulation, ECCV2022 <https://github.com/prs-eth/PCAccumulation>: scene flow estimation between two nearby frames. It looks weird that NSFP ego performs worse than NSFP w/o ego on dynamic and the margin is substantially large. [image: Screenshot from 2023-10-24 14-27-04] <https://user-images.githubusercontent.com/26435516/277656843-7fa8ef8b-eac7-44ad-a2ca-152315513ebb.png> — Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABO476FPELILOPQGEC7ZBNLYA6X7LAVCNFSM6AAAAAA24XO7UWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZXGEYTAMZXHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Kin-Zhang · 2023-11-01T14:34:27Z

I also found that NSFP performance is very dependent upon dataloading details -- ZeroFlow's implementation integrates the author implementation of NSFP, but we use our own data loaders.

I also found when I tried to reproduce the FastFlow3D result which is Zeroflow teacher network. However, since I used the official dataloader inside av2-api, and there are some score lost. I will try to find out in the following days.

Kin-Zhang · 2023-12-22T21:30:47Z

I also found that NSFP performance is very dependent upon dataloading details -- ZeroFlow's implementation integrates the author implementation of NSFP, but we use our own data loaders.

I also found when I tried to reproduce the FastFlow3D result which is Zeroflow teacher network. However, since I used the official dataloader inside av2-api, and there are some score lost. I will try to find out in the following days.

Thanks to @kylevedder , he mentioned one of reasons in the email (in case someone after me has same problem, I attached his words here):

If your numbers are significantly worse, then something is wrong. If that's the case, my first guess is that you trained on a single GPU using the given config, which is setup to train on 4x GPUs simultaneously and thus has the per GPU batch size set to 16 instead of 64.

Thanks again to Kyle!

Kin-Zhang closed this as completed Jul 29, 2023

Kin-Zhang changed the title ~~why z index must be 0?~~ Questions on supervised traning part code and performance Jul 30, 2023

Kin-Zhang reopened this Jul 30, 2023

Kin-Zhang closed this as completed Aug 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions on supervised traning part code and performance #2

Questions on supervised traning part code and performance #2

Kin-Zhang commented Jul 29, 2023

Kin-Zhang commented Jul 29, 2023

kylevedder commented Jul 29, 2023

Kin-Zhang commented Jul 29, 2023

kylevedder commented Jul 30, 2023

Kin-Zhang commented Aug 1, 2023 •

edited

Loading

kylevedder commented Aug 1, 2023

Kin-Zhang commented Aug 1, 2023 •

edited

Loading

Kin-Zhang commented Aug 1, 2023

kylevedder commented Aug 1, 2023

Kin-Zhang commented Aug 9, 2023

kylevedder commented Aug 9, 2023

yanconglin commented Oct 23, 2023 •

edited

Loading

kylevedder commented Oct 24, 2023

kylevedder commented Oct 24, 2023 via email

Kin-Zhang commented Nov 1, 2023

Kin-Zhang commented Dec 22, 2023

Questions on supervised traning part code and performance #2

Questions on supervised traning part code and performance #2

Comments

Kin-Zhang commented Jul 29, 2023

Kin-Zhang commented Jul 29, 2023

kylevedder commented Jul 29, 2023

Kin-Zhang commented Jul 29, 2023

kylevedder commented Jul 30, 2023

Kin-Zhang commented Aug 1, 2023 • edited Loading

kylevedder commented Aug 1, 2023

Kin-Zhang commented Aug 1, 2023 • edited Loading

Kin-Zhang commented Aug 1, 2023

kylevedder commented Aug 1, 2023

Kin-Zhang commented Aug 9, 2023

kylevedder commented Aug 9, 2023

yanconglin commented Oct 23, 2023 • edited Loading

kylevedder commented Oct 24, 2023

kylevedder commented Oct 24, 2023 via email

Kin-Zhang commented Nov 1, 2023

Kin-Zhang commented Dec 22, 2023

Kin-Zhang commented Aug 1, 2023 •

edited

Loading

Kin-Zhang commented Aug 1, 2023 •

edited

Loading

yanconglin commented Oct 23, 2023 •

edited

Loading