Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing pose estimation results on subset of panoptic datasets #7

Open
LadnerJonas opened this issue Oct 1, 2024 · 6 comments
Open

Comments

@LadnerJonas
Copy link

LadnerJonas commented Oct 1, 2024

Currently, I am trying to apply this framework to our 4D-OR dataset (https://github.com/egeozsoy/4D-OR, TU Munich Germany). After setting up the corresponding dataset files and adapting the projection logic (due to our camera calibration), we are having trouble getting the posenet training to improve the pose detection. As it is harder to estimate poses on our dataset, we tried to reproduce the great pose estimation results on a subset of the panoptic datasets.

I freshly cloned this repository, adapted the paths, and otherwise only changed the selected training and val datasets to:
TRAIN_LIST = [ "160224_haggling1"], VAL_LIST = [ "160906_pizza1"]

We used the default configuration (without a pre-trained backbone) for the backbone, root- and posenet training steps. We did not train using the optional fine-tuning.

Unfortunately, the results with this provided configuration were not as good as expected. The human root joints are detected fairly accurately, but the pose estimation training does not seems to work as expected. After the full training, the debug image still look like this:
last epoch, train_2300_3d.png / ~2500 pictures
train_2300_3d

heatmap:
train_00002300_view_2_hm_pred.png
train_00002300_view_2_hm_pred

gt:
train_00002300_view_2_gt.jpg
train_00002300_view_2_gt

@keqizero In case you need more information, I am happy to provide it.
Thank you!

@keqizero
Copy link
Collaborator

keqizero commented Oct 1, 2024

Hi, thank you for the subset experiment.

Personally, I have never tried training with only one video. To do a quick experiment, I used the pretrained backbone and root net model (as provided in this repo) to train a pose net model from scratch, using only the pseudo 2d poses of "160224_haggling1" sequence, and then evaluate on 4 videos. I attached the training log of the first 3 epochs along with the visualization examples. The result looks okay.

validation_002_00000000_3d
validation_002_00000322_3d

cam5_posenet_2024-10-01-12-03_train.log

Could you compare my training log with yours, to see what may cause the problem? Hope it can help.

@LadnerJonas
Copy link
Author

LadnerJonas commented Oct 1, 2024

Thank you for your response.

Our configuration (printed at the start) is exactly the same, beside batch_size / GPU count.
Here is my training log:
training-log.txt

It seems like the loss is decreasing way slower and also does not improve after epoch 0.

Please double-check if the GitHub repository code is up-to-date with local changes.

Can you also share your heatmaps and attention maps (using the evaluation)? As can be seen above, it only detects the root joints using the repository code.

Do I have to do anything besides adapting the selected training/eval_dataset? To my understanding, as the 2d pseudo poses are already stored and read in the two panoptic dataset files (lib/dataset/panoptic_(ssv).py, nothing else has to be done?

@keqizero
Copy link
Collaborator

keqizero commented Oct 1, 2024

Thank you for your response.

Our configuration (printed at the start) is exactly the same, beside batch_size / GPU count. Here is my training log: training-log.txt

It seems like the loss is decreasing way slower and also does not improve after epoch 0.

Please double-check if the GitHub repository code is up-to-date with local changes.

Can you also share your heatmaps and attention maps (using the evaluation)? As can be seen above, it only detects the root joints using the repository code.

Do I have to do anything besides adapting the selected training/eval_dataset? To my understanding, as the 2d pseudo poses are already stored and read in the two panoptic dataset files (lib/dataset/panoptic_(ssv).py, nothing else has to be done?

I think the root cause is the underfitting of the backbone.

I looked at your log. I saw that the root net's performance is surprisingly low, thus I did a quick experiment by training root net for 1 epoch with only one video, and I obtained much better result as attached. It could explain why your pose net won't even converge, as the backbone is not well-trained to detect joints.
cam5_rootnet_2024-10-01-16-45_train.log

I would suggest that you replace your backbone with mine, and then train the root net and pose net again, to see if you can have similar performance.

FYI, the repo code is up-to-date. My heatmaps are attached as below. For subset training, you don't need to do anything else, except to filter out the other 8 videos (which is what I did).
validation_00000200_view_1_hm_pred

@LadnerJonas
Copy link
Author

Thank you for your response. In the meantime, I was able to reproduce it with comparable results.

@LadnerJonas
Copy link
Author

My intial problem, why I even wanted to reproduce the estimation results on the panoptic data, was caused be these lines:

self.min_x, self.max_x = grid1Dx.min() + 2500, grid1Dx.max() - 2000
self.min_y, self.max_y = grid1Dy.min() + 1500, grid1Dy.max() - 1500
self.min_z, self.max_z = grid1Dz.min() + 250, grid1Dz.max() - 300
, which were incompatible with our own dataset/camera configuration.

@keqizero
Copy link
Collaborator

keqizero commented Oct 14, 2024

My intial problem, why I even wanted to reproduce the estimation results on the panoptic data, was caused be these lines:

self.min_x, self.max_x = grid1Dx.min() + 2500, grid1Dx.max() - 2000
self.min_y, self.max_y = grid1Dy.min() + 1500, grid1Dy.max() - 1500
self.min_z, self.max_z = grid1Dz.min() + 250, grid1Dz.max() - 300

, which were incompatible with our own dataset/camera configuration.

Yes, you are right. These parameters are for the Panoptic dataset, similar to 3D space size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants