Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the results and executions #4

Open
andisantos opened this issue Sep 12, 2022 · 1 comment
Open

Questions about the results and executions #4

andisantos opened this issue Sep 12, 2022 · 1 comment

Comments

@andisantos
Copy link

Hello! Great paper! congrats!
I'm trying to use your work in my masters research and having some questions:

  1. Could you release the results of every split for PACS, OfficeHome and Domain Net? My results are close to yours in average (1~2% difference at most), but I want to see if any of the splits results have a much higher difference.

  2. I'm was having a problem with the dataloader: some times (at random) it gets stuck trying to get next data and I have to terminate the execution and start again. Have you seen this king of problem? I found some issues saying that it may be associated with parallelism when n_workers > 0. When I set n_workers = 0, it worked, but for some datasets like domain net, it took 9897minutes to execute for the Sketch split, which is a lot.

  3. Do you have the times of execution for each of your run.sh examples? I would like to compare with the times I'm having running them.

  4. Have you tried to update torch, torch vision and other libraries to evaluate if the results change? In the newer version of pytorch (1.12.0), I could execute with parallelism, but in the paper version, I was getting the random dataloader stuck problem, but the results change a little.
    Results in torch 1.10: div 1.0367 +/- 0.0536 | cor 0.0002 +/- 0.0003
    Results in torch 1.12: div 0.9695 +/- 0.0159 | cor 0.0000 +/- 0.0000

I'm running on NVIDIA RTX 5000 16Gib, using docker versions of pytorch 1.10.2 and 1.12.01.
For each version, I installed the respective version of torchvision following their github https://github.com/pytorch/vision#installation

I know its a lot of information, I'll be really happy if we could talk and hope to hear from you soon.
Thank you!

@m-Just
Copy link
Owner

m-Just commented Sep 16, 2022

Hi, thanks for your interest in our work!

  1. I think 1~2% average difference at most is normal. I'm afraid that I could not find the results of every split.
  2. I rarely see this kind of problem. My guess is that you had run out of memory, but it could also be other issues.
    DomainNet is a very large dataset compared with PACS, etc. so it takes much longer to run (especially with n_workers = 0).
  3. I can only give you some rough numbers. I was using V100 GPUs, usually 4 but sometimes 8 of them in parallel for each dataset, with abundant CPUs and memory. Colored MNIST, CelebA, and NICO were quite fast, which usually took less than an hour. PACS, OfficeHome, TerraIncognita took 1-2 hours. ImageNet variants took 3-4 hours. Camelyon took about 10 hours. DomainNet took the longest, about 1-2 days.
  4. The results will sometimes change with different versions of libraries. I'm not sure if the change you observed is due to different versions, though.

Hope these clarify a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants