Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About training time #22

Open
daihu-ye opened this issue Aug 22, 2021 · 5 comments
Open

About training time #22

daihu-ye opened this issue Aug 22, 2021 · 5 comments

Comments

@daihu-ye
Copy link

Hello! I try to train the model on ImageNet by setting epochs=1,it still takes 11 hours to finish training.I want to know
how long does it take to train AANets on ImageNet (N=5/10/25)?

@yaoyao-liu
Copy link
Owner

Thanks for your interest in our work.

Even if you set the number of epochs as one, some steps (e.g., herding) still take lots of time. On a single V100 GPU, it takes around 3-4 days to run the experiments on ImageNet using the default setting.

@daihu-ye
Copy link
Author

I have run the experiments on ImageNet for 2 days,the first increamental phase was still not finished .My GPU is 3090,a single epoch takes 25 min.It seems that it wil take more than 3-4days even a week to get the result.I don't know whta's wrong with it .
I tried to increase the batch size ,but 192 was too big for 3090.Also,N=5 takes 3-4days, what about N=25?How did you finish the whole experiments? Even with four 3090,i can't get the results on ImageNet ,it's quite frustrating .It takes too much time to run the experiments.

@yaoyao-liu
Copy link
Owner

According to the default setting, we use half of all data in the first phase. So it takes more time than other phases.

I didn't record the running time for different settings on ImageNet. So I cannot directly provide you with the exact time for running each experiment. I agree that running experiments on ImageNet takes lots of time. If there is anything I could help with, you may let me know.

@daihu-ye
Copy link
Author

daihu-ye commented Aug 26, 2021

It seems there are some bugs in your code when training ImageNet.In function gen_balanced_loader,self.balancedset.imgs = self.trainset.samples = current_train_imgs
It should be self.balancedset.imgs = self.balancedset.samples = current_train_imgs,right?

@yaoyao-liu
Copy link
Owner

Also,in function gen_balanced_loader,self.balancedset.imgs = self.trainset.samples = current_train_imgs
you changed self.trainset.samples,it will cause trainloader change.

Thanks for correcting this bug. I have updated the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants