-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[seq2seq] memory regression #9261
Comments
Yes, we really should take a stab at better speed and memory regression testing. Big new years resolution! |
This specific commit introduced the regression: |
There is a second problem: Same as above but with apex:
hangs 5% into training - spinning CPU (not OOMing) - had to kill. checked pre this PR - no hanging. Full command:
(It OOMs some time later into training) but no hanging. |
So both problem seem to be related to label-smoothing, @sgugger has been testing hypotheses and this one worked:
edit @sgugger says that this code wasn't right, so we currently don't have a solution yet. will keep on experimenting. |
Hi. |
Well, I don't think it's related other than both using up more RAM ;) This regression happened in a very recent change, but you're using a much older transformers version. I will follow up in your Issue you linked to. |
So |
#9241 introduced a memory regression - found out via git bisect.
I was able to do: BS=12 before this PR got merged and now only BS=8 with:
We really need to go back to that issue of memory benchmarks in CI and figure out how to make it happen.
The problem is that I started working on it some months back but gave up since each gpu gave different numbers...
For details please see: #6045
edit: should also make sure that
--label_smoothing 0.1 --fp16 --fp16_backend apex
works #9261 (comment)@patrickvonplaten, should we figure this out in the new year?
The text was updated successfully, but these errors were encountered: