-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[model_utils] very slow model instantiation #9205
Comments
Doesn't that script also loads and preprocess the data? From what you're reporting, I don't interpret this as "transformers takes a long time to load the model" (since the line that does that takes the same time as a torch load) but as "stuff that happens in that script before the model loading takes a lot of time" (which is probably data preprocessing + the 3s to import transformers if TF is in your env). Or am I missing something? |
Perhaps my first post is confusing, what I did is bracketing the
So all the other stuff isn't being measured, just the |
Ah, I understand better. I don't think your comparison is fair: |
I removed the 2nd part that was showing the same issue from a different angle, as it appears to just confuse and isn't contributing to understanding the issue at hand. There is just |
OK, somehow I made a mistake and was taking the snapshot of startime before
So it's setting up the model that takes so long, just as you said. Can this somehow be sped up? I was integrating deepspeed and re-running the same command repeatedly and 23 extra secs of waiting to just discover that something is off was very painful for debugging. All the failures happened at much later stages. I worked around it it by switching to a tiny model, but even that takes some secs. Can we think of a way to make an image and load it rather than rebuilding the model from scratch? So we torch.load the weights, but also cache the model image itself and load it too, rather then create it anew. It seems to be so wasteful and slow if I'm not debugging the model creation but say tuning up something in the trainer and I want the other parts to load blazingly fast and get me to the point of interest quickly. What would be the best way to approach such need? |
So doing profiling on model instantiation code it can be seen that
So we are completely wasting time doing init weights, since we are immediately replacing them. (with the exception to If you prefer the visual version: Chances are that model init needs to be made context aware and not init weights which will be immediately replaced. Thoughts? That would make The profiling was done with:
|
If we see a significant gain in loading time, maybe it's worth to explore a way to only apply Maybe a model = cls(config, init_weights=False, *model_args, **model_kwargs) # don't call init_weights, but initialize all weights to zero because it's much faster
# load weights into model and get missing layers
# init missing layers |
Yeah Patrick's suggestion is probably the best, though I'm not sure it can easily be achieved in the current API. Note that this is only one slowdown at the beginning of training, so I don't think this should be high priority. |
I totally get it that it's not high priority, since most people don't care for a slow start when they run it non-stop for hours - it only affects people who need a quick start - which is the case when debugging something or as I suggested the demo function on the model pages which takes a really long time to load. In the case of BART, its deterministic segments do the init internally, so it's enough to just monkeypatch as a proof of concept:
and this command:
goes from 25sec to 8secs. The instantiation goes from 22 secs to 5 secs. There are few I quickly checked that the core functions normally - same scores - well, I did just one finetune_trainer run. One way is to solve this as @patrickvonplaten suggested, and I'm also thinking of changing the design a bit. So that each model has a normal |
I don't see how you could have an The only way I see through it is to allow the Again, lots of headaches and possibilities for errors for an end result that doesn't strike me as high priority.
It doesn't take 25 seconds on a tiny model, only a big one. So I'd suggest debugging on a tiny model :-) |
Thank you both for entertaining possible approaches and suggesting that you are not quite seeing a smooth solution. I just don't know enough about all of it, so I'm surely missing on cases I haven't thought of, but somehow in my mind it looks simple. The devil is in the details.
Unfortunately the tiny model approach doesn't work with debugging OOM in deepspeed, as its configuration correlates to the model size. I guess it's not special to deepspeed at all. So the tiny model trick works for checking mechanics (i.e. that the code compiles), but isn't helpful for OOM debug. |
@patrickvonplaten, @sgugger, @LysandreJik - could we please revisit this - working on making t5-11b train was painful - it was taking really really really long time to init the model, just to drop it and replace with pre-trained weights. Transformers is mainly about pre-trained models, so perhaps this can be made somehow configurable? We know when a pretrained model is loaded, so why not propagate that information and let the model know it's being loaded in pre-trained mode, so that it could skip any weight inits that are going to be replaced anyway? And while we are at it, I don't suppose there is a way to involve more than one CPU core in loading the model? I guess that would be a question for pytorch. Thank you! |
I'm happy to add such a featurue. It should be feasible to only initialize those layers that are not in the saved |
Indeed, this would be a welcome feature, big models aren't going away. |
This issue has been automatically marked as stale and been closed because it has not had recent activity. Thank you for your contributions. If you think this still needs to be addressed please comment on this thread. |
@patrickvonplaten, I should probably work on it - since it doesn't seem like you will have time any time soon. |
It's on my To-Do List, but still don't think, I'll be able to take a look within the next 2,3 weeks - sorry :-/ If you find some time for this, it would be great |
Hello @AyeshaSarwar, could you please use the forum: https://discuss.huggingface.co/ instead for such questions? We don't support Flask compatibility in Thanks |
Im on the same boat as @stas00 . I understand that the code need to maintain a wider compatibility across the oceans of models, but people needs a working workaround before an elegant solution born into reality. I believe as huggingface slowly graduating from pure research field, more and more people are being hurt by the tremendous model initialization time. |
@DeXtmL, this thread is 2 years old - the particular problem I raised in this Issue has been solved a long time ago. The model is no longer being init'ed twice. If you feel something is still slow please start a new Issue. thank you. |
For some reason I'm noticing a very slow model instantiation time.
For example to load
shleifer/distill-mbart-en-ro-12-4
it takestorch.load
its weights.If I'm not changing how the model is created and want to quickly fast forward to the area of debug how could these slow parts be cached and not rebuilt anew again and again?
But also it looks like we are doing a completely wasteful operation of init_weights, which immediately get overwritten with pretrained model weights (#9205 (comment)) (for the use case of pre-trained model).
(I initially made a mistake and thought that it was
torch.load
that had an issue, but it'scls(config, *model_args, **model_kwargs)
) - thank you, @sgugger - so this post has been edited to reflect reality. So if you're joining later you can skip the comments up to #9205 (comment) and continue from there)@patrickvonplaten, @sgugger, @LysandreJik
The text was updated successfully, but these errors were encountered: