-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port IDEFICS to tensorflow #26870
Port IDEFICS to tensorflow #26870
Conversation
@VictorSanh just an fyi - I am hoping this model is TF portable.. |
also ccing @Rocketknight1 |
🤗👀 |
Hey @a8nova! I'm the TF maintainer around here, and this sounds like a great idea. Feel free to ping me if you encounter any difficulties. In general when porting code to TF you will first want to do the 'boilerplate'. Take a look at #22970 to see how this works! All of these steps are fairly quick:
After this, you can start actually porting the code in
self.layer_norm1 = tf.keras.layers.LayerNormalization(name="layer_norm1")
We've actually had some success using GPT-4 to draft a port of the modeling code, so let me know if you'd like me to do that and add the GPT port of After all this, the last step is to make changes to any processing code and tests needed to also support the TF model. It's a long list, but it's doable! |
Thank you @Rocketknight1 for the detailed guide. I have 1,2 & 3 done already, i just updated the PR. I will continue to work on the rest. Regarding the GPT-4 generated draft, I already started doing some of the work, if you think the generated draft is easier to port to TF, please add it here and I can continue working from that ( I saw a comment about "auto-translation" in modeling_tf_sam.py and I was wondering about the details.. :) A few questions:
Thanks for all the help! |
Hi @a8nova, to answer the questions:
I'll work on generating GPT-4 translations for you, and post them here when they're available! Since you've already started on |
@a8nova I added the three files with |
Thank you @Rocketknight1, yes that makes sense, taking a closer look, idefics is 2 pre-trained models combined together, so vision.py is a core part. I will take a look at the auto-translated files! |
Hello @Rocketknight1 - Sorry I went MIA, I have been occupied with my 9-5. I just made some progress. I have the tests running, I am running into a weird error, I will attach a file below, any ideas? Also, regarding: |
Hi @a8nova! Firstly the error: The problem is that TF models don't let you assign to Secondly, regarding |
Thank you @Rocketknight1! |
Hi @Rocketknight1, I have a few questions:
Thanks in advance! |
Hi @a8nova, let's see... For 1, we usually add an argument like For 2, I think the problem there is that def freeze_text_layers(self, module_exceptions=[]):
for module in [self.decoder_layers, self.norm]:
freeze_model(module, module_exceptions=module_exceptions) As a result, it tries to iterate over For 3, |
Thank you @Rocketknight1 for the detailed explanation and help! Moving on to other errors.. |
Sorry to bother you again @Rocketknight1, Do i need to worry about gradient checkpointing in the TF training tests? I am asking because for TFIdeficsForVisionText2Text there is I found gradient_checkpointing_enable() inside modeling_utils.py, i don't see one inside modeling_tf_utils.py, can you guide me if my observation is correct and if I need to add all the gradient_checkpointing_* routines in modeling_tf_utils.py? Thank you! |
@a8nova No, you can skip gradient checkpointing in TF ports! |
Hi @Rocketknight1, how do i run the integration tests? For example
|
Hi @a8nova, you're right - we used to have a Anyway, as a result of this the integration tests for TF models can be quite different from the integration tests for PT models - I'd recommend copying tests from another TF model instead, rather than trying to exactly copy the PT tests. You may also have to drop some tests entirely - the full IDEFICS tests use 4-bit quantization in PT, which isn't supported for TF, and as a result our CI machines may not be able to run it in TF at all! |
Thanks @Rocketknight1. I have a few follow up questions. There are some test failures I don't understand.
Does the integration test make sense?
How do i pass
full error in file attached There are a few other errors I don't understand but some look related to this. Thanks for the help @Rocketknight1 ! |
Hi @a8nova, sorry for the delay! Firstly, for def __init__(self, return_tensors=None):
self.return_tensors = return_tensors
def __call__(self, return_tensors=None):
if return_tensors = None:
return_tensors = self.return_tensors In other words, you can supply it as an argument to Also, I took a look in the attached file but I can't see the error - can you double-check the contents? |
whoops I attached the wrong one, i have attached the correct one below. thank you @Rocketknight1! |
No probs - let me know if it recurs! |
Thank you @Rocketknight1!. There is another test failure I could use your help on. For |
Hi @a8nova, the models aren't intended to run with Also, the thing you did where you transpose to |
Hi @Rocketknight1, Happy holidays! I have a few more questions. I have less test failures now but I can't wait to get this model working end-to-end. I am using the integration test as a start for fully testing the tensorflow implmentation. EDIT: Update on 1 (Jan 7th): I have figured out the issue, it was due to a bad reshape, I am able to run the model end-to-end now using tiny-random-idefics. I will run it with idefics-9b next! (I still have some follow up questions coming up as there is some weirdness i don't understand)
|
Hi @a8nova, sorry for the Christmas-related delay. Huge congratulations on getting the tiny-random model working though, that indicates you're probably quite close to getting the whole thing working! Firstly, I'd suggest that you probably do not need to convert the weights. The reason for this is that our Secondly, dummy inputs are much less important than they used to be, since we're moving to adding explicit Finally, I notice a lot of the test failures are caused by the TF test runner trying to Still, I think this PR is getting close now - I should be able to reply much more quickly now the holidays are over, so please let me know if you encounter any other difficulties! |
… will be reverted" This reverts commit 998cc38.
… will be reverted" This reverts commit 1c695ac.
IIRC test_save_load was also failing on the CI but not on my local box, it might be easier to debug that on the CI first than the cross tests
This reverts commit 8eafc8e.
Maybe this will help me repro this weird bug
The issue seems to be with save_pretrained(), when I looked at the model saved from the CI test failure it is basically empty and has no weights. `self.save_weights(..)` seems to be failing in save_pretrained but needs more debugging
This reverts commit 9d0d307.
…ebugging" This reverts commit 774b6b7.
The CI failures are gone after my latest rebase, no idea why but I was still saving the model to my hub on HF and the tf_model.h5 file now has everything.
Hi @Rocketknight1! After rebasing, the CI failures for the pt_tf cross test and Some other pt tf equivalence test not related to this PR started failing but went away:
I am seeing some non-deterministic behavior with the tests, above test failure only happened once and I haven't seen it since. I added the logging of the disk usage detail, none of the tests were failing, in fact all tests pass in the latest commit. I also tried running the whole CI command for tests_tf on my local box but I wasn't able to repo. |
Hi @a8nova, looking at that failure, it seems like the error was very small ( The main thing is that the IDEFICS PR looks good - do you want me to do a final review now? |
A final review sounds good, I am so ready to get this merged in! thank you @Rocketknight1! |
Final review complete, and everything looks good! Thanks again for the commitment and the patience with getting this PR in - hopefully people will get a lot of benefit from TF-IDEFICS as a result of your work here! |
Thank you @Rocketknight1 for your assistance throughout this PR, i am really happy this got merged in. I think most of all I was the one to benefit from this as I have learned so much about transformers. I will focus on finalizing the TF-Gemma port next! |
What does this PR do?
This PR ports IDEFICS to tensorflow
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.