Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port IDEFICS to tensorflow #26870

Merged
merged 119 commits into from
May 13, 2024
Merged

Port IDEFICS to tensorflow #26870

merged 119 commits into from
May 13, 2024

Conversation

a8nova
Copy link
Contributor

@a8nova a8nova commented Oct 17, 2023

What does this PR do?

This PR ports IDEFICS to tensorflow

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@a8nova a8nova marked this pull request as draft October 17, 2023 13:17
@a8nova a8nova changed the title Initial commit Port IDEFICS to tensorflow Oct 17, 2023
@a8nova
Copy link
Contributor Author

a8nova commented Oct 20, 2023

@VictorSanh just an fyi - I am hoping this model is TF portable..

@ArthurZucker
Copy link
Collaborator

also ccing @Rocketknight1

@VictorSanh
Copy link
Contributor

🤗👀

@Rocketknight1
Copy link
Member

Rocketknight1 commented Oct 20, 2023

Hey @a8nova! I'm the TF maintainer around here, and this sounds like a great idea. Feel free to ping me if you encounter any difficulties. In general when porting code to TF you will first want to do the 'boilerplate'. Take a look at #22970 to see how this works! All of these steps are fairly quick:

  1. Copy the modeling_idefics.py file to modeling_tf_idefics.py
  2. Rename all the classes e.g. IdeficsForXXX -> TFIdeficsForXXX
  3. Add the relevant imports in src/transformers/models/idefics/__init__.py and /src/transformers/__init__.py
  4. Add the TF classes to modeling_tf_auto.py
  5. Add the TF classes in docs/source/en/model_doc/

After this, you can start actually porting the code in modeling_tf_idefics.py. In general, you can just replace torch.xxx() ops with tf.xxx(), and layers from torch.nn with layers from tf.keras.layers. When creating layers, subclass from tf.keras.layers.Layer instead of nn.Module. Many of the ops are exactly identical, or only have minor name changes. There are a few key changes to watch out for, though:

  1. The TF forward() method is called call(). NOT __call__()!
  2. TF layers usually don't have an input_dim argument. This value is inferred when they are built with their first input.
  3. When creating a layer (any class that is subclassed from tf.keras.layers.Layer, which includes TF built-in layers like Dense), pass the attribute name as the name argument. This ensures the TF weights layout lines up with the PyTorch state dict, like so:
self.layer_norm1 = tf.keras.layers.LayerNormalization(name="layer_norm1")
  1. In-place modifications of tensors are prohibited in TF (and JAX!). Most neural net code doesn't do this anyway, because it creates problems in backprop. If you need to do it, you can use something like tf.where() or tf.scatter_nd_update() to create a new tensor with the updates instead. This can be tricky, let me know if you need help!
  2. For... reasons... the base stem of the model is contained in TFIdeficsBaseLayer and TFIdeficsModel is just a wrapper around this. Derived classes all use TFIdeficsBaseLayer and don't create any TFIdeficsModel layers. This is different from Torch models, where the model stem is contained in IdeficsModel and the derived classes use it as a layer.

We've actually had some success using GPT-4 to draft a port of the modeling code, so let me know if you'd like me to do that and add the GPT port of modeling_tf_idefics.py to this PR as a starting point!

After all this, the last step is to make changes to any processing code and tests needed to also support the TF model. It's a long list, but it's doable!

@a8nova
Copy link
Contributor Author

a8nova commented Oct 20, 2023

Thank you @Rocketknight1 for the detailed guide. I have 1,2 & 3 done already, i just updated the PR. I will continue to work on the rest.

Regarding the GPT-4 generated draft, I already started doing some of the work, if you think the generated draft is easier to port to TF, please add it here and I can continue working from that ( I saw a comment about "auto-translation" in modeling_tf_sam.py and I was wondering about the details.. :)

A few questions:

  1. What about all the torch.nn code inside perceiver.py and vision.py, do they also need TF porting? (my goal is to port inference code first, if this isn't needed for inference, then maybe i can come back to it)
  2. For model_tf_auto.py, what is this code doing? It is not clear to me what to add the TF idefics versions, since i don't understand that file

Thanks for all the help!

@Rocketknight1
Copy link
Member

Hi @a8nova, to answer the questions:

  1. You'll probably need to convert those files too - the IdeficsVisionTransformer in vision.py seems to be a core part of the model. You might be able to skip perceiver.py initially, as it's only used in some model configs, but we should probably do it somewhere as part of this PR!
  2. That code is there because SAM was the first ForMaskGeneration model we added in TF. For Idefics it's easier, because you're not adding a whole new "category" of model like that. If you look in modeling_auto.py you'll see there are two lines for IdeficsModel and IdeficsForVisionText2Text - all you need to do is make copies of those lines in modeling_tf_auto.py for TFIdeficsModel and TFIdeficsForVisionText2Text.

I'll work on generating GPT-4 translations for you, and post them here when they're available! Since you've already started on modeling_tf_idefics.py I won't overwrite it. You can just copy pieces from it when you need to.

@Rocketknight1
Copy link
Member

Rocketknight1 commented Oct 20, 2023

@a8nova I added the three files with _autotranslate.py endings! Note that there are likely to be issues (e.g. forgetting the name kwarg when initializing layers even though I told it to)

@a8nova
Copy link
Contributor Author

a8nova commented Oct 21, 2023

Thank you @Rocketknight1, yes that makes sense, taking a closer look, idefics is 2 pre-trained models combined together, so vision.py is a core part.

I will take a look at the auto-translated files!

@a8nova
Copy link
Contributor Author

a8nova commented Nov 16, 2023

Hello @Rocketknight1 - Sorry I went MIA, I have been occupied with my 9-5. I just made some progress. I have the tests running, I am running into a weird error, I will attach a file below, any ideas?
error.txt

Also, regarding:
"For... reasons... the base stem of the model is contained in TFIdeficsBaseLayer and TFIdeficsModel is just a wrapper around this. Derived classes all use TFIdeficsBaseLayer and don't create any TFIdeficsModel layers. This is different from Torch models, where the model stem is contained in IdeficsModel and the derived classes use it as a layer."
how come I don't see similar stuff for other TF implementations? Is this specific to IDEFICS?

@Rocketknight1
Copy link
Member

Hi @a8nova!

Firstly the error: The problem is that TF models don't let you assign to self.layers, because TensorFlow reserves that as a special keyword. What you should do is replace self.layers with something else, maybe self.modules or self.decoder_layers or something. However, you should keep the name kwarg as layers_{i}. We match TF layers to PT layers when doing weight cross-loading using the name attributes of layers, not the actual name like self.layers, so as long as you keep the argument the same then weight cross-loading should work.

Secondly, regarding TFIdeficsBaseLayer, that was actually just a typo on my part - it's actually called TFIdeficsMainLayer! If you check any of our other TF models, you'll see they have a MainLayer class like TFBertMainLayer that contains the model stem.

@a8nova
Copy link
Contributor Author

a8nova commented Nov 17, 2023

Thank you @Rocketknight1!

@a8nova
Copy link
Contributor Author

a8nova commented Nov 21, 2023

Hi @Rocketknight1, I have a few questions:

  1. For processing_idefics.py, how do you suggest I handle both pytorch and tf? right now I just have it hacked in my view to only have TF stuff (just to unblock me).
  2. I am getting this weird error from freeze_model . I am doing something wrong but not sure, the error is AttributeError: 'TFIdeficsRMSNorm' object has no attribute 'layers' (full stacktrace attached). Any ideas?
  3. There is also a "ALL_LAYERNORM_LAYERS" in the pytorch code, I added it in this commit 7e0a351, does this look right to you?

Thanks in advance!

error.txt

@Rocketknight1
Copy link
Member

Hi @a8nova, let's see...

For 1, we usually add an argument like return_tensors which can take values like tf, pt, etc. You can take a look at e.g. models/sam/processing_sam.py for an example - the main thing is that you should guard any imports of tf or torch behind something like is_torch_available() to make sure that the code doesn't crash for people who only have one of them installed!

For 2, I think the problem there is that freeze_text_layers iterates over multiple layers that include the normalization layer self.norm:

def freeze_text_layers(self, module_exceptions=[]):
    for module in [self.decoder_layers, self.norm]:
        freeze_model(module, module_exceptions=module_exceptions)

As a result, it tries to iterate over self.norm.layers, which doesn't exist because self.norm is just a LayerNormalization layer, not a model with additional sub-layers. I'll suggest a change in freeze_model that should help.

For 3, ALL_LAYERNORM_LAYERS is a value mainly used by the HuggingFace Trainer, which is only used for PyTorch. When training with TF, we just use Keras instead to get the same functionality. You can just skip/remove it!

@a8nova
Copy link
Contributor Author

a8nova commented Nov 22, 2023

Thank you @Rocketknight1 for the detailed explanation and help! Moving on to other errors..

@a8nova
Copy link
Contributor Author

a8nova commented Nov 22, 2023

Sorry to bother you again @Rocketknight1, Do i need to worry about gradient checkpointing in the TF training tests?

I am asking because for TFIdeficsForVisionText2Text there is test_training and test_training_gradient_checkpointing and they call model.train() and model.gradient_checkpointing_enable() which fail with AtributeError.

I found gradient_checkpointing_enable() inside modeling_utils.py, i don't see one inside modeling_tf_utils.py, can you guide me if my observation is correct and if I need to add all the gradient_checkpointing_* routines in modeling_tf_utils.py?

Thank you!

@Rocketknight1
Copy link
Member

@a8nova No, you can skip gradient checkpointing in TF ports!

@a8nova
Copy link
Contributor Author

a8nova commented Nov 29, 2023

Hi @Rocketknight1, how do i run the integration tests? For example IdeficsModelIntegrationTest.

test_training for tf is failing due to model.train(). i see it is defined in Trainer class in files trainer.py and trainer_tf.py. I don't think trainer_tf.py is used anywhere or is it? how do you suggest I resolve this? Thanks!

@Rocketknight1
Copy link
Member

Hi @a8nova, you're right - we used to have a TFTrainer class but it's now deprecated. We recommend just training our TF models using the Keras API like model.fit(), and that means methods like model.train() do not really exist for them. I wrote a blogpost about this here if you're curious, but it's mostly not that relevant to this PR!

Anyway, as a result of this the integration tests for TF models can be quite different from the integration tests for PT models - I'd recommend copying tests from another TF model instead, rather than trying to exactly copy the PT tests. You may also have to drop some tests entirely - the full IDEFICS tests use 4-bit quantization in PT, which isn't supported for TF, and as a result our CI machines may not be able to run it in TF at all!

@a8nova
Copy link
Contributor Author

a8nova commented Dec 3, 2023

Thanks @Rocketknight1. I have a few follow up questions. There are some test failures I don't understand.

  1. For image_processing_idefics.py, I made changes in this commit to pass return_tensors to preprocess. I am testing it via the integration tests, I will share diff I needed to get it to run below.

Does the integration test make sense?

diff --git a/tests/models/idefics/test_modeling_tf_idefics.py b/tests/models/idefics/test_modeling_tf_idefics.py
index f9bcec579..50eee25d6 100644
--- a/tests/models/idefics/test_modeling_tf_idefics.py
+++ b/tests/models/idefics/test_modeling_tf_idefics.py
@@ -454,7 +454,7 @@ class TFIdeficsForVisionText2TextTest(TFIdeficsModelTest, unittest.TestCase):
 
 @require_tf
 @require_vision
-class IdeficsModelIntegrationTest(TestCasePlus):
+class TFIdeficsModelIntegrationTest(TestCasePlus):
     @cached_property
     def default_processor(self):
         return (
@@ -463,8 +463,6 @@ class IdeficsModelIntegrationTest(TestCasePlus):
             else None
         )
 
-    @require_bitsandbytes
-    @slow
     def test_inference_natural_language_visual_reasoning(self):
         cat_image_path = self.tests_dir / "fixtures/tests_samples/COCO/000000039769.png"
         cats_image_obj = Image.open(cat_image_path)  # 2 cats
@@ -494,9 +492,7 @@ class IdeficsModelIntegrationTest(TestCasePlus):
             load_in_4bit=True,
             bnb_4bit_compute_dtype="float16",
         )
-        model = IdeficsForVisionText2Text.from_pretrained(
-            "HuggingFaceM4/idefics-9b", quantization_config=quantization_config, device_map="auto"
-        )
+        model = TFIdeficsForVisionText2Text.from_pretrained("HuggingFaceM4/idefics-9b")
         processor = self.default_processor
         inputs = processor(prompts, return_tensors="tf")
         generated_ids = model.generate(**inputs, max_length=100)

How do i pass return_tensors? I adopted other code where return_tensors for image processing code is passed to the preprocess code only and not sure how to pass it here.

  1. I have attached an error log for another failure I get. This is coming from TFIdeficsDecoupledLinear:
    E assert <transformers.models.idefics.modeling_tf_idefics.TFIdeficsDecoupledLinear object at 0x7d4a9ac41540> is None,

full error in file attached

There are a few other errors I don't understand but some look related to this. Thanks for the help @Rocketknight1 !

@Rocketknight1
Copy link
Member

Hi @a8nova, sorry for the delay!

Firstly, for return_tensors, generally our processors handle it like this:

def __init__(self, return_tensors=None):
    self.return_tensors = return_tensors

def __call__(self, return_tensors=None):
    if return_tensors = None:
        return_tensors = self.return_tensors

In other words, you can supply it as an argument to __call__, but if you don't supply that argument, then it uses the default value that's set in the __init__.

Also, I took a look in the attached file but I can't see the error - can you double-check the contents?

@a8nova
Copy link
Contributor Author

a8nova commented Dec 6, 2023

whoops I attached the wrong one, i have attached the correct one below. thank you @Rocketknight1!
Edit: i can't seem to repro that error now, I will look into it over the weekend again

@Rocketknight1
Copy link
Member

No probs - let me know if it recurs!

@a8nova
Copy link
Contributor Author

a8nova commented Dec 9, 2023

Thank you @Rocketknight1!. There is another test failure I could use your help on. For test_resize_token_embeddings, the input pixel_values is CHW without the batches N, which code is responsible for resizing this kind of input? Right now it crashes when it tries to build the dummy weights, I have attached full error. The vision_tf.py code is still a little bugy. In the TFIdeficsVisionEmbeddings forward pass, the first thing I do is change the pixel_values to channels last format so it runs on CPU. what do I do when I get this kind of input? Thank you!

test_resize_tokens_embeddings_error.txt

@Rocketknight1
Copy link
Member

Hi @a8nova, the models aren't intended to run with CHW input - they should always receive NCHW! The test_resize_token_embeddings test is mostly designed to work with text models - you might need to override for IDEFICS, but it's also not a critical test, so you can just skip it instead!

Also, the thing you did where you transpose to NHWC / channels_last is normal for TF models, because TF convolutions don't work on CPU in channels_first mode. Be careful that you transpose back after convolutional layers/blocks, though, or else the dimensions in the rest of the code will be wrong!

@a8nova
Copy link
Contributor Author

a8nova commented Dec 27, 2023

Hi @Rocketknight1, Happy holidays! I have a few more questions. I have less test failures now but I can't wait to get this model working end-to-end. I am using the integration test as a start for fully testing the tensorflow implmentation.

EDIT: Update on 1 (Jan 7th): I have figured out the issue, it was due to a bad reshape, I am able to run the model end-to-end now using tiny-random-idefics. I will run it with idefics-9b next! (I still have some follow up questions coming up as there is some weirdness i don't understand)

1. For the integration test, I have( This is from TFIdeficsModelIntegrationTest slightly modified as below, I understand this test will fail with tiny-random-idefics but I am using it to help me flush other bugs)

        model = TFIdeficsForVisionText2Text.from_pretrained("HuggingFaceM4/tiny-random-idefics", from_pt=True)
        processor = self.default_processor
        inputs = processor(prompts, return_tensors="tf")
        generated_ids = model.generate(**inputs, max_length=100)
        generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)

Right now this fails when calling from_pretrained, it goes into the forward pass and has an invalid size for pixel_values, this goes back to what I asked last time where I said the input is missing batches for some test but it looks like the problem is from my code actually. Right now what I am observing is that processing_idefics.py::__call__ returns the correct size for pixel_values from the call to BatchFeatures() where pixel_values is a 5d tensor of [2,2,3,30,30] but the first forward pass inside modeling_idefics and vision_tf.py have pixel_values of [3,30,30].
Do you have suggestions how this might be?

  1. I tried converting the pytorch weights to tf but failed with some error, do i need to get the tensorflow implementation working before I can convert the weights?

  2. Is dummy_inputs still needed for tf implementations? Like modeling_tf_bert.py#L916

@Rocketknight1
Copy link
Member

Rocketknight1 commented Jan 8, 2024

Hi @a8nova, sorry for the Christmas-related delay. Huge congratulations on getting the tiny-random model working though, that indicates you're probably quite close to getting the whole thing working!

Firstly, I'd suggest that you probably do not need to convert the weights. The reason for this is that our idefics checkpoints all have safetensors weights already, and those weights can be loaded by any framework - TF, PyTorch or JAX. Once the TF code is working, you should just be able to load those repos with from_pretrained with no additional changes to the repo.

Secondly, dummy inputs are much less important than they used to be, since we're moving to adding explicit build() methods on all of our TF models. That's something we'll have to add to this PR as well, but let's leave it to the end - I can help with it, and it shouldn't be too painful.

Finally, I notice a lot of the test failures are caused by the TF test runner trying to import torch - the reason for this is that you're importing ModelOutput from modeling_outputs.py, but this is a torch file - try importing it from modeling_tf_outputs instead!

Still, I think this PR is getting close now - I should be able to reply much more quickly now the holidays are over, so please let me know if you encounter any other difficulties!

a8nova added 16 commits May 11, 2024 19:49
IIRC test_save_load was also failing on the CI but not on my local
box, it might be easier to debug that on the CI first than the cross tests
Maybe this will help me repro this weird bug
The issue seems to be with save_pretrained(),  when I looked at the model saved
from the CI test failure it is basically empty and has no weights.
`self.save_weights(..)` seems to be failing in save_pretrained but needs
more debugging
The CI failures are gone after my latest rebase, no idea why
but I was still saving the model to my hub on HF and the tf_model.h5
file now has everything.
@a8nova
Copy link
Contributor Author

a8nova commented May 12, 2024

Hi @Rocketknight1! After rebasing, the CI failures for the pt_tf cross test and test_save_load are now passing. If you look at the model size from the test, you can see the size now matches the one from my local, you can also see that tf_model.h5 has all the model weights. Now, I have no idea why it started passing which is suspicious but I think it would be great to get someone that setup the docker CI involved.

Some other pt tf equivalence test not related to this PR started failing but went away:

FAILED tests/models/swiftformer/test_modeling_swiftformer.py::SwiftFormerModelTest::test_pt_tf_model_equivalence - AssertionError: 1.6331993e-05 not less than or equal to 1e-05 : outputs.hidden_states_1: Difference between PyTorch and TF is 1.6331992810592055e-05 (>= 1e-05).
============= 1 failed, 292 passed, 11 skipped in 97.42s (0:01:37) ============

I am seeing some non-deterministic behavior with the tests, above test failure only happened once and I haven't seen it since. I added the logging of the disk usage detail, none of the tests were failing, in fact all tests pass in the latest commit.

I also tried running the whole CI command for tests_tf on my local box but I wasn't able to repo.

@Rocketknight1
Copy link
Member

Hi @a8nova, looking at that failure, it seems like the error was very small (1.633e-05). At that scale, it's possible just for random numerical differences between runs to create intermittent/flaky errors like that. In general, we just slightly increase the threshold when that happens - we'll do that in another PR if it becomes a recurring problem, but you can ignore it for now!

The main thing is that the IDEFICS PR looks good - do you want me to do a final review now?

@a8nova
Copy link
Contributor Author

a8nova commented May 13, 2024

A final review sounds good, I am so ready to get this merged in! thank you @Rocketknight1!

@Rocketknight1
Copy link
Member

Final review complete, and everything looks good! Thanks again for the commitment and the patience with getting this PR in - hopefully people will get a lot of benefit from TF-IDEFICS as a result of your work here!

@Rocketknight1 Rocketknight1 merged commit 9430635 into huggingface:main May 13, 2024
24 checks passed
@a8nova
Copy link
Contributor Author

a8nova commented May 13, 2024

Thank you @Rocketknight1 for your assistance throughout this PR, i am really happy this got merged in.

I think most of all I was the one to benefit from this as I have learned so much about transformers. I will focus on finalizing the TF-Gemma port next!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants