-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TensorFlow implementation of EfficientFormer #22620
Add TensorFlow implementation of EfficientFormer #22620
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
fb3da1f
to
1a25b23
Compare
Hi @D-Roberts, just letting you know the TF team at Hugging Face is aware of this and definitely interested in the port! Please ping me or @gante whenever it's ready for review, or if you run into any issues while porting. |
500de37
to
1d3c82d
Compare
4a64767
to
6f8c787
Compare
b2ac9bb
to
2cc5a5e
Compare
@Rocketknight1 @gante This PR is now ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks like an incredibly solid port! I think this might be the best handling of a complex PR like this that I've ever seen from someone not on the Hugging Face payroll. Most issues I raised are just comments or nits, but manipulating self.ab
during the forward pass and the layer names in the encoder are two that could potentially be breaking. Let me know what you think of the proposed solutions there, but I think this PR should be ready to merge very soon.
src/transformers/models/efficientformer/modeling_tf_efficientformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/efficientformer/modeling_tf_efficientformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/efficientformer/modeling_tf_efficientformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/efficientformer/modeling_tf_efficientformer.py
Outdated
Show resolved
Hide resolved
cc @amyeroberts for core maintainer review as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this model!
Overall a really nice, clean PR, super easy to review 🤗 There's a few places were the architecture implementation deviates for the standard pattern but this seems to come from the PT model.
In general, just a few comments before we're good to merge:
- As @Rocketknight1 highlighted, the logic for
self.ab
is very non-canonical and potentially breaking for TF, so let's go for the local varab = tf.gather(...)
logic - The
serving_output
logic should be updated for the hidden_states and attentions to be conditionally returned based on the config settings, and a comment added about why they're not converted to tensors (different shapes) as in other vision models - Switching the NHWC format should just happen once in the main layer
src/transformers/models/efficientformer/configuration_efficientformer.py
Show resolved
Hide resolved
tests/models/efficientformer/test_modeling_tf_efficientformer.py
Outdated
Show resolved
Hide resolved
tests/models/efficientformer/test_modeling_tf_efficientformer.py
Outdated
Show resolved
Hide resolved
tests/models/efficientformer/test_modeling_tf_efficientformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/efficientformer/modeling_tf_efficientformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/efficientformer/modeling_tf_efficientformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/efficientformer/modeling_tf_efficientformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/efficientformer/modeling_tf_efficientformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/efficientformer/modeling_tf_efficientformer.py
Outdated
Show resolved
Hide resolved
src/transformers/models/efficientformer/modeling_tf_efficientformer.py
Outdated
Show resolved
Hide resolved
4fd39d8
to
dbd5e73
Compare
@Rocketknight1 @amyeroberts I addressed your comments and also submitted two PRs for the l1 and l3 weights (and tagged Rocketknight1). Let me know what's next! |
@D-Roberts - that's great! For the CI - it seems there is an issue with your CircleCI permissions, as the tests won't run. |
092502e
to
02c7ad6
Compare
@amyeroberts Thanks for pointing out the circle ci fix. It appears that one doc test which (rightly) can't find tf weights is failing for now. I added back the |
@D-Roberts Just to let you know, we've reached out to the team at Snap to ask them to merge your PRs on the EfficientFormer checkpoints. Sorry for the delay! |
@D-Roberts the checkpoint PRs should be merged now. Thank you to @alanspike for the quick response! |
02c7ad6
to
d442667
Compare
@amyeroberts @Rocketknight1 All local tests pass with the new tf weights. The CI gets this documentation tests failing; the pt version also predicts 281 which maps to label_281 in config. |
@D-Roberts I think it's fine to swap those tests for just checking the actual argmax index rather than the |
b188b39
to
9d30773
Compare
b17b4cd
to
a2b9995
Compare
@Rocketknight1 All green again. :) |
* Add tf code for efficientformer * Fix return dict bug - return last hidden state after last stage * Fix corresponding return dict bug * Override test tol * Change default values of training to False * Set training to default False X3 * Rm axis from ln * Set init in dense projection * Rm debug stuff * Make style; all tests pass. * Modify year to 2023 * Fix attention biases codes * Update the shape list logic * Add a batch norm eps config * Remove extract comments in test files * Add conditional attn and hidden states return for serving output * Change channel dim checking logic * Add exception for withteacher model in training mode * Revert layer count for now * Add layer count for conditional layer naming * Transpose for conv happens only in main layer * Make tests smaller * Make style * Update doc * Rm from_pt * Change to actual expect image class label * Remove stray print in tests * Update image processor test * Remove the old serving output logic * Make style * Make style * Complete test
* Add tf code for efficientformer * Fix return dict bug - return last hidden state after last stage * Fix corresponding return dict bug * Override test tol * Change default values of training to False * Set training to default False X3 * Rm axis from ln * Set init in dense projection * Rm debug stuff * Make style; all tests pass. * Modify year to 2023 * Fix attention biases codes * Update the shape list logic * Add a batch norm eps config * Remove extract comments in test files * Add conditional attn and hidden states return for serving output * Change channel dim checking logic * Add exception for withteacher model in training mode * Revert layer count for now * Add layer count for conditional layer naming * Transpose for conv happens only in main layer * Make tests smaller * Make style * Update doc * Rm from_pt * Change to actual expect image class label * Remove stray print in tests * Update image processor test * Remove the old serving output logic * Make style * Make style * Complete test
@sgugger @amyeroberts @Rocketknight1 I was wondering - when do you plan a transformers release that includes this code? |
@D-Roberts We release roughly once a month and are planning on releasing 4.30 later this week. If you need it right now, it's possible to install from source to have the |
* Add tf code for efficientformer * Fix return dict bug - return last hidden state after last stage * Fix corresponding return dict bug * Override test tol * Change default values of training to False * Set training to default False X3 * Rm axis from ln * Set init in dense projection * Rm debug stuff * Make style; all tests pass. * Modify year to 2023 * Fix attention biases codes * Update the shape list logic * Add a batch norm eps config * Remove extract comments in test files * Add conditional attn and hidden states return for serving output * Change channel dim checking logic * Add exception for withteacher model in training mode * Revert layer count for now * Add layer count for conditional layer naming * Transpose for conv happens only in main layer * Make tests smaller * Make style * Update doc * Rm from_pt * Change to actual expect image class label * Remove stray print in tests * Update image processor test * Remove the old serving output logic * Make style * Make style * Complete test
What does this PR do?
Ran tests (CPU-only, all pass) with:
NVIDIA_TF32_OVERRIDE=1 RUN_SLOW=1 RUN_PT_TF_CROSS_TESTS=1 py.test -vv -rA tests/models/efficientformer/test_modeling_tf_efficientformer.py
Double checked pt and tf architecture codes with the "EfficientFormer: Vision Transformers at MobileNet Speed" paper.
Verified on example image shapes and diffs in hidden states:
which gives:
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.