-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TF Longformer] Improve Speed for TF Longformer #6447
[TF Longformer] Improve Speed for TF Longformer #6447
Conversation
Codecov Report
@@ Coverage Diff @@
## master #6447 +/- ##
==========================================
+ Coverage 78.96% 79.81% +0.84%
==========================================
Files 157 157
Lines 28486 28479 -7
==========================================
+ Hits 22495 22730 +235
+ Misses 5991 5749 -242
Continue to review full report at Codecov.
|
8c8c5a0
to
85bed01
Compare
# is index masked or global attention | ||
|
||
is_index_masked = tf.math.less(attention_mask, 0) | ||
is_index_global_attn = tf.math.greater(attention_mask, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These operations should be done once at not be repeated for every layer -> unnecessary slow-down
@@ -170,24 +168,20 @@ def call( | |||
# normalize query | |||
query_vectors /= tf.math.sqrt(tf.constant(self.head_dim, dtype=tf.dtypes.float32)) | |||
|
|||
query_vectors = tf.transpose( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get rid of tf.transpose
as it allocates a new tensor upon calling. Should be avoided.
@@ -134,6 +125,19 @@ def test_save_load(self): | |||
|
|||
self.assert_outputs_same(after_outputs, outputs) | |||
|
|||
def test_graph_mode(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a function that tests every model in graph mode
Speed Benchmarking:Running this command on the master branch:
on this env:
gives:
On this branch the speed is improved to:
So we can see an improvement of ca. 3%, which is not that much actually... I guess it's interesting to see what effect removing some unnecessary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though I'm no expert in TF :-)
316206c
to
ae3bbe2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, LGTM! Nice test, should be pretty similar to the Keras/compile tests!
* add tf graph compile tests * fix conflict * remove more tf transpose statements * fix conflicts * fix comment typos * move function to class function * fix black * fix black * make style
* add tf graph compile tests * fix conflict * remove more tf transpose statements * fix conflicts * fix comment typos * move function to class function * fix black * fix black * make style
This PR:
tf.transpose()
(In contrast to PyTorch,tf.transpose()
allocates a new tensor and thus should be avoided). This also cleans up the code IMO.=> These changes lead to a speed-up of 1.03 which is actually not that much...more details in benchmark below.
After a lot of digging TF XLA will not be very easy to implement as a lot of kernels that are highly used in this model
tf.where
are not implemented for XLA (yet). So TF Longformer TPU will not work sadly for the moment @ibeltagyConclusion
For me the PR was also a good exercise to see whether TF can significantly sped up by removing unnecessary tensor allocations. It seems like it's not really worth it go through all the tf models if the improvement in speed is only around 2,3%.