-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Adam FP32 JIT assembly kernel #39158
Conversation
Thanks for your contribution! |
1d5bdf8
to
9e033d8
Compare
@Silv3S please review this PR |
I see that you've implemented Adam-CPU, but it can't cover cases using Adamw. So I hope that Adamw-CPU can be also implemented to provide higher performance. Thx. |
Hi @haohongxiang, Adamw can also be implemented, but that would be probably done in future in another PR. |
@pawelpiotrowicz , @tsocha please help with review |
Excellet contribution! LGTM |
OK, thx. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@Aganlengzi could you please start your review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LGTM |
Sorry to inform you that 9e033d8's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New features
PR changes
OPs
Describe
Added Adam FP32 JIT assembly kernel. This feature was requested by #39005.
All benchmarks were done on VGG training script - "test_image_classification.py". 100 batches of 128 images were tested.
Benchmark were done using: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz.
Performance comparison:
Adam JIT FP32 20 threads
Adam native FP32 20 threads
Adam JIT FP32 1 thread
Adam native FP32 1 thread