-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICP: gcm-avx: Support architectures lacking the MOVBE instruction #10029
ICP: gcm-avx: Support architectures lacking the MOVBE instruction #10029
Conversation
I opened this mainly as a reference for comparing against #10025. It has not been tested yet, which I'll do once time permits. |
Codecov Report
@@ Coverage Diff @@
## master #10029 +/- ##
==========================================
- Coverage 79% 61% -18%
==========================================
Files 386 365 -21
Lines 122448 118117 -4331
==========================================
- Hits 96998 71748 -25250
- Misses 25450 46369 +20919
Continue to review full report at Codecov.
|
Looks good. Obviously in #10025 I was going to lengths to avoid duplicating the assembly but that's hardly worth the effort since we don't expect to have to update the code often / ever. |
Yes, and since we have the mostly unmodified original Now I wish I'd now why the tests are failing. I suspect accessing the global variable |
Looks better now, the ZTS |
With unmodified master@392556f0ef I can reproduce the ztest crashes, so let's wait until it stabilizes again. @adamdmoss Could you test if this changes work for you? I can't access my testing VMs right now, so have no way to test on affected CPUs. |
@AttilaFueloep master@392556f0ef should be stable. I haven't observed any increase in ztest failures in other PRs. Definitely nothing like the number of failures observed here. I know you're busy but you may want to double check you really can reproduce the issue with a clean build of master. |
@behlendorf Yes, you're right, master@392556f0ef is fine. There must have been an error on my part. Sorry for the noise. |
@adamdmoss Please don't try this out yet! Although ZTS passes, there seems to be a sincere bug triggered by ztest. |
Once I had the chance to run this through |
The Ubuntu 16.04 failure seems unrelated but I don't know what to make out of the Fedora failure. Everything else seems fine. I manually verified that the correct code paths are taken on a Ivy Bridge and Skylake virtual CPU. Rebasing on master. |
9a7fb36
to
445cde1
Compare
Feel free to ignore the Fedora failure. It appears to be caused by some updated packages, we haven't yet run down exactly what they changed. |
Follow-up for openzfs#9749 There are a couple of x86_64 architectures which support all needed features to make the accelerated GCM implementation work but the MOVBE instruction. Those are mainly Intel Sandy- and Ivy-Bridge and AMD Bulldozer, Piledriver, and Steamroller. By using MOVBE only if available and replacing it with a MOV followed by a BSWAP if not, those architectures now benefit from the new GCM routines and performance is considerably better compared to the original implementation. Signed-off-by: Attila Fülöp <attila@fueloep.org>
445cde1
to
ecc9dee
Compare
Rebased on master in the hope that the test failures go away. |
Looks better now. @behlendorf @adamdmoss I think this is ready for review and testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test CI failures here were unrelated. This looks good, I'll queue this up for some additional local testing.
Codecov Report
@@ Coverage Diff @@
## master #10029 +/- ##
========================================
+ Coverage 79% 80% +<1%
========================================
Files 385 385
Lines 122385 122390 +5
========================================
+ Hits 97235 97480 +245
+ Misses 25150 24910 -240
Continue to review full report at Codecov.
|
@adamdmoss would you mind reviewing this and approving the PR if everything looks good to you. |
sorry, behind on github reqs - will take a look! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Code looks good and tests well in casual-but-nontrivial testing on my non-movbe machine. |
@adamdmoss thanks for the quick feedback. This has held up well in my testing as well, I'll get it merged. |
There are a couple of x86_64 architectures which support all needed features to make the accelerated GCM implementation work but the MOVBE instruction. Those are mainly Intel Sandy- and Ivy-Bridge and AMD Bulldozer, Piledriver, and Steamroller. By using MOVBE only if available and replacing it with a MOV followed by a BSWAP if not, those architectures now benefit from the new GCM routines and performance is considerably better compared to the original implementation. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam D. Moss <c@yotes.com> Signed-off-by: Attila Fülöp <attila@fueloep.org> Followup openzfs#9749 Closes openzfs#10029
There are a couple of x86_64 architectures which support all needed features to make the accelerated GCM implementation work but the MOVBE instruction. Those are mainly Intel Sandy- and Ivy-Bridge and AMD Bulldozer, Piledriver, and Steamroller. By using MOVBE only if available and replacing it with a MOV followed by a BSWAP if not, those architectures now benefit from the new GCM routines and performance is considerably better compared to the original implementation. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam D. Moss <c@yotes.com> Signed-off-by: Attila Fülöp <attila@fueloep.org> Followup openzfs#9749 Closes openzfs#10029
There are a couple of x86_64 architectures which support all needed features to make the accelerated GCM implementation work but the MOVBE instruction. Those are mainly Intel Sandy- and Ivy-Bridge and AMD Bulldozer, Piledriver, and Steamroller. By using MOVBE only if available and replacing it with a MOV followed by a BSWAP if not, those architectures now benefit from the new GCM routines and performance is considerably better compared to the original implementation. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam D. Moss <c@yotes.com> Signed-off-by: Attila Fülöp <attila@fueloep.org> Followup openzfs#9749 Closes openzfs#10029
There are a couple of x86_64 architectures which support all needed features to make the accelerated GCM implementation work but the MOVBE instruction. Those are mainly Intel Sandy- and Ivy-Bridge and AMD Bulldozer, Piledriver, and Steamroller. By using MOVBE only if available and replacing it with a MOV followed by a BSWAP if not, those architectures now benefit from the new GCM routines and performance is considerably better compared to the original implementation. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam D. Moss <c@yotes.com> Signed-off-by: Attila Fülöp <attila@fueloep.org> Followup openzfs#9749 Closes openzfs#10029
There are a couple of x86_64 architectures which support all needed features to make the accelerated GCM implementation work but the MOVBE instruction. Those are mainly Intel Sandy- and Ivy-Bridge and AMD Bulldozer, Piledriver, and Steamroller. By using MOVBE only if available and replacing it with a MOV followed by a BSWAP if not, those architectures now benefit from the new GCM routines and performance is considerably better compared to the original implementation. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam D. Moss <c@yotes.com> Signed-off-by: Attila Fülöp <attila@fueloep.org> Followup openzfs#9749 Closes openzfs#10029
There are a couple of x86_64 architectures which support all needed features to make the accelerated GCM implementation work but the MOVBE instruction. Those are mainly Intel Sandy- and Ivy-Bridge and AMD Bulldozer, Piledriver, and Steamroller. By using MOVBE only if available and replacing it with a MOV followed by a BSWAP if not, those architectures now benefit from the new GCM routines and performance is considerably better compared to the original implementation. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam D. Moss <c@yotes.com> Signed-off-by: Attila Fülöp <attila@fueloep.org> Followup openzfs#9749 Closes openzfs#10029
There are a couple of x86_64 architectures which support all needed features to make the accelerated GCM implementation work but the MOVBE instruction. Those are mainly Intel Sandy- and Ivy-Bridge and AMD Bulldozer, Piledriver, and Steamroller. By using MOVBE only if available and replacing it with a MOV followed by a BSWAP if not, those architectures now benefit from the new GCM routines and performance is considerably better compared to the original implementation. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam D. Moss <c@yotes.com> Signed-off-by: Attila Fülöp <attila@fueloep.org> Followup #9749 Closes #10029
There are a couple of x86_64 architectures which support all needed features to make the accelerated GCM implementation work but the MOVBE instruction. Those are mainly Intel Sandy- and Ivy-Bridge and AMD Bulldozer, Piledriver, and Steamroller. By using MOVBE only if available and replacing it with a MOV followed by a BSWAP if not, those architectures now benefit from the new GCM routines and performance is considerably better compared to the original implementation. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam D. Moss <c@yotes.com> Signed-off-by: Attila Fülöp <attila@fueloep.org> Followup openzfs#9749 Closes openzfs#10029
Motivation and Context
Follow-up for #9749
There are a couple of x86_64 architectures which support all needed
features to make the accelerated GCM implementation work but the
MOVBE instruction. Those are mainly Intel Sandy- and Ivy-Bridge
and AMD Bulldozer, Piledriver, and Steamroller.
Description
By using MOVBE only if available and replacing it with a MOV
followed by a BSWAP if not, those architectures now benefit from
the new GCM routines and performance is considerably better
compared to the original implementation.
Signed-off-by: Attila Fülöp attila@fueloep.org
Types of changes
Checklist:
Signed-off-by
.