Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICP: gcm-avx: Support architectures lacking the MOVBE instruction #10029

Merged
merged 1 commit into from
Mar 17, 2020

Conversation

AttilaFueloep
Copy link
Contributor

@AttilaFueloep AttilaFueloep commented Feb 19, 2020

Motivation and Context

Follow-up for #9749

There are a couple of x86_64 architectures which support all needed
features to make the accelerated GCM implementation work but the
MOVBE instruction. Those are mainly Intel Sandy- and Ivy-Bridge
and AMD Bulldozer, Piledriver, and Steamroller.

Description

By using MOVBE only if available and replacing it with a MOV
followed by a BSWAP if not, those architectures now benefit from
the new GCM routines and performance is considerably better
compared to the original implementation.

Signed-off-by: Attila Fülöp attila@fueloep.org

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the contributing document.
  • I have added tests to cover my changes.
  • I have run the ZFS Test Suite with this change applied.
  • All commit messages are properly formatted and contain Signed-off-by.

@AttilaFueloep AttilaFueloep changed the title ICP: gcm-avx: Support architectures lacking the MOVBE instruction WIP: ICP: gcm-avx: Support architectures lacking the MOVBE instruction Feb 19, 2020
@AttilaFueloep
Copy link
Contributor Author

I opened this mainly as a reference for comparing against #10025. It has not been tested yet, which I'll do once time permits.

@codecov
Copy link

codecov bot commented Feb 19, 2020

Codecov Report

Merging #10029 into master will decrease coverage by 18%.
The diff coverage is 86%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10029      +/-   ##
==========================================
- Coverage      79%      61%     -18%     
==========================================
  Files         386      365      -21     
  Lines      122448   118117    -4331     
==========================================
- Hits        96998    71748   -25250     
- Misses      25450    46369   +20919
Flag Coverage Δ
#kernel 62% <86%> (-17%) ⬇️
#user 49% <86%> (-18%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 392556f...e8b691f. Read the comment docs.

@behlendorf behlendorf added the Status: Work in Progress Not yet ready for general review label Feb 19, 2020
@adamdmoss
Copy link
Contributor

Looks good. Obviously in #10025 I was going to lengths to avoid duplicating the assembly but that's hardly worth the effort since we don't expect to have to update the code often / ever.

@AttilaFueloep
Copy link
Contributor Author

Yes, and since we have the mostly unmodified original _aesni_ctr32_ghash_6x it shouldn't be hard to apply potential changes to the copied and modified version as well.

Now I wish I'd now why the tests are failing. I suspect accessing the global variable gcm_avx_can_use_movbe doesn't work as I expected. Sadly I've no time to delve into this currently.

@AttilaFueloep
Copy link
Contributor Author

Looks better now, the ZTS mmp/mmp_active_import failure seems unrelated. I still have to look into the ztest failures.

@AttilaFueloep
Copy link
Contributor Author

With unmodified master@392556f0ef I can reproduce the ztest crashes, so let's wait until it stabilizes again.

@adamdmoss Could you test if this changes work for you? I can't access my testing VMs right now, so have no way to test on affected CPUs.

@behlendorf
Copy link
Contributor

@AttilaFueloep master@392556f0ef should be stable. I haven't observed any increase in ztest failures in other PRs. Definitely nothing like the number of failures observed here. I know you're busy but you may want to double check you really can reproduce the issue with a clean build of master.

@AttilaFueloep
Copy link
Contributor Author

@behlendorf Yes, you're right, master@392556f0ef is fine. There must have been an error on my part. Sorry for the noise.

@AttilaFueloep
Copy link
Contributor Author

@adamdmoss Please don't try this out yet! Although ZTS passes, there seems to be a sincere bug triggered by ztest.

@AttilaFueloep
Copy link
Contributor Author

Once I had the chance to run this through gdb the failure was obvious. If the test pass now, I'll squash and rebase and it should be ready for review.

@AttilaFueloep
Copy link
Contributor Author

The Ubuntu 16.04 failure seems unrelated but I don't know what to make out of the Fedora failure. Everything else seems fine. I manually verified that the correct code paths are taken on a Ivy Bridge and Skylake virtual CPU. Rebasing on master.

@AttilaFueloep AttilaFueloep changed the title WIP: ICP: gcm-avx: Support architectures lacking the MOVBE instruction ICP: gcm-avx: Support architectures lacking the MOVBE instruction Mar 6, 2020
@AttilaFueloep AttilaFueloep force-pushed the icp-gcm-movbe-optional branch from 9a7fb36 to 445cde1 Compare March 6, 2020 23:40
@behlendorf
Copy link
Contributor

Feel free to ignore the Fedora failure. It appears to be caused by some updated packages, we haven't yet run down exactly what they changed.

Follow-up for openzfs#9749

There are a couple of x86_64 architectures which support all needed
features to make the accelerated GCM implementation work but the
MOVBE instruction. Those are mainly Intel Sandy- and Ivy-Bridge
and AMD Bulldozer, Piledriver, and Steamroller.

By using MOVBE only if available and replacing it with a MOV
followed by a BSWAP if not, those architectures now benefit from
the new GCM routines and performance is considerably better
compared to the original implementation.

Signed-off-by: Attila Fülöp <attila@fueloep.org>
@AttilaFueloep AttilaFueloep force-pushed the icp-gcm-movbe-optional branch from 445cde1 to ecc9dee Compare March 13, 2020 02:08
@AttilaFueloep
Copy link
Contributor Author

Rebased on master in the hope that the test failures go away.

@AttilaFueloep
Copy link
Contributor Author

Looks better now.

@behlendorf @adamdmoss I think this is ready for review and testing.

@behlendorf behlendorf added Status: Code Review Needed Ready for review and testing and removed Status: Work in Progress Not yet ready for general review labels Mar 13, 2020
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test CI failures here were unrelated. This looks good, I'll queue this up for some additional local testing.

@codecov-io
Copy link

codecov-io commented Mar 14, 2020

Codecov Report

Merging #10029 into master will increase coverage by <1%.
The diff coverage is 85%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #10029    +/-   ##
========================================
+ Coverage      79%      80%   +<1%     
========================================
  Files         385      385            
  Lines      122385   122390     +5     
========================================
+ Hits        97235    97480   +245     
+ Misses      25150    24910   -240
Flag Coverage Δ
#kernel 80% <46%> (ø) ⬆️
#user 67% <85%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0fdd610...ecc9dee. Read the comment docs.

@behlendorf
Copy link
Contributor

@adamdmoss would you mind reviewing this and approving the PR if everything looks good to you.

@adamdmoss
Copy link
Contributor

sorry, behind on github reqs - will take a look!

Copy link
Contributor

@adamdmoss adamdmoss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@adamdmoss
Copy link
Contributor

Code looks good and tests well in casual-but-nontrivial testing on my non-movbe machine.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Mar 16, 2020
@behlendorf
Copy link
Contributor

@adamdmoss thanks for the quick feedback. This has held up well in my testing as well, I'll get it merged.

@behlendorf behlendorf merged commit 5b3b795 into openzfs:master Mar 17, 2020
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Apr 15, 2020
There are a couple of x86_64 architectures which support all needed
features to make the accelerated GCM implementation work but the
MOVBE instruction. Those are mainly Intel Sandy- and Ivy-Bridge
and AMD Bulldozer, Piledriver, and Steamroller.

By using MOVBE only if available and replacing it with a MOV
followed by a BSWAP if not, those architectures now benefit from
the new GCM routines and performance is considerably better
compared to the original implementation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam D. Moss <c@yotes.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Followup openzfs#9749 
Closes openzfs#10029
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Apr 22, 2020
There are a couple of x86_64 architectures which support all needed
features to make the accelerated GCM implementation work but the
MOVBE instruction. Those are mainly Intel Sandy- and Ivy-Bridge
and AMD Bulldozer, Piledriver, and Steamroller.

By using MOVBE only if available and replacing it with a MOV
followed by a BSWAP if not, those architectures now benefit from
the new GCM routines and performance is considerably better
compared to the original implementation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam D. Moss <c@yotes.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Followup openzfs#9749 
Closes openzfs#10029
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Apr 22, 2020
There are a couple of x86_64 architectures which support all needed
features to make the accelerated GCM implementation work but the
MOVBE instruction. Those are mainly Intel Sandy- and Ivy-Bridge
and AMD Bulldozer, Piledriver, and Steamroller.

By using MOVBE only if available and replacing it with a MOV
followed by a BSWAP if not, those architectures now benefit from
the new GCM routines and performance is considerably better
compared to the original implementation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam D. Moss <c@yotes.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Followup openzfs#9749 
Closes openzfs#10029
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Apr 23, 2020
There are a couple of x86_64 architectures which support all needed
features to make the accelerated GCM implementation work but the
MOVBE instruction. Those are mainly Intel Sandy- and Ivy-Bridge
and AMD Bulldozer, Piledriver, and Steamroller.

By using MOVBE only if available and replacing it with a MOV
followed by a BSWAP if not, those architectures now benefit from
the new GCM routines and performance is considerably better
compared to the original implementation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam D. Moss <c@yotes.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Followup openzfs#9749 
Closes openzfs#10029
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Apr 28, 2020
There are a couple of x86_64 architectures which support all needed
features to make the accelerated GCM implementation work but the
MOVBE instruction. Those are mainly Intel Sandy- and Ivy-Bridge
and AMD Bulldozer, Piledriver, and Steamroller.

By using MOVBE only if available and replacing it with a MOV
followed by a BSWAP if not, those architectures now benefit from
the new GCM routines and performance is considerably better
compared to the original implementation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam D. Moss <c@yotes.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Followup openzfs#9749 
Closes openzfs#10029
winny- pushed a commit to winny-/zfs that referenced this pull request May 1, 2020
There are a couple of x86_64 architectures which support all needed
features to make the accelerated GCM implementation work but the
MOVBE instruction. Those are mainly Intel Sandy- and Ivy-Bridge
and AMD Bulldozer, Piledriver, and Steamroller.

By using MOVBE only if available and replacing it with a MOV
followed by a BSWAP if not, those architectures now benefit from
the new GCM routines and performance is considerably better
compared to the original implementation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam D. Moss <c@yotes.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Followup openzfs#9749 
Closes openzfs#10029
tonyhutter pushed a commit that referenced this pull request May 12, 2020
There are a couple of x86_64 architectures which support all needed
features to make the accelerated GCM implementation work but the
MOVBE instruction. Those are mainly Intel Sandy- and Ivy-Bridge
and AMD Bulldozer, Piledriver, and Steamroller.

By using MOVBE only if available and replacing it with a MOV
followed by a BSWAP if not, those architectures now benefit from
the new GCM routines and performance is considerably better
compared to the original implementation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam D. Moss <c@yotes.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Followup #9749 
Closes #10029
jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
There are a couple of x86_64 architectures which support all needed
features to make the accelerated GCM implementation work but the
MOVBE instruction. Those are mainly Intel Sandy- and Ivy-Bridge
and AMD Bulldozer, Piledriver, and Steamroller.

By using MOVBE only if available and replacing it with a MOV
followed by a BSWAP if not, those architectures now benefit from
the new GCM routines and performance is considerably better
compared to the original implementation.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam D. Moss <c@yotes.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Followup openzfs#9749 
Closes openzfs#10029
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants