Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aes: autodetection support for AES-NI #208

Merged
merged 1 commit into from
Dec 2, 2020
Merged

aes: autodetection support for AES-NI #208

merged 1 commit into from
Dec 2, 2020

Conversation

tarcieri
Copy link
Member

Closes #25.

Adds an off-by-default autodetect feature which the cpuid-bool crate to detect whether AES-NI is available when compiling on i686/x86_64 architectures.

@tarcieri tarcieri added the aes AES implementations label Nov 26, 2020
@tarcieri tarcieri requested a review from newpavlov November 26, 2020 14:59
@tarcieri
Copy link
Member Author

Preliminary benchmarks show whatever effect this may have on performance is well within the variance of noise.

Perhaps criterion would give better data but otherwise this appears to have negligible impact on performance.

Copy link
Member

@newpavlov newpavlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use union instead of enum. The new version of cpuid-bool now supports initialization tokens, which makes the tag in AES types redundant (i.e. the tag will be stored in a static variable inside module created by the new cpuid-bool macro).

cpuid_bool::new(aes_cpuid, "aes");

union Inner {
    ni: crate::ni::Aes128,
    soft: crate::soft::Aes128,
}

pub struct Aes128 {
    inner: Inner,
    token: aes_cpuid::InitToken,
}

fn new(key: &GenericArray<u8, $key_size>) -> Self {
    let (token, val) = aes_cpuid::init_get();
    let inner = if val {
        Inner { ni: crate::ni::Aes128::new(key) }
    } else {
        Inner { soft: crate::soft::Aes128::new(key) }
    };
    Self { token, 
}

fn encrypt_block(&self, block: &mut Block) {
    if self.token.get() {
        unsafe { self.inner.ni.encrypt_block(block) }
    } else {
        unsafe { self.inner.soft.encrypt_block(block) }
    }
}

.github/workflows/aes.yml Outdated Show resolved Hide resolved
aes/Cargo.toml Outdated Show resolved Hide resolved
aes/Cargo.toml Outdated Show resolved Hide resolved
aes/src/lib.rs Outdated Show resolved Hide resolved
aes/src/soft/ctr.rs Outdated Show resolved Hide resolved
@tarcieri
Copy link
Member Author

@newpavlov I'm not sure that enabling detection without the corresponding target feature being enabled accomplishes anything. See the issue @kazcw opened here:

cryptocorrosion/cryptocorrosion#7

LLVM has a soft implementation for aesenc for non-AESNI x86-64 CPUs and will gladly use it

If this is true, trying to use AES-NI without the corresponding target feature enabled will use a software fallback provided by LLVM instead.

As a separate issue, I need to test how enabling/disabling AES-NI in UEFI/BIOS affects CPUID.

@tarcieri
Copy link
Member Author

As a separate issue, I need to test how enabling/disabling AES-NI in UEFI/BIOS affects CPUID.

I confirmed disabling AES-NI in UEFI removes the flag from CPUID.

However this is a notable example of how even on the exact same physical CPU, the flag may or may not be present depending on environmental conditions.

@tarcieri
Copy link
Member Author

tarcieri commented Nov 29, 2020

Well, this isn't good. I just ran the test suite against this PR in its current state with the autodetect feature enabled on an AES-NI capable host which explicitly has AES-NI disabled in UEFI/BIOS and:

  process didn't exit successfully: `/home/tony/block-ciphers/target/debug/deps/aes-eeda3af739ccdef0` (signal: 4, SIGILL: illegal instruction)

Will investigate where things are going wrong...

@tarcieri
Copy link
Member Author

@newpavlov as far as I can tell the SIGILL is a bug in cpuid_bool, which seems to be returning true even when AES-NI is disabled, i.e.:

$ lscpu | grep aes
$

(I confirmed the flag was present before disabling AES)

I also double checked this is occurring in the autodetection code in this PR, both via a simple println-style method:

running 1 test
checking CPUID for AES-NI...
YES AES-NI!

...and double checking using lldb.

I can try seeing if the problem is solved by using the is_x86_feature_detected macro instead...

@tarcieri
Copy link
Member Author

tarcieri commented Nov 29, 2020

I confirmed using the is_x86_feature_detected macro fixes the problem and the tests pass:

running 1 test
checking CPUID for AES-NI...
NO AES-NI!!!
checking CPUID for AES-NI...
NO AES-NI!!!
checking CPUID for AES-NI...
NO AES-NI!!!
checking CPUID for AES-NI...
NO AES-NI!!!
...
test aes128_test ... ok

@kazcw
Copy link

kazcw commented Nov 29, 2020

@newpavlov I'm not sure that enabling detection without the corresponding target feature being enabled accomplishes anything. See the issue @kazcw opened here:

cryptocorrosion/cryptocorrosion#7

LLVM has a soft implementation for aesenc for non-AESNI x86-64 CPUs and will gladly use it

If this is true, trying to use AES-NI without the corresponding target feature enabled will use a software fallback provided by LLVM instead.

@tarcieri I was wrong about the soft fallback in cryptocorrosion/cryptocorrosion#7 -- in the followup comment in that thread I go into what was actually causing the effect I observed.

@tarcieri
Copy link
Member Author

@kazcw okay thanks!

I guess it also make sense that if cpuid-bool is eagerly omitting the fallback code based on target features, it would always select AES-NI based on their presence.

Will try removing the target_feature-based gating then and see if that helps.

@tarcieri
Copy link
Member Author

Okay, I confirmed that without supplying any target features, cpuid-bool detects AES-NI is disabled and the tests pass.

Let me flip AES-NI back on and determine that it is properly detected when enabled and the performance is what is expected.

@tarcieri
Copy link
Member Author

@newpavlov okay, seems everything is working the way you expected.

Let me at least remove the target_feature-based gating and push that up.

After that we can then look at inverting the autodetect feature into a force-soft feature and making cpuid-bool gating based on an i686/x86_64 target, and then upgrading cpuid-bool.

@newpavlov
Copy link
Member

Note that with runtime detection we should add #[target_feature(enable = "..")] to the code dependent on extensions (and this code must be accessible only after runtime check), otherwise LLVM will not inline intrinsics, which will result in drastically reduced performance.

@tarcieri
Copy link
Member Author

tarcieri commented Nov 29, 2020

@newpavlov this is somewhat problematic:

error[E0658]: unions with non-`Copy` fields are unstable

I can try marking the inner structs non-Copy for now, but I'm a bit wary about that as part of a public API (particularly for aes::soft)

Edit: I guess the aes::ni types are already Copy 🤷‍♂️

@tarcieri tarcieri force-pushed the aes/autodetect branch 4 times, most recently from d85a29f to b1468ba Compare November 29, 2020 18:44
@newpavlov
Copy link
Member

Since ni and soft types, as well as the unions will not be part of the public API it should not be a problem, no? Though we will have to be careful with CTR types, since they have &mut self methods.

@roblabla
Copy link
Contributor

I'm curious what's the use-case for force-soft? It seems odd to me that anyone would want this - if people require using a soft-float aes implementation, they can use the aes-soft crate?

@newpavlov
Copy link
Member

@roblabla
Currently we plan to deprecate aes-soft and aesni in favor of the merged aes crate.

Copy link
Member

@newpavlov newpavlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Release cpuid-bool v0.2

Done.

Decide if enum is acceptable for Aes*Ctr types, and if not, potentially add Copy (and at least Clone) impls to ctr::Ctr128

Lack of the Clone impls is certainly an oversight, but I don't think we should add Copy. The main reason is existence of the methods which take &mut self, adding Copy would make the types too error-prone for my taste.

The best solution for now probably would be to vendor CTR implementation into the aes crate until the non-Copy unions get stabilized. Luckily the soft-CTR code is not that complex and big.

Does anyone knows if there are CPUs which have AES-NI, but not SSSE3?

aes/src/autodetect.rs Outdated Show resolved Hide resolved
@tarcieri
Copy link
Member Author

tarcieri commented Dec 1, 2020

The best solution for now probably would be to vendor CTR implementation into the aes crate until the non-Copy unions get stabilized.

Wouldn't it be simpler to just stick with an enum for now? It bloats the size of an Aes*Ctr instance marginally, but to me that seems preferable to code duplication.

Note that the ctr crate still has an open PR to change the counter behavior (RustCrypto/stream-ciphers#75), so if we copy and paste it, that becomes three things that need updated (the ctr crate, a copy-and-paste version of it in aes, and the SSSE3-optimized implementation)

@tarcieri
Copy link
Member Author

tarcieri commented Dec 1, 2020

Alright, this is strange: just tried to re-enable cross-based tests and it's now getting linking errors. I confirmed this isn't an environmental issue (#213), so it seems like something legitimately broke the build.

Note: this specifically occurs when force-soft is enabled

ARM64

https://github.com/RustCrypto/block-ciphers/pull/208/checks?check_run_id=1480139183

  = note: /usr/bin/ld: /home/runner/work/block-ciphers/block-ciphers/target/aarch64-unknown-linux-gnu/release/deps/aes-90b6e8d35dd18b1f.aes.bh0y2qek-cgu.0.rcgu.o: Relocations in generic ELF (EM: 183)
          /home/runner/work/block-ciphers/block-ciphers/target/aarch64-unknown-linux-gnu/release/deps/aes-90b6e8d35dd18b1f.aes.bh0y2qek-cgu.0.rcgu.o: error adding symbols: File in wrong format
          collect2: error: ld returned 1 exit status

PPC32

https://github.com/RustCrypto/block-ciphers/pull/208/checks?check_run_id=1480139204

= note: /usr/bin/ld: cannot find Scrt1.o: No such file or directory
          /usr/bin/ld: cannot find crti.o: No such file or directory
          /usr/bin/ld: /home/runner/work/block-ciphers/block-ciphers/target/powerpc-unknown-linux-gnu/release/deps/aes-6dc3112c80dcf8ff.aes.7b1gqfkf-cgu.0.rcgu.o: Relocations in generic ELF (EM: 20)
          /usr/bin/ld: /home/runner/work/block-ciphers/block-ciphers/target/powerpc-unknown-linux-gnu/release/deps/aes-6dc3112c80dcf8ff.aes.7b1gqfkf-cgu.0.rcgu.o: Relocations in generic ELF (EM: 20)
          /usr/bin/ld: /home/runner/work/block-ciphers/block-ciphers/target/powerpc-unknown-linux-gnu/release/deps/aes-6dc3112c80dcf8ff.aes.7b1gqfkf-cgu.0.rcgu.o: Relocations in generic ELF (EM: 20)
          /usr/bin/ld: /home/runner/work/block-ciphers/block-ciphers/target/powerpc-unknown-linux-gnu/release/deps/aes-6dc3112c80dcf8ff.aes.7b1gqfkf-cgu.0.rcgu.o: Relocations in generic ELF (EM: 20)
          /home/runner/work/block-ciphers/block-ciphers/target/powerpc-unknown-linux-gnu/release/deps/aes-6dc3112c80dcf8ff.aes.7b1gqfkf-cgu.0.rcgu.o: error adding symbols: File in wrong format
          collect2: error: ld returned 1 exit status

tarcieri added a commit that referenced this pull request Dec 1, 2020
It's breaking cross-based builds. See:

#208 (comment)
tarcieri added a commit that referenced this pull request Dec 1, 2020
It's breaking cross-based builds. See:

#208 (comment)
tarcieri added a commit that referenced this pull request Dec 1, 2020
It's breaking cross-based builds. See:

#208 (comment)
@tarcieri tarcieri force-pushed the aes/autodetect branch 2 times, most recently from 61ce5e4 to 13f2968 Compare December 1, 2020 13:58
@tarcieri
Copy link
Member Author

tarcieri commented Dec 1, 2020

Okay, managed to get the test suite green again with the cross tests running. Strangely they still fail with --all-features, though I guess that wasn't tested before.

Will try to get force-soft working again.

@tarcieri tarcieri force-pushed the aes/autodetect branch 4 times, most recently from ac003cc to 7582d0f Compare December 1, 2020 14:25
@tarcieri
Copy link
Member Author

tarcieri commented Dec 1, 2020

Seems force-soft was fine and I just screwed up the CI config.

I added it back and cross-based tests are now passing.

Does anyone knows if there are CPUs which have AES-NI, but not SSSE3?

AES-NI was introduced in Westmere which also had SSSE3, so I think it's fairly safe to assume any AES-NI capable CPU also has SSSE3.

@newpavlov do you see anywhere that needs to be annotated with target_feature(enable = "aes") that isn't already?

I did a quick spot check and it seems like relevant functions are either annotated with that or #[inline(always)], which I assume accomplishes the same thing?

On i686/x86_64 platforms, uses the `cpuid-bool` crate to detect at
runtime whether AES-NI is available.

This eliminates the need to specify `target_feature=+aes` when compiling
the crate in order to take advantage of AES-NI.
@tarcieri
Copy link
Member Author

tarcieri commented Dec 1, 2020

I'd say this is now ready to merge, so long as the enum-based solution for Aes*Ctr is at least temporarily acceptable until we can make it work with union somehow.

It seems like it might be worth waiting for non-Copy union types to stabilize, although I'm not sure what the plan/timeline is for that.

Copy link
Member

@newpavlov newpavlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disadvantage of the enum-based approach is that we will not be able to remove the fallback branches by enabling the necessary target features via RUSTFLAGS. But I guess it can be a good enough starting point and we can revisit it after deciding on what CTR flavors we want to expose in the ctr crate.

do you see anywhere that needs to be annotated with target_feature(enable = "aes") that isn't already?

It looks like you've missed annotations for CTR types and some load/store intrinsics stay outside of the enabled functions. With some code shuffling AES-128 parallel encryption improved on my PC from 8 GB/s to 10.6 GB/s. I will try to submit a separate PR with improvements, so I think we can merge this PR as is.

BlockCipher, BlockDecrypt, BlockEncrypt, NewBlockCipher,
};

cpuid_bool::new!(aes_cpuid, "aes");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We either need to add ssse3 to this list or use a separate aes_ssse3_cpuid module as you did before. Also it may be worth to a comment that SSE2 is implied by AES-NI, so we don't need to check for it separately. Or we could simply add sse2 to the list to be completely thorough. :)

Copy link
Member Author

@tarcieri tarcieri Dec 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per my comments above, as far as I can tell every CPU with AES-NI has both SSE2 and SSSE3.

AES-NI was introduced in the Westmere architecture, which also has SSSE3. Unless there's some strange AMD/other CPU that has AES-NI but not SSSE3, I don't think it will be a problem.

If you're worried though, I can add back aes_ssse3_cpuid (perhaps as a separate PR).

Copy link
Member

@newpavlov newpavlov Dec 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per reference we can omit sse2 check if we have verified that aes or ssse3 is enabled, but it does not indicate any implicit dependency between aes and ssse3. So even though in practice there is probably no such CPU, for our code to be correct we have to check both those features. The check is cheap enough and will be executed only once, so I think it's worth to be extra-safe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm ok, sorry just merged but I can follow up with this.

@tarcieri
Copy link
Member Author

tarcieri commented Dec 2, 2020

@newpavlov I took a look into work on supporting union types with non-Copy fields in rustc. I think this is the relevant tracking issue?

rust-lang/rust#55149

And this PR appears to stabilize what we'd need, namely:

rust-lang/rust#77547

It will be possible to have a union with field type ManuallyDrop for any T.

So I think this problem may get addressed soon, after which we can migrate the enum to a union without having to worry about Copy workarounds.

@tarcieri tarcieri merged commit 61cd5de into master Dec 2, 2020
@tarcieri tarcieri deleted the aes/autodetect branch December 2, 2020 15:59
@tarcieri
Copy link
Member Author

tarcieri commented Dec 2, 2020

I just double checked: ManuallyDrop will be stable in 1.49 🎉

@tarcieri tarcieri mentioned this pull request Apr 29, 2021
tarcieri added a commit that referenced this pull request May 9, 2021
It appears that `fixslice64.rs` was accidentally overwritten with
`fixslice32.rs` in #208.

This restores the 64-bit implementation to what it was prior to that PR.

Closes #246.
tarcieri added a commit that referenced this pull request May 9, 2021
It appears that `fixslice64.rs` was accidentally overwritten with
`fixslice32.rs` in #208.

This restores the 64-bit implementation to what it was prior to that PR.

Closes #246.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aes AES implementations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AES implementation should be chosen at runtime rather than compile time
4 participants