Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AArch64 implementation #90

Merged
merged 6 commits into from
Jul 13, 2021
Merged

AArch64 implementation #90

merged 6 commits into from
Jul 13, 2021

Conversation

liamwhite
Copy link
Contributor

Fixes #84. Not particularly polished.

Passes all tests:

running 42 tests
test fallback_hash::tests::test_conversion ... ok
test aes_hash::tests::test_conversion ... ok
test aes_hash::tests::test_sanity ... ok
test fallback_hash::tests::test_hash ... ok
test hash_quality_test::aes_tests::aes_all_bytes_matter ... ok
test aes_hash::tests::test_hash ... ok
test hash_quality_test::aes_tests::aes_finish_is_consistant ... ok
test hash_map::test::test_borrow ... ok
test hash_quality_test::aes_tests::aes_input_affect_every_byte ... ok
test hash_quality_test::aes_tests::aes_keys_change_output ... ok
test hash_quality_test::aes_tests::aes_keys_affect_every_byte ... ok
test hash_quality_test::aes_tests::aes_single_key_bit_flip ... ok
test hash_quality_test::aes_tests::test_single_bit_in_byte ... ok
test hash_quality_test::fallback_tests::fallback_all_bytes_matter ... ok
test hash_quality_test::fallback_tests::fallback_finish_is_consistant ... ok
test hash_quality_test::fallback_tests::fallback_input_affect_every_byte ... ok
test hash_quality_test::fallback_tests::fallback_keys_affect_every_byte ... ok
test hash_quality_test::fallback_tests::fallback_keys_change_output ... ok
test hash_quality_test::aes_tests::aes_single_bit_flip ... ok
test hash_quality_test::aes_tests::aes_test_no_pair_collisions ... ok
test hash_quality_test::fallback_tests::fallback_single_key_bit_flip ... ok
test hash_quality_test::fallback_tests::fallback_single_bit_flip ... ok
test operations::test::test_shuffle_contains_each_value ... ok
test hash_quality_test::fallback_tests::fallback_test_no_pair_collisions ... ok
test hash_quality_test::aes_tests::aes_padding_doesnot_collide ... ok
test operations::test::test_shuffle_moves_high_bits ... ok
test operations::test::test_shuffle_moves_every_value ... ok
test random_state::test::test_unique ... ok
test random_state::test::test_not_pi ... ok
test random_state::test::test_with_seeds_const ... ok
test specialize::test::test_input_processed ... ok
test specialize::test::test_ref_independent ... ok
test specialize::test::test_specialized_invoked ... ok
test test::test_ahasher_construction ... ok
test test::test_builder ... ok
test test::test_default_builder ... ok
test test::test_conversion ... ok
test hash_quality_test::fallback_tests::fallback_padding_doesnot_collide ... ok
test test::test_non_zero ... ok
test test::test_non_zero_specialized ... ok
test hash_quality_test::fallback_tests::fallback_test_no_full_collisions ... ok
test hash_quality_test::aes_tests::ase_test_no_full_collisions ... ok

test result: ok. 42 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 3.04s

Benchmark (Qualcomm Snapdragon 730G):

aeshash/u8              time:   [3.2561 ns 3.2563 ns 3.2564 ns]                        
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe
aeshash/u16             time:   [3.2557 ns 3.2560 ns 3.2564 ns]                         
Found 12 outliers among 100 measurements (12.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
  6 (6.00%) high severe
aeshash/u32             time:   [3.2558 ns 3.2560 ns 3.2562 ns]                         
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low severe
  3 (3.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe
aeshash/u64             time:   [3.2557 ns 3.2571 ns 3.2599 ns]                         
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  4 (4.00%) high mild
  3 (3.00%) high severe
aeshash/u128            time:   [940.45 ps 944.50 ps 948.68 ps]                          
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
aeshash/string          time:   [199.92 ns 200.29 ns 200.65 ns]                           
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

... as compared to fallback implementation on 730G:

fallback/u8             time:   [3.2571 ns 3.2574 ns 3.2576 ns]                         
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
  5 (5.00%) high mild
  3 (3.00%) high severe
fallback/u16            time:   [3.2569 ns 3.2570 ns 3.2572 ns]                          
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe
fallback/u32            time:   [3.2572 ns 3.2575 ns 3.2577 ns]                          
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe
fallback/u64            time:   [3.2573 ns 3.2591 ns 3.2630 ns]                          
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild
  5 (5.00%) high severe
fallback/u128           time:   [913.23 ps 914.34 ps 915.56 ps]                           
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
fallback/string         time:   [346.18 ns 346.20 ns 346.22 ns]                            
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) low severe
  3 (3.00%) high mild
  2 (2.00%) high severe

(main timing difference appears in string benchmark)

@eadf
Copy link

eadf commented Jun 1, 2021

All tests pass on M1 Mac.

aeshash/u8              time:   [581.65 ps 581.77 ps 581.90 ps]                        
Found 17 outliers among 100 measurements (17.00%)
  1 (1.00%) low mild
  9 (9.00%) high mild
  7 (7.00%) high severe
aeshash/u16             time:   [581.67 ps 581.80 ps 581.95 ps]                         
Found 13 outliers among 100 measurements (13.00%)
  5 (5.00%) high mild
  8 (8.00%) high severe
aeshash/u32             time:   [583.43 ps 585.03 ps 587.65 ps]                         
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
aeshash/u64             time:   [585.17 ps 587.13 ps 589.45 ps]                         
Found 13 outliers among 100 measurements (13.00%)
  4 (4.00%) high mild
  9 (9.00%) high severe
aeshash/u128            time:   [373.79 ps 374.13 ps 374.53 ps]                         
Found 19 outliers among 100 measurements (19.00%)
  4 (4.00%) low mild
  1 (1.00%) high mild
  14 (14.00%) high severe
aeshash/string          time:   [90.148 ns 90.166 ns 90.190 ns]                           
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) high mild
  9 (9.00%) high severe

Edit: fallback comparison (thanks liamwhite)

fallback/u8             time:   [581.93 ps 582.21 ps 582.55 ps]                         
Found 14 outliers among 100 measurements (14.00%)
  6 (6.00%) high mild
  8 (8.00%) high severe
fallback/u16            time:   [582.17 ps 582.45 ps 582.78 ps]                          
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
fallback/u32            time:   [582.06 ps 582.35 ps 582.70 ps]                          
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe
fallback/u64            time:   [581.95 ps 582.20 ps 582.50 ps]                          
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe
fallback/u128           time:   [373.65 ps 373.77 ps 373.91 ps]                          
Found 15 outliers among 100 measurements (15.00%)
  4 (4.00%) high mild
  11 (11.00%) high severe
fallback/string         time:   [108.99 ns 109.06 ns 109.14 ns]                            
Found 14 outliers among 100 measurements (14.00%)
  7 (7.00%) high mild
  7 (7.00%) high severe

I'm getting a few warning: unused import:... warnings while compiling. But didn't investigate If any of those are added by this PR.

@liamwhite
Copy link
Contributor Author

@eadf comment this line to test the fallback implementation

bench_ahash,

@eadf
Copy link

eadf commented Jun 2, 2021

M1 Mac: passes all tests and aes benches, but only on +nightly (because of the #![feature(stdsimd)] in lib.rs)

@tkaitchuck tkaitchuck self-requested a review June 7, 2021 05:49
@tkaitchuck
Copy link
Owner

The #![feature(stdsimd)] will have to be put behind a nightly flag to get the build to pass.

There also is a second issue:
On x86 the aes instruction is defined strangely, it first does the sub-bytes, shift rows, and mix columns and then xors the key at the end. (This means the first round of the encryption requires a manual xor, and all the intermediate rounds use one instruction to complete the round and start the next). This is actually very unfortunate for aHash because it is much more convenient if the xor happens first, because it means an additional round is required. Currently the code is designed to work with this x86 way of doing things, so doing it the other way or arm invalidates my analysis of the correctness of the algorithm.

I will look through the code and see if there is any problem with this approach as-is. However this is also an opportunity, it ought to be possible to design a version which on arm invokes one less call to aes, but produces the same quality output.

In the meantime, can either of you run the smhasher test against it as a sanity test? There is a shell script checked into the repo in the smhasher directory. If you run clone_smhasher.sh from that directory, and then build the ahash-cbindings and run the install script
you should be able to just go into the smhasher directory and run ./build.sh && ./SMHasher ahash64

@tkaitchuck
Copy link
Owner

tkaitchuck commented Jun 7, 2021

In fact there is a problem caused by the ordering:
https://github.com/tkaitchuck/aHash/blob/master/src/aes_hash.rs#L86

Currently the length of a slice is added in at the beginning under the assumption that the first aes round will mix it BEFORE xoring with the first block of user data. On ARM this would not be the case, making it possible to force a collision with a 50% chance between two slices which are identical but differ by one of them being longer by one byte (which has a value of 0).

The most direct solution would be on ARM to have an alternative implementation of add_in_length:

        self.enc = aesenc(self.enc, length as u128);

This way the length is mixed in instead of simply adding it. If we want to avoid the additional aes call it would be possible to move the adding of the length to the end of write instead of the beginning. But in my experience performing additional instruction after the if statements which are common between branches causes sub-optimal assembly to be generated. You can try it and benchmark which is better.

@liamwhite
Copy link
Contributor Author

I wrote a test:

fn test_length_extension<T: Hasher>(hasher: impl Fn() -> T) {
    for i in 0..=255u8 {
        let v1 = hash(&[i], &hasher);
        let v2 = hash(&[i, 0], &hasher);
        assert!(v1 != v2, "i: {} -> v1 = {:x}, v2 = {:x}", i, v1, v2);
    }
}

mod aes_tests {
    #[test]
    fn aes_length_extension() {
        test_length_extension(|| AHasher::test_with_keys(0, 0));
    }
}

It passes on aarch64:

test hash_quality_test::aes_tests::aes_length_extension ... ok

(note that it fails the fallback test)

How should I change it to get the failure mode you describe?

@tkaitchuck
Copy link
Owner

The fallback is failing because you are calling test_with_keys instead of new_with_keys.
The code there is not quite what I intended. I'll post a test which I suspect may fail.

@tkaitchuck
Copy link
Owner

I was wrong above. There is no issue. The code is robust against what I was proposing there.
If we can fix the build issue this should be good to go.

@eadf
Copy link

eadf commented Jun 14, 2021

The name of the "nightly" feature feels wrong. If it was automatically enabled when using the +nightly toolkit it would be ok, but it isn't. To use it we have to build with +nightly and select the feature "nightly".

Wouldn't it be more descriptive to name it "stdsimd" - i.e. a feature that only can be used in nightly?

P.S. I just checked that #![feature(stdsimd)] and #[cfg(feature = "stdsimd"))] can use the same name.

@novacrazy
Copy link

Quite a few crates use a nightly flag, including hashbrown.

@eadf
Copy link

eadf commented Jun 14, 2021

Guess it comes down to the intent of the feature flag. If there are more (future) features that could make use of the same flag (like hash brown) it makes sense with a name like "nightly". For an isolated feature it would be better with a more descriptive name, imho.
That said, "stdsimd", probably isn't the best name for this flag.

And yes: M1 mac completes all tests on +stable and +nightly.
On +stable the aes bench complains about 'aes must be enabled' (as expected). But fallback bench works.
On +nightly aes and fallback benches works as before (if the flag is set accordingly).

@tkaitchuck
Copy link
Owner

Detection of nightly can be handled in the build without requiring users to specify a flag. See:
https://github.com/tkaitchuck/aHash/blob/master/build.rs#L8

Cargo.toml Outdated Show resolved Hide resolved
build.rs Outdated Show resolved Hide resolved
Copy link
Owner

@tkaitchuck tkaitchuck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good if we could get the SMHasher results also.

Cargo.toml Outdated Show resolved Hide resolved
@eadf
Copy link

eadf commented Jun 20, 2021

M1 Mac, regarding SMHasher:

The patch 0001-Add-support-for-aHash.patch used by clone_smhasher.sh seems to be out of date, it does not apply at all.

git clone ... && cd smhasher && git apply ../0001-Add-support-for-aHash.patch

error: patch failed: CMakeLists.txt:638
error: CMakeLists.txt: patch does not apply
error: patch failed: Hashes.h:17
error: Hashes.h: patch does not apply
error: ahash.h: already exists in working directory
error: patch failed: main.cpp:302
error: main.cpp: patch does not apply

@tkaitchuck
Copy link
Owner

tkaitchuck commented Jun 25, 2021

I see the patch does not apply because smhasher directly added support for aHash!
So it should be possible to run directly.

@tkaitchuck
Copy link
Owner

@eadf What is the status of SMHasher? I don't quite follow from the linked PR.

@eadf
Copy link

eadf commented Jul 7, 2021

It does not build on a M1, the cmake script fails to identify the architecture and the compiler says that it does not support -march 'native'. But I don't have the time nor cmake knowledge to fix that.
However, I could help run a few tests if someone w/o access to a M1 wants to give it a go.

@tkaitchuck
Copy link
Owner

In that case I am going to merge this, and open an issue to track it.

@tkaitchuck tkaitchuck merged commit 8781819 into tkaitchuck:master Jul 13, 2021
@tkaitchuck tkaitchuck mentioned this pull request Jul 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AES on ARM?
4 participants