Split source files to support Base Implementation + SIMD implementation #461

noloader · 2017-08-17T16:31:43Z

This pull request has been on JW's testing branch for some time. It takes the "single source" and breaks it into a "base implementation" which is standard C++, and a "SIMD implementation" which includes different ISAs.

The base implementation can be built with minimal or no flags. For example, -march=i686 or -march=x86-64. The SIMD implementation will be built with architecture specific flags, like -msha or -march=armv8-a+crypto when compiling sha-simd.cpp.

The GNUmakefile will add the flags automatically. Other build systems, like Cmake may need to be modified. Its OK to specify -march=native, and in fact the makefile still uses this strategy unless the user disables it. However, the makefile always adds the architecture specific flag to a source file when needed to ensure a source file is compiled correctly.

We avoided GCC function multi-versioning because we needed to support further back than GCC 6 for x86 and GCC 7 for ARM. Most of my ARM gadgets have GCC 4.8 and 4.9, so we wanted support for them out of the box. We also needed to support other compilers, like SunCC.

Also see https://groups.google.com/forum/#!topic/cryptopp-users/-1fZCx8JSRE

…_HashBlocks_SSE2

We only need to base it on the compiler in config.h. config.h activates the code path guarded by HasNEON(). The source file that actially provides the NEON implementation will be compiled with -fpu=neon or -march=armv8-a. Since we are providing the specialized implementation in a sequestered source file (and not a header file), we can probably avoid the defines like CRYPTOPP_ARM_NEON_AVAILABLE altogether.

cryptlib.lib(aria.obj) : error LNK2001: unresolved external symbol "unsigned int const * const CryptoPP::ARIATab::X2" (?X2@ARIATab@CryptoPP@@3QBIB) [C:\projects\cryptopp\cryptest.vcxproj] cryptlib.lib(aria-simd.obj) : error LNK2001: unresolved external symbol "unsigned int const * const CryptoPP::ARIATab::X2" (?X2@ARIATab@CryptoPP@@3QBIB) [C:\projects\cryptopp\cryptest.vcxproj] ...

Thanks to Botan for providing these

Clang causes too many problems. Early versions of the compiler simply crashes. Later versions of the compiler still have trouble with Intel ASM and still produce incorrect results on occassion. Additionally, we have to special case the integrated assemvler. Its making a mess of the code and causing self test failures

When converting to split-sources, we disgorged ReverseHashBufferIfNeeded from Intel CLMUL and ARM PMULL operations. The problem is, they are linked. The only time a buffer needs reversing is when CLMUL or PMULL is in effect. However, we made GCM_ReverseHashBufferIfNeeded_CLMUL and GCM_ReverseHashBufferIfNeeded_PMULL available wheneever SSSE3 or NEON was available, which was incorrect. They should only be used when CLMUL or PMULL is being used

Formerly the ARM code favored CPU probes with SIGILLs. We've found its ineffiient on most platforms and dangerous on Apple platforms. This commit splits feature probes into CPU_QueryXXX(), which asks the OS if a feature is present. The detection code then falls back to CPU_ProbeXXX() using SIGILLs as a last resort.

In the bigger picture, the code to use inline ASM when intrinsics are not available still needs to be checked-in. Its a big change since we moved into SSE4, AVX and SHA. Design changes are still being evaluated, and its still being tested.

…on (GH weidai11#461) Split source files to support Base Implementation + SIMD implementation

…(PR #461)

… in headers may not be 16-byte aligned because the architecture switch is present on the simd file, and not the base file. 16-byte aligned is the default for most systems nowadays, so we side stepped alignment problems on all platforms except 32-bit Solaris. We need the 16-byte alignment for all Intel compatibles since the late 1990s, which is nearly all processors in the class. The worst case is, if a processor lacks SSE2, then it gets an aligned SecBlock anyways. The last time we saw processors without the features was 486 and early Pentiums, and that was 1996 or so. Even low-end processors like Intel Atoms and VIA have SSE2+SSSE3. Also see "Enable 16-byte alignment full-time for i386 and x86_64?" (https://groups.google.com/forum/#!topic/cryptopp-users/ubp-gFC1BJI) for a discussion.

noloader added 30 commits July 29, 2017 00:24

Cut-in CRC test for SSE4.2 and ARMv8a

fe9e21d

Also see https://groups.google.com/forum/#!topic/cryptopp-users/-1fZCx8JSRE

Fix define/include

368f344

Move CRC32 probe code from cpu.cpp to crc-simd.cpp

3e74968

Cut-in SHA for Intel and ARMv8a

d5a6d8f

Update TestScripts/cryptest.sh. Rename X86_SHA256_HashBlocks → SHA256…

fd4c754

…_HashBlocks_SSE2

Remove duplicate test from cryptest.sh

61691dd

Add ARIA, BLAKE2 and SHA support for ARMv7, ARMv8 and Intel

8338d90

Removed stray XXX in blake2-simd.cpp

4b51ead

Fixed ARMv7a and NEON detection. Initial cut-in of GCM

b4f6882

Fix ARIA under SSSE3

5e9e228

Fix ARIA under SSSE3

24fa16d

Add ariatab.cpp

6576bc3

Fix GCM under SSSE3 and CLMUL

a495018

Fix ARM build under Windows Phone

1fdd08d

Fix Aarch64 build. Cleanup Windows build

a846232

Fix Aarch64 build. Cleanup Windows build

2b9319c

Fix Intel SHA code path activation

205e116

Update comments

48f46bb

Add GCM_SetKeyWithoutResync_PMULL

6145d52

Const-ify hashKey

eafdae9

Fix ARMv7

51cff62

Update test script

9159992

Cleanup ARMv7 and ARMv8

9d8a892

Fix missing GCM_ReverseHashBufferIfNeeded_NEON under NEON

e06c156

Consitently use _ARMV8 as Aarch32/Aarch64 function suffix

249a5ed

Sync with Upstream master

475232a

Sync with Upstream master

2a17350

Initial Rijndael cut-in

87e7b85

noloader added 18 commits August 16, 2017 11:21

Suppress C4251 and C4275 warnings in project files (Issue 412)

b0baf7c

Sync with Upstream master

745edc3

Add SHACAL2 optimizations

edad2cc

Thanks to Botan for providing these

Sync with Upstream master

86ff697

Sync with Upstream master

df178bd

Update comments

371ec39

Sync with Upstream master

fb5e731

Sync with upstream master

e4cadb5

Fix runtime check for GCM_ReverseHashBufferIfNeeded_PMULL

8bbcad3

Sync with Upstream master

1cc963f

Remove ios-tv from allow_failures

d04bcf1

Reorder cpu features

1fd5b7a

Update CPU code for Aarch32 and Aarch64

d31c991

Update CpuId to take leaf function number

68c7726

Sync with Upstream master

24e1d30

noloader merged commit e2c377e into weidai11:master Aug 17, 2017

This was referenced Aug 17, 2017

Add Carryless Multiply intrinsics when __PCLMUL__ is not defined #430

Closed

Add AES intrinsics when __AES__ is not defined #429

Closed

Add CRC intrinsics when __CRC__ is not defined #428

Closed

Add SHA intrinsics when __SHA__ is not defined #427

Closed

mouse07410 pushed a commit to mouse07410/cryptopp that referenced this pull request Aug 18, 2017

Split source files to support Base Implementation + SIMD implementati…

5272744

…on (GH weidai11#461) Split source files to support Base Implementation + SIMD implementation

noloader mentioned this pull request Aug 21, 2017

Random crashes on different computers because option -march=native is active 'by default' #380

Closed

noloader added a commit that referenced this pull request Aug 24, 2017

Support Base Implementation + SIMD implementation on Solaris (PR #461)

5c6a32b

noloader added a commit that referenced this pull request Aug 25, 2017

Support Base Implementation + SIMD implementation in cryptest.nmake …

2651de2

…(PR #461)

noloader mentioned this pull request Sep 20, 2017

Unit-test: fatal error in "GoodSubscription": signal: illegal opcode; address of failing instruction: 0x00409d78 monero-project/kovri#699

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split source files to support Base Implementation + SIMD implementation #461

Split source files to support Base Implementation + SIMD implementation #461

noloader commented Aug 17, 2017 •

edited

Loading

Split source files to support Base Implementation + SIMD implementation #461

Split source files to support Base Implementation + SIMD implementation #461

Conversation

noloader commented Aug 17, 2017 • edited Loading

noloader commented Aug 17, 2017 •

edited

Loading