Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split source files to support Base Implementation + SIMD implementation #461

Merged
merged 223 commits into from
Aug 17, 2017

Conversation

noloader
Copy link
Collaborator

@noloader noloader commented Aug 17, 2017

This pull request has been on JW's testing branch for some time. It takes the "single source" and breaks it into a "base implementation" which is standard C++, and a "SIMD implementation" which includes different ISAs.

The base implementation can be built with minimal or no flags. For example, -march=i686 or -march=x86-64. The SIMD implementation will be built with architecture specific flags, like -msha or -march=armv8-a+crypto when compiling sha-simd.cpp.

The GNUmakefile will add the flags automatically. Other build systems, like Cmake may need to be modified. Its OK to specify -march=native, and in fact the makefile still uses this strategy unless the user disables it. However, the makefile always adds the architecture specific flag to a source file when needed to ensure a source file is compiled correctly.

We avoided GCC function multi-versioning because we needed to support further back than GCC 6 for x86 and GCC 7 for ARM. Most of my ARM gadgets have GCC 4.8 and 4.9, so we wanted support for them out of the box. We also needed to support other compilers, like SunCC.

noloader added 30 commits July 29, 2017 00:24
We only need to base it on the compiler in config.h. config.h activates the code path guarded by HasNEON(). The source file that actially provides the NEON implementation will be compiled with -fpu=neon or -march=armv8-a.
Since we are providing the specialized implementation in a sequestered source file (and not a header file), we can probably avoid the defines like CRYPTOPP_ARM_NEON_AVAILABLE altogether.
cryptlib.lib(aria.obj) : error LNK2001: unresolved external symbol "unsigned int const * const CryptoPP::ARIATab::X2" (?X2@ARIATab@CryptoPP@@3QBIB) [C:\projects\cryptopp\cryptest.vcxproj]
cryptlib.lib(aria-simd.obj) : error LNK2001: unresolved external symbol "unsigned int const * const CryptoPP::ARIATab::X2" (?X2@ARIATab@CryptoPP@@3QBIB) [C:\projects\cryptopp\cryptest.vcxproj]
...
Thanks to Botan for providing these
Clang causes too many problems. Early versions of the compiler simply crashes. Later versions of the compiler still have trouble with Intel ASM and still produce incorrect results on occassion. Additionally, we have to special case the integrated assemvler. Its making a mess of the code and causing self test failures
When converting to split-sources, we disgorged ReverseHashBufferIfNeeded from Intel CLMUL and ARM PMULL operations. The problem is, they are linked. The only time a buffer needs reversing is when CLMUL or PMULL is in effect.
However, we made GCM_ReverseHashBufferIfNeeded_CLMUL and GCM_ReverseHashBufferIfNeeded_PMULL available wheneever SSSE3 or NEON was available, which was incorrect. They should only be used when CLMUL or PMULL is being used
Formerly the ARM code favored CPU probes with SIGILLs. We've found its ineffiient on most platforms and dangerous on Apple platforms. This commit splits feature probes into CPU_QueryXXX(), which asks the OS if a feature is present. The detection code then falls back to CPU_ProbeXXX() using SIGILLs as a last resort.
@noloader noloader merged commit e2c377e into weidai11:master Aug 17, 2017
noloader referenced this pull request Aug 17, 2017
In the bigger picture, the code to use inline ASM when intrinsics are not available still needs to be checked-in. Its a big change since we moved into SSE4, AVX and SHA. Design changes are still being evaluated, and its still being tested.
mouse07410 pushed a commit to mouse07410/cryptopp that referenced this pull request Aug 18, 2017
…on (GH weidai11#461)

Split source files to support Base Implementation + SIMD implementation
noloader referenced this pull request Aug 28, 2017
… in headers may not be 16-byte aligned because the architecture switch is present on the simd file, and not the base file.

16-byte aligned is the default for most systems nowadays, so we side stepped alignment problems on all platforms except 32-bit Solaris. We need the 16-byte alignment for all Intel compatibles since the late 1990s, which is nearly all processors in the class.

The worst case is, if a processor lacks SSE2, then it gets an aligned SecBlock anyways. The last time we saw processors without the features was 486 and early Pentiums, and that was 1996 or so. Even low-end processors like Intel Atoms and VIA have SSE2+SSSE3.

Also see "Enable 16-byte alignment full-time for i386 and x86_64?" (https://groups.google.com/forum/#!topic/cryptopp-users/ubp-gFC1BJI) for a discussion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant