Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile failures using ARM and ARM64 with Microsoft tools #776

Closed
noloader opened this issue Jan 3, 2019 · 5 comments
Closed

Compile failures using ARM and ARM64 with Microsoft tools #776

noloader opened this issue Jan 3, 2019 · 5 comments

Comments

@noloader
Copy link
Collaborator

noloader commented Jan 3, 2019

This is an open-ended report to track changes for Microsoft ARM and ARM64 compiles. Microsoft recently released their ARM64 compiler (part of Visual Studio 15.9), so we can now test a compile and link. I'm trying to get hold of an ASUS TP370QL for testing so we can actually run the test vectors.

Here's the first issue. A typical initialization is shown below. The problem is, it is using GCC extensions:

const uint32x4_t CTRS[3] = {
    {1,0,0,0}, {2,0,0,0}, {3,0,0,0}
};

And a compile results in:

cl.exe /nologo /W4 /D_MBCS /Zi /TP /GR /EHsc /DNDEBUG /D_NDEBUG /Oi /Oy /O2 /MT
/FI sdkddkver.h /FI winapifamily.h /DWINAPI_FAMILY=WINAPI_FAMILY_PHONE_APP /c c
hacha_simd.cpp

chacha_simd.cpp(306) : error C2078: too many initializers
NMAKE : fatal error U1077: '"C:\Program Files (x86)\Microsoft Visual Studio 12.0
\VC\BIN\x86_ARM\cl.exe"' : return code '0x2'
Stop.

Peter Cordes (@pcordes) provided a workaround at Error C2078 when initializing uint32x4_t on ARM? on Stack Overflow. It requires macros, but it avoids other problems with the compiler:

#if (CRYPTOPP_ARM_NEON_AVAILABLE)
# if defined(_MSC_VER)
#  define PACK32x4(w,x,y,z) { ((w) + (uint64_t(x) << 32)), ((y) + (uint64_t(z) << 32)) }
# else
#  define PACK32x4(w,x,y,z) { (w), (x), (y), (z) }
# endif
#endif  // Microsoft workaround
@pcordes
Copy link

pcordes commented Jan 3, 2019

Worth mentioning that this packing order is for little-endian ARM. If anybody ever uses ARM in big-endian mode, you'd need to left-shift w and y to the MSB of the int64, instead of x and z.

And BTW, this works because MSVC declares __n128 (aka uint32x4_t and all other 128-bit vector types) with the .n128_u64 field as the first member of the union. Probably not a bad idea to have a comment in there, because it looks insane.

@noloader
Copy link
Collaborator Author

noloader commented Jan 3, 2019

And from speck128_simd.cpp (simon128_simd.cpp has a similar failure):

cl.exe /nologo /W4 /D_MBCS /Zi /TP /GR /EHsc /DNDEBUG /D_NDEBUG /Oi /Oy /O2 /MT
/FI sdkddkver.h /FI winapifamily.h /DWINAPI_FAMILY=WINAPI_FAMILY_PHONE_APP /c
speck128_simd.cpp

speck128_simd.cpp(139) : error C3861: 'vld1q_dup_u64': identifier not found
speck128_simd.cpp(167) : error C3861: 'vld1q_dup_u64': identifier not found
speck128_simd.cpp(204) : error C3861: 'vld1q_dup_u64': identifier not found
speck128_simd.cpp(232) : error C3861: 'vld1q_dup_u64': identifier not found
NMAKE : warning U4010: 'speck128_simd.obj' : build failed; /K specified, continu
ing ...

According to arm_neon.h on GitHub, Microsoft provides vld1q_dup_u32 and lower, but not vld1q_dup_u64. Sigh...

#define vld1q_dup_u8(pcD)    ( __neon_Q1Adr( 0xf4a00c2f, __uint8ToN64_c(pcD)) )
#define vld1q_dup_u16(pcD)   ( __neon_Q1Adr( 0xf4a00c6f, __uint16ToN64_c(pcD)) )
#define vld1q_dup_u32(pcD)   ( __neon_Q1Adr( 0xf4a00caf, __uint32ToN64_c(pcD)) )

It looks like we need to provide it (kind of):

// Missing from Microsoft's implementation???
#if defined(_MSC_VER) && !defined(_M_ARM64)
inline uint64x2_t vld1q_dup_u64(const uint64_t* ptr)
{
	return vmovq_n_u64(*ptr);
}
#endif

@noloader
Copy link
Collaborator Author

noloader commented Jan 4, 2019

And another one for ARM64:

cl.exe /nologo /W4 /wd4231 /wd4511 /wd4156 /D_MBCS /Zi /TP /GR /EHsc /DNDEBUG /D
_NDEBUG /Oi /Oy /O2 /MT /FI sdkddkver.h /FI winapifamily.h /DWINAPI_FAMILY=WINAP
I_FAMILY_PHONE_APP /c integer.cpp
integer.cpp
integer.cpp(963): error C3861: '__emulu': identifier not found
integer.cpp(1262): error C3861: '__emulu': identifier not found
integer.cpp(1271): error C3861: '__emulu': identifier not found
integer.cpp(1280): error C3861: '__emulu': identifier not found
...

@noloader
Copy link
Collaborator Author

noloader commented Jan 4, 2019

And another one for ARM64:

        cl.exe /nologo /W4 /wd4231 /wd4511 /wd4156 /D_MBCS /Zi /TP /GR /EHsc /DN
DEBUG /D_NDEBUG /Oi /Oy /O2 /MT /FI sdkddkver.h /FI winapifamily.h /DWINAPI_FAMI
LY=WINAPI_FAMILY_PHONE_APP /c gcm_simd.cpp
gcm_simd.cpp
gcm_simd.cpp(129): error C2664: '__n128 neon_pmull_64(__n64,__n64)': cannot conv
ert argument 1 from 'unsigned __int64' to '__n64'
gcm_simd.cpp(131): note: No constructor could take the source type, or construct
or overload resolution was ambiguous
...

@noloader noloader closed this as completed Jan 5, 2019
@noloader noloader changed the title Compile failures using ARM NEON with Microsoft tools Compile failures using ARM and ARM64 with Microsoft tools Jan 5, 2019
@noloader
Copy link
Collaborator Author

noloader commented Jan 5, 2019

We cleared the issues resulting from the ARM64 compile. cryptest.exe compiled and linked.

I don't have a test machine to transfer the binary and run the test suite, so I am not sure if things will actually work. I'm trying to get hold of an ASUS TP370QL for testing so we can actually run the test vectors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants