Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of autoconf check for endianness #770

Closed

Conversation

real-or-random
Copy link
Contributor

This does not fix any particular issue but it's preferable to not
rely on autoconf. This avoids endianness mess for users on BE hosts
if they use their build without autoconf.

This does not fix any particular issue but it's preferable to not
rely on autoconf. This avoids endianness mess for users on BE hosts
if they use their build without autoconf.
@real-or-random
Copy link
Contributor Author

I guess we should merge #696 first.

@gmaxwell
Copy link
Contributor

9% slowdown in hash_rfc6979_hmac_sha256 on i.MX6 (arm7; gcc 4.9.2). Not needing autotools for target endianness detection sounds good but a slowdown on on a popular LE embedded target does not. endian.h perhaps?

@real-or-random
Copy link
Contributor Author

i.MX6 (arm7; gcc 4.9.2).

Hm, I had looked at the ASM for recent compilers:
https://godbolt.org/z/ndjvKs
Note that there's a large difference between GCC 4 and newer versions, with and without -mcpu=cortex-a9.

Maybe I'm ignorant but my response would be that I don't care about the performance with an age-old GCC 4. Is GCC 4.9 common for this target?

@real-or-random
Copy link
Contributor Author

Hashing performance will even less be a concern with a resolution of #702

@gmaxwell
Copy link
Contributor

It's pretty normal to have older (or even awful) compilers on embedded platforms, esp arm because devices usually depend on a vendor kernel that stops getting upgraded to new major versions and hangs around with old operating systems. On novena, the latest supported OS is debian jessie. The situation for toolchains you might find setup for hardware wallets and whatnot isn't that different a story.

If there is a choice between being fast on older compilers or new, sure pick new. Or between something much harder to maintain and a small slowdown on older compilers, sure. Bit 9% is pretty big, and a byteswap macro is about as idiomatic as you get.

Small embedded devices don't usually have hashing hardware so 702 doesn't really apply, unless you think that it's okay to just provide objectively worse stuff because the user can fix this libraries mistakes?

Besides, it appears to be a similar similar story elsewhere, if less dramatic. GCC 10.1.1 on PPC64LE:

Before: hash_rfc6979_hmac_sha256: min 5.96us / avg 5.96us / max 5.97us
After: hash_rfc6979_hmac_sha256: min 6.10us / avg 6.10us / max 6.10us

@real-or-random
Copy link
Contributor Author

real-or-random commented Jul 23, 2020

@gmaxwell You mentioned endian.h but this seems to be a portability mess, too. Or which header do you refer to?

Here's another idea: What about sticking to autoconf but also let it define a macro if it figures out little-endian. Then we can #error out at least (or try __BYTE_ORDER__ as a fallback, which is at least available on GCC/clang, and #error if that fails too). That's not very ergonomic but at least it errs on the side of caution, whereas we currently simply assume LE if autoconf was not used. Creating wrong hashes could end up terribly, e.g., if you create a schnorr sig with a user-specified nonce function, you could end up signing different hashes with the same nonce.

For enhanced safety, we could even add a cheap check in the context creation that checks at runtime if the endianness matches what it thinks it has been compiled for. Reminds me of my suggestion in #702.

@gmaxwell
Copy link
Contributor

Sounds okay to me. Also, erroring on a non-configuration makes sense. Portability is kinda complicated-- in the sense that a lot of things are in theory non-portable but in practice are fine (and a few things are in theory portable but in practice are not). I think things like BYTE_ORDER are reasonably widely supported. Anyone building for a really weird platform is going to be able to follow a clearly stated #error.

@real-or-random
Copy link
Contributor Author

Portability is kinda complicated-- in the sense that a lot of things are in theory non-portable but in practice are fine (and a few things are in theory portable but in practice are not).

I was specifically referring to endian.h not being available on Mac OS for example, which is a practical issue.

I think things like BYTE_ORDER are reasonably widely supported.

By the way, I found this hell yesterday: :D
https://gist.github.com/jtbr/7a43e6281e6cca353b33ee501421860c#file-endianness-h-L39

I guess the combination of __BYTE_ORDER__ (just this none, none of the weird stuff behind the link) and #error is good enough.

@elichai
Copy link
Contributor

elichai commented Jul 23, 2020

What do you think about something like this? https://godbolt.org/z/hfYa7K
It still has the conditional you were trying to avoid, but it doesn't rely on autotools.

@real-or-random
Copy link
Contributor Author

What do you think about something like this? https://godbolt.org/z/hfYa7K
It still has the conditional you were trying to avoid, but it doesn't rely on autotools.

Oh letting constant folding optimize the run-time away is really a neat idea. GCC 4.9 seems to do better here.

GCC 4.4.7 x86 and ARM GCC 4.5.4 without -mcpu=cortex-a9 still don't get this, but this seems good enough. Hey, even MSVC manages to optimize this... Or what do you think @gmaxwell ?

@gmaxwell
Copy link
Contributor

That looks like an extremely non-ideomatic way to do a byteswap and I wouldn't be particularly confident that some random compiler would handle it correctly, or that the optimization wouldn't randomly break when a butterfly flapped its wings in the amazon.

@elichai
Copy link
Contributor

elichai commented Jul 24, 2020

That looks like an extremely non-ideomatic way to do a byteswap and I wouldn't be particularly confident that some random compiler would handle it correctly, or that the optimization wouldn't randomly break when a butterfly flapped its wings in the amazon.

Yeah it's quite sketchy hehe, I first saw something like this in NIST submissions, it was way worse and had UB, it was something along these lines:

typedef enum {BIG, LITTLE, MIDDLE} Endianess;

Endianess Endian() {
    unsigned char arr[8];
    *(unsigned int*)arr = 0x01020304;
    if (arr[0] == 0x04) {
        return LITTLE;
    } else if (arr[0] == 0x01) {
        return BIG;
    } else {
        return MIDDLE;
    }
}

@sipa
Copy link
Contributor

sipa commented Jul 24, 2020

FWIW, it seems that literally every compiler on godbolt (including the obscure ones outside of clang/gcc/icc/msvc, and ones for less common platforms) supports __BYTE_ORDER (and it equalling either __BIG_ENDIAN or __LITTLE_ENDIAN).

Perhaps it's reasonable to just use that in non-autotools environments?

@real-or-random
Copy link
Contributor Author

real-or-random commented Jul 24, 2020

FWIW, it seems that literally every compiler on godbolt (including the obscure ones outside of clang/gcc/icc/msvc, and ones for less common platforms) supports __BYTE_ORDER (and it equalling either __BIG_ENDIAN or __LITTLE_ENDIAN).

Perhaps it's reasonable to just use that in non-autotools environments?

Nice, then let's just do this and #error if it's not defined. I'll update the PR.

@real-or-random
Copy link
Contributor Author

FWIW, it seems that literally every compiler on godbolt (including the obscure ones outside of clang/gcc/icc/msvc, and ones for less common platforms) supports __BYTE_ORDER (and it equalling either __BIG_ENDIAN or __LITTLE_ENDIAN).

I guess you meant __BYTE_ORDER__, that's apparently defined almost everywhere. A notable exception is MSVC of course... But we can test for this with _MSC_VER and it then offers _WIN32, which is "Defined as 1 when the compilation target is 32-bit ARM, 64-bit ARM, x86, or x64. Otherwise, undefined." https://docs.microsoft.com/en-us/cpp/preprocessor/predefined-macros?view=vs-2019

@real-or-random
Copy link
Contributor Author

Superseded by #787.

real-or-random added a commit that referenced this pull request Aug 13, 2020
…ianness

0dccf98 Use preprocessor macros instead of autoconf to detect endianness (Tim Ruffing)

Pull request description:

  This does not fix any particular issue but it's preferable to not
  rely on autoconf. This avoids endianness mess for users on BE hosts
  if they use their build without autoconf.

  The macros are carefully written to err on the side of the caution,
  e.g., we #error if the user manually configures a different endianness
  than what we detect.

  Supersedes #770 .

ACKs for top commit:
  sipa:
    ACK 0dccf98
  gmaxwell:
    ACK 0dccf98

Tree-SHA512: 6779458de5cb6eaef2ac37f9d4b8fa6c9b299f58f6e5b72f2b0d7e36c12ea06074e483acfb85085a147e0f4b51cd67d897f61a67250ec1cea284a0f7680eb2e8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants