Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename memczero #829

Closed
roconnor-blockstream opened this issue Oct 6, 2020 · 9 comments · Fixed by #835
Closed

Rename memczero #829

roconnor-blockstream opened this issue Oct 6, 2020 · 9 comments · Fixed by #835

Comments

@roconnor-blockstream
Copy link
Contributor

The list of reserved identifiers in C includes all functions names beginning with mem followed by a lowercase letter. We are using the reserved function name memczero, which is undefined behaviour.

@gmaxwell
Copy link
Contributor

gmaxwell commented Oct 6, 2020

ACK fixing this.

I would gauge that this is extremely unlikely to ever cause issues (esp, not silent issues); but the function name is a free choice so better to be pedantically correct.

Ideally the code would be checked by static analysis which is anal enough to catch things like this (some of which might be less benign than this one) and keeping this wrong would just cause the tool to produce uninteresting reports.

Reserved wildcard issues have been fixed before. E.g. #479

@jonasnick jonasnick changed the title memczero Rename memczero Oct 7, 2020
@elichai
Copy link
Contributor

elichai commented Oct 7, 2020

wtf, as much as UB in C is interesting this is crazy, I thought only __ prefix is UB. sigh.

@real-or-random
Copy link
Contributor

real-or-random commented Oct 7, 2020

If I understand C89 correctly, this is not an issue because memczero is static and as such has internal linkage and is not an "external name".

But the "mem..." identifiers are reserved as external names:
4.13: "All external names described below are reserved no matter what headers are included by the program. (...)"

3.1.2 defines "external name" to be "an identifier that has external linkage"
and
3.1.2.2: "If the declaration of an identifier for an object or a function has file scope and contains the storage-class specifier static, the identifier has internal linkage."

4.1.2: "If the program defines an external identifier with the same name as a reserved external identifier, even in a semantically equivalent form, the behavior is undefined."

The language is pretty imprecise though. C99 and C11 make this clearer by explicitly saying which identifiers are reserved in which contexts. The result is the same in our case.

C99, 7.3.1: "All identifiers with external linkage in any of the following subclauses (including the future library directions) are always reserved for use as identifiers with external linkage."

That's enough language lawyering for today, even for my standards. No pun intended.

@roconnor-blockstream
Copy link
Contributor Author

roconnor-blockstream commented Oct 7, 2020

C99, 7.3.1: "Each identifier with file scope listed in any of the following subclauses (including the future library directions) is reserved for use as a macro name and as an identifier with file scope in the same name space if any of its associated headers is included." (emphasis mine)

My reading of this was that declaring the memczero function in any file that includes the <string.h> header is UB. But I find this wording hard to read, and I may be mistaken.

@real-or-random
Copy link
Contributor

Oh yes, I think you're right!

I guess let's just fix it instead of debating the standard (even though the latter may be more entertaining). :D

@roconnor-blockstream
Copy link
Contributor Author

roconnor-blockstream commented Oct 7, 2020

Should we expand the scope of this issue?

https://github.com/bitcoin-core/secp256k1/blob/master/src/ecmult_impl.h#L71

Beginning a macro with E followed by a digit or an uppercase letter, ... undefined behaviour?

To be fair, I don't think we include <errno.h>.

@gmaxwell
Copy link
Contributor

gmaxwell commented Oct 8, 2020

wtf, as much as UB in C is interesting this is crazy, I thought only __ prefix is UB. sigh.

All UB really means is that the standard doesn't make any promises about what happens when your program does it.

mem* is reserved, so maybe a later standard/compiler makes a "memczero" a built-in function which happens to take the right number of arguments, and .. uh.. zeros all memory accessible to C. This would make you unhappy. But (1) they're unlikely to use that specific identifier, (2) if they did it would probably trigger warnings or failures to compile.

The whole thing where other kinds of UB causes extremely weird program behaviour is a product of optimization passes. E.g. Some optimization to unroll a loop can't work unless it knows that the loop counter can't wrap, and if it can't unroll it can't vectorize. Fortunately the counter is signed and so it can't wrap by definition. Compose a bunch of optimizations and you get stuff like the compiler automatically removing blocks of code because it "proved" that it could never get executed, much to the programmer's surprise. Without the optimizations the compiler's performance would be much poorer (and it is-- go compile this library with gcc 2.95, it's visibly slower).

Some identifier language lawyering is extremely unlikely to result in weirdo behaviour through any path like that, only through a genuine clash. But being pedantically wrong, even if is unlikely doesn't matter, makes it harder to use pedantic static analysis tools-- and also potentially sends an implicit signal to contributors that the authors here don't care about being pedantically correct. I'd even go a step further and say that if multiple experienced contributors think something might be verboten it should be avoided at least if avoiding it is essentially free: easy to review, harmless to performance, harmless to api compatibility, almost certain to not introduce bugs, like "pick a different function name". :) If we thought it was wrong, someone else will later.

real-or-random added a commit to real-or-random/secp256k1 that referenced this issue Oct 19, 2020
This add a simple static checker based on clang-query, which is a
tool that could be described as a "clever grep" for abstract syntax
trees (ASTs).

As an initial proof of usefulness, this commit adds these checks:
  - No uses of floating point types.
  - No use of certain reserved identifiers (e.g., "mem(...)", bitcoin-core#829).
  - No use of memcmp (bitcoin-core#823).

The checks are easily extensible.

The main purpose is to run the checker on CI, and this commit adds
the checker to the Travis CI script.

This currently requires clang-query version at least 10. (However,
it's not required not compile with clang version 10, or with clang
at all. Just the compiler flags must be compatible with clang.)
Clang-query simply uses the clang compiler as a backend for
generating the AST.

In order to determine the compile in which the code is supposed to be
compiled (e.g., compiler flags such as -D defines and include paths),
it reads a compilation database in JSON format. There are multiple ways
to generate this database. The easiest way to obtain such a database is
to use a tool that intercepts the make process and build the database.

On Travis CI, we currently use "bear" for this purpose. It's a natural
choice because there is an Ubuntu package for it. If you want to run
this locally, bear is a good choice but other tools such as compiledb
(Python) are available.
real-or-random added a commit to real-or-random/secp256k1 that referenced this issue Oct 22, 2020
This add a simple static checker based on clang-query, which is a
tool that could be described as a "clever grep" for abstract syntax
trees (ASTs).

As an initial proof of usefulness, this commit adds these checks:
  - No uses of floating point types.
  - No use of certain reserved identifiers (e.g., "mem(...)", bitcoin-core#829).
  - No use of memcmp (bitcoin-core#823).

The checks are easily extensible.

The main purpose is to run the checker on CI, and this commit adds
the checker to the Travis CI script.

This currently requires clang-query version at least 10. (However,
it's not required not compile with clang version 10, or with clang
at all. Just the compiler flags must be compatible with clang.)
Clang-query simply uses the clang compiler as a backend for
generating the AST.

In order to determine the compile in which the code is supposed to be
compiled (e.g., compiler flags such as -D defines and include paths),
it reads a compilation database in JSON format. There are multiple ways
to generate this database. The easiest way to obtain such a database is
to use a tool that intercepts the make process and build the database.

On Travis CI, we currently use "bear" for this purpose. It's a natural
choice because there is an Ubuntu package for it. If you want to run
this locally, bear is a good choice but other tools such as compiledb
(Python) are available.
@sipa sipa closed this as completed in e89278f Nov 4, 2020
sipa added a commit that referenced this issue Nov 4, 2020
…ify_t

1f4dd03 Typedef (u)int128_t only when they're not provided by the compiler (Tim Ruffing)
e89278f Don't use reserved identifiers memczero and benchmark_verify_t (Tim Ruffing)

Pull request description:

  As identified in #829 and #833. Fixes #829.

  Since we touch this anyway, this commit additionally makes the
  identifiers in the benchmark files a little bit more consistent.

  This is necessary before we can merge #833. I preferred a separate PR because it makes it easier to see the results of Travis in #833.

ACKs for top commit:
  sipa:
    utACK 1f4dd03
  jonasnick:
    ACK 1f4dd03

Tree-SHA512: c0ec92798f3c94f3ef6ac69b3f0f39a39257a32be9d9a068832cece1ebe64c89848b70e44652fc397004b8b240883ac4bc0c8f95abbe4ba4b028de120e6734bf
@roconnor-blockstream
Copy link
Contributor Author

Should I open a new issue for macros beginning with E?

@real-or-random
Copy link
Contributor

Should I open a new issue for macros beginning with E?

I think a PR would also work. ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants