Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix undefined left shift of negative value. #255

Closed
wants to merge 1 commit into from

Conversation

bryant
Copy link

@bryant bryant commented Oct 7, 2015

Left-shifting signed negative values is undefined behavior. From 6.5.7 of http://open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf:

If E1 has a signed type and nonnegative value,
and E1 × 2 E2 is representable in the result type,
then that is the resulting value; otherwise, the
behavior is undefined.

This patch also has the pleasant side effect of permitting build with clang.

@bobot
Copy link
Contributor

bobot commented Oct 7, 2015

Well done. Recently I had to do the same fix in order to verify a C stub with Frama-c Value (ie Value found this undefined behavior), but I forgot to send the fix. Thank you.

Just by curiosity what is the message given by Clang?

@damiendoligez
Copy link
Member

OK for merging.

@gasche
Copy link
Member

gasche commented Oct 8, 2015

@damiendoligez just a reminder, I would be interested in knowing the fate of #243 (I have done a few merges on top of it, that I thus haven't committed to trunk yet).

@damiendoligez
Copy link
Member

@gasche #243 is ready, I'll merge it today.

@chrismamo1
Copy link
Contributor

While shifting by a negative value is technically undefined, most CPU's don't even consider the sign when shifting. For instance, when performing a shift operation on a 32-bit integer, Intel chips only consider the five least significant bits in the second operand. For example, on a 32-bit Intel machine

x << (-1)

is equivalent to

x << 31

because their binary representations are identical in the five least-significant bits.

I doubt that very much OCaml code exploits this (I would hope none) but this update could quietly break any code that does.

@bryant
Copy link
Author

bryant commented Oct 8, 2015

@chrismamo1 'tis the value that is being shifted that is negative that is the problem.

@chrismamo1
Copy link
Contributor

@bryant Why should this pose a problem? Could you provide an example of this reducing in undesirable behaviour which is fixed by this PR?

@bobot
Copy link
Contributor

bobot commented Oct 10, 2015

@chrismamo1 the problem is just compliance with the definition of the C language. Most compilers will compile the code as it is fixed. However a compiler could assume that the undefined behaviors defined in C don't occur, in this case that there is no negative value.

@bryant
Copy link
Author

bryant commented Oct 12, 2015

@bobot Here's how it was discovered:

$ clang --version
clang version 3.7.0 (git://github.com/llvm-mirror/clang 40b68b4c02b9d9e1e4138815747adf5589496240) (git://github.com/llvm-mirror/llvm 17611b180b77406671cf0819a452dd128ce7481c)
Target: x86_64-unknown-linux-gnu
Thread model: posix

$ git log --pretty=oneline -1 | cat
c59bce11a5528d78d2177660a14d7f0a489b1407 amend commit 16486:  - add missing `./` in ocamldoc invocation  - change ifeq syntax style

$ ./configure -prefix `pwd` -no-curses -cc "clang -O3" && make world.opt

[...]

compare.c:273:12: error: shifting a negative signed value is undefined
      [-Werror,-Wshift-negative-value]
    return Val_int(LESS);
           ^~~~~~~~~~~~~
./caml/mlvalues.h:75:20: note: expanded from macro 'Val_int'
#define Val_int(x) Val_long(x)
                   ^~~~~~~~~~~
./caml/mlvalues.h:71:39: note: expanded from macro 'Val_long'
#define Val_long(x)     (((intnat)(x) << 1) + 1)
                          ~~~~~~~~~~~ ^
1 error generated.
make[2]: *** [compare.o] Error 1
make[2]: Leaving directory `/tmp/ocaml/byterun'
make[1]: *** [coldstart] Error 2
make[1]: Leaving directory `/tmp/3rd/ocaml'
make: *** [world.opt] Error 2

@chrismamo1 here are a few links that expound on the perils of ub:

http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html
http://blog.regehr.org/archives/213
https://randomascii.wordpress.com/2014/05/19/undefined-behavior-can-format-your-drive/

@bryant bryant closed this Oct 12, 2015
@bryant bryant reopened this Oct 12, 2015
@damiendoligez
Copy link
Member

OK for merging.

@gasche
Copy link
Member

gasche commented Oct 17, 2015

Merged in trunk, thanks for the patch.

@bryant , I wanted to contact you to know under which name you would like to appear in the changelog, but the email adress you used in your commit (bryant@defrag.in) is invalid.

@gasche gasche closed this Oct 17, 2015
@chrismamo1
Copy link
Contributor

@bryant thanks for the links!

@gasche
Copy link
Member

gasche commented Dec 6, 2018

yallop/ocaml-ctypes#583 is a fix to ctypes that was required by the present change (cc @yallop, @fdopen). Should we have used

(intnat) ((uintnat)((intnat)x) << 1)

instead of the current

(intnat) ((uintnat)(x) << 1)

?

@yallop
Copy link
Member

yallop commented Dec 6, 2018

In general I think it might be best to document the expected types of the various macro arguments, and leave it to the caller to ensure that arguments have the correct types.

The problem with macros that don't cast their arguments to the expected types is that code like the linked ctypes PR can behave unexpectedly. That problem can be fixed by coercing the argument in the caller.

The problem with macros that cast their arguments to the expected types is that they'll accept all kinds of nonsense, and the problem can't be fixed in the caller.

For example, Long_val is currently defined as follows:

#define Long_val(x) ((x)>> 1)

and calling Long_val("foo") will result in a compiler error. If instead Long_val were defined like this:

#define Long_val(x) ((value)(x)>> 1)

then Long_val("foo") would be silently accepted.

Ideally we'd have function semantics: i.e.

inline intnat Long_val(value x) { return x >> 1; }

which would give us the best of both worlds: functions ensure that arguments are of the correct type, while still rejecting undesirable coercions (like string-to-integer) that a cast would silently allow through.

@gasche
Copy link
Member

gasche commented Dec 6, 2018

I wonder if we would actually observe a performance degradation with the compilers we care about if we used inline functions instead of macros for these conversions.

(Note: if there was a conditional define to use inline functions instead of macros, it could also be used as a development help, for example in Debug mode, even if a production build uses macros instead.)

mshinwell pushed a commit to mshinwell/ocaml that referenced this pull request Sep 2, 2020
Revert Gc interface changes for memprof tracker
EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
* Fix ahrefs' testimonial

* Unify ahrefs' description
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants