-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Undefined behaviour in main
and 3.11
#96678
Comments
main
and 3.11
main
and 3.11main
and 3.11
Can you post a summary of "arithmetic with NULL pointers" here? Thanks |
@kumaraditya303 In /* Pack other positional arguments into the *args argument */
if (co->co_flags & CO_VARARGS) {
PyObject *u = NULL;
assert(args != NULL);
u = _PyTuple_FromArraySteal(args + n, argcount - n);
if (u == NULL) {
goto fail_post_positional;
}
assert(localsplus[total_args] == NULL);
localsplus[total_args] = u;
} There's also an example in |
#96672 is also vaguely related. |
cc @pablogsal This seems to be introduced in #89419 |
I am a bit surprised by this because we have a USAN buildbot that has not detected anything: |
These are the parameters, just in case we want to compare: |
@pablogsal I added some more info about what compiler I am using. |
Can you see if you reproduce these warnings using the exact flags the buildbot is using? |
I'll try that later today. |
@pablogsal Is https://buildbot.python.org/all/#/builders/964/builds/319/steps/5/logs/stdio supposed to show lots of stuff like this?:
|
Most of these are expected because some tests segfault on purpose (like faulthandler) and other are incompatible with sanitizer (like some ctypes ones) Those are always in subprocesses so as long as they don't happen on the main process that won't make the test suite fail. |
Notice that some of the frames involve |
When I run with the same config as the build bot, I also seem to be getting only the |
#!/bin/bash
set -e
set -u
set -o pipefail
set -x
dir="builds/debug-main-no-pymalloc"
here="$(realpath .)"
mkdir --parent "${dir}"
pushd "${dir}"
export CC="clang"
export LD="clang"
export CFLAGS="-fno-sanitize-recover"
nice "${here}/configure" \
--with-undefined-behavior-sanitizer \
--with-pymalloc=no \
--with-assertions \
--with-address-sanitizer \
--with-trace-refs \
--with-pydebug \
nice make -j8
nice make test When I run the above build-script on 3.11, I don't even get to
(See #96569 for that problem.) |
That hints that this only happens when we added either |
Here's my full build script that I run from the root of the repository and it gives all the UB warnings I made assertions for: #!/bin/bash
set -e
set -u
set -o pipefail
set -x
dir="builds/debug-main-no-pymalloc-do-recover-no-leaks"
here="$(realpath .)"
mkdir --parent "${dir}"
pushd "${dir}"
export CC="clang"
export LD="clang"
export ASAN_OPTIONS=detect_leaks=0
export CFLAGS="-fsanitize-recover"
nice "${here}/configure" \
--with-undefined-behavior-sanitizer \
--with-pymalloc=no \
--with-assertions \
--with-address-sanitizer \
--with-trace-refs \
--with-pydebug \
nice make -j8
nice make test Curiously, (I only put Btw, this config doesn't seem to be producing the |
I think the C compiler optimizes away the undefined behavior in non debug builds hence the non debug ubsan is passing. |
Are the things this points out new, or just stuff we haven't ever noticed until now because we don't do our sanitizer runs --with-pydebug or --with-trace-refs? If these aren't new, I'd defer this blocker to a bugfix. |
That's what I am trying to understand. It seems that this only triggers with --with-pydebug or --with-trace-refs but this could point to a real problem that just happens to not be there in release mode by chance or because we are lucky. In any case this doesn't seem to happen on 3.10 . |
@gpshead It's a few different instances of UB. I suspect some have been around for a bit, and the one in (But I'd need to check the history for that, when I'm back at my pc.) |
(cherry picked from commit 50a70a0) Co-authored-by: Mark Shannon <mark@hotpy.org>
Btw, I suspect the problem in Please check whether the other instances of UB still left deserve to be release blockers? |
I am removing the release-blocker tag for now was the error in |
@pablogsal The conclusion that the release should not be blocked might very well be the right course of action, but I am a bit confused by your reasoning? Release mode wouldn't detect any undefined behaviour at all, as it doesn't run with any sanitizer nor assertions? Or would it? At most you would get a failed test, if you are really lucky? Eg left shifting a negative number doesn't suddenly become defined in C, just because we are building with optimizations turned on? (Nor does left shifting a signed long 1 by 63 places?) If I remember right from my cursory look at the git history, the remaining instances of UB that I detected are in files that haven't changed recently. So we can just pray that all the compilers for all the platforms we release for keep producing code that works alright in the new release, too. (And that might sound a bit dismissive, but I think it's probably actually an ok strategy, and that's why I suggested reviewing the release-blocker status. I just want us to be aware that our strategy is 'hope' in this case.) |
My reasoning is that we have an undefined behaviour sanitiser buildbot that checks release mode that has not found any problem so the risk of any of this leaking into release mode, although non-zero, is likely low. The builder is here: https://buildbot.python.org/all/#/builders/964 So unless someone explains to me why the failures that only are detected on debug mode can also happen on release mode and why the buildbot is not picking them up, I am going to delay any fixes to 3.11.1. |
As a practical test, we can take the asserts I added in the PRs mentioned in the text of the issue, convert them into something that triggers in release mode (eg a normal I don't know why the sanitizer doesn't pick any of this up. At least it doesn't pick any of this up when the build bot runs it, neither in release mode nor in the debug mode. I ran my sanitizer locally with different flags. See above. (I'm only on my phone right now, so can't run these experiments until later today.) |
Answering that question is very important, as it may reveal very interesting things such as problems in the running builder or what is the difference in behaviour between your builds and the builder and if the differences matte or not.
The sanitizer only runs in release mode in the build bots |
Oh, thanks. I must have misunderstood something. Of course, our general point about not actually picking up many instances of UB still stands. Btw, the config that you linked to that had the sanitizer enabled didn't enable optimisations as far as I can tell, hence I didn't see it as release mode. |
@pablogsal @markshannon @kumaraditya303 Good news! I found the rootcause of why my debug build with all sanitisers turned up showed undefined behaviour, but they didn't show up in the build bot. Most invocations of From clang's documentation about
GCC is a bit more verbose, but still rather vague:
It turns out that arithmetic on the null-pointer also becomes defined with If the above is true, then all the instances of UB that I found are not actually UB in the effective C-dialect we are using for release builds. Now we have (at least) two options:
The former is the minimal change and safe. The latter enables a lot of C compiler optimizations, and my experiments running the sanitisers suggests that we are already nearly I say we should do the former straight away, and in the longer term consider implementing the latter (after running benchmarks and weighing the pros and cons etc). |
(cherry picked from commit 6ba686d) Co-authored-by: Matthias Görgens <matthias.goergens@gmail.com>
Automerge-Triggered-By: GH:pablogsal (cherry picked from commit 81e36f3) Co-authored-by: Matthias Görgens <matthias.goergens@gmail.com>
Automerge-Triggered-By: GH:pablogsal
Automerge-Triggered-By: GH:pablogsal (cherry picked from commit 81e36f3) Co-authored-by: Matthias Görgens <matthias.goergens@gmail.com>
I am closing this issue for now, because thanks to |
I ran the sanitizers again, and found a few more instances of undefined behaviour, mostly around bit-shifting of signed integers and arithmetic with NULL pointers.
I put some asserts to demonstrate the undefined behaviour into pull requests for main (matthiasgoergens#18) and 3.11 (matthiasgoergens#19).
More information about my environment:
The text was updated successfully, but these errors were encountered: