Skip to content

False positive msan error with strtol and -std=c2x or _GNU_SOURCE #64946

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tavianator opened this issue Aug 24, 2023 · 4 comments
Closed

False positive msan error with strtol and -std=c2x or _GNU_SOURCE #64946

tavianator opened this issue Aug 24, 2023 · 4 comments
Labels
compiler-rt:msan Memory sanitizer false-positive Warning fires when it should not

Comments

@tavianator
Copy link
Contributor

A recent glibc update added a new strtol() implementation to support the 0b prefix from C23. This new symbol seems to be missing an interceptor:

tavianator@graphene $ cat foo.c
#include <assert.h>
#include <stdlib.h>

int main(void) {
        char *end;
        long l = strtol("42", &end, 0);
        assert(l == 42);
        assert(*end == '\0');
        return 0;
}
tavianator@graphene $ clang -fsanitize=memory foo.c -o foo && ./foo
tavianator@graphene $ clang -std=c2x -fsanitize=memory foo.c -o foo && ./foo
==58541==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x55994bbe00da in main (/home/tavianator/code/bfs/foo+0xb50da) (BuildId: 43c18b79db0838f533698c3e83e9322a882086d9)
    #1 0x7f1a12c27ccf  (/usr/lib/libc.so.6+0x27ccf) (BuildId: 316d0d3666387f0e8fb98773f51aa1801027c5ab)
    #2 0x7f1a12c27d89 in __libc_start_main (/usr/lib/libc.so.6+0x27d89) (BuildId: 316d0d3666387f0e8fb98773f51aa1801027c5ab)
    #3 0x55994bb4d134 in _start (/home/tavianator/code/bfs/foo+0x22134) (BuildId: 43c18b79db0838f533698c3e83e9322a882086d9)

SUMMARY: MemorySanitizer: use-of-uninitialized-value (/home/tavianator/code/bfs/foo+0xb50da) (BuildId: 43c18b79db0838f533698c3e83e9322a882086d9) in main
Exiting
tavianator@graphene $ clang -D_GNU_SOURCE -fsanitize=memory foo.c -o foo && ./foo
==58556==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x5643ee5720da in main (/home/tavianator/code/bfs/foo+0xb50da) (BuildId: 43c18b79db0838f533698c3e83e9322a882086d9)
    #1 0x7f302fc27ccf  (/usr/lib/libc.so.6+0x27ccf) (BuildId: 316d0d3666387f0e8fb98773f51aa1801027c5ab)
    #2 0x7f302fc27d89 in __libc_start_main (/usr/lib/libc.so.6+0x27d89) (BuildId: 316d0d3666387f0e8fb98773f51aa1801027c5ab)
    #3 0x5643ee4df134 in _start (/home/tavianator/code/bfs/foo+0x22134) (BuildId: 43c18b79db0838f533698c3e83e9322a882086d9)

SUMMARY: MemorySanitizer: use-of-uninitialized-value (/home/tavianator/code/bfs/foo+0xb50da) (BuildId: 43c18b79db0838f533698c3e83e9322a882086d9) in main
Exiting
tavianator@graphene $ nm -u foo | grep strtol
                 U __isoc23_strtol@GLIBC_2.38
tavianator@graphene $ clang --version
clang version 15.0.7
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
tavianator@graphene $ /usr/lib/libc.so.6
GNU C Library (GNU libc) stable release version 2.38.
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 13.2.1 20230801.
libc ABIs: UNIQUE IFUNC ABSOLUTE
Minimum supported kernel: 4.4.0
For bug reporting instructions, please see:
<https://bugs.archlinux.org/>.
@MaskRay
Copy link
Member

MaskRay commented Aug 26, 2023

Patch: https://reviews.llvm.org/D158943 [sanitizer] Intercept glibc 2.38 __isoc23_* functions

@tavianator It'd be nice if you can verify whether the patch helps.

curl -s https://raw.githubusercontent.com/chromium/chromium/main/tools/clang/scripts/update.py | python3 - --output-dir=~/Stable

# Use https://github.com/MaskRay/llvm-project/tree/san-c2x
cmake -GNinja -Sllvm -Bout/release -DCMAKE_CXX_COMPILER=$HOME/Stable/bin/clang++ -DCMAKE_C_COMPILER=$HOME/Stable/bin/clang  -DLLVM_ENABLE_PROJECTS='clang;lld;compiler-rt' -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD=host -DCLANG_ENABLE_ARCMT=off -DCLANG_ENABLE_STATIC_ANALYZER=off
ninja -C out/release clang asan msan compiler-rt-headers
# Use out/release/bin/clang

@tavianator
Copy link
Contributor Author

@MaskRay I can confirm that MaskRay@9e305a0 fixes it for me, thanks!

@EugeneZelenko
Copy link
Contributor

@MaskRay: Should it be backported to 17.x?

llvmbot pushed a commit to llvm/llvm-project-release-prs that referenced this issue Aug 28, 2023
`strtol("0b1", 0, 0)` can be (pre-C23) 0 or (C23) 1.
`sscanf("0b10", "%i", &x)` is similar. glibc 2.38 introduced
`__isoc23_strtol` and `__isoc23_scanf` family functions for binary
compatibility.

When `_ISOC2X_SOURCE` is defined (implied by `_GNU_SOURCE`) or
`__STDC_VERSION__ > 201710L`, `__GLIBC_USE_ISOC2X` is defined to 1 and
these `__isoc23_*` symbols are used.

Add `__isoc23_` versions for the following interceptors:

* sanitizer_common_interceptors.inc implements strtoimax/strtoumax.
  Remove incorrect FIXME about google/sanitizers#321
* asan_interceptors.cpp implements just strtol and strtoll. The default
  `replace_str` mode checks `nptr` is readable and `endptr` is writable.
  atoi reuses the existing strtol interceptor.
* msan_interceptors.cpp implements strtol family functions and their
  `_l` versions. Tested by lib/msan/tests/msan_test.cpp
* sanitizer_common_interceptors.inc implements scanf family functions.

The strtol family functions are spreaded, which is not great, but the
patch (intended for release/17.x) does not attempt to address the issue.

Add symbols to lib/sanitizer_common/symbolizer/scripts/global_symbols.txt to
support both glibc pre-2.38 and 2.38.

When build bots migrate to glibc 2.38+, we will lose test coverage for
non-isoc23 versions since the existing C++ unittests imply `_GNU_SOURCE`.
Add test/sanitizer_common/TestCases/{strtol.c,scanf.c}.
They catch msan false positive in the absence of the interceptors.

Fix llvm/llvm-project#64388
Fix llvm/llvm-project#64946

Link: https://lists.gnu.org/archive/html/info-gnu/2023-07/msg00010.html
("The GNU C Library version 2.38 is now available")

Reviewed By: #sanitizers, vitalybuka, mgorny

Differential Revision: https://reviews.llvm.org/D158943

(cherry picked from commit ad7e2501000da2494860f06a306dfe8c08cc07c3)
@MaskRay
Copy link
Member

MaskRay commented Aug 28, 2023

Filed release/17.x backport request on #64388

@MaskRay: Should it be backported to 17.x?

Yes! My commit message mentioned

The strtol family functions are spreaded, which is not great, but the
patch (intended for release/17.x) does not attempt to address the issue.

@EugeneZelenko EugeneZelenko removed this from the LLVM 17.0.X Release milestone Aug 28, 2023
llvmbot pushed a commit to llvm/llvm-project-release-prs that referenced this issue Aug 28, 2023
`strtol("0b1", 0, 0)` can be (pre-C23) 0 or (C23) 1.
`sscanf("0b10", "%i", &x)` is similar. glibc 2.38 introduced
`__isoc23_strtol` and `__isoc23_scanf` family functions for binary
compatibility.

When `_ISOC2X_SOURCE` is defined (implied by `_GNU_SOURCE`) or
`__STDC_VERSION__ > 201710L`, `__GLIBC_USE_ISOC2X` is defined to 1 and
these `__isoc23_*` symbols are used.

Add `__isoc23_` versions for the following interceptors:

* sanitizer_common_interceptors.inc implements strtoimax/strtoumax.
  Remove incorrect FIXME about google/sanitizers#321
* asan_interceptors.cpp implements just strtol and strtoll. The default
  `replace_str` mode checks `nptr` is readable and `endptr` is writable.
  atoi reuses the existing strtol interceptor.
* msan_interceptors.cpp implements strtol family functions and their
  `_l` versions. Tested by lib/msan/tests/msan_test.cpp
* sanitizer_common_interceptors.inc implements scanf family functions.

The strtol family functions are spreaded, which is not great, but the
patch (intended for release/17.x) does not attempt to address the issue.

Add symbols to lib/sanitizer_common/symbolizer/scripts/global_symbols.txt to
support both glibc pre-2.38 and 2.38.

When build bots migrate to glibc 2.38+, we will lose test coverage for
non-isoc23 versions since the existing C++ unittests imply `_GNU_SOURCE`.
Add test/sanitizer_common/TestCases/{strtol.c,scanf.c}.
They catch msan false positive in the absence of the interceptors.

Fix llvm/llvm-project#64388
Fix llvm/llvm-project#64946

Link: https://lists.gnu.org/archive/html/info-gnu/2023-07/msg00010.html
("The GNU C Library version 2.38 is now available")

Reviewed By: #sanitizers, vitalybuka, mgorny

Differential Revision: https://reviews.llvm.org/D158943

(cherry picked from commit ad7e2501000da2494860f06a306dfe8c08cc07c3)
llvmbot pushed a commit to llvm/llvm-project-release-prs that referenced this issue Aug 29, 2023
`strtol("0b1", 0, 0)` can be (pre-C23) 0 or (C23) 1.
`sscanf("0b10", "%i", &x)` is similar. glibc 2.38 introduced
`__isoc23_strtol` and `__isoc23_scanf` family functions for binary
compatibility.

When `_ISOC2X_SOURCE` is defined (implied by `_GNU_SOURCE`) or
`__STDC_VERSION__ > 201710L`, `__GLIBC_USE_ISOC2X` is defined to 1 and
these `__isoc23_*` symbols are used.

Add `__isoc23_` versions for the following interceptors:

* sanitizer_common_interceptors.inc implements strtoimax/strtoumax.
  Remove incorrect FIXME about google/sanitizers#321
* asan_interceptors.cpp implements just strtol and strtoll. The default
  `replace_str` mode checks `nptr` is readable and `endptr` is writable.
  atoi reuses the existing strtol interceptor.
* msan_interceptors.cpp implements strtol family functions and their
  `_l` versions. Tested by lib/msan/tests/msan_test.cpp
* sanitizer_common_interceptors.inc implements scanf family functions.

The strtol family functions are spreaded, which is not great, but the
patch (intended for release/17.x) does not attempt to address the issue.

Add symbols to lib/sanitizer_common/symbolizer/scripts/global_symbols.txt to
support both glibc pre-2.38 and 2.38.

When build bots migrate to glibc 2.38+, we will lose test coverage for
non-isoc23 versions since the existing C++ unittests imply `_GNU_SOURCE`.
Add test/sanitizer_common/TestCases/{strtol.c,scanf.c}.
They catch msan false positive in the absence of the interceptors.

Fix llvm/llvm-project#64388
Fix llvm/llvm-project#64946

Link: https://lists.gnu.org/archive/html/info-gnu/2023-07/msg00010.html
("The GNU C Library version 2.38 is now available")

Reviewed By: #sanitizers, vitalybuka, mgorny

Differential Revision: https://reviews.llvm.org/D158943

(cherry picked from commit ad7e2501000da2494860f06a306dfe8c08cc07c3)
tru pushed a commit to llvm/llvm-project-release-prs that referenced this issue Aug 30, 2023
`strtol("0b1", 0, 0)` can be (pre-C23) 0 or (C23) 1.
`sscanf("0b10", "%i", &x)` is similar. glibc 2.38 introduced
`__isoc23_strtol` and `__isoc23_scanf` family functions for binary
compatibility.

When `_ISOC2X_SOURCE` is defined (implied by `_GNU_SOURCE`) or
`__STDC_VERSION__ > 201710L`, `__GLIBC_USE_ISOC2X` is defined to 1 and
these `__isoc23_*` symbols are used.

Add `__isoc23_` versions for the following interceptors:

* sanitizer_common_interceptors.inc implements strtoimax/strtoumax.
  Remove incorrect FIXME about google/sanitizers#321
* asan_interceptors.cpp implements just strtol and strtoll. The default
  `replace_str` mode checks `nptr` is readable and `endptr` is writable.
  atoi reuses the existing strtol interceptor.
* msan_interceptors.cpp implements strtol family functions and their
  `_l` versions. Tested by lib/msan/tests/msan_test.cpp
* sanitizer_common_interceptors.inc implements scanf family functions.

The strtol family functions are spreaded, which is not great, but the
patch (intended for release/17.x) does not attempt to address the issue.

Add symbols to lib/sanitizer_common/symbolizer/scripts/global_symbols.txt to
support both glibc pre-2.38 and 2.38.

When build bots migrate to glibc 2.38+, we will lose test coverage for
non-isoc23 versions since the existing C++ unittests imply `_GNU_SOURCE`.
Add test/sanitizer_common/TestCases/{strtol.c,scanf.c}.
They catch msan false positive in the absence of the interceptors.

Fix llvm/llvm-project#64388
Fix llvm/llvm-project#64946

Link: https://lists.gnu.org/archive/html/info-gnu/2023-07/msg00010.html
("The GNU C Library version 2.38 is now available")

Reviewed By: #sanitizers, vitalybuka, mgorny

Differential Revision: https://reviews.llvm.org/D158943

(cherry picked from commit ad7e2501000da2494860f06a306dfe8c08cc07c3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler-rt:msan Memory sanitizer false-positive Warning fires when it should not
Projects
Status: Needs Triage
Development

No branches or pull requests

3 participants