-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[analyzer] TaintPropagation checker strlen() should not propagate #66086
Conversation
@llvm/pr-subscribers-clang-static-analyzer-1 Changesstrlen(..) call should not propagate taintedness, buf = malloc(strlen(tainted_txt) + 1); // false warning This pattern can lead to a denial of service attack only, when the attacker can directly specify the size of the allocated area as an arbitrary large number (e.g. the value is converted from a user provided string). Later, we could reintroduce strlen() as a taint propagating function with the consideration not to emit warnings when the tainted value cannot be "arbitrarily large" (such as the size of an already allocated memory block). The change has been evaluated on the following open source projects:
In all cases the lost reports are originating from copying untrusted environment variables into another buffer. There are 2 types of lost false positive reports:
-- 4 Files Affected:
diff --git a/clang/docs/analyzer/checkers.rst b/clang/docs/analyzer/checkers.rst index 54ea49e7426cc86..dbd6d7787823530 100644 --- a/clang/docs/analyzer/checkers.rst +++ b/clang/docs/analyzer/checkers.rst @@ -2599,8 +2599,8 @@ Default propagations rules: ``memcpy``, ``memmem``, ``memmove``, ``mbtowc``, ``pread``, ``qsort``, ``qsort_r``, ``rawmemchr``, ``read``, ``recv``, ``recvfrom``, ``rindex``, ``strcasestr``, ``strchr``, ``strchrnul``, ``strcasecmp``, ``strcmp``, - ``strcspn``, ``strlen``, ``strncasecmp``, ``strncmp``, ``strndup``, - ``strndupa``, ``strnlen``, ``strpbrk``, ``strrchr``, ``strsep``, ``strspn``, + ``strcspn``, ``strncasecmp``, ``strncmp``, ``strndup``, + ``strndupa``, ``strpbrk``, ``strrchr``, ``strsep``, ``strspn``, ``strstr``, ``strtol``, ``strtoll``, ``strtoul``, ``strtoull``, ``tolower``, ``toupper``, ``ttyname``, ``ttyname_r``, ``wctomb``, ``wcwidth`` diff --git a/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp index 3dcb45c0b110383..9df69c1ad1b525e 100644 --- a/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp +++ b/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp @@ -694,8 +694,6 @@ void GenericTaintChecker::initTaintRules(CheckerContext &C) const { {{{"strpbrk"}}, TR::Prop({{0}}, {{ReturnValueIndex}})}, {{{"strndup"}}, TR::Prop({{0}}, {{ReturnValueIndex}})}, {{{"strndupa"}}, TR::Prop({{0}}, {{ReturnValueIndex}})}, - {{{"strlen"}}, TR::Prop({{0}}, {{ReturnValueIndex}})}, - {{{"strnlen"}}, TR::Prop({{0}}, {{ReturnValueIndex}})}, {{{"strtol"}}, TR::Prop({{0}}, {{1, ReturnValueIndex}})}, {{{"strtoll"}}, TR::Prop({{0}}, {{1, ReturnValueIndex}})}, {{{"strtoul"}}, TR::Prop({{0}}, {{1, ReturnValueIndex}})}, diff --git a/clang/test/Analysis/taint-diagnostic-visitor.c b/clang/test/Analysis/taint-diagnostic-visitor.c index 663836836d3db67..1eb926f25f9a778 100644 --- a/clang/test/Analysis/taint-diagnostic-visitor.c +++ b/clang/test/Analysis/taint-diagnostic-visitor.c @@ -10,6 +10,7 @@ int scanf(const char *restrict format, ...); int system(const char *command); char* getenv( const char* env_var ); size_t strlen( const char* str ); +int atoi( const char* str ); void *malloc(size_t size ); void free( void *ptr ); char *fgets(char *str, int n, FILE *stream); @@ -54,11 +55,11 @@ void taintDiagnosticVLA(void) { // propagating through variables and expressions char *taintDiagnosticPropagation(){ char *pathbuf; - char *pathlist=getenv("PATH"); // expected-note {{Taint originated here}} + char *size=getenv("SIZE"); // expected-note {{Taint originated here}} // expected-note@-1 {{Taint propagated to the return value}} - if (pathlist){ // expected-note {{Assuming 'pathlist' is non-null}} + if (size){ // expected-note {{Assuming 'size' is non-null}} // expected-note@-1 {{Taking true branch}} - pathbuf=(char*) malloc(strlen(pathlist)+1); // expected-warning{{Untrusted data is used to specify the buffer size}} + pathbuf=(char*) malloc(atoi(size)); // expected-warning{{Untrusted data is used to specify the buffer size}} // expected-note@-1{{Untrusted data is used to specify the buffer size}} // expected-note@-2 {{Taint propagated to the return value}} return pathbuf; @@ -71,12 +72,12 @@ char *taintDiagnosticPropagation(){ char *taintDiagnosticPropagation2(){ char *pathbuf; char *user_env2=getenv("USER_ENV_VAR2");//unrelated taint source - char *pathlist=getenv("PATH"); // expected-note {{Taint originated here}} + char *size=getenv("SIZE"); // expected-note {{Taint originated here}} // expected-note@-1 {{Taint propagated to the return value}} char *user_env=getenv("USER_ENV_VAR");//unrelated taint source - if (pathlist){ // expected-note {{Assuming 'pathlist' is non-null}} + if (size){ // expected-note {{Assuming 'size' is non-null}} // expected-note@-1 {{Taking true branch}} - pathbuf=(char*) malloc(strlen(pathlist)+1); // expected-warning{{Untrusted data is used to specify the buffer size}} + pathbuf=(char*) malloc(atoi(size)+1); // expected-warning{{Untrusted data is used to specify the buffer size}} // expected-note@-1{{Untrusted data is used to specify the buffer size}} // expected-note@-2 {{Taint propagated to the return value}} return pathbuf; diff --git a/clang/test/Analysis/taint-generic.c b/clang/test/Analysis/taint-generic.c index b7906d201e4fad3..a614453c63af671 100644 --- a/clang/test/Analysis/taint-generic.c +++ b/clang/test/Analysis/taint-generic.c @@ -915,24 +915,6 @@ void testStrndupa(size_t n) { clang_analyzer_isTainted_charp(result); // expected-warning {{YES}} } -size_t strlen(const char *s); -void testStrlen() { - char s[10]; - scanf("%9s", s); - - size_t result = strlen(s); - clang_analyzer_isTainted_int(result); // expected-warning {{YES}} -} - -size_t strnlen(const char *s, size_t maxlen); -void testStrnlen(size_t maxlen) { - char s[10]; - scanf("%9s", s); - - size_t result = strnlen(s, maxlen); - clang_analyzer_isTainted_int(result); // expected-warning {{YES}} -} - long strtol(const char *restrict nptr, char **restrict endptr, int base); long long strtoll(const char *restrict nptr, char **restrict endptr, int base); unsigned long int strtoul(const char *nptr, char **endptr, int base); |
@llvm/pr-subscribers-clang Changesstrlen(..) call should not propagate taintedness, buf = malloc(strlen(tainted_txt) + 1); // false warning This pattern can lead to a denial of service attack only, when the attacker can directly specify the size of the allocated area as an arbitrary large number (e.g. the value is converted from a user provided string). Later, we could reintroduce strlen() as a taint propagating function with the consideration not to emit warnings when the tainted value cannot be "arbitrarily large" (such as the size of an already allocated memory block). The change has been evaluated on the following open source projects:
In all cases the lost reports are originating from copying untrusted environment variables into another buffer. There are 2 types of lost false positive reports:
-- 4 Files Affected:
diff --git a/clang/docs/analyzer/checkers.rst b/clang/docs/analyzer/checkers.rst index 54ea49e7426cc86..dbd6d7787823530 100644 --- a/clang/docs/analyzer/checkers.rst +++ b/clang/docs/analyzer/checkers.rst @@ -2599,8 +2599,8 @@ Default propagations rules: ``memcpy``, ``memmem``, ``memmove``, ``mbtowc``, ``pread``, ``qsort``, ``qsort_r``, ``rawmemchr``, ``read``, ``recv``, ``recvfrom``, ``rindex``, ``strcasestr``, ``strchr``, ``strchrnul``, ``strcasecmp``, ``strcmp``, - ``strcspn``, ``strlen``, ``strncasecmp``, ``strncmp``, ``strndup``, - ``strndupa``, ``strnlen``, ``strpbrk``, ``strrchr``, ``strsep``, ``strspn``, + ``strcspn``, ``strncasecmp``, ``strncmp``, ``strndup``, + ``strndupa``, ``strpbrk``, ``strrchr``, ``strsep``, ``strspn``, ``strstr``, ``strtol``, ``strtoll``, ``strtoul``, ``strtoull``, ``tolower``, ``toupper``, ``ttyname``, ``ttyname_r``, ``wctomb``, ``wcwidth`` diff --git a/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp index 3dcb45c0b110383..9df69c1ad1b525e 100644 --- a/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp +++ b/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp @@ -694,8 +694,6 @@ void GenericTaintChecker::initTaintRules(CheckerContext &C) const { {{{"strpbrk"}}, TR::Prop({{0}}, {{ReturnValueIndex}})}, {{{"strndup"}}, TR::Prop({{0}}, {{ReturnValueIndex}})}, {{{"strndupa"}}, TR::Prop({{0}}, {{ReturnValueIndex}})}, - {{{"strlen"}}, TR::Prop({{0}}, {{ReturnValueIndex}})}, - {{{"strnlen"}}, TR::Prop({{0}}, {{ReturnValueIndex}})}, {{{"strtol"}}, TR::Prop({{0}}, {{1, ReturnValueIndex}})}, {{{"strtoll"}}, TR::Prop({{0}}, {{1, ReturnValueIndex}})}, {{{"strtoul"}}, TR::Prop({{0}}, {{1, ReturnValueIndex}})}, diff --git a/clang/test/Analysis/taint-diagnostic-visitor.c b/clang/test/Analysis/taint-diagnostic-visitor.c index 663836836d3db67..1eb926f25f9a778 100644 --- a/clang/test/Analysis/taint-diagnostic-visitor.c +++ b/clang/test/Analysis/taint-diagnostic-visitor.c @@ -10,6 +10,7 @@ int scanf(const char *restrict format, ...); int system(const char *command); char* getenv( const char* env_var ); size_t strlen( const char* str ); +int atoi( const char* str ); void *malloc(size_t size ); void free( void *ptr ); char *fgets(char *str, int n, FILE *stream); @@ -54,11 +55,11 @@ void taintDiagnosticVLA(void) { // propagating through variables and expressions char *taintDiagnosticPropagation(){ char *pathbuf; - char *pathlist=getenv("PATH"); // expected-note {{Taint originated here}} + char *size=getenv("SIZE"); // expected-note {{Taint originated here}} // expected-note@-1 {{Taint propagated to the return value}} - if (pathlist){ // expected-note {{Assuming 'pathlist' is non-null}} + if (size){ // expected-note {{Assuming 'size' is non-null}} // expected-note@-1 {{Taking true branch}} - pathbuf=(char*) malloc(strlen(pathlist)+1); // expected-warning{{Untrusted data is used to specify the buffer size}} + pathbuf=(char*) malloc(atoi(size)); // expected-warning{{Untrusted data is used to specify the buffer size}} // expected-note@-1{{Untrusted data is used to specify the buffer size}} // expected-note@-2 {{Taint propagated to the return value}} return pathbuf; @@ -71,12 +72,12 @@ char *taintDiagnosticPropagation(){ char *taintDiagnosticPropagation2(){ char *pathbuf; char *user_env2=getenv("USER_ENV_VAR2");//unrelated taint source - char *pathlist=getenv("PATH"); // expected-note {{Taint originated here}} + char *size=getenv("SIZE"); // expected-note {{Taint originated here}} // expected-note@-1 {{Taint propagated to the return value}} char *user_env=getenv("USER_ENV_VAR");//unrelated taint source - if (pathlist){ // expected-note {{Assuming 'pathlist' is non-null}} + if (size){ // expected-note {{Assuming 'size' is non-null}} // expected-note@-1 {{Taking true branch}} - pathbuf=(char*) malloc(strlen(pathlist)+1); // expected-warning{{Untrusted data is used to specify the buffer size}} + pathbuf=(char*) malloc(atoi(size)+1); // expected-warning{{Untrusted data is used to specify the buffer size}} // expected-note@-1{{Untrusted data is used to specify the buffer size}} // expected-note@-2 {{Taint propagated to the return value}} return pathbuf; diff --git a/clang/test/Analysis/taint-generic.c b/clang/test/Analysis/taint-generic.c index b7906d201e4fad3..a614453c63af671 100644 --- a/clang/test/Analysis/taint-generic.c +++ b/clang/test/Analysis/taint-generic.c @@ -915,24 +915,6 @@ void testStrndupa(size_t n) { clang_analyzer_isTainted_charp(result); // expected-warning {{YES}} } -size_t strlen(const char *s); -void testStrlen() { - char s[10]; - scanf("%9s", s); - - size_t result = strlen(s); - clang_analyzer_isTainted_int(result); // expected-warning {{YES}} -} - -size_t strnlen(const char *s, size_t maxlen); -void testStrnlen(size_t maxlen) { - char s[10]; - scanf("%9s", s); - - size_t result = strnlen(s, maxlen); - clang_analyzer_isTainted_int(result); // expected-warning {{YES}} -} - long strtol(const char *restrict nptr, char **restrict endptr, int base); long long strtoll(const char *restrict nptr, char **restrict endptr, int base); unsigned long int strtoul(const char *nptr, char **endptr, int base); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very straightforward change and the results on open source projects clearly show that it's an improvement over the status quo.
Also note that the buildkite/github-pull-request (build#1418) failure is completely irrelevant: it complains about spaces found at the end of two lines in the file |
I can understand the frustration of the FPs. However, propagating taint there is the right thing to do. I haven't looked the the content of the patch (yet), neither the diff's. I'll try to have a deeper look tomorrow. |
EDIT: I actually proposed that a couple days ago in #66074 (wcslen) XD |
I finished the review of this PR. By looking at the disappeared reports you attached, I'm convinced that the Consequently, I agree with the raised problems, but I disagree with the approach. On the same token, I think we should be able to separately enable/disable diagnostics on the GenericTaintChecker (including that they should not sink execution paths if they are disabled); but that's a different subject. |
clang/test/Analysis/taint-generic.c
Outdated
size_t strlen(const char *s); | ||
void testStrlen() { | ||
char s[10]; | ||
scanf("%9s", s); | ||
|
||
size_t result = strlen(s); | ||
clang_analyzer_isTainted_int(result); // expected-warning {{YES}} | ||
} | ||
|
||
size_t strnlen(const char *s, size_t maxlen); | ||
void testStrnlen(size_t maxlen) { | ||
char s[10]; | ||
scanf("%9s", s); | ||
|
||
size_t result = strnlen(s, maxlen); | ||
clang_analyzer_isTainted_int(result); // expected-warning {{YES}} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I oppose removing FN tests. They are good at documenting intent, if for nothing else.
It might be even better to add comments there about why we think it's okay and intentional to not propagate taint there. Also, adding a PR link would give the possibility to look deeper to understand the why.
If we remove the malloc(..) as the taint sink, we would lose some true positive findings where the size of the allocated
The above example is prone to denial of service attack as the the attacker just specifies an arbitrarily large number to which a buffer will be allocated. The attacker needs much less resources to specify a large number than the recevier to allocate a large chuck of memory. On the other hand when we have a code like this:
Here we should not warn as the size passed to malloc is the size of an already allocated buffer. So invested resources by the attacker to provide the large string and the server allocating another buffer to contain that string is symmetrical. So not prone to DoS attack. A more sophisticated longer term solution could be that we add a flag to the taint info (or introduce a taint type) that the tainted value was originating from an existing buffer size and then specify the malloc sink so that it should not warn in that case. I know we cannot do this know, but the taint analysis could be extended into this direction. Back to this solution. So for me either solution would work: Which one would you prefer? |
You're right that theoretically speaking the By the way, could you show an OOBV2 true positive that involves taint propagated by
I agree that I'd be happy to accept a patch that (1) ensures that |
TBH I would prefer (b). I see removing the whole
I'm sure I've seen it in the Juliet suite. I believe something similar must have been the motivation for me proposing
Yea, upperbounding Let's discuss then how much benefit warning on tainted malloc allocations provide. |
Putting an upper bound on As a very clear example, this function There are also other more complicated false positives from vim and postgres. Based on these I'd say that propagating taint in |
Yes, I've also seen similar cases around the result of
Let's see how it works out in practice. I won't object to this change. Do you also plan to partially revert |
yes, I will do that. |
Request another round of review once you are happy with the content and addressed the open comments. |
9c7674c
to
f8997b1
Compare
As I'm not a maintainer, I could not push to your branch. |
strlen(..) call should not propagate taintedness, because it brings in many false positive findings. It is a common pattern to copy user provided input to another buffer. In these cases we always get warnings about tainted data used as the malloc parameter: buf = malloc(strlen(tainted_txt) + 1); // false warning This pattern can lead to a denial of service attack only, when the attacker can directly specify the size of the allocated area as an arbitrary large number (e.g. the value is converted from a user provided string). Later, we could reintroduce strlen() as a taint propagating function with the consideration not to emit warnings when the tainted value cannot be "arbitrarily large" (such as the size of an already allocated memory block).
f8997b1
to
889c886
Compare
Thanks for the suggestions. I squashed it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope in the future we will have a more satisfying solution.
LGTM.
strlen(..) call should not propagate taintedness,
because it brings in many false positive findings. It is a common pattern to copy user provided input to another buffer. In these cases we always
get warnings about tainted data used as the malloc parameter:
buf = malloc(strlen(tainted_txt) + 1); // false warning
This pattern can lead to a denial of service attack only, when the attacker can directly specify the size of the allocated area as an arbitrary large number (e.g. the value is converted from a user provided string).
Later, we could reintroduce strlen() as a taint propagating function with the consideration not to emit warnings when the tainted value cannot be "arbitrarily large" (such as the size of an already allocated memory block).
The change has been evaluated on the following open source projects:
memcached: 1 lost false positive
tmux: 0 lost reports
twin 3 lost false positives
vim 1 lost false positive
openssl 0 lost reports
sqliste 2 lost false positives
ffmpeg 0 lost repots
postgresql 3 lost false positives
tinyxml 0 lost reports
libwebm 0 lost reports
xerces 0 lost reports
In all cases the lost reports are originating from copying untrusted environment variables into another buffer.
There are 2 types of lost false positive reports:
Where the warning is emitted at the malloc call by the TaintPropagation Checker
len = strlen(portnumber_filename)+4+1; temp_portnumber_filename = malloc(len);
When pointers are set based on the length of the tainted string by the ArrayOutofBoundsv2 checker.
For example this case.