Skip to content

Commit 2ea4f20

Browse files
authored
Merge pull request llvm#3982 from haoNoQ/static-analyzer-cherrypicks-25
Static analyzer cherrypicks 25
2 parents eee7d51 + 3e8146f commit 2ea4f20

File tree

107 files changed

+7111
-1576
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

107 files changed

+7111
-1576
lines changed

clang/docs/analyzer/checkers.rst

+82
Original file line numberDiff line numberDiff line change
@@ -2333,6 +2333,88 @@ A data is tainted when it comes from an unreliable source.
23332333
alpha.unix
23342334
^^^^^^^^^^^
23352335
2336+
.. _alpha-unix-StdCLibraryFunctionArgs:
2337+
2338+
alpha.unix.StdCLibraryFunctionArgs (C)
2339+
""""""""""""""""""""""""""""""""""""""
2340+
Check for calls of standard library functions that violate predefined argument
2341+
constraints. For example, it is stated in the C standard that for the ``int
2342+
isalnum(int ch)`` function the behavior is undefined if the value of ``ch`` is
2343+
not representable as unsigned char and is not equal to ``EOF``.
2344+
2345+
.. code-block:: c
2346+
2347+
void test_alnum_concrete(int v) {
2348+
int ret = isalnum(256); // \
2349+
// warning: Function argument constraint is not satisfied
2350+
(void)ret;
2351+
}
2352+
2353+
If the argument's value is unknown then the value is assumed to hold the proper value range.
2354+
2355+
.. code-block:: c
2356+
2357+
#define EOF -1
2358+
int test_alnum_symbolic(int x) {
2359+
int ret = isalnum(x);
2360+
// after the call, ret is assumed to be in the range [-1, 255]
2361+
2362+
if (ret > 255) // impossible (infeasible branch)
2363+
if (x == 0)
2364+
return ret / x; // division by zero is not reported
2365+
return ret;
2366+
}
2367+
2368+
If the user disables the checker then the argument violation warning is
2369+
suppressed. However, the assumption about the argument is still modeled. This
2370+
is because exploring an execution path that already contains undefined behavior
2371+
is not valuable.
2372+
2373+
There are different kind of constraints modeled: range constraint, not null
2374+
constraint, buffer size constraint. A **range constraint** requires the
2375+
argument's value to be in a specific range, see ``isalnum`` as an example above.
2376+
A **not null constraint** requires the pointer argument to be non-null.
2377+
2378+
A **buffer size** constraint specifies the minimum size of the buffer
2379+
argument. The size might be a known constant. For example, ``asctime_r`` requires
2380+
that the buffer argument's size must be greater than or equal to ``26`` bytes. In
2381+
other cases, the size is denoted by another argument or as a multiplication of
2382+
two arguments.
2383+
For instance, ``size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream)``.
2384+
Here, ``ptr`` is the buffer, and its minimum size is ``size * nmemb``
2385+
2386+
.. code-block:: c
2387+
2388+
void buffer_size_constraint_violation(FILE *file) {
2389+
enum { BUFFER_SIZE = 1024 };
2390+
wchar_t wbuf[BUFFER_SIZE];
2391+
2392+
const size_t size = sizeof(*wbuf); // 4
2393+
const size_t nitems = sizeof(wbuf); // 4096
2394+
2395+
// Below we receive a warning because the 3rd parameter should be the
2396+
// number of elements to read, not the size in bytes. This case is a known
2397+
// vulnerability described by the the ARR38-C SEI-CERT rule.
2398+
fread(wbuf, size, nitems, file);
2399+
}
2400+
2401+
**Limitations**
2402+
2403+
The checker is in alpha because the reports cannot provide notes about the
2404+
values of the arguments. Without this information it is hard to confirm if the
2405+
constraint is indeed violated. For example, consider the above case for
2406+
``fread``. We display in the warning message that the size of the 1st arg
2407+
should be equal to or less than the value of the 2nd arg times the 3rd arg.
2408+
However, we fail to display the concrete values (``4`` and ``4096``) for those
2409+
arguments.
2410+
2411+
**Parameters**
2412+
2413+
The checker models functions (and emits diagnostics) from the C standard by
2414+
default. The ``ModelPOSIX`` option enables the checker to model (and emit
2415+
diagnostics) for functions that are defined in the POSIX standard. This option
2416+
is disabled by default.
2417+
23362418
.. _alpha-unix-BlockInCriticalSection:
23372419
23382420
alpha.unix.BlockInCriticalSection (C)

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td

+1-1
Original file line numberDiff line numberDiff line change
@@ -557,7 +557,7 @@ def StdCLibraryFunctionArgsChecker : Checker<"StdCLibraryFunctionArgs">,
557557
"or is EOF.">,
558558
Dependencies<[StdCLibraryFunctionsChecker]>,
559559
WeakDependencies<[CallAndMessageChecker, NonNullParamChecker, StreamChecker]>,
560-
Documentation<NotDocumented>;
560+
Documentation<HasAlphaDocumentation>;
561561

562562
} // end "alpha.unix"
563563

clang/include/clang/StaticAnalyzer/Checkers/SValExplainer.h

+5-5
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ class SValExplainer : public FullSValVisitor<SValExplainer, std::string> {
3232
std::string Str;
3333
llvm::raw_string_ostream OS(Str);
3434
S->printPretty(OS, nullptr, PrintingPolicy(ACtx.getLangOpts()));
35-
return OS.str();
35+
return Str;
3636
}
3737

3838
bool isThisObject(const SymbolicRegion *R) {
@@ -69,7 +69,7 @@ class SValExplainer : public FullSValVisitor<SValExplainer, std::string> {
6969
std::string Str;
7070
llvm::raw_string_ostream OS(Str);
7171
OS << "concrete memory address '" << I << "'";
72-
return OS.str();
72+
return Str;
7373
}
7474

7575
std::string VisitNonLocSymbolVal(nonloc::SymbolVal V) {
@@ -82,7 +82,7 @@ class SValExplainer : public FullSValVisitor<SValExplainer, std::string> {
8282
llvm::raw_string_ostream OS(Str);
8383
OS << (I.isSigned() ? "signed " : "unsigned ") << I.getBitWidth()
8484
<< "-bit integer '" << I << "'";
85-
return OS.str();
85+
return Str;
8686
}
8787

8888
std::string VisitNonLocLazyCompoundVal(nonloc::LazyCompoundVal V) {
@@ -123,7 +123,7 @@ class SValExplainer : public FullSValVisitor<SValExplainer, std::string> {
123123
OS << "(" << Visit(S->getLHS()) << ") "
124124
<< std::string(BinaryOperator::getOpcodeStr(S->getOpcode())) << " "
125125
<< S->getRHS();
126-
return OS.str();
126+
return Str;
127127
}
128128

129129
// TODO: IntSymExpr doesn't appear in practice.
@@ -177,7 +177,7 @@ class SValExplainer : public FullSValVisitor<SValExplainer, std::string> {
177177
else
178178
OS << "'" << Visit(R->getIndex()) << "'";
179179
OS << " of " + Visit(R->getSuperRegion());
180-
return OS.str();
180+
return Str;
181181
}
182182

183183
std::string VisitNonParamVarRegion(const NonParamVarRegion *R) {

clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def

+17
Original file line numberDiff line numberDiff line change
@@ -320,6 +320,11 @@ ANALYZER_OPTION(bool, ShouldDisplayCheckerNameForText, "display-checker-name",
320320
"Display the checker name for textual outputs",
321321
true)
322322

323+
ANALYZER_OPTION(bool, ShouldSupportSymbolicIntegerCasts,
324+
"support-symbolic-integer-casts",
325+
"Produce cast symbols for integral types.",
326+
false)
327+
323328
ANALYZER_OPTION(
324329
bool, ShouldConsiderSingleElementArraysAsFlexibleArrayMembers,
325330
"consider-single-element-arrays-as-flexible-array-members",
@@ -336,6 +341,18 @@ ANALYZER_OPTION(
336341
"might be modeled by the analyzer to never return NULL.",
337342
false)
338343

344+
ANALYZER_OPTION(
345+
bool, ShouldIgnoreBisonGeneratedFiles, "ignore-bison-generated-files",
346+
"If enabled, any files containing the \"/* A Bison parser, made by\" "
347+
"won't be analyzed.",
348+
true)
349+
350+
ANALYZER_OPTION(
351+
bool, ShouldIgnoreFlexGeneratedFiles, "ignore-flex-generated-files",
352+
"If enabled, any files containing the \"/* A lexical scanner generated by "
353+
"flex\" won't be analyzed.",
354+
true)
355+
339356
//===----------------------------------------------------------------------===//
340357
// Unsigned analyzer options.
341358
//===----------------------------------------------------------------------===//
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
//===- CallDescription.h - function/method call matching --*- C++ -*-===//
2+
//
3+
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4+
// See https://llvm.org/LICENSE.txt for license information.
5+
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6+
//
7+
//===----------------------------------------------------------------------===//
8+
//
9+
/// \file This file defines a generic mechanism for matching for function and
10+
/// method calls of C, C++, and Objective-C languages. Instances of these
11+
/// classes are frequently used together with the CallEvent classes.
12+
//
13+
//===----------------------------------------------------------------------===//
14+
15+
#ifndef LLVM_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_CALLDESCRIPTION_H
16+
#define LLVM_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_CALLDESCRIPTION_H
17+
18+
#include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h"
19+
#include "llvm/ADT/ArrayRef.h"
20+
#include "llvm/ADT/Optional.h"
21+
#include "llvm/Support/Compiler.h"
22+
#include <vector>
23+
24+
namespace clang {
25+
class IdentifierInfo;
26+
} // namespace clang
27+
28+
namespace clang {
29+
namespace ento {
30+
31+
enum CallDescriptionFlags : unsigned {
32+
CDF_None = 0,
33+
34+
/// Describes a C standard function that is sometimes implemented as a macro
35+
/// that expands to a compiler builtin with some __builtin prefix.
36+
/// The builtin may as well have a few extra arguments on top of the requested
37+
/// number of arguments.
38+
CDF_MaybeBuiltin = 1 << 0,
39+
};
40+
41+
/// This class represents a description of a function call using the number of
42+
/// arguments and the name of the function.
43+
class CallDescription {
44+
friend class CallEvent;
45+
using MaybeCount = Optional<unsigned>;
46+
47+
mutable Optional<const IdentifierInfo *> II;
48+
// The list of the qualified names used to identify the specified CallEvent,
49+
// e.g. "{a, b}" represent the qualified names, like "a::b".
50+
std::vector<std::string> QualifiedName;
51+
MaybeCount RequiredArgs;
52+
MaybeCount RequiredParams;
53+
int Flags;
54+
55+
public:
56+
/// Constructs a CallDescription object.
57+
///
58+
/// @param QualifiedName The list of the name qualifiers of the function that
59+
/// will be matched. The user is allowed to skip any of the qualifiers.
60+
/// For example, {"std", "basic_string", "c_str"} would match both
61+
/// std::basic_string<...>::c_str() and std::__1::basic_string<...>::c_str().
62+
///
63+
/// @param RequiredArgs The number of arguments that is expected to match a
64+
/// call. Omit this parameter to match every occurrence of call with a given
65+
/// name regardless the number of arguments.
66+
CallDescription(CallDescriptionFlags Flags,
67+
ArrayRef<const char *> QualifiedName,
68+
MaybeCount RequiredArgs = None,
69+
MaybeCount RequiredParams = None);
70+
71+
/// Construct a CallDescription with default flags.
72+
CallDescription(ArrayRef<const char *> QualifiedName,
73+
MaybeCount RequiredArgs = None,
74+
MaybeCount RequiredParams = None);
75+
76+
CallDescription(std::nullptr_t) = delete;
77+
78+
/// Get the name of the function that this object matches.
79+
StringRef getFunctionName() const { return QualifiedName.back(); }
80+
81+
/// Get the qualified name parts in reversed order.
82+
/// E.g. { "std", "vector", "data" } -> "vector", "std"
83+
auto begin_qualified_name_parts() const {
84+
return std::next(QualifiedName.rbegin());
85+
}
86+
auto end_qualified_name_parts() const { return QualifiedName.rend(); }
87+
88+
/// It's false, if and only if we expect a single identifier, such as
89+
/// `getenv`. It's true for `std::swap`, or `my::detail::container::data`.
90+
bool hasQualifiedNameParts() const { return QualifiedName.size() > 1; }
91+
92+
/// @name Matching CallDescriptions against a CallEvent
93+
/// @{
94+
95+
/// Returns true if the CallEvent is a call to a function that matches
96+
/// the CallDescription.
97+
///
98+
/// \note This function is not intended to be used to match Obj-C method
99+
/// calls.
100+
bool matches(const CallEvent &Call) const;
101+
102+
/// Returns true whether the CallEvent matches on any of the CallDescriptions
103+
/// supplied.
104+
///
105+
/// \note This function is not intended to be used to match Obj-C method
106+
/// calls.
107+
friend bool matchesAny(const CallEvent &Call, const CallDescription &CD1) {
108+
return CD1.matches(Call);
109+
}
110+
111+
/// \copydoc clang::ento::matchesAny(const CallEvent &, const CallDescription &)
112+
template <typename... Ts>
113+
friend bool matchesAny(const CallEvent &Call, const CallDescription &CD1,
114+
const Ts &...CDs) {
115+
return CD1.matches(Call) || matchesAny(Call, CDs...);
116+
}
117+
/// @}
118+
};
119+
120+
/// An immutable map from CallDescriptions to arbitrary data. Provides a unified
121+
/// way for checkers to react on function calls.
122+
template <typename T> class CallDescriptionMap {
123+
friend class CallDescriptionSet;
124+
125+
// Some call descriptions aren't easily hashable (eg., the ones with qualified
126+
// names in which some sections are omitted), so let's put them
127+
// in a simple vector and use linear lookup.
128+
// TODO: Implement an actual map for fast lookup for "hashable" call
129+
// descriptions (eg., the ones for C functions that just match the name).
130+
std::vector<std::pair<CallDescription, T>> LinearMap;
131+
132+
public:
133+
CallDescriptionMap(
134+
std::initializer_list<std::pair<CallDescription, T>> &&List)
135+
: LinearMap(List) {}
136+
137+
template <typename InputIt>
138+
CallDescriptionMap(InputIt First, InputIt Last) : LinearMap(First, Last) {}
139+
140+
~CallDescriptionMap() = default;
141+
142+
// These maps are usually stored once per checker, so let's make sure
143+
// we don't do redundant copies.
144+
CallDescriptionMap(const CallDescriptionMap &) = delete;
145+
CallDescriptionMap &operator=(const CallDescription &) = delete;
146+
147+
CallDescriptionMap(CallDescriptionMap &&) = default;
148+
CallDescriptionMap &operator=(CallDescriptionMap &&) = default;
149+
150+
LLVM_NODISCARD const T *lookup(const CallEvent &Call) const {
151+
// Slow path: linear lookup.
152+
// TODO: Implement some sort of fast path.
153+
for (const std::pair<CallDescription, T> &I : LinearMap)
154+
if (I.first.matches(Call))
155+
return &I.second;
156+
157+
return nullptr;
158+
}
159+
};
160+
161+
/// An immutable set of CallDescriptions.
162+
/// Checkers can efficiently decide if a given CallEvent matches any
163+
/// CallDescription in the set.
164+
class CallDescriptionSet {
165+
CallDescriptionMap<bool /*unused*/> Impl = {};
166+
167+
public:
168+
CallDescriptionSet(std::initializer_list<CallDescription> &&List);
169+
170+
CallDescriptionSet(const CallDescriptionSet &) = delete;
171+
CallDescriptionSet &operator=(const CallDescription &) = delete;
172+
173+
LLVM_NODISCARD bool contains(const CallEvent &Call) const;
174+
};
175+
176+
} // namespace ento
177+
} // namespace clang
178+
179+
#endif // LLVM_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_CALLDESCRIPTION_H

0 commit comments

Comments
 (0)