Skip to content

Commit 22e9024

Browse files
authoredJan 13, 2025··
[IR] Introduce captures attribute (#116990)
This introduces the `captures` attribute as described in: https://discourse.llvm.org/t/rfc-improvements-to-capture-tracking/81420 This initial patch only introduces the IR/bitcode support for the attribute and its in-memory representation as `CaptureInfo`. This will be followed by a patch to upgrade and remove the `nocapture` attribute, and then by actual inference/analysis support. Based on the RFC feedback, I've used a syntax similar to the `memory` attribute, though the only "location" that can be specified is `ret`. I've added some pretty extensive documentation to LangRef on the semantics. One non-obvious bit here is that using ptrtoint will not result in a "return-only" capture, even if the ptrtoint result is only used in the return value. Without this requirement we wouldn't be able to continue ordinary capture analysis on the return value.
1 parent d6f7f2a commit 22e9024

File tree

19 files changed

+580
-10
lines changed

19 files changed

+580
-10
lines changed
 

‎llvm/docs/LangRef.rst

+127-9
Original file line numberDiff line numberDiff line change
@@ -1397,6 +1397,42 @@ Currently, only the following parameter attributes are defined:
13971397
function, returning a pointer to allocated storage disjoint from the
13981398
storage for any other object accessible to the caller.
13991399

1400+
``captures(...)``
1401+
This attributes restrict the ways in which the callee may capture the
1402+
pointer. This is not a valid attribute for return values. This attribute
1403+
applies only to the particular copy of the pointer passed in this argument.
1404+
1405+
The arguments of ``captures`` is a list of captured pointer components,
1406+
which may be ``none``, or a combination of:
1407+
1408+
- ``address``: The integral address of the pointer.
1409+
- ``address_is_null`` (subset of ``address``): Whether the address is null.
1410+
- ``provenance``: The ability to access the pointer for both read and write
1411+
after the function returns.
1412+
- ``read_provenance`` (subset of ``provenance``): The ability to access the
1413+
pointer only for reads after the function returns.
1414+
1415+
Additionally, it is possible to specify that some components are only
1416+
captured in certain locations. Currently only the return value (``ret``)
1417+
and other (default) locations are supported.
1418+
1419+
The `pointer capture section <pointercapture>` discusses these semantics
1420+
in more detail.
1421+
1422+
Some examples of how to use the attribute:
1423+
1424+
- ``captures(none)``: Pointer not captured.
1425+
- ``captures(address, provenance)``: Equivalent to omitting the attribute.
1426+
- ``captures(address)``: Address may be captured, but not provenance.
1427+
- ``captures(address_is_null)``: Only captures whether the address is null.
1428+
- ``captures(address, read_provenance)``: Both address and provenance
1429+
captured, but only for read-only access.
1430+
- ``captures(ret: address, provenance)``: Pointer captured through return
1431+
value only.
1432+
- ``captures(address_is_null, ret: address, provenance)``: The whole pointer
1433+
is captured through the return value, and additionally whether the pointer
1434+
is null is captured in some other way.
1435+
14001436
.. _nocapture:
14011437

14021438
``nocapture``
@@ -3339,10 +3375,92 @@ Pointer Capture
33393375
---------------
33403376

33413377
Given a function call and a pointer that is passed as an argument or stored in
3342-
the memory before the call, a pointer is *captured* by the call if it makes a
3343-
copy of any part of the pointer that outlives the call.
3344-
To be precise, a pointer is captured if one or more of the following conditions
3345-
hold:
3378+
memory before the call, the call may capture two components of the pointer:
3379+
3380+
* The address of the pointer, which is its integral value. This also includes
3381+
parts of the address or any information about the address, including the
3382+
fact that it does not equal one specific value. We further distinguish
3383+
whether only the fact that the address is/isn't null is captured.
3384+
* The provenance of the pointer, which is the ability to perform memory
3385+
accesses through the pointer, in the sense of the :ref:`pointer aliasing
3386+
rules <pointeraliasing>`. We further distinguish whether only read acceses
3387+
are allowed, or both reads and writes.
3388+
3389+
For example, the following function captures the address of ``%a``, because
3390+
it is compared to a pointer, leaking information about the identitiy of the
3391+
pointer:
3392+
3393+
.. code-block:: llvm
3394+
3395+
@glb = global i8 0
3396+
3397+
define i1 @f(ptr %a) {
3398+
%c = icmp eq ptr %a, @glb
3399+
ret i1 %c
3400+
}
3401+
3402+
The function does not capture the provenance of the pointer, because the
3403+
``icmp`` instruction only operates on the pointer address. The following
3404+
function captures both the address and provenance of the pointer, as both
3405+
may be read from ``@glb`` after the function returns:
3406+
3407+
.. code-block:: llvm
3408+
3409+
@glb = global ptr null
3410+
3411+
define void @f(ptr %a) {
3412+
store ptr %a, ptr @glb
3413+
ret void
3414+
}
3415+
3416+
The following function captures *neither* the address nor the provenance of
3417+
the pointer:
3418+
3419+
.. code-block:: llvm
3420+
3421+
define i32 @f(ptr %a) {
3422+
%v = load i32, ptr %a
3423+
ret i32
3424+
}
3425+
3426+
While address capture includes uses of the address within the body of the
3427+
function, provenance capture refers exclusively to the ability to perform
3428+
accesses *after* the function returns. Memory accesses within the function
3429+
itself are not considered pointer captures.
3430+
3431+
We can further say that the capture only occurs through a specific location.
3432+
In the following example, the pointer (both address and provenance) is captured
3433+
through the return value only:
3434+
3435+
.. code-block:: llvm
3436+
3437+
define ptr @f(ptr %a) {
3438+
%gep = getelementptr i8, ptr %a, i64 4
3439+
ret ptr %gep
3440+
}
3441+
3442+
However, we always consider direct inspection of the pointer address
3443+
(e.g. using ``ptrtoint``) to be location-independent. The following example
3444+
is *not* considered a return-only capture, even though the ``ptrtoint``
3445+
ultimately only contribues to the return value:
3446+
3447+
.. code-block:: llvm
3448+
3449+
@lookup = constant [4 x i8] [i8 0, i8 1, i8 2, i8 3]
3450+
3451+
define ptr @f(ptr %a) {
3452+
%a.addr = ptrtoint ptr %a to i64
3453+
%mask = and i64 %a.addr, 3
3454+
%gep = getelementptr i8, ptr @lookup, i64 %mask
3455+
ret ptr %gep
3456+
}
3457+
3458+
This definition is chosen to allow capture analysis to continue with the return
3459+
value in the usual fashion.
3460+
3461+
The following describes possible ways to capture a pointer in more detail,
3462+
where unqualified uses of the word "capture" refer to capturing both address
3463+
and provenance.
33463464

33473465
1. The call stores any bit of the pointer carrying information into a place,
33483466
and the stored bits can be read from the place by the caller after this call
@@ -3381,30 +3499,30 @@ hold:
33813499
@lock = global i1 true
33823500

33833501
define void @f(ptr %a) {
3384-
store ptr %a, ptr* @glb
3502+
store ptr %a, ptr @glb
33853503
store atomic i1 false, ptr @lock release ; %a is captured because another thread can safely read @glb
33863504
store ptr null, ptr @glb
33873505
ret void
33883506
}
33893507

3390-
3. The call's behavior depends on any bit of the pointer carrying information.
3508+
3. The call's behavior depends on any bit of the pointer carrying information
3509+
(address capture only).
33913510

33923511
.. code-block:: llvm
33933512

33943513
@glb = global i8 0
33953514

33963515
define void @f(ptr %a) {
33973516
%c = icmp eq ptr %a, @glb
3398-
br i1 %c, label %BB_EXIT, label %BB_CONTINUE ; escapes %a
3517+
br i1 %c, label %BB_EXIT, label %BB_CONTINUE ; captures address of %a only
33993518
BB_EXIT:
34003519
call void @exit()
34013520
unreachable
34023521
BB_CONTINUE:
34033522
ret void
34043523
}
34053524

3406-
4. The pointer is used in a volatile access as its address.
3407-
3525+
4. The pointer is used as the pointer operand of a volatile access.
34083526

34093527
.. _volatile:
34103528

‎llvm/include/llvm/AsmParser/LLParser.h

+1
Original file line numberDiff line numberDiff line change
@@ -379,6 +379,7 @@ namespace llvm {
379379
bool inAttrGrp, LocTy &BuiltinLoc);
380380
bool parseRangeAttr(AttrBuilder &B);
381381
bool parseInitializesAttr(AttrBuilder &B);
382+
bool parseCapturesAttr(AttrBuilder &B);
382383
bool parseRequiredTypeAttr(AttrBuilder &B, lltok::Kind AttrToken,
383384
Attribute::AttrKind AttrKind);
384385

‎llvm/include/llvm/AsmParser/LLToken.h

+6
Original file line numberDiff line numberDiff line change
@@ -207,6 +207,12 @@ enum Kind {
207207
kw_inaccessiblememonly,
208208
kw_inaccessiblemem_or_argmemonly,
209209

210+
// Captures attribute:
211+
kw_address,
212+
kw_address_is_null,
213+
kw_provenance,
214+
kw_read_provenance,
215+
210216
// nofpclass attribute:
211217
kw_all,
212218
kw_nan,

‎llvm/include/llvm/Bitcode/LLVMBitCodes.h

+1
Original file line numberDiff line numberDiff line change
@@ -788,6 +788,7 @@ enum AttributeKindCodes {
788788
ATTR_KIND_NO_EXT = 99,
789789
ATTR_KIND_NO_DIVERGENCE_SOURCE = 100,
790790
ATTR_KIND_SANITIZE_TYPE = 101,
791+
ATTR_KIND_CAPTURES = 102,
791792
};
792793

793794
enum ComdatSelectionKindCodes {

‎llvm/include/llvm/IR/Attributes.h

+7
Original file line numberDiff line numberDiff line change
@@ -284,6 +284,9 @@ class Attribute {
284284
/// Returns memory effects.
285285
MemoryEffects getMemoryEffects() const;
286286

287+
/// Returns information from captures attribute.
288+
CaptureInfo getCaptureInfo() const;
289+
287290
/// Return the FPClassTest for nofpclass
288291
FPClassTest getNoFPClass() const;
289292

@@ -436,6 +439,7 @@ class AttributeSet {
436439
UWTableKind getUWTableKind() const;
437440
AllocFnKind getAllocKind() const;
438441
MemoryEffects getMemoryEffects() const;
442+
CaptureInfo getCaptureInfo() const;
439443
FPClassTest getNoFPClass() const;
440444
std::string getAsString(bool InAttrGrp = false) const;
441445

@@ -1260,6 +1264,9 @@ class AttrBuilder {
12601264
/// Add memory effect attribute.
12611265
AttrBuilder &addMemoryAttr(MemoryEffects ME);
12621266

1267+
/// Add captures attribute.
1268+
AttrBuilder &addCapturesAttr(CaptureInfo CI);
1269+
12631270
// Add nofpclass attribute
12641271
AttrBuilder &addNoFPClassAttr(FPClassTest NoFPClassMask);
12651272

‎llvm/include/llvm/IR/Attributes.td

+3
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,9 @@ def NoCallback : EnumAttr<"nocallback", IntersectAnd, [FnAttr]>;
183183
/// Function creates no aliases of pointer.
184184
def NoCapture : EnumAttr<"nocapture", IntersectAnd, [ParamAttr]>;
185185

186+
/// Specify how the pointer may be captured.
187+
def Captures : IntAttr<"captures", IntersectCustom, [ParamAttr]>;
188+
186189
/// Function is not a source of divergence.
187190
def NoDivergenceSource : EnumAttr<"nodivergencesource", IntersectAnd, [FnAttr]>;
188191

‎llvm/include/llvm/Support/ModRef.h

+101
Original file line numberDiff line numberDiff line change
@@ -273,6 +273,107 @@ raw_ostream &operator<<(raw_ostream &OS, MemoryEffects RMRB);
273273
// Legacy alias.
274274
using FunctionModRefBehavior = MemoryEffects;
275275

276+
/// Components of the pointer that may be captured.
277+
enum class CaptureComponents : uint8_t {
278+
None = 0,
279+
AddressIsNull = (1 << 0),
280+
Address = (1 << 1) | AddressIsNull,
281+
ReadProvenance = (1 << 2),
282+
Provenance = (1 << 3) | ReadProvenance,
283+
All = Address | Provenance,
284+
LLVM_MARK_AS_BITMASK_ENUM(Provenance),
285+
};
286+
287+
inline bool capturesNothing(CaptureComponents CC) {
288+
return CC == CaptureComponents::None;
289+
}
290+
291+
inline bool capturesAnything(CaptureComponents CC) {
292+
return CC != CaptureComponents::None;
293+
}
294+
295+
inline bool capturesAddressIsNullOnly(CaptureComponents CC) {
296+
return (CC & CaptureComponents::Address) == CaptureComponents::AddressIsNull;
297+
}
298+
299+
inline bool capturesAddress(CaptureComponents CC) {
300+
return (CC & CaptureComponents::Address) != CaptureComponents::None;
301+
}
302+
303+
inline bool capturesReadProvenanceOnly(CaptureComponents CC) {
304+
return (CC & CaptureComponents::Provenance) ==
305+
CaptureComponents::ReadProvenance;
306+
}
307+
308+
inline bool capturesFullProvenance(CaptureComponents CC) {
309+
return (CC & CaptureComponents::Provenance) == CaptureComponents::Provenance;
310+
}
311+
312+
raw_ostream &operator<<(raw_ostream &OS, CaptureComponents CC);
313+
314+
/// Represents which components of the pointer may be captured in which
315+
/// location. This represents the captures(...) attribute in IR.
316+
///
317+
/// For more information on the precise semantics see LangRef.
318+
class CaptureInfo {
319+
CaptureComponents OtherComponents;
320+
CaptureComponents RetComponents;
321+
322+
public:
323+
CaptureInfo(CaptureComponents OtherComponents,
324+
CaptureComponents RetComponents)
325+
: OtherComponents(OtherComponents), RetComponents(RetComponents) {}
326+
327+
CaptureInfo(CaptureComponents Components)
328+
: OtherComponents(Components), RetComponents(Components) {}
329+
330+
/// Create CaptureInfo that may capture all components of the pointer.
331+
static CaptureInfo all() { return CaptureInfo(CaptureComponents::All); }
332+
333+
/// Get components potentially captured by the return value.
334+
CaptureComponents getRetComponents() const { return RetComponents; }
335+
336+
/// Get components potentially captured through locations other than the
337+
/// return value.
338+
CaptureComponents getOtherComponents() const { return OtherComponents; }
339+
340+
/// Get the potentially captured components of the pointer (regardless of
341+
/// location).
342+
operator CaptureComponents() const { return OtherComponents | RetComponents; }
343+
344+
bool operator==(CaptureInfo Other) const {
345+
return OtherComponents == Other.OtherComponents &&
346+
RetComponents == Other.RetComponents;
347+
}
348+
349+
bool operator!=(CaptureInfo Other) const { return !(*this == Other); }
350+
351+
/// Compute union of CaptureInfos.
352+
CaptureInfo operator|(CaptureInfo Other) const {
353+
return CaptureInfo(OtherComponents | Other.OtherComponents,
354+
RetComponents | Other.RetComponents);
355+
}
356+
357+
/// Compute intersection of CaptureInfos.
358+
CaptureInfo operator&(CaptureInfo Other) const {
359+
return CaptureInfo(OtherComponents & Other.OtherComponents,
360+
RetComponents & Other.RetComponents);
361+
}
362+
363+
static CaptureInfo createFromIntValue(uint32_t Data) {
364+
return CaptureInfo(CaptureComponents(Data >> 4),
365+
CaptureComponents(Data & 0xf));
366+
}
367+
368+
/// Convert CaptureInfo into an encoded integer value (used by captures
369+
/// attribute).
370+
uint32_t toIntValue() const {
371+
return (uint32_t(OtherComponents) << 4) | uint32_t(RetComponents);
372+
}
373+
};
374+
375+
raw_ostream &operator<<(raw_ostream &OS, CaptureInfo Info);
376+
276377
} // namespace llvm
277378

278379
#endif

‎llvm/lib/AsmParser/LLLexer.cpp

+4
Original file line numberDiff line numberDiff line change
@@ -704,6 +704,10 @@ lltok::Kind LLLexer::LexIdentifier() {
704704
KEYWORD(argmemonly);
705705
KEYWORD(inaccessiblememonly);
706706
KEYWORD(inaccessiblemem_or_argmemonly);
707+
KEYWORD(address_is_null);
708+
KEYWORD(address);
709+
KEYWORD(provenance);
710+
KEYWORD(read_provenance);
707711

708712
// nofpclass attribute
709713
KEYWORD(all);

0 commit comments

Comments
 (0)
Please sign in to comment.