Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IR] Introduce captures attribute #116990

Merged
merged 7 commits into from
Jan 13, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
136 changes: 127 additions & 9 deletions llvm/docs/LangRef.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1397,6 +1397,42 @@ Currently, only the following parameter attributes are defined:
function, returning a pointer to allocated storage disjoint from the
storage for any other object accessible to the caller.

``captures(...)``
This attributes restrict the ways in which the callee may capture the
pointer. This is not a valid attribute for return values. This attribute
applies only to the particular copy of the pointer passed in this argument.

The arguments of ``captures`` is a list of captured pointer components,
which may be ``none``, or a combination of:

- ``address``: The integral address of the pointer.
- ``address_is_null`` (subset of ``address``): Whether the address is null.
- ``provenance``: The ability to access the pointer for both read and write
after the function returns.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding correctly, it's not possible to have something like captures(address_is_null, ret: address, provenance)? Would something like that make sense?

Copy link
Contributor Author

@nikic nikic Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's currently not supported. This is just to reduce complexity because I don't think it would buy us much in practice right now. But I could change this to be more memory-like and track the captured components independently for the "return" and "other" locations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a change to track locations separately, so captures(address_is_null, ret: address, provenance) is now supported.

- ``read_provenance`` (subset of ``provenance``): The ability to access the
pointer only for reads after the function returns.

Additionally, it is possible to specify that some components are only
captured in certain locations. Currently only the return value (``ret``)
and other (default) locations are supported.

The `pointer capture section <pointercapture>` discusses these semantics
in more detail.

Some examples of how to use the attribute:

- ``captures(none)``: Pointer not captured.
- ``captures(address, provenance)``: Equivalent to omitting the attribute.
- ``captures(address)``: Address may be captured, but not provenance.
- ``captures(address_is_null)``: Only captures whether the address is null.
- ``captures(address, read_provenance)``: Both address and provenance
captured, but only for read-only access.
- ``captures(ret: address, provenance)``: Pointer captured through return
value only.
- ``captures(address_is_null, ret: address, provenance)``: The whole pointer
is captured through the return value, and additionally whether the pointer
is null is captured in some other way.

.. _nocapture:

``nocapture``
Expand Down Expand Up @@ -3339,10 +3375,92 @@ Pointer Capture
---------------

Given a function call and a pointer that is passed as an argument or stored in
the memory before the call, a pointer is *captured* by the call if it makes a
copy of any part of the pointer that outlives the call.
To be precise, a pointer is captured if one or more of the following conditions
hold:
memory before the call, the call may capture two components of the pointer:

* The address of the pointer, which is its integral value. This also includes
parts of the address or any information about the address, including the
fact that it does not equal one specific value. We further distinguish
whether only the fact that the address is/isn't null is captured.
* The provenance of the pointer, which is the ability to perform memory
accesses through the pointer, in the sense of the :ref:`pointer aliasing
rules <pointeraliasing>`. We further distinguish whether only read acceses
are allowed, or both reads and writes.

For example, the following function captures the address of ``%a``, because
it is compared to a pointer, leaking information about the identitiy of the
pointer:

.. code-block:: llvm

@glb = global i8 0

define i1 @f(ptr %a) {
%c = icmp eq ptr %a, @glb
ret i1 %c
}

The function does not capture the provenance of the pointer, because the
``icmp`` instruction only operates on the pointer address. The following
function captures both the address and provenance of the pointer, as both
may be read from ``@glb`` after the function returns:

.. code-block:: llvm

@glb = global ptr null

define void @f(ptr %a) {
store ptr %a, ptr @glb
ret void
}

The following function captures *neither* the address nor the provenance of
the pointer:

.. code-block:: llvm

define i32 @f(ptr %a) {
%v = load i32, ptr %a
ret i32
}

While address capture includes uses of the address within the body of the
function, provenance capture refers exclusively to the ability to perform
accesses *after* the function returns. Memory accesses within the function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean ptrtoint unconditionally captures provenance? I'd like to be able to avoid this for CHERI where it would only be an address capture.

But I guess we avoid this by making sure to always use our local llvm.cheri.cap.address.get intrinsic instead of ptrtoint (although ptrtoint is more helpful for optimizations/known bits).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently LLVM considers ptrtoint to always capture provenance, but I do plan to introduce a variant of it that only captures the address. This is also needed to better model strict provenance in Rust, and should also be helpful to represent pointer differences (though those might benefit from a separate intrinsic, really).

itself are not considered pointer captures.

We can further say that the capture only occurs through a specific location.
In the following example, the pointer (both address and provenance) is captured
through the return value only:

.. code-block:: llvm

define ptr @f(ptr %a) {
%gep = getelementptr i8, ptr %a, i64 4
ret ptr %gep
}

However, we always consider direct inspection of the pointer address
(e.g. using ``ptrtoint``) to be location-independent. The following example
is *not* considered a return-only capture, even though the ``ptrtoint``
ultimately only contribues to the return value:

.. code-block:: llvm

@lookup = constant [4 x i8] [i8 0, i8 1, i8 2, i8 3]

define ptr @f(ptr %a) {
%a.addr = ptrtoint ptr %a to i64
%mask = and i64 %a.addr, 3
%gep = getelementptr i8, ptr @lookup, i64 %mask
ret ptr %gep
}

This definition is chosen to allow capture analysis to continue with the return
value in the usual fashion.

The following describes possible ways to capture a pointer in more detail,
where unqualified uses of the word "capture" refer to capturing both address
and provenance.

1. The call stores any bit of the pointer carrying information into a place,
and the stored bits can be read from the place by the caller after this call
Expand Down Expand Up @@ -3381,30 +3499,30 @@ hold:
@lock = global i1 true

define void @f(ptr %a) {
store ptr %a, ptr* @glb
store ptr %a, ptr @glb
store atomic i1 false, ptr @lock release ; %a is captured because another thread can safely read @glb
store ptr null, ptr @glb
ret void
}

3. The call's behavior depends on any bit of the pointer carrying information.
3. The call's behavior depends on any bit of the pointer carrying information
(address capture only).

.. code-block:: llvm

@glb = global i8 0

define void @f(ptr %a) {
%c = icmp eq ptr %a, @glb
br i1 %c, label %BB_EXIT, label %BB_CONTINUE ; escapes %a
br i1 %c, label %BB_EXIT, label %BB_CONTINUE ; captures address of %a only
BB_EXIT:
call void @exit()
unreachable
BB_CONTINUE:
ret void
}

4. The pointer is used in a volatile access as its address.

4. The pointer is used as the pointer operand of a volatile access.

.. _volatile:

Expand Down
1 change: 1 addition & 0 deletions llvm/include/llvm/AsmParser/LLParser.h
Original file line number Diff line number Diff line change
Expand Up @@ -379,6 +379,7 @@ namespace llvm {
bool inAttrGrp, LocTy &BuiltinLoc);
bool parseRangeAttr(AttrBuilder &B);
bool parseInitializesAttr(AttrBuilder &B);
bool parseCapturesAttr(AttrBuilder &B);
bool parseRequiredTypeAttr(AttrBuilder &B, lltok::Kind AttrToken,
Attribute::AttrKind AttrKind);

Expand Down
6 changes: 6 additions & 0 deletions llvm/include/llvm/AsmParser/LLToken.h
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,12 @@ enum Kind {
kw_inaccessiblememonly,
kw_inaccessiblemem_or_argmemonly,

// Captures attribute:
kw_address,
kw_address_is_null,
kw_provenance,
kw_read_provenance,

// nofpclass attribute:
kw_all,
kw_nan,
Expand Down
1 change: 1 addition & 0 deletions llvm/include/llvm/Bitcode/LLVMBitCodes.h
Original file line number Diff line number Diff line change
Expand Up @@ -788,6 +788,7 @@ enum AttributeKindCodes {
ATTR_KIND_NO_EXT = 99,
ATTR_KIND_NO_DIVERGENCE_SOURCE = 100,
ATTR_KIND_SANITIZE_TYPE = 101,
ATTR_KIND_CAPTURES = 102,
};

enum ComdatSelectionKindCodes {
Expand Down
7 changes: 7 additions & 0 deletions llvm/include/llvm/IR/Attributes.h
Original file line number Diff line number Diff line change
Expand Up @@ -284,6 +284,9 @@ class Attribute {
/// Returns memory effects.
MemoryEffects getMemoryEffects() const;

/// Returns information from captures attribute.
CaptureInfo getCaptureInfo() const;

/// Return the FPClassTest for nofpclass
FPClassTest getNoFPClass() const;

Expand Down Expand Up @@ -436,6 +439,7 @@ class AttributeSet {
UWTableKind getUWTableKind() const;
AllocFnKind getAllocKind() const;
MemoryEffects getMemoryEffects() const;
CaptureInfo getCaptureInfo() const;
FPClassTest getNoFPClass() const;
std::string getAsString(bool InAttrGrp = false) const;

Expand Down Expand Up @@ -1260,6 +1264,9 @@ class AttrBuilder {
/// Add memory effect attribute.
AttrBuilder &addMemoryAttr(MemoryEffects ME);

/// Add captures attribute.
AttrBuilder &addCapturesAttr(CaptureInfo CI);

// Add nofpclass attribute
AttrBuilder &addNoFPClassAttr(FPClassTest NoFPClassMask);

Expand Down
3 changes: 3 additions & 0 deletions llvm/include/llvm/IR/Attributes.td
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,9 @@ def NoCallback : EnumAttr<"nocallback", IntersectAnd, [FnAttr]>;
/// Function creates no aliases of pointer.
def NoCapture : EnumAttr<"nocapture", IntersectAnd, [ParamAttr]>;

/// Specify how the pointer may be captured.
def Captures : IntAttr<"captures", IntersectCustom, [ParamAttr]>;

/// Function is not a source of divergence.
def NoDivergenceSource : EnumAttr<"nodivergencesource", IntersectAnd, [FnAttr]>;

Expand Down
101 changes: 101 additions & 0 deletions llvm/include/llvm/Support/ModRef.h
Original file line number Diff line number Diff line change
Expand Up @@ -273,6 +273,107 @@ raw_ostream &operator<<(raw_ostream &OS, MemoryEffects RMRB);
// Legacy alias.
using FunctionModRefBehavior = MemoryEffects;

/// Components of the pointer that may be captured.
enum class CaptureComponents : uint8_t {
None = 0,
AddressIsNull = (1 << 0),
Address = (1 << 1) | AddressIsNull,
ReadProvenance = (1 << 2),
Provenance = (1 << 3) | ReadProvenance,
All = Address | Provenance,
LLVM_MARK_AS_BITMASK_ENUM(Provenance),
};

inline bool capturesNothing(CaptureComponents CC) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the following commits look like but maybe it would it be better for IDE code completion to have a struct wrapper with these as member functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't worked on the following commits yet, so I expect the API here to change a good bit once I get around to the inference + CaptureTracking changes.

I could wrap this in a struct, but then I'd have to reimplement the functionality that BitMaskEnum provides :(

return CC == CaptureComponents::None;
}

inline bool capturesAnything(CaptureComponents CC) {
return CC != CaptureComponents::None;
}

inline bool capturesAddressIsNullOnly(CaptureComponents CC) {
return (CC & CaptureComponents::Address) == CaptureComponents::AddressIsNull;
}

inline bool capturesAddress(CaptureComponents CC) {
return (CC & CaptureComponents::Address) != CaptureComponents::None;
}

inline bool capturesReadProvenanceOnly(CaptureComponents CC) {
return (CC & CaptureComponents::Provenance) ==
CaptureComponents::ReadProvenance;
}

inline bool capturesFullProvenance(CaptureComponents CC) {
return (CC & CaptureComponents::Provenance) == CaptureComponents::Provenance;
}

raw_ostream &operator<<(raw_ostream &OS, CaptureComponents CC);

/// Represents which components of the pointer may be captured in which
/// location. This represents the captures(...) attribute in IR.
///
/// For more information on the precise semantics see LangRef.
class CaptureInfo {
CaptureComponents OtherComponents;
CaptureComponents RetComponents;

public:
CaptureInfo(CaptureComponents OtherComponents,
CaptureComponents RetComponents)
: OtherComponents(OtherComponents), RetComponents(RetComponents) {}

CaptureInfo(CaptureComponents Components)
: OtherComponents(Components), RetComponents(Components) {}

/// Create CaptureInfo that may capture all components of the pointer.
static CaptureInfo all() { return CaptureInfo(CaptureComponents::All); }

/// Get components potentially captured by the return value.
CaptureComponents getRetComponents() const { return RetComponents; }

/// Get components potentially captured through locations other than the
/// return value.
CaptureComponents getOtherComponents() const { return OtherComponents; }

/// Get the potentially captured components of the pointer (regardless of
/// location).
operator CaptureComponents() const { return OtherComponents | RetComponents; }

bool operator==(CaptureInfo Other) const {
return OtherComponents == Other.OtherComponents &&
RetComponents == Other.RetComponents;
}

bool operator!=(CaptureInfo Other) const { return !(*this == Other); }

/// Compute union of CaptureInfos.
CaptureInfo operator|(CaptureInfo Other) const {
return CaptureInfo(OtherComponents | Other.OtherComponents,
RetComponents | Other.RetComponents);
}

/// Compute intersection of CaptureInfos.
CaptureInfo operator&(CaptureInfo Other) const {
return CaptureInfo(OtherComponents & Other.OtherComponents,
RetComponents & Other.RetComponents);
}

static CaptureInfo createFromIntValue(uint32_t Data) {
return CaptureInfo(CaptureComponents(Data >> 4),
CaptureComponents(Data & 0xf));
}

/// Convert CaptureInfo into an encoded integer value (used by captures
/// attribute).
uint32_t toIntValue() const {
return (uint32_t(OtherComponents) << 4) | uint32_t(RetComponents);
}
};

raw_ostream &operator<<(raw_ostream &OS, CaptureInfo Info);

} // namespace llvm

#endif
4 changes: 4 additions & 0 deletions llvm/lib/AsmParser/LLLexer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -704,6 +704,10 @@ lltok::Kind LLLexer::LexIdentifier() {
KEYWORD(argmemonly);
KEYWORD(inaccessiblememonly);
KEYWORD(inaccessiblemem_or_argmemonly);
KEYWORD(address_is_null);
KEYWORD(address);
KEYWORD(provenance);
KEYWORD(read_provenance);

// nofpclass attribute
KEYWORD(all);
Expand Down
Loading
Loading