-
Notifications
You must be signed in to change notification settings - Fork 12.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[memprof] Use linear IDs for Frames and call stacks #93740
Conversation
With this patch, we stop using on-disk hash tables for Frames and call stacks. Instead, we'll write out all the Frames as a flat array while maintaining mappings from FrameIds to the indexes into the array. Then we serialize call stacks in terms of those indexes. Likewise, we'll write out all the call stacks as another flat array while maintaining mappings from CallStackIds to the indexes into the call stack array. One minor difference from Frames is that the indexes into the call stack array are not contiguous because call stacks are variable-length objects. Then we serialize IndexedMemProfRecords in terms of the indexes into the call stack array. Now, we describe each call stack with 32-bit indexes into the Frame array (as opposed to the 64-bit FrameIds in Version 2). The use of the smaller type cuts down the profile file size by about 40% relative to Version 2. The departure from the on-disk hash tables contributes a little bit to the savings, too. For now, IndexedMemProfRecords refer to call stacks with 64-bit indexes into the call stack array. As a follow-up, I'll change that to uint32_t, including necessary updates to RecordWriterTrait.
@llvm/pr-subscribers-pgo Author: Kazu Hirata (kazutakahirata) ChangesWith this patch, we stop using on-disk hash tables for Frames and call Likewise, we'll write out all the call stacks as another flat array Then we serialize IndexedMemProfRecords in terms of the indexes Now, we describe each call stack with 32-bit indexes into the Frame For now, IndexedMemProfRecords refer to call stacks with 64-bit Full diff: https://github.com/llvm/llvm-project/pull/93740.diff 5 Files Affected:
diff --git a/llvm/include/llvm/ProfileData/InstrProfReader.h b/llvm/include/llvm/ProfileData/InstrProfReader.h
index 8d475fb048624..1c3d4c795f631 100644
--- a/llvm/include/llvm/ProfileData/InstrProfReader.h
+++ b/llvm/include/llvm/ProfileData/InstrProfReader.h
@@ -659,6 +659,10 @@ class IndexedMemProfReader {
std::unique_ptr<MemProfFrameHashTable> MemProfFrameTable;
/// MemProf call stack data on-disk indexed via call stack id.
std::unique_ptr<MemProfCallStackHashTable> MemProfCallStackTable;
+ /// The starting address of the frame array.
+ const unsigned char *FrameBase = nullptr;
+ /// The starting address of the call stack array.
+ const unsigned char *CallStackBase = nullptr;
Error deserializeV012(const unsigned char *Start, const unsigned char *Ptr,
uint64_t FirstWord, memprof::IndexedVersion Version);
diff --git a/llvm/include/llvm/ProfileData/MemProf.h b/llvm/include/llvm/ProfileData/MemProf.h
index d44a2d1e2fb11..34b295f792a22 100644
--- a/llvm/include/llvm/ProfileData/MemProf.h
+++ b/llvm/include/llvm/ProfileData/MemProf.h
@@ -418,7 +418,9 @@ struct IndexedMemProfRecord {
// Serializes the memprof records in \p Records to the ostream \p OS based
// on the schema provided in \p Schema.
void serialize(const MemProfSchema &Schema, raw_ostream &OS,
- IndexedVersion Version);
+ IndexedVersion Version,
+ llvm::DenseMap<memprof::CallStackId, uint32_t>
+ *MemProfCallStackIndexes = nullptr);
// Deserializes memprof records from the Buffer.
static IndexedMemProfRecord deserialize(const MemProfSchema &Schema,
@@ -557,11 +559,17 @@ class RecordWriterTrait {
// The MemProf version to use for the serialization.
IndexedVersion Version;
+ // Mappings from CallStackId to the indexes into the call stack array.
+ llvm::DenseMap<memprof::CallStackId, uint32_t> *MemProfCallStackIndexes;
+
public:
// We do not support the default constructor, which does not set Version.
RecordWriterTrait() = delete;
- RecordWriterTrait(const MemProfSchema *Schema, IndexedVersion V)
- : Schema(Schema), Version(V) {}
+ RecordWriterTrait(
+ const MemProfSchema *Schema, IndexedVersion V,
+ llvm::DenseMap<memprof::CallStackId, uint32_t> *MemProfCallStackIndexes)
+ : Schema(Schema), Version(V),
+ MemProfCallStackIndexes(MemProfCallStackIndexes) {}
static hash_value_type ComputeHash(key_type_ref K) { return K; }
@@ -586,7 +594,7 @@ class RecordWriterTrait {
void EmitData(raw_ostream &Out, key_type_ref /*Unused*/, data_type_ref V,
offset_type /*Unused*/) {
assert(Schema != nullptr && "MemProf schema is not initialized!");
- V.serialize(*Schema, Out, Version);
+ V.serialize(*Schema, Out, Version, MemProfCallStackIndexes);
// Clear the IndexedMemProfRecord which results in clearing/freeing its
// vectors of allocs and callsites. This is owned by the associated on-disk
// hash table, but unused after this point. See also the comment added to
@@ -835,6 +843,50 @@ template <typename MapTy> struct CallStackIdConverter {
}
};
+// A function object that returns a Frame stored at a given index into the Frame
+// array in the profile.
+struct LinearFrameIdConverter {
+ const unsigned char *FrameBase;
+
+ LinearFrameIdConverter() = delete;
+ LinearFrameIdConverter(const unsigned char *FrameBase)
+ : FrameBase(FrameBase) {}
+
+ Frame operator()(uint32_t LinearId) {
+ uint64_t Offset = static_cast<uint64_t>(LinearId) * Frame::serializedSize();
+ return Frame::deserialize(FrameBase + Offset);
+ }
+};
+
+// A function object that returns a call stack stored at a given index into the
+// call stack array in the profile.
+struct LinearCallStackIdConverter {
+ const unsigned char *CallStackBase;
+ std::function<Frame(uint32_t)> FrameIdToFrame;
+
+ LinearCallStackIdConverter() = delete;
+ LinearCallStackIdConverter(const unsigned char *CallStackBase,
+ std::function<Frame(uint32_t)> FrameIdToFrame)
+ : CallStackBase(CallStackBase), FrameIdToFrame(FrameIdToFrame) {}
+
+ llvm::SmallVector<Frame> operator()(uint32_t LinearCSId) {
+ llvm::SmallVector<Frame> Frames;
+
+ const unsigned char *Ptr =
+ CallStackBase + static_cast<uint64_t>(LinearCSId) * sizeof(uint32_t);
+ uint32_t NumFrames =
+ support::endian::readNext<uint32_t, llvm::endianness::little>(Ptr);
+ Frames.reserve(NumFrames);
+ for (; NumFrames; --NumFrames) {
+ uint32_t Elem =
+ support::endian::readNext<uint32_t, llvm::endianness::little>(Ptr);
+ Frames.push_back(FrameIdToFrame(Elem));
+ }
+
+ return Frames;
+ }
+};
+
struct IndexedMemProfData {
// A map to hold memprof data per function. The lower 64 bits obtained from
// the md5 hash of the function name is used to index into the map.
diff --git a/llvm/lib/ProfileData/InstrProfReader.cpp b/llvm/lib/ProfileData/InstrProfReader.cpp
index 1b36ca1a733a2..1e66716294689 100644
--- a/llvm/lib/ProfileData/InstrProfReader.cpp
+++ b/llvm/lib/ProfileData/InstrProfReader.cpp
@@ -1261,16 +1261,10 @@ Error IndexedMemProfReader::deserializeV012(const unsigned char *Start,
Error IndexedMemProfReader::deserializeV3(const unsigned char *Start,
const unsigned char *Ptr,
memprof::IndexedVersion Version) {
- // The value returned from FrameTableGenerator.Emit.
- const uint64_t FrameTableOffset =
- support::endian::readNext<uint64_t, llvm::endianness::little>(Ptr);
// The offset in the stream right before invoking
// CallStackTableGenerator.Emit.
const uint64_t CallStackPayloadOffset =
support::endian::readNext<uint64_t, llvm::endianness::little>(Ptr);
- // The value returned from CallStackTableGenerator.Emit.
- const uint64_t CallStackTableOffset =
- support::endian::readNext<uint64_t, llvm::endianness::little>(Ptr);
// The offset in the stream right before invoking RecordTableGenerator.Emit.
const uint64_t RecordPayloadOffset =
support::endian::readNext<uint64_t, llvm::endianness::little>(Ptr);
@@ -1284,16 +1278,8 @@ Error IndexedMemProfReader::deserializeV3(const unsigned char *Start,
return SchemaOr.takeError();
Schema = SchemaOr.get();
- // Initialize the frame table reader with the payload and bucket offsets.
- MemProfFrameTable.reset(MemProfFrameHashTable::Create(
- /*Buckets=*/Start + FrameTableOffset,
- /*Payload=*/Ptr,
- /*Base=*/Start));
-
- MemProfCallStackTable.reset(MemProfCallStackHashTable::Create(
- /*Buckets=*/Start + CallStackTableOffset,
- /*Payload=*/Start + CallStackPayloadOffset,
- /*Base=*/Start));
+ FrameBase = Ptr;
+ CallStackBase = Start + CallStackPayloadOffset;
// Now initialize the table reader with a pointer into data buffer.
MemProfRecordTable.reset(MemProfRecordHashTable::Create(
@@ -1605,6 +1591,16 @@ getMemProfRecordV2(const memprof::IndexedMemProfRecord &IndexedRecord,
return Record;
}
+static Expected<memprof::MemProfRecord>
+getMemProfRecordV3(const memprof::IndexedMemProfRecord &IndexedRecord,
+ const unsigned char *FrameBase,
+ const unsigned char *CallStackBase) {
+ memprof::LinearFrameIdConverter FrameIdConv(FrameBase);
+ memprof::LinearCallStackIdConverter CSIdConv(CallStackBase, FrameIdConv);
+ memprof::MemProfRecord Record = IndexedRecord.toMemProfRecord(CSIdConv);
+ return Record;
+}
+
Expected<memprof::MemProfRecord>
IndexedMemProfReader::getMemProfRecord(const uint64_t FuncNameHash) const {
// TODO: Add memprof specific errors.
@@ -1626,11 +1622,17 @@ IndexedMemProfReader::getMemProfRecord(const uint64_t FuncNameHash) const {
"MemProfCallStackTable must not be available");
return getMemProfRecordV0(IndexedRecord, *MemProfFrameTable);
case memprof::Version2:
- case memprof::Version3:
assert(MemProfFrameTable && "MemProfFrameTable must be available");
assert(MemProfCallStackTable && "MemProfCallStackTable must be available");
return getMemProfRecordV2(IndexedRecord, *MemProfFrameTable,
*MemProfCallStackTable);
+ case memprof::Version3:
+ assert(!MemProfFrameTable && "MemProfFrameTable must not be available");
+ assert(!MemProfCallStackTable &&
+ "MemProfCallStackTable must not be available");
+ assert(FrameBase && "FrameBase must be available");
+ assert(CallStackBase && "CallStackBase must be available");
+ return getMemProfRecordV3(IndexedRecord, FrameBase, CallStackBase);
}
return make_error<InstrProfError>(
diff --git a/llvm/lib/ProfileData/InstrProfWriter.cpp b/llvm/lib/ProfileData/InstrProfWriter.cpp
index 7e0c9a159d932..301ecb3b7ceb3 100644
--- a/llvm/lib/ProfileData/InstrProfWriter.cpp
+++ b/llvm/lib/ProfileData/InstrProfWriter.cpp
@@ -57,6 +57,7 @@ class ProfOStream {
uint64_t tell() { return OS.tell(); }
void write(uint64_t V) { LE.write<uint64_t>(V); }
+ void write32(uint32_t V) { LE.write<uint32_t>(V); }
void writeByte(uint8_t V) { LE.write<uint8_t>(V); }
// \c patch can only be called when all data is written and flushed.
@@ -452,8 +453,11 @@ static uint64_t writeMemProfRecords(
ProfOStream &OS,
llvm::MapVector<GlobalValue::GUID, memprof::IndexedMemProfRecord>
&MemProfRecordData,
- memprof::MemProfSchema *Schema, memprof::IndexedVersion Version) {
- memprof::RecordWriterTrait RecordWriter(Schema, Version);
+ memprof::MemProfSchema *Schema, memprof::IndexedVersion Version,
+ llvm::DenseMap<memprof::CallStackId, uint32_t> *MemProfCallStackIndexes =
+ nullptr) {
+ memprof::RecordWriterTrait RecordWriter(Schema, Version,
+ MemProfCallStackIndexes);
OnDiskChainedHashTableGenerator<memprof::RecordWriterTrait>
RecordTableGenerator;
for (auto &[GUID, Record] : MemProfRecordData) {
@@ -485,6 +489,42 @@ static uint64_t writeMemProfFrames(
return FrameTableGenerator.Emit(OS.OS);
}
+// Serialize MemProfFrameData. Return the mapping from FrameIds to their
+// indexes within the frame array.
+static llvm::DenseMap<memprof::FrameId, uint32_t> writeMemProfFrameArray(
+ ProfOStream &OS,
+ llvm::MapVector<memprof::FrameId, memprof::Frame> &MemProfFrameData) {
+ // Mappings from FrameIds to array indexes.
+ llvm::DenseMap<memprof::FrameId, uint32_t> MemProfFrameIndexes;
+
+ // Sort the FrameIDs for stability.
+ std::vector<memprof::FrameId> FrameIdOrder;
+ FrameIdOrder.reserve(MemProfFrameData.size());
+ for (const auto &KV : MemProfFrameData)
+ FrameIdOrder.push_back(KV.first);
+ assert(MemProfFrameData.size() == FrameIdOrder.size());
+ llvm::sort(FrameIdOrder);
+
+ // Serialize all frames while creating mappings from linear IDs to FrameIds.
+ uint64_t Index = 0;
+ MemProfFrameIndexes.reserve(FrameIdOrder.size());
+ for (memprof::FrameId Id : FrameIdOrder) {
+ auto Iter = MemProfFrameData.find(Id);
+ assert(Iter != MemProfFrameData.end());
+ const memprof::Frame &F = Iter->second;
+ F.serialize(OS.OS);
+ MemProfFrameIndexes.insert({Id, Index});
+ ++Index;
+ }
+ assert(MemProfFrameData.size() == Index);
+ assert(MemProfFrameData.size() == MemProfFrameIndexes.size());
+
+ // Release the memory of this MapVector as it is no longer needed.
+ MemProfFrameData.clear();
+
+ return MemProfFrameIndexes;
+}
+
static uint64_t writeMemProfCallStacks(
ProfOStream &OS,
llvm::MapVector<memprof::CallStackId, llvm::SmallVector<memprof::FrameId>>
@@ -499,6 +539,33 @@ static uint64_t writeMemProfCallStacks(
return CallStackTableGenerator.Emit(OS.OS);
}
+static llvm::DenseMap<memprof::CallStackId, uint32_t>
+writeMemProfCallStackArray(
+ ProfOStream &OS,
+ llvm::MapVector<memprof::CallStackId, llvm::SmallVector<memprof::FrameId>>
+ &MemProfCallStackData,
+ llvm::DenseMap<memprof::FrameId, uint32_t> &MemProfFrameIndexes) {
+ llvm::DenseMap<memprof::CallStackId, uint32_t> MemProfCallStackIndexes;
+
+ MemProfCallStackIndexes.reserve(MemProfCallStackData.size());
+ uint64_t CallStackBase = OS.tell();
+ for (const auto &[CSId, CallStack] : MemProfCallStackData) {
+ uint64_t CallStackIndex = (OS.tell() - CallStackBase) / sizeof(uint32_t);
+ MemProfCallStackIndexes.insert({CSId, CallStackIndex});
+ const llvm::SmallVector<memprof::FrameId> CS = CallStack;
+ OS.write32(CS.size());
+ for (const auto F : CS) {
+ assert(MemProfFrameIndexes.contains(F));
+ OS.write32(MemProfFrameIndexes[F]);
+ }
+ }
+
+ // Release the memory of this vector as it is no longer needed.
+ MemProfCallStackData.clear();
+
+ return MemProfCallStackIndexes;
+}
+
// Write out MemProf Version0 as follows:
// uint64_t RecordTableOffset = RecordTableGenerator.Emit
// uint64_t FramePayloadOffset = Offset for the frame payload
@@ -619,9 +686,7 @@ static Error writeMemProfV2(ProfOStream &OS,
// Write out MemProf Version3 as follows:
// uint64_t Version
-// uint64_t FrameTableOffset = FrameTableGenerator.Emit
// uint64_t CallStackPayloadOffset = Offset for the call stack payload
-// uint64_t CallStackTableOffset = CallStackTableGenerator.Emit
// uint64_t RecordPayloadOffset = Offset for the record payload
// uint64_t RecordTableOffset = RecordTableGenerator.Emit
// uint64_t Num schema entries
@@ -637,9 +702,7 @@ static Error writeMemProfV3(ProfOStream &OS,
bool MemProfFullSchema) {
OS.write(memprof::Version3);
uint64_t HeaderUpdatePos = OS.tell();
- OS.write(0ULL); // Reserve space for the memprof frame table offset.
OS.write(0ULL); // Reserve space for the memprof call stack payload offset.
- OS.write(0ULL); // Reserve space for the memprof call stack table offset.
OS.write(0ULL); // Reserve space for the memprof record payload offset.
OS.write(0ULL); // Reserve space for the memprof record table offset.
@@ -648,19 +711,23 @@ static Error writeMemProfV3(ProfOStream &OS,
Schema = memprof::getFullSchema();
writeMemProfSchema(OS, Schema);
- uint64_t FrameTableOffset = writeMemProfFrames(OS, MemProfData.FrameData);
+ llvm::DenseMap<memprof::FrameId, uint32_t> MemProfFrameIndexes =
+ writeMemProfFrameArray(OS, MemProfData.FrameData);
uint64_t CallStackPayloadOffset = OS.tell();
- uint64_t CallStackTableOffset =
- writeMemProfCallStacks(OS, MemProfData.CallStackData);
+ llvm::DenseMap<memprof::CallStackId, uint32_t> MemProfCallStackIndexes =
+ writeMemProfCallStackArray(OS, MemProfData.CallStackData,
+ MemProfFrameIndexes);
uint64_t RecordPayloadOffset = OS.tell();
- uint64_t RecordTableOffset = writeMemProfRecords(OS, MemProfData.RecordData,
- &Schema, memprof::Version3);
+ uint64_t RecordTableOffset =
+ writeMemProfRecords(OS, MemProfData.RecordData, &Schema,
+ memprof::Version3, &MemProfCallStackIndexes);
uint64_t Header[] = {
- FrameTableOffset, CallStackPayloadOffset, CallStackTableOffset,
- RecordPayloadOffset, RecordTableOffset,
+ CallStackPayloadOffset,
+ RecordPayloadOffset,
+ RecordTableOffset,
};
OS.patch({{HeaderUpdatePos, Header, std::size(Header)}});
diff --git a/llvm/lib/ProfileData/MemProf.cpp b/llvm/lib/ProfileData/MemProf.cpp
index 2f0e53736c82e..854b35b6924e1 100644
--- a/llvm/lib/ProfileData/MemProf.cpp
+++ b/llvm/lib/ProfileData/MemProf.cpp
@@ -143,17 +143,43 @@ static void serializeV2(const IndexedMemProfRecord &Record,
LE.write<CallStackId>(CSId);
}
-void IndexedMemProfRecord::serialize(const MemProfSchema &Schema,
- raw_ostream &OS, IndexedVersion Version) {
+static void
+serializeV3(const IndexedMemProfRecord &Record, const MemProfSchema &Schema,
+ raw_ostream &OS,
+ llvm::DenseMap<CallStackId, uint32_t> &MemProfCallStackIndexes) {
+ using namespace support;
+
+ endian::Writer LE(OS, llvm::endianness::little);
+
+ LE.write<uint64_t>(Record.AllocSites.size());
+ for (const IndexedAllocationInfo &N : Record.AllocSites) {
+ assert(MemProfCallStackIndexes.contains(N.CSId));
+ LE.write<uint64_t>(MemProfCallStackIndexes[N.CSId]);
+ N.Info.serialize(Schema, OS);
+ }
+
+ // Related contexts.
+ LE.write<uint64_t>(Record.CallSiteIds.size());
+ for (const auto &CSId : Record.CallSiteIds) {
+ assert(MemProfCallStackIndexes.contains(CSId));
+ LE.write<uint64_t>(MemProfCallStackIndexes[CSId]);
+ }
+}
+
+void IndexedMemProfRecord::serialize(
+ const MemProfSchema &Schema, raw_ostream &OS, IndexedVersion Version,
+ llvm::DenseMap<CallStackId, uint32_t> *MemProfCallStackIndexes) {
switch (Version) {
case Version0:
case Version1:
serializeV0(*this, Schema, OS);
return;
case Version2:
- case Version3:
serializeV2(*this, Schema, OS);
return;
+ case Version3:
+ serializeV3(*this, Schema, OS, *MemProfCallStackIndexes);
+ return;
}
llvm_unreachable("unsupported MemProf version");
}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm but I have a suggestion for a possible time efficiency improvement below.
uint64_t Index = 0; | ||
MemProfFrameIndexes.reserve(FrameIdOrder.size()); | ||
for (memprof::FrameId Id : FrameIdOrder) { | ||
auto Iter = MemProfFrameData.find(Id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this loop has non-trivial time overhead, another possibility would be to convert FrameIdOrder to a vector of pairs (of the frame id and frame), then sort it based on the frame id. That avoids an extra loop here of map lookups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Fixed in the latest iteration.
With this patch, we stop using on-disk hash tables for Frames and call
stacks. Instead, we'll write out all the Frames as a flat array while
maintaining mappings from FrameIds to the indexes into the array.
Then we serialize call stacks in terms of those indexes.
Likewise, we'll write out all the call stacks as another flat array
while maintaining mappings from CallStackIds to the indexes into the
call stack array. One minor difference from Frames is that the
indexes into the call stack array are not contiguous because call
stacks are variable-length objects.
Then we serialize IndexedMemProfRecords in terms of the indexes
into the call stack array.
Now, we describe each call stack with 32-bit indexes into the Frame
array (as opposed to the 64-bit FrameIds in Version 2). The use of
the smaller type cuts down the profile file size by about 40% relative
to Version 2. The departure from the on-disk hash tables contributes
a little bit to the savings, too.
For now, IndexedMemProfRecords refer to call stacks with 64-bit
indexes into the call stack array. As a follow-up, I'll change that
to uint32_t, including necessary updates to RecordWriterTrait.