Experiment with a hybrid bitfield + range encoding for Span / DefId. #53560
Labels
A-incr-comp
Area: Incremental compilation
C-enhancement
Category: An issue proposing an enhancement or a PR with one.
I-compilemem
Issue: Problems and improvements with respect to memory usage during compilation.
T-compiler
Relevant to the compiler team, which will review and decide on the PR/issue.
Roughly, if you have a "container" (file/crate/etc.), and sequential indices in it:
(container_index, intra_container_index)
(but that takes 2x space)Span
does this currently, where the files are effectively "concatenated"An improvement on all of those is to choose an arbitrary chunk size (e.g.
2^17 = 128kB
for files), and then split each container into a number of chunks (ideally just 1 in the common case).You can then use bitfields for
(chunk, intra_chunk_index)
(e.g.15
and17
bits ofu32
).The difference is that to translate
chunk
tocontainer
, we don't need to use binary search, becausechunk
is several orders of magnitude smaller than the index space as a whole, and we can use arrays.That is,
chunk -> container
can be an array, but also, if there is per-container data that would be accessed throughchunk
, we can optimize that by building achunk -> Rc<ContainerData>
array.Translating
intra_chunk_index
tointra_container_index
is similarly easy, if you can look up per-container data, you can subtract its overall start (if each container is a contiguous range of chunks).Another reason this might be useful is translating (an unified)
DefId
orSpan
between crates or between incremental (re)compilation sessions - we can have a bitset of changed chunks: if a chunk is unchanged, the index is identical, otherwise we can have an intra-chunk/container binary search for changed ranges (or just a map of changes).We can grow the number indices within the last chunk of a container, and if we run out of space, we can relocate the container's chunks without a significant cost. Alternatively, another tradeoff we can make is to fragment a container's chunks.
The first step in experimenting with this would have to be take
Span
, and round up the start/end of each file's range to a multiple of a power of2
(e.g.2^17
- but an optimal value would require gathering some real-world file-size statistics).This way we can see if there's a negative performance impact from having unused gaps in the index space, everything else should be an improvement.
We can also try to replace the binary searches to find the
SourceFile
aSpan
is from.cc @nikomatsakis @michaelwoerister
The text was updated successfully, but these errors were encountered: