Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compress most of spans to 32 bits #44646

Merged
merged 1 commit into from
Sep 25, 2017
Merged

Conversation

petrochenkov
Copy link
Contributor

@petrochenkov petrochenkov commented Sep 17, 2017

@petrochenkov petrochenkov changed the title Compress spans Compress most of spans to 32 bits Sep 17, 2017
@petrochenkov
Copy link
Contributor Author

petrochenkov commented Sep 17, 2017

TODO: Measure difference in actual memory consumption, especially on large crates.
82.68% of spans are "inlined" in the compiler and standard library crates, but that's an indirect measure and it needs to be verified on something else.

TODO2: Remove accidental RLS update

@@ -61,7 +60,7 @@ pub enum ProfileQueriesMsg {
/// end a task
TaskEnd,
/// begin a new query
QueryBegin(Span, QueryMsg),
QueryBegin(QueryMsg),
Copy link
Contributor Author

@petrochenkov petrochenkov Sep 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Span is not Send/Sync if it uses thread-local interner and profile queries are sent to other threads, so I had to temporary remove spans from them.
I'll restore this back later, queries will have to use SpanData instead of Span (I hoped to keep it private, but it looks like there's no better choice :( ).

}
_ => unreachable!()
};
SpanData { lo: BytePos(base), hi: BytePos(base + len), ctxt: SyntaxContext(ctxt) }
Copy link
Contributor Author

@petrochenkov petrochenkov Sep 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If encoding/decoding with 2-bit tag looks too error-prone/unmaintainable (or turns out to be too slow), then I can redo this with 1-bit tag instead, it will reduce the persentage of inlined spans from 82.68% to 80.01% (on rustc/libstd data).

}

#[allow(deprecated)]
pub const DUMMY_SP: Span = Span { lo: BytePos(0), hi: BytePos(0), ctxt: NO_EXPANSION };
#[derive(Clone, Copy, PartialEq, Eq, Hash)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelwoerister
I'm not sure if Span can derive Hash or not (for incremental, etc).
Is hashing the 32-bit "index" enough for interned spans, or actual data must be hashed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incremental uses HashStable anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. this impl:

impl<'a, 'gcx, 'tcx> HashStable<StableHashingContext<'a, 'gcx, 'tcx>> for Span {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as @arielb1 says. We keep hashing for incr. comp. separate because often it's a lot expensive than what we need for a regular hash table and it often requires additional contextual information. So you don't have to worry about it.

You might want to provide a specialized implementation that hashes just one 32 bit value instead for bytes though (unless we do that anyway). But probably not worth the trouble.

@arielb1
Copy link
Contributor

arielb1 commented Sep 17, 2017

Why are you using a [u8; 4]? Is that to make it dealigned and allow slipping through smaller holes? Can't you use a #[repr(packed)] u32 and gain safety?

@petrochenkov
Copy link
Contributor Author

@arielb1

Why are you using a [u8; 4]? Is that to make it dealigned and allow slipping through smaller holes?

Yes.

Can't you use a #[repr(packed)] u32 and gain safety?

Are reads from packed structs guaranteed to perform unaligned loads correctly?
Does #27060 happens only if reference is taken, passed somewhere and then dereferenced?
I wasn't sure, so I avoided packed. If packed works, I'd certainly prefer it to the union trick.

@Mark-Simulacrum
Copy link
Member

We can run this through perf.rlo's benchmarks when ready, just run a try build and ping me on completion.

@arielb1
Copy link
Contributor

arielb1 commented Sep 17, 2017

Are reads from packed structs guaranteed to perform unaligned loads correctly?
Does #27060 happens only if reference is taken, passed somewhere and then dereferenced?

That's right - otherwise packed structs would be quite useless.

And #27060 (which I'm working on fixing right now) only means that some code that should be unsafe is allowed, so if you break it in any way that should be easy to fix.

@bors
Copy link
Contributor

bors commented Sep 17, 2017

☔ The latest upstream changes (presumably #44654) made this pull request unmergeable. Please resolve the merge conflicts.

@carols10cents carols10cents added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Sep 18, 2017
@michaelwoerister
Copy link
Member

This looks awesome, @petrochenkov!
Like @arielb1, I'd also be interested in the performance impact.

@petrochenkov
Copy link
Contributor Author

Updated.
@bors try (for performance testing)

@bors
Copy link
Contributor

bors commented Sep 18, 2017

⌛ Trying commit f069c88 with merge 1125280...

bors added a commit that referenced this pull request Sep 18, 2017
@bors
Copy link
Contributor

bors commented Sep 19, 2017

☀️ Test successful - status-travis
State: approved= try=True

@michaelwoerister
Copy link
Member

@arielb1, I tried typing the merge commit hashes into perf.rlo compare but that didn't seem to work for me. Did you have a different method for performance comparison in mind?

@petrochenkov
Copy link
Contributor Author

We can run this through perf.rlo's benchmarks when ready, just run a try build and ping me on completion.

ping @Mark-Simulacrum

@llogiq
Copy link
Contributor

llogiq commented Sep 19, 2017

I notice that the interner uses a default HashMap. Perhaps we can further speed it up using an optimized hash, given that the SpanData to hash is really small?

@petrochenkov
Copy link
Contributor Author

I looked through the performance data a bit.

What we try to optimize is memory, i.e. max-rss.
The situation looks pretty good, tests use less memory, mostly.
Three regressions are regex-0.1.80@020-incr-from-scr..., regex-0.1.80@080-SparseSet, syntex-0.42.2@000-base. I'm not sure how to interpret this, on one hand regex and syntex are large crates so they can hit the limit for lo and start actively use interner, on the other hand many other tests using regex and syntex doesn't show regressions.

On the other side there are more regressions in speed. I don't know what exactly the tests measure (e.g. how cpu-clock and cycles-u are different), but the speed problem can be mitigated by 1) using 1-bit tag which makes encoding/decoding much more simple, 2) using faster hash function in interner as @llogiq suggested, 3) use span.data() + span_data.lo + span_data.hi more actively when you need e.g. span.lo() and span.hi() in the same place.

@michaelwoerister
Copy link
Member

Many of the tests showing regressions are ones with incremental compilation activated. Incremental compilation has to expand all spans to file:line:col during change detection, so it will do more decoding than other cases. That might explain why these tests show up. That being said, in the future we will probably change how exactly spans are treated by change detection, so I don't think we should give too much weight to the incr. comp. tests right now.

Great find, @llogiq! This should definitely use rustc_data_structures::fx::FxHashMap.

@petrochenkov
Copy link
Contributor Author

@bors try
Let's try 1-bit tag before doing other optimizations.

@bors
Copy link
Contributor

bors commented Sep 20, 2017

🔒 Merge conflict

@petrochenkov
Copy link
Contributor Author

@bors try

@bors
Copy link
Contributor

bors commented Sep 20, 2017

⌛ Trying commit cb1158f with merge 39d2aa2...

bors added a commit that referenced this pull request Sep 20, 2017
@petrochenkov
Copy link
Contributor Author

@michaelwoerister
Updated table

setup max-rss (average) cpu-clock (average)
32-bit span, 2-bit tag -0.73% +0.16%
32-bit span, 1-bit tag -0.70% -1.05%
64-bit span, 1-bit tag -0.66% -0.77%

"32-bit span, 1-bit tag" still seems to be better

@michaelwoerister
Copy link
Member

Alright, thanks for checking! 32 bits with 1 bit tag seems to be a good choice indeed.

Maybe encode() and decode() should be marked as #[inline] just so their body is available in downstream crates.

If you make the interner use an FxHashMap this is good to go :)

// option. This file may not be copied, modified, or distributed
// except according to those terms.

// Spans are encoded using 2-bit tag and 4 different encoding formats for each tag.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment has rotted.

@petrochenkov
Copy link
Contributor Author

@michaelwoerister
Updated.

@petrochenkov petrochenkov added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 23, 2017
@michaelwoerister
Copy link
Member

@bors r+

Thanks, @petrochenkov! Looking forward to seeing the results :)

@bors
Copy link
Contributor

bors commented Sep 25, 2017

📌 Commit 52251cd has been approved by michaelwoerister

@bors
Copy link
Contributor

bors commented Sep 25, 2017

⌛ Testing commit 52251cd with merge dcb4378...

@bors
Copy link
Contributor

bors commented Sep 25, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: michaelwoerister
Pushing dcb4378 to master...

@bors bors merged commit 52251cd into rust-lang:master Sep 25, 2017
@alexcrichton
Copy link
Member

Looks like this may have improved the memory of the tuple-stress benchmark by 5%!

kennytm added a commit to kennytm/rust that referenced this pull request Nov 1, 2017
Optimize some span operations

Do not decode span data twice/thrice/etc unnecessarily.
Applied to stable hashing and all methods in `impl Span`.

Follow up to rust-lang#44646
r? @michaelwoerister
glaubitz added a commit to glaubitz/rust that referenced this pull request Nov 1, 2017
Due the limitation that #[derive(...)] on #[repr(packed)] structs
does not guarantee proper alignment of the compiler-generated
impls is not guaranteed (rust-lang#39696), the change in rust-lang#44646 to compress
Spans results in the compiler generating code with unaligned access.

Until rust-lang#39696 has been fixed, the issue can be worked around by
not using the packed attribute on sparc64 and sparcv9 on the
Span struct.

Fixes: rust-lang#45509
@petrochenkov petrochenkov deleted the scompress branch June 5, 2019 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants