Don't create a malloc'd block each time we start tracing. #168

ltratt · 2020-12-05T09:20:54Z

When we detect a possibly closed loop in the end user's program, we need to determine whether the current thread is the one tracing it or not. We previously allocated a chunk of memory when we started tracing and used its address as a temporary thread ID.

Ideally we would use the system provided thread ID to determine this, but that turns out to be a non-portable minefield. pthreads provides the pthread_t ID type but that's notionally opaque (on OpenBSD, as far as I can tell from a quick search, it's really an opaque struct about which no guarantees are provided, though some programs such as Chrome do seem to rely on it being equivalent to a usize). Some platforms (at least Linux and OpenBSD) use the PID ID type pid_t for threads too and that's guaranteed to be a signed integer though not it seems of a guaranteed size (though on both platforms it's actually a C int, so 32 bits in practise). Rust provides a ThreadID type and an unstable "as_64" function (rust-lang/rust#67939) but that provides no guarantees about how much of the 64-bit integer space is used.

The challenge we have is that whatever ID we use it must fit into (numbits(usize) - numbits(PHASE_TAGS)) i.e. we have 62 bits for the ID on a 64-bit system. If we rely on the system thread IDs it feels like we're storing up a portability time bomb that might explode one day in the dim and distant future. At least for now, the (minimal) extra performance we might get from that doesn't seem worth the danger.

This commit is thus a relatively simple change. Rather than allocating a malloc'd block every time we start tracing a thread, which means we have to be quite careful about freeing memory, we allocate it on thread construction. This simplifies the code slightly and (since it's an aligned address) we can feel fairly safe that it plays nicely with PHASE_TAGS. We could, if we wanted, allocate this block lazily the first time we trace but that feels like an unnecessary optimisation at this point.

This is based on a suggestion from @vext01 and a comment from @bjorn3.

vext01 · 2020-12-05T10:37:06Z

ykrt/src/mt.rs

@@ -316,6 +311,9 @@ impl MTThread {
 /// The innards of a meta-tracer thread.
 struct MTThreadInner {
    mt: MT,
+    /// A value that uniquely identifies a thread and that is guaranteed not to overlap with
+    /// PHASE_TAG. For simplicities sake this is a pointer to a malloc'd chunk of memory.


Let's distil some of the wisdom of the PR description into this comment.

Namely, we can briefly mention that most system/language provided thread IDs are non-portable and subject to change. Then it's more accurate to say "For portability, future-proofing and simplicities sake..."

Also I think "overlap" is an odd term to use here. What does it mean to overlap? I think I kind of know what you mean, but it'd be less vague if we again, took some inspiration from the PR description. For example:

whatever ID we use it must fit into (numbits(usize) - numbits(PHASE_TAGS)) i.e. we have 62 bits for the ID on a 64-bit system.

This explains really well, especially if you mention alignment guarantees of our pointer.

I'd prefer not to make the comment too long -- see if e65104e works for you.

vext01 · 2020-12-05T10:43:29Z

As an side-note, I was reading about the 48-bit address space yesterday. The top bits of a pointer in x86_64 (bits 48-63) are "reserved", meaning that you can't use them for your own purposes, but you can throw them away and reconstruct them later: the bits are expected to be either all zeroes or all ones according to the rules of sign-extension.

So in theory, we have even more bits available to us in the "pack" if we compress the pointers in this way, however, this too may be a ticking time bomb. Hasn't there been talk of expanding the 48-bit address space already?

ltratt · 2020-12-05T10:45:13Z

Hasn't there been talk of expanding the 48-bit address space already?

Yes, I believe upcoming chips will use 56-bits.

ltratt · 2020-12-05T17:29:59Z

@vext01 See if 2ece7cd helps.

vext01 · 2020-12-05T21:07:59Z

Good, but one question.

that chunk is aligned to a machine word (or greater)

Why "or greater". Could be 9-byte aligned?

bjorn3 · 2020-12-05T21:20:46Z

It could be 16 byte aligned, 32 byte, ... If you generate random 8 byte aligned addresses, some will also be aligned to a higher amount.

Alignment is always a power of two, 9 byte alignment is impossible.

ltratt · 2020-12-06T09:12:10Z

@bjorn3 has hit the nail on the head.

OK to squash?

vext01 · 2020-12-06T09:41:17Z

It could be 16 byte aligned, 32 byte...

Yes, but my point is, if we are asking for 8-byte aligned memory, any higher-aligned pointer that happens to come back (16, 32, ...), is by definition also 8-byte aligned. So to say 8-byte aligned "or greater" is redundant and IMHO actually casts doubt on what is meant.

(My question about 9-byte alignment was intended to be rhetorical to emphasise how "or greater" could be misinterpreted. Alas, it didn't work :P)

So I think it'd be clearer to drop "or greater", and I am probably guilty of highest bikeshedding at this point!

ltratt · 2020-12-06T10:13:11Z

I'm going to be awkward and say that I don't think many people will interpret "greater than 8" as "9" :) It's more to make clear that some platforms will only give you multiples of 16 even if you ask for multiples of 8. Anyway, if you would still like the comment change, I'm willing to do it. Let me know!

vext01 · 2020-12-06T10:24:15Z

It's not a mole hill I'll die on 😆

Please squash (optionally amending the comment at the same time, if you see fit).

It's more to make clear that some platforms will only give you multiples of 16.

Now that I didn't know.

When we detect a possibly closed loop in the end user's program, we need to determine whether the current thread is the one tracing it or not. We previously allocated a chunk of memory when we started tracing and used its address as a temporary thread ID. Ideally we would use the system provided thread ID to determine this, but that turns out to be a non-portable minefield. pthreads provides the pthread_t ID type but that's notionally opaque (on OpenBSD, as far as I can tell from a quick search, it's really an opaque struct about which no guarantees are provided, though some programs such as Chrome do seem to rely on it being equivalent to a usize). Some platforms (at least Linux and OpenBSD) use the PID ID type pid_t for threads too and that's guaranteed to be a signed integer though not it seems of a guaranteed size (though on both platforms it's actually a C int, so 32 bits in practise). Rust provides a ThreadID type and an unstable "as_64" function (rust-lang/rust#67939) but that provides no guarantees about how much of the 64-bit integer space is used. The challenge we have is that whatever ID we use it must fit into (numbits(usize) - numbits(PHASE_TAGS)) i.e. we have 62 bits for the ID on a 64-bit system. If we rely on the system thread IDs it feels like we're storing up a portability time bomb that might explode one day in the dim and distant future. At least for now, the (minimal) extra performance we might get from that doesn't seem worth the danger. This commit is thus a relatively simple change. Rather than allocating a malloc'd block every time we start tracing a thread, which means we have to be quite careful about freeing memory, we allocate it on thread construction. This simplifies the code slightly and (since it's an aligned address) we can feel fairly safe that it plays nicely with PHASE_TAGS. We could, if we wanted, allocate this block lazily the first time we trace but that feels like an unnecessary optimisation at this point.

ltratt · 2020-12-06T10:27:38Z

OK, I've removed "or greater" -- I see your point. Force pushed.

vext01 · 2020-12-06T11:04:02Z

Boris r+

…

On Sun, 6 Dec 2020, 10:27 Laurence Tratt, ***@***.***> wrote: OK, I've removed "or greater" -- I see your point. Force pushed. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#168 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAETWG2DQPFD7JY2PKGPITDSTNMCNANCNFSM4UONKO3A> .

vext01 · 2020-12-06T11:04:30Z

bors r+ (Sorry, autocorrect!)

…

On Sun, 6 Dec 2020, 11:03 Edd Barrett, ***@***.***> wrote: Boris r+ On Sun, 6 Dec 2020, 10:27 Laurence Tratt, ***@***.***> wrote: > OK, I've removed "or greater" -- I see your point. Force pushed. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#168 (comment)>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAETWG2DQPFD7JY2PKGPITDSTNMCNANCNFSM4UONKO3A> > . >

bors · 2020-12-06T11:07:34Z

Build succeeded:

buildbot/ykrustc-buildbot-build-script-hwt

ltratt assigned vext01 Dec 5, 2020

vext01 reviewed Dec 5, 2020

View reviewed changes

ltratt force-pushed the per_thread_malloc_block branch from 2ece7cd to b5a3c60 Compare December 6, 2020 10:27

bors bot merged commit 9e8c82a into ykjit:master Dec 6, 2020

ltratt deleted the per_thread_malloc_block branch December 14, 2020 18:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't create a malloc'd block each time we start tracing. #168

Don't create a malloc'd block each time we start tracing. #168

ltratt commented Dec 5, 2020

vext01 Dec 5, 2020

ltratt Dec 5, 2020

vext01 commented Dec 5, 2020

ltratt commented Dec 5, 2020

ltratt commented Dec 5, 2020

vext01 commented Dec 5, 2020

bjorn3 commented Dec 5, 2020

ltratt commented Dec 6, 2020

vext01 commented Dec 6, 2020

ltratt commented Dec 6, 2020

vext01 commented Dec 6, 2020

ltratt commented Dec 6, 2020

vext01 commented Dec 6, 2020 via email

vext01 commented Dec 6, 2020 via email

bors bot commented Dec 6, 2020

Don't create a malloc'd block each time we start tracing. #168

Don't create a malloc'd block each time we start tracing. #168

Conversation

ltratt commented Dec 5, 2020

vext01 Dec 5, 2020

Choose a reason for hiding this comment

ltratt Dec 5, 2020

Choose a reason for hiding this comment

vext01 commented Dec 5, 2020

ltratt commented Dec 5, 2020

ltratt commented Dec 5, 2020

vext01 commented Dec 5, 2020

bjorn3 commented Dec 5, 2020

ltratt commented Dec 6, 2020

vext01 commented Dec 6, 2020

ltratt commented Dec 6, 2020

vext01 commented Dec 6, 2020

ltratt commented Dec 6, 2020

vext01 commented Dec 6, 2020 via email

vext01 commented Dec 6, 2020 via email

bors bot commented Dec 6, 2020