Rewrite the local_data implementation #15399

lilyball · 2014-07-04T05:28:24Z

This was motivated by a desire to remove allocation in the common
pattern of

let old = key.replace(None)
do_something();
key.replace(old);

This also switched the map representation from a Vec to a TreeMap. A Vec
may be reasonable if there's only a couple TLD keys, but a TreeMap
provides better behavior as the number of keys increases.

Like the Vec, this TreeMap implementation does not shrink the container
when a value is removed. Unlike Vec, this TreeMap implementation cannot
reuse an empty node for a different key. Therefore any key that has been
inserted into the TLD at least once will continue to take up space in
the Map until the task ends. The expectation is that the majority of
keys that are inserted into TLD will be expected to have a value for
most of the rest of the task's lifetime. If this assumption is wrong,
there are two reasonable ways to fix this that could be implemented in
the future:

Provide an API call to either remove a specific key from the TLD and
destruct its node (e.g. remove()), or instead to explicitly clean
up all currently-empty nodes in the map (e.g. compact()). This is
simple, but requires the user to explicitly call it.
Keep track of the number of empty nodes in the map and when the map
is mutated (via replace()), if the number of empty nodes passes
some threshold, compact it automatically. Alternatively, whenever a
new key is inserted that hasn't been used before, compact the map at
that point.

Benchmarks:

I ran 3 benchmarks. tld_replace_none just replaces the tld key with None
repeatedly. tld_replace_some replaces it with Some repeatedly. And
tld_replace_none_some simulates the common behavior of replacing with
None, then replacing with the previous value again (which was a Some).

Old implementation:

test tld_replace_none      ... bench:        20 ns/iter (+/- 0)
test tld_replace_none_some ... bench:        77 ns/iter (+/- 4)
test tld_replace_some      ... bench:        57 ns/iter (+/- 2)

New implementation:

test tld_replace_none      ... bench:        11 ns/iter (+/- 0)
test tld_replace_none_some ... bench:        23 ns/iter (+/- 0)
test tld_replace_some      ... bench:        12 ns/iter (+/- 0)

huonw · 2014-07-04T05:43:37Z

src/liballoc/rc.rs

I don't think this inline is necessary (and similarly for the other new methods here): these functions are generic anyway and so are in crate metadata. Do they make a measurable difference?

I made it inline largely because make_unique() is inline (and of course deref() is #[inline(always)]). I thought it was a reasonable precedent to default these new methods to being inlined as well.

Also argh, that comment example has a typo!

bluss · 2014-07-04T09:32:17Z

Very cool!

There should be a slight unease with adding methods to Rc though -- anything that implements Deref will have confusing (for users) conflicts between its methods and the contained value's methods.

cc @thestinger maybe wants to know about the Rc changes.

lilyball · 2014-07-04T20:07:04Z

@blake2-ppc Yeah, that's my concern as well, but the ability to deal with a unique Rc is important here (otherwise we'd have to reimplement reference counting in our own custom type, which I'd rather avoid).

One option is to turn them into functions inside alloc::rc, but Rc already has the precedent of having a number of methods implemented on it directly (including a set of undocumented ones, like inner(), strong(), etc).

lilyball · 2014-07-04T20:18:49Z

At @huonw's request I put Vec back and benchmarked, and here's what I get:

test local_data::tests::bench_1000_keys_replace_last ... bench:       803 ns/iter (+/- 31)
test local_data::tests::bench_100_keys_replace_last  ... bench:       102 ns/iter (+/- 6)
test local_data::tests::bench_replace_none           ... bench:        14 ns/iter (+/- 0)
test local_data::tests::bench_replace_none_some      ... bench:        38 ns/iter (+/- 1)
test local_data::tests::bench_replace_some           ... bench:        20 ns/iter (+/- 1)

I'll have the numbers for 100/1000 for TreeMap soon, but last time I ran them it was 29ns/iter and 38ns/iter, so there's a pretty big difference here.

bluss · 2014-07-04T20:19:10Z

Well ok that's some inspiration -- .inner() is a trait method that Rc<T> implements -- can't the other new methods be trait methods as well, so that the "confusion" of extra methods on smart pointers is opt-in?

lilyball · 2014-07-04T20:21:26Z

@blake2-ppc Good idea. Rc still has downgrade() and make_unique() though.

Is using a trait for this better than just using top-level functions in the rc module though?

sfackler · 2014-07-04T20:48:26Z

@kballard one possible alternative to avoid the deref issue is to make is_unique etc static methods on Rc. So you'd do Rc::is_unique(&foo) instead of foo.is_unique().

lilyball · 2014-07-04T20:53:39Z

Here's the benchmarks again using TreeMap

test local_data::tests::bench_1000_keys_replace_last ... bench:        38 ns/iter (+/- 2)
test local_data::tests::bench_100_keys_replace_last  ... bench:        28 ns/iter (+/- 1)
test local_data::tests::bench_replace_none           ... bench:        14 ns/iter (+/- 0)
test local_data::tests::bench_replace_none_some      ... bench:        35 ns/iter (+/- 1)
test local_data::tests::bench_replace_some           ... bench:        19 ns/iter (+/- 0)

Note the 100 and 1000 keys behaviors are vastly better than the Vec approach.

lilyball · 2014-07-04T20:54:03Z

@sfackler I hadn't considered static methods. That's better than free functions in the module, which is what I was thinking of doing.

lilyball · 2014-07-04T21:41:57Z

Updated. r?

lilyball · 2014-07-04T21:56:20Z

And again. Fixed the &data nit, added 2 more benchmarks.

alexcrichton · 2014-07-05T05:49:33Z

I need to take some more time to read this more thoroughly, but casually adding functions to core modules such as alloc::rc should not be taken lightly. Regardless of whether the functions are methods of standalone, they need thorough discussion in order to add them.

lilyball · 2014-07-05T05:58:06Z

@alexcrichton I am not at all surprised you want to discuss this first, and that's fine (as long as it doesn't end up stalling out, like other PRs where you've wanted to discuss first).

The alternative to adding the try_get_mut() API to rc is to reimplement a brand new reference-counting type inside of local_data, and I really wanted to avoid doing that.

is_unique() and try_unwrap() are not actually necessary for this PR, but I felt is_unique() was a logical thing to expose given both try_get_mut() and make_unique(), and try_unwrap() I was originally going to use. When I decided I wasn't going to use try_unwrap() after all, I left it in because, again, I think it's a potentially-useful API in light of the other APIs that already allow for manipulation of unique values.

pcwalton · 2014-07-05T06:43:21Z

Previously we rejected the Arc equivalent of try_unwrap(), but I think it's OK for Rc because the deadlock issues are not there. Still, it's a sharp tool and we should describe in the docs that you should be careful when using it.

lilyball · 2014-07-05T19:39:11Z

@pcwalton What sort of warnings are you thinking about adding to the docs? It's a bit of a specialized tool and you need to be careful about when you assume it will or won't be able to work, but it's not really dangerous in any way. I'm happy to add wording suggesting caution, but I don't know what to actually say that makes sense. "You should be careful using this because...."

alexcrichton · 2014-07-07T19:53:23Z

Taking a step back, I'm not sure that a shared ownership model is one that we would like to live with moving into the future. Ideally, I would imagine that task-local-data looks something like this:

Each crate knows the exact set of types and TLD keys it contains. Each TLD is assigned an offset so they don't clash.
Each crate has a slot for a TLD index (specific to this crate).
- Any rlib created will have an undefined symbol for this index
- Any dylib will be initialized through some other means. This may be done dynamically for a dynamically loaded crate.
When compiling a crate, all dependencies are assigned unique integers and the table layout for TLD is set up.
When a TLD key (crate, id) is requested, the following steps are taken:
1. Access OS TLS to get the local rust task
2. If the TLD table is null, allocate one
3. Load table[crate]
4. If this table is null, allocate one
5. Look at sub_table[id]

With a design such as this, all TLD lookups/sets are all O(1), unlike the O(log n) in this PR and the O(n) previously. TLD is far less valuable if it has any complexity other than O(1) for the most part.

Basically, in an ideal world, there's no space for this shared ownership model. I think it's a great idea to reduce allocations, but I think it's important to allow for references into the actual TLS data rather than handing out Rc references which forever forces most insertions to allocate. Some of these allocations can be removed, as you've found in this PR, but I would expect that hardcore usage this still doesn't quite fit the bill due to the non-O(1) lookup.

I know that @pcwalton has wanted to use TLS for many high performance use cases in servo, but has been unable to due to the current design.

lilyball · 2014-07-07T22:46:36Z

@alexcrichton I agree that an O(1) approach would be ideal, but as long as we're handing out references to values stored inline in TLD (instead of separately-allocated as I'm doing here), we need to effectively be a RefCell and dynamically fail on mutation if there are any outstanding loans. There's simply no way to model TLD in the type system to allow for borrowck to deal with it.

Now of course RefCell exists for a reason, and perhaps the dynamic failure is ok, but I chose to get rid of dynamic failure because as long as each TLD value is allocated, dynamic failure is not necessary.

If you feel strongly about this, and think that your described TLD scheme will actually be implemented some day (because it sure isn't being implemented today), then I can reintroduce dynamic failure. The only benefit to doing that right now is to preserve API compatibility with this proposed future TLD rewrite (and that's assuming there's no API breakage from this rewrite otherwise). Fundamentally, it would just involve basically doing .ok().unwrap() within the replace() method and making the return value Option<T> again.

FWIW, it's possible to get O(1) lookup and still preserve the shared ownership model. The time complexity does not come from the need for allocations but rather the fact that we do not in fact have a known offset into TLD for each key, and therefore need to perform some kind of search. And the allocation with this implementation only happens 1. on first insert of a key, or 2. when setting a key value that is not uniquely owned. As long as clients of the TLD API avoid shared ownership, replacing values re-uses the existing allocation. The way this handles shared ownership is compatible with a O(1) TLD scheme like you described, so keeping this API does not preclude fixing lookup. So the only real reason to switch back to a dynamic failure model is if you believe that the allocation will have an important performance impact on TLD (you already have several indirections in your scheme, so adding one more to read the value out of the allocation does not seem necessarily problematic).

I don't know how your scheme is going to handle dynamically loading dylibs at runtime that define new TLD keys. Any existing tasks won't have space in their allocated TLD table for the keys defined by the runtime-loaded dylib. And any resolution to this will involve reallocating the tables somehow, which will break any references to contained values anyway. AFAICT supporting this scenario[1] will require keeping allocation for each value, which brings us back to the shared-ownership model.

[1] and I think we have to. It's a bit prohibitive to say that e.g. syntax extensions cannot use TLD, because while the syntax extension may not use it directly, it may use a library that itself uses TLD (that the syntax extension author isn't even aware of).

huonw · 2014-07-07T22:58:30Z

(A hashmap would give (slightly more expensive) O(1) lookups, although the current library organisation doesn't allow that yet, and @kballard had some segfaulting issues when he tried, iirc.)

lilyball · 2014-07-07T23:26:57Z

@huonw Yeah I think my segfaulting issues stemmed from changing the implementation of local_data in the tests, without recompiling everything from scratch. And since it can't be compiled from scratch (because hashmap lives in std), that makes it a bit of a non-starter.

lilyball · 2014-07-13T23:15:01Z

@alexcrichton How do you want to move forward on this? I think this PR needs to land in some form, because it improves the performance of local_data and that's useful. Do you have a strong objection to landing it with the shared ownership model? As already documented, the shared ownership model can still be preserved in a hypothetical future O(1) lookup implementation. Alternatively I could modify the PR to reintroduce the failure-on-replace() approach that breaks shared ownership, but I would prefer not to do that without a compelling reason.

alexcrichton · 2014-07-15T14:11:10Z

I personally feel that this is not the right approach for TLS. The primitive that TLS is providing is not shared ownership, but rather task-local-storage. The old get_mut was removed due to that being better provided by RefCell, and I feel that forcing Rc falls in that category as well. It's also less clear to me how this can be extended into the future if a shared ownership model is part of the API.

I'm fine with landing the addition of set and the move to a TreeMap, but I don't think that Rc is the way to go with backing TLS-storage.

lilyball · 2014-07-21T07:25:59Z

Note: I have a pending rewrite of this PR that I'm working on that addresses @alexcrichton's issues. I hope to find the time to complete it this coming week.

The correct terminology is Task-Local Data, or TLD. Task-Local Storage, or TLS, is the old terminology that was abandoned because of the confusion with Thread-Local Storage (TLS).

lilyball · 2014-07-30T18:08:14Z

Rewritten. r? @alexcrichton

I ditched Rc in favor of a mini re-implementation, because Rc doesn't provide any way to deallocate the box without dropping the contained value, and I wanted to be able to contain uninitialized values without using Option.

alexcrichton · 2014-07-30T18:12:42Z

src/librustrt/local_data.rs

Could you be sure to confirm that this doesn't pop up in the documentation by accident? In theory grep -R LocalData doc/ should turn up nothing. I forget if this was for a broken link in the past or some weird rustdoc bug which has since been fixed.

Aww crud, it does indeed show up. It's not public, it has no business showing up. What is rustdoc thinking?

Actually, the LocalData trait isn't even used anymore, it should just be removed.

alexcrichton · 2014-07-30T18:19:27Z

src/librustrt/local_data.rs

What's going on here?

Helper type to expunge no-longer-used keys from the map, so that way the tests and benchmarks don't interfere with each other (each test and benchmark uses brand new keys, which as explained earlier won't get removed from the map, and this affects the measurements of the benchmarks). I'll add a comment.

Tests are run their own tasks, so I thought they wouldn't affect other tasks or benchmarks?

Hrm, actually, that's a good point. Benchmarks definitely need to clear the keys (because they're run multiple times) but regular tests do get their own task. I recall measuring a difference in the benchmarks when I added key-clearing to these tests, but it seems like that shouldn't happen. I'll test it again.

Tested again, looks like you're right, the keys from the tests aren't impacting the benchmarks. Which is good, because that's how it should be.

lilyball · 2014-07-31T04:51:52Z

Updated. I believe I've addressed all of the feedback. r? @alexcrichton

alexcrichton · 2014-07-31T14:20:45Z

src/librustrt/local_data.rs

Could the (None, None) case have a return None, the (None, Some(data)) case returns data from the match, and then the (Some(slot), data) case starts with return match (refcount, data) { ... }?

alexcrichton · 2014-07-31T14:24:00Z

Just a few minor comments, but otherwise looks good to me! r=me

forticulous · 2014-07-31T19:41:30Z

@kballard You ditched Rc for this PR but would you still want to add those methods for try_unique and is_unique in a separate PR?

lilyball · 2014-07-31T19:48:00Z

@forticulous I submitted that 1.5 days ago as #16101.

forticulous · 2014-07-31T19:52:38Z

@kballard Excellent, I was hoping that didn't get lost

lilyball · 2014-07-31T20:12:55Z

Comments addressed. Thanks for the review, @alexcrichton

lilyball · 2014-07-31T20:13:53Z

Just realized this is no longer a breaking change.

This was motivated by a desire to remove allocation in the common pattern of let old = key.replace(None) do_something(); key.replace(old); This also switched the map representation from a Vec to a TreeMap. A Vec may be reasonable if there's only a couple TLD keys, but a TreeMap provides better behavior as the number of keys increases. Like the Vec, this TreeMap implementation does not shrink the container when a value is removed. Unlike Vec, this TreeMap implementation cannot reuse an empty node for a different key. Therefore any key that has been inserted into the TLD at least once will continue to take up space in the Map until the task ends. The expectation is that the majority of keys that are inserted into TLD will be expected to have a value for most of the rest of the task's lifetime. If this assumption is wrong, there are two reasonable ways to fix this that could be implemented in the future: 1. Provide an API call to either remove a specific key from the TLD and destruct its node (e.g. `remove()`), or instead to explicitly clean up all currently-empty nodes in the map (e.g. `compact()`). This is simple, but requires the user to explicitly call it. 2. Keep track of the number of empty nodes in the map and when the map is mutated (via `replace()`), if the number of empty nodes passes some threshold, compact it automatically. Alternatively, whenever a new key is inserted that hasn't been used before, compact the map at that point. --- Benchmarks: I ran 3 benchmarks. tld_replace_none just replaces the tld key with None repeatedly. tld_replace_some replaces it with Some repeatedly. And tld_replace_none_some simulates the common behavior of replacing with None, then replacing with the previous value again (which was a Some). Old implementation: test tld_replace_none ... bench: 20 ns/iter (+/- 0) test tld_replace_none_some ... bench: 77 ns/iter (+/- 4) test tld_replace_some ... bench: 57 ns/iter (+/- 2) New implementation: test tld_replace_none ... bench: 11 ns/iter (+/- 0) test tld_replace_none_some ... bench: 23 ns/iter (+/- 0) test tld_replace_some ... bench: 12 ns/iter (+/- 0)

Errors can be printed with {}, printing with {:?} does not work very well. Not actually related to this PR, but it came up when running the tests and now is as good a time to fix it as any.

This was motivated by a desire to remove allocation in the common pattern of let old = key.replace(None) do_something(); key.replace(old); This also switched the map representation from a Vec to a TreeMap. A Vec may be reasonable if there's only a couple TLD keys, but a TreeMap provides better behavior as the number of keys increases. Like the Vec, this TreeMap implementation does not shrink the container when a value is removed. Unlike Vec, this TreeMap implementation cannot reuse an empty node for a different key. Therefore any key that has been inserted into the TLD at least once will continue to take up space in the Map until the task ends. The expectation is that the majority of keys that are inserted into TLD will be expected to have a value for most of the rest of the task's lifetime. If this assumption is wrong, there are two reasonable ways to fix this that could be implemented in the future: 1. Provide an API call to either remove a specific key from the TLD and destruct its node (e.g. `remove()`), or instead to explicitly clean up all currently-empty nodes in the map (e.g. `compact()`). This is simple, but requires the user to explicitly call it. 2. Keep track of the number of empty nodes in the map and when the map is mutated (via `replace()`), if the number of empty nodes passes some threshold, compact it automatically. Alternatively, whenever a new key is inserted that hasn't been used before, compact the map at that point. --- Benchmarks: I ran 3 benchmarks. tld_replace_none just replaces the tld key with None repeatedly. tld_replace_some replaces it with Some repeatedly. And tld_replace_none_some simulates the common behavior of replacing with None, then replacing with the previous value again (which was a Some). Old implementation: test tld_replace_none ... bench: 20 ns/iter (+/- 0) test tld_replace_none_some ... bench: 77 ns/iter (+/- 4) test tld_replace_some ... bench: 57 ns/iter (+/- 2) New implementation: test tld_replace_none ... bench: 11 ns/iter (+/- 0) test tld_replace_none_some ... bench: 23 ns/iter (+/- 0) test tld_replace_some ... bench: 12 ns/iter (+/- 0)

lilyball mentioned this pull request Jul 4, 2014

Avoid allocation for every use of task local standard out #15341

Closed

huonw reviewed Jul 4, 2014
View reviewed changes

lilyball mentioned this pull request Jul 21, 2014

Extended Rc uniqueness methods #14908

Closed

Update docs for TLS -> TLD

3db5cf6

The correct terminology is Task-Local Data, or TLD. Task-Local Storage, or TLS, is the old terminology that was abandoned because of the confusion with Thread-Local Storage (TLS).

alexcrichton reviewed Jul 30, 2014
View reviewed changes

alexcrichton reviewed Jul 31, 2014
View reviewed changes

lilyball added 3 commits July 31, 2014 13:14

Tweak error reporting in io::net::tcp tests

24a62e1

Errors can be printed with {}, printing with {:?} does not work very well. Not actually related to this PR, but it came up when running the tests and now is as good a time to fix it as any.

Add some benchmarks for TLD

e65bcff

bors closed this Aug 1, 2014

lilyball deleted the rewrite_local_data branch August 1, 2014 02:04

Uh oh!

Rewrite the local_data implementation #15399

Rewrite the local_data implementation #15399

Uh oh!

Conversation

lilyball commented Jul 4, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bluss commented Jul 4, 2014

Uh oh!

lilyball commented Jul 4, 2014

Uh oh!

lilyball commented Jul 4, 2014

Uh oh!

bluss commented Jul 4, 2014

Uh oh!

lilyball commented Jul 4, 2014

Uh oh!

sfackler commented Jul 4, 2014

Uh oh!

lilyball commented Jul 4, 2014

Uh oh!

lilyball commented Jul 4, 2014

Uh oh!

lilyball commented Jul 4, 2014

Uh oh!

lilyball commented Jul 4, 2014

Uh oh!

alexcrichton commented Jul 5, 2014

Uh oh!

lilyball commented Jul 5, 2014

Uh oh!

pcwalton commented Jul 5, 2014

Uh oh!

lilyball commented Jul 5, 2014

Uh oh!

alexcrichton commented Jul 7, 2014

Uh oh!

lilyball commented Jul 7, 2014

Uh oh!

huonw commented Jul 7, 2014

Uh oh!

lilyball commented Jul 7, 2014

Uh oh!

lilyball commented Jul 13, 2014

Uh oh!

alexcrichton commented Jul 15, 2014

Uh oh!

lilyball commented Jul 21, 2014

Uh oh!

lilyball commented Jul 30, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lilyball commented Jul 31, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexcrichton commented Jul 31, 2014

Uh oh!

forticulous commented Jul 31, 2014

Uh oh!

lilyball commented Jul 31, 2014