Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite the local_data implementation #15399

Closed
wants to merge 4 commits into from

Conversation

lilyball
Copy link
Contributor

@lilyball lilyball commented Jul 4, 2014

This was motivated by a desire to remove allocation in the common
pattern of

let old = key.replace(None)
do_something();
key.replace(old);

This also switched the map representation from a Vec to a TreeMap. A Vec
may be reasonable if there's only a couple TLD keys, but a TreeMap
provides better behavior as the number of keys increases.

Like the Vec, this TreeMap implementation does not shrink the container
when a value is removed. Unlike Vec, this TreeMap implementation cannot
reuse an empty node for a different key. Therefore any key that has been
inserted into the TLD at least once will continue to take up space in
the Map until the task ends. The expectation is that the majority of
keys that are inserted into TLD will be expected to have a value for
most of the rest of the task's lifetime. If this assumption is wrong,
there are two reasonable ways to fix this that could be implemented in
the future:

  1. Provide an API call to either remove a specific key from the TLD and
    destruct its node (e.g. remove()), or instead to explicitly clean
    up all currently-empty nodes in the map (e.g. compact()). This is
    simple, but requires the user to explicitly call it.
  2. Keep track of the number of empty nodes in the map and when the map
    is mutated (via replace()), if the number of empty nodes passes
    some threshold, compact it automatically. Alternatively, whenever a
    new key is inserted that hasn't been used before, compact the map at
    that point.

Benchmarks:

I ran 3 benchmarks. tld_replace_none just replaces the tld key with None
repeatedly. tld_replace_some replaces it with Some repeatedly. And
tld_replace_none_some simulates the common behavior of replacing with
None, then replacing with the previous value again (which was a Some).

Old implementation:

test tld_replace_none      ... bench:        20 ns/iter (+/- 0)
test tld_replace_none_some ... bench:        77 ns/iter (+/- 4)
test tld_replace_some      ... bench:        57 ns/iter (+/- 2)

New implementation:

test tld_replace_none      ... bench:        11 ns/iter (+/- 0)
test tld_replace_none_some ... bench:        23 ns/iter (+/- 0)
test tld_replace_some      ... bench:        12 ns/iter (+/- 0)

/// let y = x.clone();
/// asesrt_eq!(x.try_unwrap(), Err(Rc::new(4u)));
/// ```
#[inline]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this inline is necessary (and similarly for the other new methods here): these functions are generic anyway and so are in crate metadata. Do they make a measurable difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it inline largely because make_unique() is inline (and of course deref() is #[inline(always)]). I thought it was a reasonable precedent to default these new methods to being inlined as well.

Also argh, that comment example has a typo!

@bluss
Copy link
Member

bluss commented Jul 4, 2014

Very cool!

There should be a slight unease with adding methods to Rc though -- anything that implements Deref will have confusing (for users) conflicts between its methods and the contained value's methods.

cc @thestinger maybe wants to know about the Rc changes.

@lilyball
Copy link
Contributor Author

lilyball commented Jul 4, 2014

@blake2-ppc Yeah, that's my concern as well, but the ability to deal with a unique Rc is important here (otherwise we'd have to reimplement reference counting in our own custom type, which I'd rather avoid).

One option is to turn them into functions inside alloc::rc, but Rc already has the precedent of having a number of methods implemented on it directly (including a set of undocumented ones, like inner(), strong(), etc).

@lilyball
Copy link
Contributor Author

lilyball commented Jul 4, 2014

At @huonw's request I put Vec back and benchmarked, and here's what I get:

test local_data::tests::bench_1000_keys_replace_last ... bench:       803 ns/iter (+/- 31)
test local_data::tests::bench_100_keys_replace_last  ... bench:       102 ns/iter (+/- 6)
test local_data::tests::bench_replace_none           ... bench:        14 ns/iter (+/- 0)
test local_data::tests::bench_replace_none_some      ... bench:        38 ns/iter (+/- 1)
test local_data::tests::bench_replace_some           ... bench:        20 ns/iter (+/- 1)

I'll have the numbers for 100/1000 for TreeMap soon, but last time I ran them it was 29ns/iter and 38ns/iter, so there's a pretty big difference here.

@bluss
Copy link
Member

bluss commented Jul 4, 2014

Well ok that's some inspiration -- .inner() is a trait method that Rc<T> implements -- can't the other new methods be trait methods as well, so that the "confusion" of extra methods on smart pointers is opt-in?

@lilyball
Copy link
Contributor Author

lilyball commented Jul 4, 2014

@blake2-ppc Good idea. Rc still has downgrade() and make_unique() though.

Is using a trait for this better than just using top-level functions in the rc module though?

@sfackler
Copy link
Member

sfackler commented Jul 4, 2014

@kballard one possible alternative to avoid the deref issue is to make is_unique etc static methods on Rc. So you'd do Rc::is_unique(&foo) instead of foo.is_unique().

@lilyball
Copy link
Contributor Author

lilyball commented Jul 4, 2014

Here's the benchmarks again using TreeMap

test local_data::tests::bench_1000_keys_replace_last ... bench:        38 ns/iter (+/- 2)
test local_data::tests::bench_100_keys_replace_last  ... bench:        28 ns/iter (+/- 1)
test local_data::tests::bench_replace_none           ... bench:        14 ns/iter (+/- 0)
test local_data::tests::bench_replace_none_some      ... bench:        35 ns/iter (+/- 1)
test local_data::tests::bench_replace_some           ... bench:        19 ns/iter (+/- 0)

Note the 100 and 1000 keys behaviors are vastly better than the Vec approach.

@lilyball
Copy link
Contributor Author

lilyball commented Jul 4, 2014

@sfackler I hadn't considered static methods. That's better than free functions in the module, which is what I was thinking of doing.

@lilyball
Copy link
Contributor Author

lilyball commented Jul 4, 2014

Updated. r?

@lilyball
Copy link
Contributor Author

lilyball commented Jul 4, 2014

And again. Fixed the &data nit, added 2 more benchmarks.

@alexcrichton
Copy link
Member

I need to take some more time to read this more thoroughly, but casually adding functions to core modules such as alloc::rc should not be taken lightly. Regardless of whether the functions are methods of standalone, they need thorough discussion in order to add them.

@lilyball
Copy link
Contributor Author

lilyball commented Jul 5, 2014

@alexcrichton I am not at all surprised you want to discuss this first, and that's fine (as long as it doesn't end up stalling out, like other PRs where you've wanted to discuss first).

The alternative to adding the try_get_mut() API to rc is to reimplement a brand new reference-counting type inside of local_data, and I really wanted to avoid doing that.

is_unique() and try_unwrap() are not actually necessary for this PR, but I felt is_unique() was a logical thing to expose given both try_get_mut() and make_unique(), and try_unwrap() I was originally going to use. When I decided I wasn't going to use try_unwrap() after all, I left it in because, again, I think it's a potentially-useful API in light of the other APIs that already allow for manipulation of unique values.

@pcwalton
Copy link
Contributor

pcwalton commented Jul 5, 2014

Previously we rejected the Arc equivalent of try_unwrap(), but I think it's OK for Rc because the deadlock issues are not there. Still, it's a sharp tool and we should describe in the docs that you should be careful when using it.

@lilyball
Copy link
Contributor Author

lilyball commented Jul 5, 2014

@pcwalton What sort of warnings are you thinking about adding to the docs? It's a bit of a specialized tool and you need to be careful about when you assume it will or won't be able to work, but it's not really dangerous in any way. I'm happy to add wording suggesting caution, but I don't know what to actually say that makes sense. "You should be careful using this because...."

@alexcrichton
Copy link
Member

Taking a step back, I'm not sure that a shared ownership model is one that we would like to live with moving into the future. Ideally, I would imagine that task-local-data looks something like this:

  1. Each crate knows the exact set of types and TLD keys it contains. Each TLD is assigned an offset so they don't clash.
  2. Each crate has a slot for a TLD index (specific to this crate).
    • Any rlib created will have an undefined symbol for this index
    • Any dylib will be initialized through some other means. This may be done dynamically for a dynamically loaded crate.
  3. When compiling a crate, all dependencies are assigned unique integers and the table layout for TLD is set up.
  4. When a TLD key (crate, id) is requested, the following steps are taken:
    1. Access OS TLS to get the local rust task
    2. If the TLD table is null, allocate one
    3. Load table[crate]
    4. If this table is null, allocate one
    5. Look at sub_table[id]

With a design such as this, all TLD lookups/sets are all O(1), unlike the O(log n) in this PR and the O(n) previously. TLD is far less valuable if it has any complexity other than O(1) for the most part.

Basically, in an ideal world, there's no space for this shared ownership model. I think it's a great idea to reduce allocations, but I think it's important to allow for references into the actual TLS data rather than handing out Rc references which forever forces most insertions to allocate. Some of these allocations can be removed, as you've found in this PR, but I would expect that hardcore usage this still doesn't quite fit the bill due to the non-O(1) lookup.

I know that @pcwalton has wanted to use TLS for many high performance use cases in servo, but has been unable to due to the current design.

@lilyball
Copy link
Contributor Author

lilyball commented Jul 7, 2014

@alexcrichton I agree that an O(1) approach would be ideal, but as long as we're handing out references to values stored inline in TLD (instead of separately-allocated as I'm doing here), we need to effectively be a RefCell and dynamically fail on mutation if there are any outstanding loans. There's simply no way to model TLD in the type system to allow for borrowck to deal with it.

Now of course RefCell exists for a reason, and perhaps the dynamic failure is ok, but I chose to get rid of dynamic failure because as long as each TLD value is allocated, dynamic failure is not necessary.

If you feel strongly about this, and think that your described TLD scheme will actually be implemented some day (because it sure isn't being implemented today), then I can reintroduce dynamic failure. The only benefit to doing that right now is to preserve API compatibility with this proposed future TLD rewrite (and that's assuming there's no API breakage from this rewrite otherwise). Fundamentally, it would just involve basically doing .ok().unwrap() within the replace() method and making the return value Option<T> again.

FWIW, it's possible to get O(1) lookup and still preserve the shared ownership model. The time complexity does not come from the need for allocations but rather the fact that we do not in fact have a known offset into TLD for each key, and therefore need to perform some kind of search. And the allocation with this implementation only happens 1. on first insert of a key, or 2. when setting a key value that is not uniquely owned. As long as clients of the TLD API avoid shared ownership, replacing values re-uses the existing allocation. The way this handles shared ownership is compatible with a O(1) TLD scheme like you described, so keeping this API does not preclude fixing lookup. So the only real reason to switch back to a dynamic failure model is if you believe that the allocation will have an important performance impact on TLD (you already have several indirections in your scheme, so adding one more to read the value out of the allocation does not seem necessarily problematic).


I don't know how your scheme is going to handle dynamically loading dylibs at runtime that define new TLD keys. Any existing tasks won't have space in their allocated TLD table for the keys defined by the runtime-loaded dylib. And any resolution to this will involve reallocating the tables somehow, which will break any references to contained values anyway. AFAICT supporting this scenario[1] will require keeping allocation for each value, which brings us back to the shared-ownership model.

[1] and I think we have to. It's a bit prohibitive to say that e.g. syntax extensions cannot use TLD, because while the syntax extension may not use it directly, it may use a library that itself uses TLD (that the syntax extension author isn't even aware of).

@huonw
Copy link
Member

huonw commented Jul 7, 2014

(A hashmap would give (slightly more expensive) O(1) lookups, although the current library organisation doesn't allow that yet, and @kballard had some segfaulting issues when he tried, iirc.)

@lilyball
Copy link
Contributor Author

lilyball commented Jul 7, 2014

@huonw Yeah I think my segfaulting issues stemmed from changing the implementation of local_data in the tests, without recompiling everything from scratch. And since it can't be compiled from scratch (because hashmap lives in std), that makes it a bit of a non-starter.

@lilyball
Copy link
Contributor Author

@alexcrichton How do you want to move forward on this? I think this PR needs to land in some form, because it improves the performance of local_data and that's useful. Do you have a strong objection to landing it with the shared ownership model? As already documented, the shared ownership model can still be preserved in a hypothetical future O(1) lookup implementation. Alternatively I could modify the PR to reintroduce the failure-on-replace() approach that breaks shared ownership, but I would prefer not to do that without a compelling reason.

@alexcrichton
Copy link
Member

I personally feel that this is not the right approach for TLS. The primitive that TLS is providing is not shared ownership, but rather task-local-storage. The old get_mut was removed due to that being better provided by RefCell, and I feel that forcing Rc falls in that category as well. It's also less clear to me how this can be extended into the future if a shared ownership model is part of the API.

I'm fine with landing the addition of set and the move to a TreeMap, but I don't think that Rc is the way to go with backing TLS-storage.

@lilyball
Copy link
Contributor Author

Note: I have a pending rewrite of this PR that I'm working on that addresses @alexcrichton's issues. I hope to find the time to complete it this coming week.

The correct terminology is Task-Local Data, or TLD. Task-Local Storage,
or TLS, is the old terminology that was abandoned because of the
confusion with Thread-Local Storage (TLS).
@lilyball
Copy link
Contributor Author

Rewritten. r? @alexcrichton

I ditched Rc in favor of a mini re-implementation, because Rc doesn't provide any way to deallocate the box without dropping the contained value, and I wanted to be able to contain uninitialized values without using Option.

@@ -66,33 +69,51 @@ pub type Key<T> = &'static KeyValue<T>;
#[allow(missing_doc)]
pub enum KeyValue<T> { Key }

#[doc(hidden)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you be sure to confirm that this doesn't pop up in the documentation by accident? In theory grep -R LocalData doc/ should turn up nothing. I forget if this was for a broken link in the past or some weird rustdoc bug which has since been fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aww crud, it does indeed show up. It's not public, it has no business showing up. What is rustdoc thinking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the LocalData trait isn't even used anymore, it should just be removed.

let ClearKey(ref key) = *self;
key.clear();
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helper type to expunge no-longer-used keys from the map, so that way the tests and benchmarks don't interfere with each other (each test and benchmark uses brand new keys, which as explained earlier won't get removed from the map, and this affects the measurements of the benchmarks). I'll add a comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are run their own tasks, so I thought they wouldn't affect other tasks or benchmarks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, actually, that's a good point. Benchmarks definitely need to clear the keys (because they're run multiple times) but regular tests do get their own task. I recall measuring a difference in the benchmarks when I added key-clearing to these tests, but it seems like that shouldn't happen. I'll test it again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested again, looks like you're right, the keys from the tests aren't impacting the benchmarks. Which is good, because that's how it should be.

@lilyball
Copy link
Contributor Author

Updated. I believe I've addressed all of the feedback. r? @alexcrichton

Some(newValue) => {
map.insert(keyval, TLDValue::new(newValue));
}
None => {}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the (None, None) case have a return None, the (None, Some(data)) case returns data from the match, and then the (Some(slot), data) case starts with return match (refcount, data) { ... }?

@alexcrichton
Copy link
Member

Just a few minor comments, but otherwise looks good to me! r=me

@forticulous
Copy link

@kballard You ditched Rc for this PR but would you still want to add those methods for try_unique and is_unique in a separate PR?

@lilyball
Copy link
Contributor Author

@forticulous I submitted that 1.5 days ago as #16101.

@forticulous
Copy link

@kballard Excellent, I was hoping that didn't get lost

@lilyball
Copy link
Contributor Author

Comments addressed. Thanks for the review, @alexcrichton

@lilyball
Copy link
Contributor Author

Just realized this is no longer a breaking change.

lilyball added 3 commits July 31, 2014 13:14
This was motivated by a desire to remove allocation in the common
pattern of

    let old = key.replace(None)
    do_something();
    key.replace(old);

This also switched the map representation from a Vec to a TreeMap. A Vec
may be reasonable if there's only a couple TLD keys, but a TreeMap
provides better behavior as the number of keys increases.

Like the Vec, this TreeMap implementation does not shrink the container
when a value is removed. Unlike Vec, this TreeMap implementation cannot
reuse an empty node for a different key. Therefore any key that has been
inserted into the TLD at least once will continue to take up space in
the Map until the task ends. The expectation is that the majority of
keys that are inserted into TLD will be expected to have a value for
most of the rest of the task's lifetime. If this assumption is wrong,
there are two reasonable ways to fix this that could be implemented in
the future:

1. Provide an API call to either remove a specific key from the TLD and
   destruct its node (e.g. `remove()`), or instead to explicitly clean
   up all currently-empty nodes in the map (e.g. `compact()`). This is
   simple, but requires the user to explicitly call it.
2. Keep track of the number of empty nodes in the map and when the map
   is mutated (via `replace()`), if the number of empty nodes passes
   some threshold, compact it automatically. Alternatively, whenever a
   new key is inserted that hasn't been used before, compact the map at
   that point.

---

Benchmarks:

I ran 3 benchmarks. tld_replace_none just replaces the tld key with None
repeatedly. tld_replace_some replaces it with Some repeatedly. And
tld_replace_none_some simulates the common behavior of replacing with
None, then replacing with the previous value again (which was a Some).

Old implementation:

    test tld_replace_none      ... bench:        20 ns/iter (+/- 0)
    test tld_replace_none_some ... bench:        77 ns/iter (+/- 4)
    test tld_replace_some      ... bench:        57 ns/iter (+/- 2)

New implementation:

    test tld_replace_none      ... bench:        11 ns/iter (+/- 0)
    test tld_replace_none_some ... bench:        23 ns/iter (+/- 0)
    test tld_replace_some      ... bench:        12 ns/iter (+/- 0)
Errors can be printed with {}, printing with {:?} does not work very
well.

Not actually related to this PR, but it came up when running the tests
and now is as good a time to fix it as any.
bors added a commit that referenced this pull request Jul 31, 2014
This was motivated by a desire to remove allocation in the common
pattern of

    let old = key.replace(None)
    do_something();
    key.replace(old);

This also switched the map representation from a Vec to a TreeMap. A Vec
may be reasonable if there's only a couple TLD keys, but a TreeMap
provides better behavior as the number of keys increases.

Like the Vec, this TreeMap implementation does not shrink the container
when a value is removed. Unlike Vec, this TreeMap implementation cannot
reuse an empty node for a different key. Therefore any key that has been
inserted into the TLD at least once will continue to take up space in
the Map until the task ends. The expectation is that the majority of
keys that are inserted into TLD will be expected to have a value for
most of the rest of the task's lifetime. If this assumption is wrong,
there are two reasonable ways to fix this that could be implemented in
the future:

1. Provide an API call to either remove a specific key from the TLD and
   destruct its node (e.g. `remove()`), or instead to explicitly clean
   up all currently-empty nodes in the map (e.g. `compact()`). This is
   simple, but requires the user to explicitly call it.
2. Keep track of the number of empty nodes in the map and when the map
   is mutated (via `replace()`), if the number of empty nodes passes
   some threshold, compact it automatically. Alternatively, whenever a
   new key is inserted that hasn't been used before, compact the map at
   that point.

---

Benchmarks:

I ran 3 benchmarks. tld_replace_none just replaces the tld key with None
repeatedly. tld_replace_some replaces it with Some repeatedly. And
tld_replace_none_some simulates the common behavior of replacing with
None, then replacing with the previous value again (which was a Some).

Old implementation:

    test tld_replace_none      ... bench:        20 ns/iter (+/- 0)
    test tld_replace_none_some ... bench:        77 ns/iter (+/- 4)
    test tld_replace_some      ... bench:        57 ns/iter (+/- 2)

New implementation:

    test tld_replace_none      ... bench:        11 ns/iter (+/- 0)
    test tld_replace_none_some ... bench:        23 ns/iter (+/- 0)
    test tld_replace_some      ... bench:        12 ns/iter (+/- 0)
@bors bors closed this Aug 1, 2014
@lilyball lilyball deleted the rewrite_local_data branch August 1, 2014 02:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants