Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FlatMap, a vector map for extra #9653

Closed
wants to merge 1 commit into from

Conversation

toffaletti
Copy link
Contributor

I don't expect this to be merged as-is, but I'd like some eyes on what I've done so far. I'd like to implement a FlatSet in this file also to have as much feature parity with hashmap.rs as possible.

None => {
let v = not_found(&k, a);
self2.data.push((k,v));
match self2.data.mut_rev_iter().next() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must be a better way to do this, but I struggled to find one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this compile?

    match self.find_mut(&k) {
        Some(val) => {
            found(&k, val, a);
            return val
        }
        None => ()
    }
    let v = not_found(&k, a);
    self.data.push((k,v));
    match self.data.mut_rev_iter().next() {
        Some(&(_, ref mut val)) => val,
        None => unreachable!(),
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, but I was referring to the match self.data.mut_rev_iter().next() line, wanting to find a better way to do that.

src/libextra/flatmap.rs:71:8: 71:17 error: cannot borrow `(*self).data` as mutable more than once at a time
src/libextra/flatmap.rs:71         self.data.push((k,v));
                                   ^~~~~~~~~
src/libextra/flatmap.rs:63:14: 63:19 note: second borrow of `(*self).data` as mutable occurs here
src/libextra/flatmap.rs:63         match self.find_mut(&k) {
                                         ^~~~~
src/libextra/flatmap.rs:72:14: 72:23 error: cannot borrow `(*(*self).data)[]` as mutable more than once at a time
src/libextra/flatmap.rs:72         match self.data.mut_rev_iter().next() {
                                         ^~~~~~~~~
src/libextra/flatmap.rs:63:14: 63:19 note: second borrow of `(*(*self).data)[]` as mutable occurs here
src/libextra/flatmap.rs:63         match self.find_mut(&k) {

@sfackler
Copy link
Member

sfackler commented Oct 1, 2013

extra::smallintmap should probably be deleted if this goes through.

@toffaletti
Copy link
Contributor Author

Maybe. It looks like smallintmap's keys are the indexes into the vector, so it can be even smaller than FlatMap under the right conditions.

@alexcrichton
Copy link
Member

This is some nice code, thanks! That being said, I'm not sure how many more containers libextra needs at the moment. This would be an excellent candidate for the "package incubator" type situation that we envision for libextra, but I'm not certain that this needs to be merged at this time.

I could be wrong though, does this have a good use case that I'm not seeing? It seems that this has O(n) on almost all operations which seems undesirable? I could be overlooking something though.

@thestinger
Copy link
Contributor

@sfackler: smallintmap has O(1) insertion, deletion and "search" - isn't this O(n)?

@sfackler
Copy link
Member

sfackler commented Oct 1, 2013

Oh yeah, nevermind.

@toffaletti
Copy link
Contributor Author

@alexcrichton I haven't benchmarked this implementation, but this type of container is generally faster than a hash map for small data sets like say http headers.

I'm not taking advantage of sorting yet, but if you have sorted keys this type of container can be great for large immutable data sets too where the overhead of holes in a hash map becomes costly.

Prior art:
http://www.boost.org/doc/libs/1_54_0/doc/html/boost/container/flat_map.html

@toffaletti
Copy link
Contributor Author

smallintmap is great for things like file descriptor to waiter mappings where you're fine possibly trading a little memory or insertion speed for a lot of speed on lookup.

@alexcrichton
Copy link
Member

Oh interesting! This would actually be an excellent type to use in TLS currently as well...

I'm always a fan of more cool stuff in rust, and others may want to weigh in on this as well, but this feels like it should wait for our "incubator" to manifest itself.

@toffaletti
Copy link
Contributor Author

I added some benchmarks and modified the code to keep the vector sorted.

Before sorting:

test bench::flatmap_find_10 ... bench: 395 ns/iter (+/- 8)
test bench::flatmap_find_50 ... bench: 1065 ns/iter (+/- 30)
test bench::flatmap_insert_10 ... bench: 2865 ns/iter (+/- 224)
test bench::flatmap_insert_50 ... bench: 37811 ns/iter (+/- 889)

test bench::hashmap_find_10 ... bench: 446 ns/iter (+/- 21)
test bench::hashmap_find_50 ... bench: 482 ns/iter (+/- 56)
test bench::hashmap_insert_10 ... bench: 4943 ns/iter (+/- 148)
test bench::hashmap_insert_50 ... bench: 34150 ns/iter (+/- 1005)

After sorting:

test bench::flatmap_find_10 ... bench: 166 ns/iter (+/- 7)
test bench::flatmap_find_50 ... bench: 218 ns/iter (+/- 20)
test bench::flatmap_insert_10 ... bench: 3161 ns/iter (+/- 251)
test bench::flatmap_insert_50 ... bench: 17743 ns/iter (+/- 696)

test bench::hashmap_find_10 ... bench: 447 ns/iter (+/- 24)
test bench::hashmap_find_50 ... bench: 480 ns/iter (+/- 23)
test bench::hashmap_insert_10 ... bench: 4956 ns/iter (+/- 277)
test bench::hashmap_insert_50 ... bench: 34484 ns/iter (+/- 5944)

@huonw
Copy link
Member

huonw commented Oct 1, 2013

Is it worth having the extra restriction of Ord? Someone can use treemap if they have that extra structure. (It's a trade-off between speed and generality.)

@toffaletti
Copy link
Contributor Author

Good point, TotalOrd is the only trait needed. Fixed.

@huonw
Copy link
Member

huonw commented Oct 1, 2013

Erm, that's not a relaxation: fewer types are TotalOrd than Ord!

I actually meant this should be totally unordered, since we have other maps that cover that case. I.e. this should be a "pure" association list, which is the simplest map, and the only one that works for types that only implement Eq.

@erickt
Copy link
Contributor

erickt commented Oct 1, 2013

@toffaletti: awesome! I kept meaning to add something like this, but never got around to it. Two comments:

  • Is FlatMap a common name for this? I've always heard of this pattern being called an association list. For example, here's OCaml implementation. Perhaps this should be named AssocMap?
  • Should we have a copyless from_owned_vec(v: ~[(K,V)]) -> FlatMap<K,V> and .into_owned_vec(self) -> ~[(K,V)]?

@toffaletti
Copy link
Contributor Author

@huonw I see, I thought you just meant it was silly I had Eq+Ord+TotalOrd. I did originally implement it without ordering because I hadn't gotten around to writing the sorting code yet. Perhaps it is my lack of imagination, but I have a hard time coming up with a use case for FlatMap where the keys couldn't be ordered. An important property of FlatMap is that it is fast and the ordering is a part of that. It has a few advantages over TreeMap like less memory usage, fewer allocations, and faster lookups.

@erickt boost calls it flat_map. The wikipedia link you posted says an association list is a linked list. flat map is a sorted vector, more like an array backed binary heap.

FlatMap exploits how amazingly fast modern hardware is at accessing contiguous memory. It is memory efficient and faster than you might expect.

@toffaletti
Copy link
Contributor Author

I turned the benchmark code into a macro and added TreeMap, but it seems to be misbehaving. I randomly see benchmarks take 0 ns/iter. I couldn't find a bug reporting that problem, can anyone verify? I've made the imports in flatmap.rs so you can just rust test flatmap.rs --bench. Thanks for all the help and feedback everyone.

@alexcrichton
Copy link
Member

If you see 0 ns/iter it often means that your iterations are taking too long, I believe the current threshold is 1ms for the maximum amount of time your iterations should take.

@toffaletti
Copy link
Contributor Author

I made https://github.com/toffaletti/rust/tree/flatmap-used to test some real-world usage of FlatMap by replacing libextra usages of TreeMap with FlatMap. It exposed some issues with FlatMap that I've fixed to make stage2-check pass.

One thing worth mentioning is the changes ended up being much farther reaching than I'd hoped or expected because many things using the json module were depending on it using TreeMap to represent json::Object which seems like a code smell.

I'm not sure how to quantify this for merging, mostly it was just a fun and educational experiment for me. I think @alexcrichton is probably right saying it can wait.

@alexcrichton
Copy link
Member

As I mentioned on #9816, if we don't end up merging this, I would highly recommend putting this in a rustpkg repo on http://hiho.io/rust-ci/, because I'm sure others will find this useful!

@brson
Copy link
Contributor

brson commented Oct 16, 2013

I too think this is a good data structure, but that we should not put it in libextra since it is soon to be dissolved and moved to various other places, and this probably doesn't belong in std. It would be nice to have a discussion of what data structures we need in std, and whether we should have a single, well-supported external package for more advanced or specialized data structures like this one.

@brson
Copy link
Contributor

brson commented Oct 16, 2013

Closing for the above reasons.

@brson brson closed this Oct 16, 2013
flip1995 pushed a commit to flip1995/rust that referenced this pull request Oct 20, 2022
fix `box-default` linting `no_std` non-boxes

This fixes rust-lang#9653 by doing the check against the `Box` type correctly even if `Box` isn't there, as in `no_std` code. Thanks to `@lukas-code` for opening the issue and supplying a reproducer!

---

changelog: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants