-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metabug: Libcollections TODO #18009
Comments
cc me |
I've added this mostly verbatim from the reddit thread I originally posted this to after prompting from several developers. Some things were added. This is still pretty heavily biased towards things I've poked at. Maybe TrieSet is a disaster. Maybe LruCache is a nightmare. No clue. |
I'd suggest using this issue to coordinate work on some of these subitems. When a particular item needs additional discussion, you should spin off and link to an actual subissue for it. Also, just a note that most of the TODO items here are not strictly needed for stabilization, which is about API commitments; a lot of these are important suggestions for implementation and documentation improvements. |
Some of these actually have issues. Working on collecting those up now. |
Looking into 'BitV's internal representation is a Vec of uints. It should almost certainly be a Vec of u32's, or some other non-machine-sized type.' Seems like you already fixed it 4 days ago @gankro ;) (it simply wasn't merged yet) |
@gamazeps Huh? BitV still uses uints. I did start up a branch expecting it to be a quick CTRL+H job, but got too distracted in the bigger "needs a full review" issue as I went over all the changes to make sure I wasn't messing something up. In particular, reading over the code, it's really hard to tell locally what should be a uint, and what should be a u32 (is that value a word, or an index?). I've been pretty scatter-brained/sick lately, and don't have enough time/energy to ensure I'm not screwing the whole API up. I've been sticking to writing/design stuff for the past-while since it's easier for me to squeeze in short bursts when I'm not doing school stuff. Also fever-dream prose is just hard to read, not a bug. If you want to pick up and finish the work on my branch, feel free! |
Ok ok ^^ So I'll do that (but I'm not experimented enough to do the code review of the whole BitV for now). |
Yeah, that can wait. @apoelstra has volunteered to help review any BitV PRs, but is also a bit swamped. |
@gankro Have you thought a lot about how to fix DList? I'd be interested in iterating on it. |
@reem I actually started up some work on this 2 months ago when I didn't understand Rust as well. It's mixed in with my Cursor work on this branch: https://github.com/Gankro/rust/compare/dlist-seek The first commit is the refactor work, though it also features more sweeping reform in anticipation of introducing cursors: Gankra@4e259d2 At the time, I thought it was hopeless, but I realize now that it should mostly "just work" if you manually mark the RawLink's NoCopy. Skimming over it again now, I also came to the conclusion that the fact that we make frequently make invalid RawLinks transiently, so that makes dereffing them unsafe. That... might be valid? This kind of goes hand-in-hand with some unsafe code conventions I've been mulling over in my mind. Specifically, is internal code reasonable to mark safe if assumes pre-conditions? That is, if it assumes it's only called when the structure is in a "valid" state, and not a transient "invalid" one? |
Ideally, I think that I think that we should discuss Zippers and Cursors separately, and should probably make an RFC to introduce them. I'd be happy to chat on IRC or here about what they should like. |
Yeah, then we need to waste cycles ensuring links are always valid, or we need to mark way more of DList straight-up |
I think that (ideally) no operations on DList should let you leave it in an unsafe state. |
All of public operations are legit, it's jut trivial to make/deref a dangling pointer using the current internal API. Presumably Node or RawLink could be beefed up with more logic so that this doesn't happen. e.g. put all the unsafe link-mutating code on Node with a safe interface for DList to safely use. Actually I think I mostly did some of this with the |
Hmmm. For internal code I think the distinction is less clear. I'm attempting a refactor now, and my goal is to move as much unsafe as possible into safe abstractions, but we'll see how it goes. I'll take a look at your refactor. We should spin this off into a subissue. Danger: One of the main things I'm doing is introducing the invariant to RawLink that the internal pointer is always valid, then keeping it behind an Option. I think this makes things a bit simpler. |
If you are going to delve deeper into it on IRC, I'm interested ;) |
Remember, I wrote it at the start of August, and I first started learning Rust in July 😅 (I see you, old transmute hack I loved) |
The u32 for the BitV seems to be fine (running benchs to measure). Looking into "RingBuf should not be a Vec of Options." now :) |
I've got a slimmer BTree |
You're all amazing ❤️ |
@gankro I'd love to take a crack at discussing what certain data structures are best for. (I'm also still looking at refactoring DList). Where do you see that going/what format do you think would be best for it? |
@reem You can check out my BTree docs: http://doc.rust-lang.org/std/collections/struct.BTreeMap.html and the high-levels docs I added to libcollections: http://doc.rust-lang.org/std/collections/index.html for the sort of things I had in mind. That said, I haven't thought about it too hard, so if there's something you think should or shouldn't be done that isn't or is, feel free to pitch in ideas. I'm not a documentation expert. |
CC #18424 collections reform has passed! Lots of work to do in there too! |
…=bstrie I've implemented the new collection views API for TrieMap. I more or less followed the approach set out by @gankro in BTreeMap, by using a `SearchStack`. There's quite a bit of unsafe code, but I've wrapped it safely where I think is appropriate. I've added tests to ensure everything works, and performance seems quite good. ``` test trie::bench_map::bench_find ... bench: 67879 ns/iter (+/- 4192) test trie::bench_map::bench_find_entry ... bench: 186814 ns/iter (+/- 18748) test trie::bench_map::bench_insert_large ... bench: 716612 ns/iter (+/- 160121) test trie::bench_map::bench_insert_large_entry ... bench: 851219 ns/iter (+/- 20331) test trie::bench_map::bench_remove ... bench: 838856 ns/iter (+/- 27998) test trie::bench_map::bench_remove_entry ... bench: 981711 ns/iter (+/- 53046) ``` Using an entry is slow compared to a plain find, but is only ~15% slower for inserts and removes, which is where this API is most useful. I'm tempted to remove the standalone `remove` function in favour of an entry-based approach (to cut down on complexity). I've added some more comments to the general part of the code-base, which will hopefully help the next person looking over this. I moved the three key structures to the top of the file so that the nesting structure is clearly visible, and renamed `Child<T>` to `TrieNode<T>` and `TrieNode<T>` to `InternalNode<T>` to improve clarity. If these changes are creeping, I'm happy to revert them. Let me know if my use of `fail!` is ok, I was a little unsure of how specific to be. Some of the data-structures have various invariants that shouldn't be broken, so using `fail!` seemed appropriate. ## Still to do * Modernise iterators (make them double-ended). * Make the keys generic, or rename this data-structure (see: #14902). * Possibly move this code out of libcollections. [Searching Github for TrieMap turns up very few real results.][triemap-search] Related issues: #18009 and #17320 [triemap-search]: https://github.com/search?utf8=%E2%9C%93&q=TrieMap+language%3ARust&type=Code&ref=searchresults
Fix for task in Metabug #18009 (Rebased version of #18170) This changes much of about how RingBuf functions. `lo`, `nelts` are replaced by a more traditional `head` and`tail`. The `Vec<Option<T>>` is replaced by a bare pointer that is managed by the `RingBuf` itself. This also expects the ring buffer to always be size that is a power of 2. This change also includes a number of new tests to cover the some areas that could be of concern with manual memory management. The benchmarks have been reworked since the old ones were benchmarking of the Ring buffers growth rather then the actual test. The unit test suite have been expanded, and exposed some bugs in `fn get()` and `fn get_mut()` ## Benchmark **Before:** ``` test ring_buf::tests::bench_grow_1025 ... bench: 8919 ns/iter (+/- 87) test ring_buf::tests::bench_iter_1000 ... bench: 924 ns/iter (+/- 28) test ring_buf::tests::bench_mut_iter_1000 ... bench: 918 ns/iter (+/- 6) test ring_buf::tests::bench_new ... bench: 15 ns/iter (+/- 0) test ring_buf::tests::bench_pop_100 ... bench: 294 ns/iter (+/- 9) test ring_buf::tests::bench_pop_front_100 ... bench: 948 ns/iter (+/- 32) test ring_buf::tests::bench_push_back_100 ... bench: 291 ns/iter (+/- 16) test ring_buf::tests::bench_push_front_100 ... bench: 311 ns/iter (+/- 27 ``` **After:** ``` test ring_buf::tests::bench_grow_1025 ... bench: 2209 ns/iter (+/- 169) test ring_buf::tests::bench_iter_1000 ... bench: 534 ns/iter (+/- 27) test ring_buf::tests::bench_mut_iter_1000 ... bench: 515 ns/iter (+/- 28) test ring_buf::tests::bench_new ... bench: 11 ns/iter (+/- 0) test ring_buf::tests::bench_pop_100 ... bench: 170 ns/iter (+/- 5) test ring_buf::tests::bench_pop_front_100 ... bench: 171 ns/iter (+/- 11) test ring_buf::tests::bench_push_back_100 ... bench: 172 ns/iter (+/- 13) test ring_buf::tests::bench_push_front_100 ... bench: 158 ns/iter (+/- 12) ```
...ated buffer. Before: test btree::map::bench::find_rand_100 ... bench: 29 ns/iter (+/- 2) test btree::map::bench::find_rand_10_000 ... bench: 83 ns/iter (+/- 6) test btree::map::bench::find_seq_100 ... bench: 30 ns/iter (+/- 1) test btree::map::bench::find_seq_10_000 ... bench: 50 ns/iter (+/- 3) test btree::map::bench::insert_rand_100 ... bench: 186 ns/iter (+/- 30) test btree::map::bench::insert_rand_10_000 ... bench: 377 ns/iter (+/- 8) test btree::map::bench::insert_seq_100 ... bench: 299 ns/iter (+/- 10) test btree::map::bench::insert_seq_10_000 ... bench: 368 ns/iter (+/- 12) test btree::map::bench::iter_1000 ... bench: 20956 ns/iter (+/- 479) test btree::map::bench::iter_100000 ... bench: 2060899 ns/iter (+/- 44325) test btree::map::bench::iter_20 ... bench: 560 ns/iter (+/- 63) After: test btree::map::bench::find_rand_100 ... bench: 28 ns/iter (+/- 2) test btree::map::bench::find_rand_10_000 ... bench: 74 ns/iter (+/- 3) test btree::map::bench::find_seq_100 ... bench: 31 ns/iter (+/- 0) test btree::map::bench::find_seq_10_000 ... bench: 46 ns/iter (+/- 0) test btree::map::bench::insert_rand_100 ... bench: 141 ns/iter (+/- 1) test btree::map::bench::insert_rand_10_000 ... bench: 273 ns/iter (+/- 12) test btree::map::bench::insert_seq_100 ... bench: 255 ns/iter (+/- 17) test btree::map::bench::insert_seq_10_000 ... bench: 340 ns/iter (+/- 3) test btree::map::bench::iter_1000 ... bench: 21193 ns/iter (+/- 1958) test btree::map::bench::iter_100000 ... bench: 2203599 ns/iter (+/- 100491) test btree::map::bench::iter_20 ... bench: 614 ns/iter (+/- 110) This code could probably be a fair bit cleaner, but it works. Part of #18009.
TreeMap and TrieMap are on the way out, so I've marked all their issues as "resolved". A lot of the other issues have prototypes in collect-rs (where TreeMap and TrieMap live now). Also @gereeter's excellent efforts have resolved a lot of the BTree issues! @csherratt's work has resolved the RingBuf issue! A lot of other stuff is in rust-lang/rfcs#509 |
it looks like this bug may be done now? It seems like this was mostly "do work for 1.0". @gankro is that true? |
Eh, it's good enough. |
(most of the outstanding issues have their own stuff somewhere in the tracker these days) |
In the original reddit thread multi indexed collections were mentioned. Was this included? Where can i find it? I'm refering to functionality which is also available in other languages: http://www.boost.org/doc/libs/1_62_0/libs/multi_index/doc/index.html It's useful to model data structures as can be done in relational databases (some documentation linked above explains it). |
fix: do not assume rustup is installed in xtask codegen take 2 7d9e4fcc07e5de94e37b73436147cdbbaa35dbdc broke this on rustup toolchains, the `cmd` command is trying to be too smart here
As part of the stabilization process, we'd like our standard collection implementations to be good. Maybe even great. Wouldn't that be nice? The overall quality of each implementation is pretty varied, partially just because the language has dramatically shifted out from underneath them. Vec and HashMap are in a good place because they're The Big Dogs of data structuring, and have gotten substantial attention as a result. Other less-used collections are in worse shape, and need some serious attention.
If you're interested in working on a Rust project, or collections in particular, here's a big list of things to do experiment with on libcollections.
buffer, in the same vein as HashMap's RawTable.
Vec<*mut Node<T>>
-based search-stacks of BTreeMap, TreeMap, and TrieMap with a stack-allocated[*mut Node<T>, ..uint::BITS]
(TreeMap might need more, it's an AA-tree, dunno how tall they can get). Note that a preliminary experiment on BTree gave very poor results, though it has the least to gain from such a design from it's current design (tracks depth, can usewith_capacity
.peek_next
andremove_next
methods for soundly removing elements from the list during iteration. At least Vec, RingBuf, and HashMap could all support this soundly, I believe. Vec, RingBuf, and DList could also supportinsert_next
. Special iterator-like objects dedicated to these tasks might be better ideas, though (cursors, zippers, etc).split(&mut self, at: Index) -> Self
on some collections.The text was updated successfully, but these errors were encountered: