Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Something Something Persistent Collections Something Something? #14

Closed
Gankra opened this issue Nov 30, 2014 · 15 comments
Closed

Something Something Persistent Collections Something Something? #14

Gankra opened this issue Nov 30, 2014 · 15 comments

Comments

@Gankra
Copy link
Owner

Gankra commented Nov 30, 2014

cc @reem @matthieum @jurily

@reem
Copy link
Collaborator

reem commented Nov 30, 2014

For some very bad attempted ports of Haskell's Data.* see: https://github.com/reem/adamantium

I've been meaning to do something actually interesting with it, given the awesome naming potential (adamantium, for immutable data structures, since it doesn't break, get it?, get it...?) but time gets away from me.

I think there are two possibilities when it comes to writing immutable structures in Rust - you can either write them to be ephemeral, abusing rust's ownership model to ensure the old versions can't be used, or persistent, using reference counting. I think that the ephemeral model, while kinda cool, is actually not very useful in Rust, since the operations would have to take self, and therefore might as well just do mutation and be faster.

@ghost
Copy link

ghost commented Nov 30, 2014

I'm stuck on #4363. tl;dr rustc doesn't notice that the properties (notably, the size) of Foo<T> don't depend on T, so it ends up in an infinite recursion trying to resolve everything. In my case FingerTree<T> has a field FingerTree<Node<T>>, and no amount of Box/Rc/*mut T helps.

Haven't had the time to figure out the unsafe countermeasures yet.

@reem
Copy link
Collaborator

reem commented Nov 30, 2014

@Jurily I think the problem in #4363 actually has nothing to do with rustc trying to calculate the size of Foo<T>, but with trying to recursively figure out the full type of all the subfields and ending up just creating larger and larger nestings of Node<Node<Node<Node<Node<T>>>>>>.

@reem
Copy link
Collaborator

reem commented Nov 30, 2014

I just tested this through GHC:

data Tree a b = Node (Tree (a, a) b) b
              | Bin

and it works, so I imagine that rustc could be extended to make it work too.

@ghost
Copy link

ghost commented Nov 30, 2014

Haskell is lazy, so it can compile weird things without problems:

fix :: (a -> a) -> a
fix f = f (fix f)

OCaml would be a better comparison, and they have finger trees too:

 type ('a, 'm) fg =
| Nil (* not called Empty as in the paper to avoid a name
* clash with the exception Empty *)
| Single of 'a
| Deep of 'm * ('a, 'm) digit * (('a, 'm) node, 'm) fg * ('a, 'm) digit

@reem
Copy link
Collaborator

reem commented Nov 30, 2014

I think we would like to match all of the major use cases for the collections in libcollections, which has:

  • Vec
  • RingBuf
  • DList
  • BTreeMap/TreeMap/TrieMap/HashMap

There's some more, but they are mostly Set variants of the above Maps or much more niche structures like BitV, EnumSet, VecMap and LruCache.

Vec can't really be replicated persistently at equivalent speed, though that's sort of the story of persistent collections generally. We can replicate it a bit with a persistent vector like clojure's, which also covers the RingBuf use case since insertions anywhere are the same speed.

@gankro are you at all familiar with making a BTree persistent, because that could be another route to go for cases where the keys aren't Hash but can be Ord.

DList's most important use is for re-ordering, like is done in the implementation of an LruCache - it's not possible (or at least I have no idea how) to fully emulate a Dlist persistently and efficiently because you can reach any node from any other node, which basically necessitates a copy of the whole structure - but we can emulate this one feature. I haven't heard about any structures that are especially good for this - maybe you guys could help.

All the maps can be replaced by persistent look-a-likes, either HAMT or just persistent trees, with respective set implementations.

@reem
Copy link
Collaborator

reem commented Nov 30, 2014

Also a persistent Rope would be cool.

@tbu-
Copy link
Contributor

tbu- commented Nov 30, 2014

@reeem What use is a persistent rope? Isn't a rope optimized for change?

@reem
Copy link
Collaborator

reem commented Nov 30, 2014

@tbu- Really it's just a rope using refcounting and with no exposed destructive operations.

@Gankra
Copy link
Owner Author

Gankra commented Nov 30, 2014

I am only familiar with persistence techniques at a high-level. Path-copying vs fat-nodes for trees. Stuff like that. I know scala's collections library has some very interesting ideas for persistent collections, though.

Knowing what problems these collections solve would be helpful for determining which collections would be more/less interesting to poke at. Unfortunately persistent stuff isn't very high on my priority list, as I'm currently focusing on experimenting with stuff that might affect existing std APIs (cursors/iterators/comparators). I'm hoping someone else can take point on this.

@gereeter
Copy link
Collaborator

First, my ideal envisioned API for a persistent collection is identical to the API for a non-persistent collection. However, persistent collections would have extremely cheap Clone implementations. This would allow them to be drop-in replacements for non-persistent structures while also not allocating and garbage collecting unneeded "snapshots".

Second, this can be done completely generically for tree-like structures:

trait RcLike<T>: Deref<T> {
    fn build(val: T) -> Self;
    fn is_unique(&self) -> bool;
    fn make_unique(&mut self) -> &mut T;
}
impl<T: Clone> RcLike<T> for Rc<T> { /* ... */ }
impl<T: Clone> RcLike<T> for Arc<T> { /* ... */ }
impl<T> RcLike<T> for T {
    fn build(val: T) -> Self { val}
    fn is_unique(&self) -> bool { true }
    fn make_unique(&mut self) -> &mut T { self }
}

struct Node<K, V> { /* ... */ }
struct BTreeMap<K, V, NodeRef> { /* ... */ }

impl<K, V, NodeRef: Deref<Node<K, V>>> {
    fn get(&self, key: K) -> Option<&V> { /* ... */ }
    // ...
}

impl<K, V, NodeRef: RcLike<Node<K, V>>> BTreeMap<K, V, NodeRef> {
    fn insert(&mut self, key: K, val: V) { /* ... */ }
    // ...
}

By exposing the internal node type, we can circumvent the need for HKTs, still writing a single data structure that works just as well whether the links are made through Box, Rc, Arc, or Gc.

@reem @Jurily The problems you've been having with finger trees don't have to do with laziness or infinitely sized data types but with polymorphic recursion. To see what this is and why Rust can't accept this (even if the ICE is fixed), here is a simpler example:

fn poly_rec<T>(val: T) {
    poly_rec(&val); // note the &
}

fn main() {
    poly_rec(1u);
}

When compiling generic functions, Rust wants to be zero cost and so produces a new (monomorphised) version of the generic function for every single collection of types that the function is called with. This works excellently for speeding up the program, but doesn't work for functions like poly_rec. Once I call poly_rec on a uint, it then has to compile it for &uint, then &&uint, then &&&uint, and so on. In this case, rustc correctly catches this, erroring with "reached the recursion limit during monomorphization". The ICE results from polymorphic recursion in a data structure, but even if the compiler could accept that structure, you wouldn't be able to write any functions processing a finger tree.

As a side note, Haskell avoids this problem simply by not doing monomorphisation - abstractions are not zero-cost. While finger trees work and work well in Haskell, they actually end up doing a bunch of virtual calls. As such, I think that a proper implementation of them in Rust could be significantly faster than Haskell's.

@Gankra
Copy link
Owner Author

Gankra commented Nov 30, 2014

@gereeter so you're proposing fat-nodes (which I believe is faster than path-copying)?

@reem
Copy link
Collaborator

reem commented Nov 30, 2014

I really don't like that the type of a concrete BTree would be BTreeMap<uint, String, Arc<Node<uint, String>>>> but I recognize the need to work around the lack of HKT.

Also, the impls you provided, when their bounds are fixed (Rc and Arc need no bounds and Send + Sync, respectively) conflict, as Rc<T> implements Clone, so would also qualify for the <T: Clone> RcLike<T> for T impl.

@ghost
Copy link

ghost commented Dec 1, 2014

@gereeter No, those are two separate issues. I've had to hide the inner tree behind an opaque pointer to even get to the monomorphization error.

@Gankra
Copy link
Owner Author

Gankra commented Jan 24, 2015

So I tried to rework ImmutSList to "look" like a normal stack per @gereeter's musings. push works fine, but pop is a problem. You can only get the value out if you're the unique owner of the current Node. Otherwise all you can do is proceed forward. I considered returning None in that case, but that seems a bit... weird.

@Gankra Gankra closed this as completed Mar 18, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants