Something Something Persistent Collections Something Something? #14

Gankra · 2014-11-30T01:54:47Z

reem · 2014-11-30T05:01:08Z

For some very bad attempted ports of Haskell's Data.* see: https://github.com/reem/adamantium

I've been meaning to do something actually interesting with it, given the awesome naming potential (adamantium, for immutable data structures, since it doesn't break, get it?, get it...?) but time gets away from me.

I think there are two possibilities when it comes to writing immutable structures in Rust - you can either write them to be ephemeral, abusing rust's ownership model to ensure the old versions can't be used, or persistent, using reference counting. I think that the ephemeral model, while kinda cool, is actually not very useful in Rust, since the operations would have to take self, and therefore might as well just do mutation and be faster.

ghost · 2014-11-30T05:38:20Z

I'm stuck on #4363. tl;dr rustc doesn't notice that the properties (notably, the size) of Foo<T> don't depend on T, so it ends up in an infinite recursion trying to resolve everything. In my case FingerTree<T> has a field FingerTree<Node<T>>, and no amount of Box/Rc/*mut T helps.

Haven't had the time to figure out the unsafe countermeasures yet.

reem · 2014-11-30T05:40:29Z

@Jurily I think the problem in #4363 actually has nothing to do with rustc trying to calculate the size of Foo<T>, but with trying to recursively figure out the full type of all the subfields and ending up just creating larger and larger nestings of Node<Node<Node<Node<Node<T>>>>>>.

reem · 2014-11-30T06:22:24Z

I just tested this through GHC:

data Tree a b = Node (Tree (a, a) b) b
              | Bin

and it works, so I imagine that rustc could be extended to make it work too.

ghost · 2014-11-30T07:15:56Z

Haskell is lazy, so it can compile weird things without problems:

fix :: (a -> a) -> a
fix f = f (fix f)

OCaml would be a better comparison, and they have finger trees too:

 type ('a, 'm) fg =
| Nil (* not called Empty as in the paper to avoid a name
* clash with the exception Empty *)
| Single of 'a
| Deep of 'm * ('a, 'm) digit * (('a, 'm) node, 'm) fg * ('a, 'm) digit

reem · 2014-11-30T08:45:17Z

I think we would like to match all of the major use cases for the collections in libcollections, which has:

Vec
RingBuf
DList
BTreeMap/TreeMap/TrieMap/HashMap

There's some more, but they are mostly Set variants of the above Maps or much more niche structures like BitV, EnumSet, VecMap and LruCache.

Vec can't really be replicated persistently at equivalent speed, though that's sort of the story of persistent collections generally. We can replicate it a bit with a persistent vector like clojure's, which also covers the RingBuf use case since insertions anywhere are the same speed.

@gankro are you at all familiar with making a BTree persistent, because that could be another route to go for cases where the keys aren't Hash but can be Ord.

DList's most important use is for re-ordering, like is done in the implementation of an LruCache - it's not possible (or at least I have no idea how) to fully emulate a Dlist persistently and efficiently because you can reach any node from any other node, which basically necessitates a copy of the whole structure - but we can emulate this one feature. I haven't heard about any structures that are especially good for this - maybe you guys could help.

All the maps can be replaced by persistent look-a-likes, either HAMT or just persistent trees, with respective set implementations.

reem · 2014-11-30T08:45:45Z

Also a persistent Rope would be cool.

tbu- · 2014-11-30T10:55:51Z

@reeem What use is a persistent rope? Isn't a rope optimized for change?

reem · 2014-11-30T11:05:05Z

@tbu- Really it's just a rope using refcounting and with no exposed destructive operations.

Gankra · 2014-11-30T15:39:43Z

I am only familiar with persistence techniques at a high-level. Path-copying vs fat-nodes for trees. Stuff like that. I know scala's collections library has some very interesting ideas for persistent collections, though.

Knowing what problems these collections solve would be helpful for determining which collections would be more/less interesting to poke at. Unfortunately persistent stuff isn't very high on my priority list, as I'm currently focusing on experimenting with stuff that might affect existing std APIs (cursors/iterators/comparators). I'm hoping someone else can take point on this.

gereeter · 2014-11-30T22:12:31Z

First, my ideal envisioned API for a persistent collection is identical to the API for a non-persistent collection. However, persistent collections would have extremely cheap Clone implementations. This would allow them to be drop-in replacements for non-persistent structures while also not allocating and garbage collecting unneeded "snapshots".

Second, this can be done completely generically for tree-like structures:

trait RcLike<T>: Deref<T> {
    fn build(val: T) -> Self;
    fn is_unique(&self) -> bool;
    fn make_unique(&mut self) -> &mut T;
}
impl<T: Clone> RcLike<T> for Rc<T> { /* ... */ }
impl<T: Clone> RcLike<T> for Arc<T> { /* ... */ }
impl<T> RcLike<T> for T {
    fn build(val: T) -> Self { val}
    fn is_unique(&self) -> bool { true }
    fn make_unique(&mut self) -> &mut T { self }
}

struct Node<K, V> { /* ... */ }
struct BTreeMap<K, V, NodeRef> { /* ... */ }

impl<K, V, NodeRef: Deref<Node<K, V>>> {
    fn get(&self, key: K) -> Option<&V> { /* ... */ }
    // ...
}

impl<K, V, NodeRef: RcLike<Node<K, V>>> BTreeMap<K, V, NodeRef> {
    fn insert(&mut self, key: K, val: V) { /* ... */ }
    // ...
}

By exposing the internal node type, we can circumvent the need for HKTs, still writing a single data structure that works just as well whether the links are made through Box, Rc, Arc, or Gc.

@reem @Jurily The problems you've been having with finger trees don't have to do with laziness or infinitely sized data types but with polymorphic recursion. To see what this is and why Rust can't accept this (even if the ICE is fixed), here is a simpler example:

fn poly_rec<T>(val: T) {
    poly_rec(&val); // note the &
}

fn main() {
    poly_rec(1u);
}

When compiling generic functions, Rust wants to be zero cost and so produces a new (monomorphised) version of the generic function for every single collection of types that the function is called with. This works excellently for speeding up the program, but doesn't work for functions like poly_rec. Once I call poly_rec on a uint, it then has to compile it for &uint, then &&uint, then &&&uint, and so on. In this case, rustc correctly catches this, erroring with "reached the recursion limit during monomorphization". The ICE results from polymorphic recursion in a data structure, but even if the compiler could accept that structure, you wouldn't be able to write any functions processing a finger tree.

As a side note, Haskell avoids this problem simply by not doing monomorphisation - abstractions are not zero-cost. While finger trees work and work well in Haskell, they actually end up doing a bunch of virtual calls. As such, I think that a proper implementation of them in Rust could be significantly faster than Haskell's.

Gankra · 2014-11-30T22:19:08Z

@gereeter so you're proposing fat-nodes (which I believe is faster than path-copying)?

reem · 2014-11-30T22:20:32Z

I really don't like that the type of a concrete BTree would be BTreeMap<uint, String, Arc<Node<uint, String>>>> but I recognize the need to work around the lack of HKT.

Also, the impls you provided, when their bounds are fixed (Rc and Arc need no bounds and Send + Sync, respectively) conflict, as Rc<T> implements Clone, so would also qualify for the <T: Clone> RcLike<T> for T impl.

ghost · 2014-12-01T02:22:11Z

@gereeter No, those are two separate issues. I've had to hide the inner tree behind an opaque pointer to even get to the monomorphization error.

Gankra · 2015-01-24T05:14:43Z

So I tried to rework ImmutSList to "look" like a normal stack per @gereeter's musings. push works fine, but pop is a problem. You can only get the value out if you're the unique owner of the current Node. Otherwise all you can do is proceed forward. I considered returning None in that case, but that seems a bit... weird.

Gankra closed this as completed Mar 18, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Something Something Persistent Collections Something Something? #14

Something Something Persistent Collections Something Something? #14

Gankra commented Nov 30, 2014

reem commented Nov 30, 2014

ghost commented Nov 30, 2014

reem commented Nov 30, 2014

reem commented Nov 30, 2014

ghost commented Nov 30, 2014

reem commented Nov 30, 2014

reem commented Nov 30, 2014

tbu- commented Nov 30, 2014

reem commented Nov 30, 2014

Gankra commented Nov 30, 2014

gereeter commented Nov 30, 2014

Gankra commented Nov 30, 2014

reem commented Nov 30, 2014

ghost commented Dec 1, 2014

Gankra commented Jan 24, 2015

Something Something Persistent Collections Something Something? #14

Something Something Persistent Collections Something Something? #14

Comments

Gankra commented Nov 30, 2014

reem commented Nov 30, 2014

ghost commented Nov 30, 2014

reem commented Nov 30, 2014

reem commented Nov 30, 2014

ghost commented Nov 30, 2014

reem commented Nov 30, 2014

reem commented Nov 30, 2014

tbu- commented Nov 30, 2014

reem commented Nov 30, 2014

Gankra commented Nov 30, 2014

gereeter commented Nov 30, 2014

Gankra commented Nov 30, 2014

reem commented Nov 30, 2014

ghost commented Dec 1, 2014

Gankra commented Jan 24, 2015