-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GROOVY-6944: Add immutable collections #482
base: master
Are you sure you want to change the base?
Conversation
"Persistent" might be less confusing than "Immutable" (although "persistent" has other connotations in other contexts) |
We might ping @blackdrag about this. The idea was to have pcollections integrated into Groovy 3. A premature inclusion might increase the chances of breaking changes between 2 and 3. But it's indeed an idea we had in mind for long 👍 |
There is an argument among Scala people about the jargon "persistent" and "immutable" here. I really don't know which jargon is better, but should I use "persistent"? |
Persistent usually means you have a way to rollback to a previous state, while immutable doesn't guarantee this. |
Indeed "persistent" means that former states persist even when you add other elements — which actually create a new (immutable) collection. I prefer "persist". |
I was thinking this problem for one day. Haskell and Scala uses "immutable", but Clojure uses "persistent". There is If Groovy uses "persistent" for this data structure, Therefore to avoid confuse I think Groovy should also use "immutable" for this data structure, |
I updated the patch.
|
I know the confusion with "disk persistence", as that's what I was referring to when saying it had other "connotations". We'll have to make a call on that, perhaps discussing it on the mailing-list for broader attention and feedback. On another topic, I see you deprecated the asImmutable() methods, and created asUnmodifiable() ones. Shouldn't (or should) asImmutable() now use those immutable / persistent collections instead? |
OK, I joined dev@groovy.codehaus.org and user@groovy.codehaus.org . The speed of Java's unmodifiable views are far faster than immutable and persistent collections. I think Groovy should keep the method as By the way, http://beta.groovy-lang.org/ is very cool! |
I summarized the jargon usage.
Also I updated the patch, and I added these.
Now I can write as the same way for four types of objects.
|
I found that to support the following code correctly,
|
I updated the patch and solved the above problem. First, I added
Then I added
Then I added Now, |
Where to start... first of all, I maintain pcollections now, but it is not originally my project. I am free to make changes to it though. And in that spirit I would prefer changing pcollections to work better with Groovy than to just copy the sources over. And if it is only to avoid having to maintain two coebases. I guess I could get friendly with Immutable instead of Persistent. For DGM-integration tough I see a tough task. Creating a persistent collection is usually not cheap. You want to reuse as much as possible to avoid having to create a new thing all the time. But if you look at why createSimilarCollection is used, then it is exactly because we do that bad thing of creating a new container and rebuild it from scratch. Instead to call for example minusAll or plusAll.... it really depends on the case. That's also why in Java8 you have Streams, it allows internal iteration and operations, specialized for the type's inner structure - thanks to the lazy evaluation ability. Think for example of using grep on a tree like structure. Instead of collecting the values, that are valid and then create a new tree from that, you could go through the tree, and mark branches, that will be cut off, to then later create a new Tree in a much faster manner All in all I think we should first think about changing pcollections itself. Oh, one more thing... Speed. ultimatively a Java unmodifiable list can be backed by a simple array. There is nothing faster than a array for such a collection. But my tests show, that for example a HAMT can compete with a HashMap for get operations. It is slower to create from scratch of course, but if I use existing HAMTs to create a new one it can be still very superior to a HashMap. The point where unmodifiable is not enough anymore is when it comes to multithreading. Here you will want either one of those concurrent structures Java offers, or you will want something immutable. And here I see the biggest advantage of structures like there are in the pcollections library. Because with those you can more easily create concurrent systems without the usage of any memory barriers - or at least, you can dose them very exactly. |
OK, I will recreate the patch.
Actually, many methods have to create a whole List from scratch.
I read this Clojure implementation, Switching from a binary tree to a hash array mapped trie is just an implementation problem, How the Groovy immutable API should be? I was thinking the instance creation API, and I found there are four options. Option 1 (current patch implmentation)
Option 2 (I'm now thinking this is best)
Option 3
Option 4
Is there any another option? Option 2 breaks the compatibility a little bit, |
Yes, but why first create an ArrayList to then convert it to the immutable list with a possibly other base type (createSimilar doesn't recreate on the exact type)? I was thinking, that to avoid that I should add the empty() method to the base interface of the pcollections classes. That way something like create similiar collection is not required at all.
yes, using some kind of not too big "chunk" is in general a good idea to be able to reuse structures, while at the same time, keeping things small if you have to create something new and the array overhead is small. The typical linked list approach for example requires some kind of node class per element I add. That alone already causes a big overhead, since object creation is slow on the JVM. Things get worse, if you have to change the last element in a single linked list, because then you have to create each node new from the end. Using "chunks", all but the last chunk can be reused. And the actual memory performance is better than ArrayList, since if I add too many elements to ArrayList, I have to create a new internal array and that may exceed the available memory already. For the chunk based list, I create a new chunk and a new root array maybe, but that array will stay much smaller.
I have a HAMT implementation here, I am going to commit it to pcollections as soon as I am done with my tests and clear about the actual performance. Anyway... my comment was directed at persistent collections having bad performance only. As for how the Groovy immutable API should be.... Option 2 looks best to me. |
I changed to use PCollection and updated the patch. As it is not in the Maven central repository, I put the jar file in the lib directory. I also implemented the option 2, but
OK, I will wait for it and don't implement the HAMT. |
Immutable collections are convenient for backtracking algorithms. The implementation is based on Blackdrag's PCollections. http://pcollections.org/ As the license of PCollections is MIT license, it can be merged to Groovy, which is Apache license. I modified API, and wrote unit tests, and fixed bugs. What I changed. 1. All the public class names start with `Immutable`. 2. All the implementation classes are package scope and marked `Serializable`. 3. `Empty` class is renamed to `ImmutableCollections`. 4. `ImmutableCollections` can create not only empty collections but also convert from an `Iterable` to an immutable collection. HashTreePBag, HashTreePMap and HashTreePSet, which are static convenience classes, are removed. 5. `PVector` is renamed to `ImmutableList`. 6. `POrderedSet` is renamed to `ImmutableListSet`. The name `ListSet` comes from Scala. I didn't use the name `OrderedSet` because it does not have a comparator. 7. `minus()` of PCollections can remove an element and also can remove a specified position element, but this API creates bugs on using `ImmutableList<Integer>`. Therefore I changed one to `minusAt()`. For consistency, I also renamed to `plusAt()`. And `with()` is renamed to `replaceAt()`. 8. Removed `PSequence` interface. `ImmutableStack` extends `ImmutableList` like `Stack` extends `List`. 9. A new instance can be created by `[] as ImmutableList`. 10. `Iterator` of `ImmutableList` and `ImmutableStack` are same as the original, but `ListIterator` is backed by `Object[]`. I think this is faster than the original. 11. `DefaultGroovyMethods.plus()` , `minus()`, `multiply()` can handle `ImmutableCollections`. 12. `minus(no argument)` of `ImmutableQueue` is renamed to `tail()`. 13. Renamed `Collection.asImmutable()` to `Collection.asUnmodifiable()`, and deprecated the original method. 14. Implemented `Object[] as Queue` and `Object[] as Stack`. 15. Changed `ImmutableQueue` to `ImmutableDeque`. 16. Changed scope of `ImmutableStack` to package scope. Use `ImmutableDeque` instead. 17. Changed scope of `ImmutableBag` to package scope. Use `ImmutableList` instead. 18. Added `minusAt()` and `replaceAt()` to `List` and `Object[]`
Add immutable collections.
Immutable collections are convenient for backtracking algorithms.
The implementation is based on Blackdrag's PCollections.
http://pcollections.org/
As the license of PCollections is MIT license,
it can be merged to Groovy, which is Apache license.
I modified API, and wrote unit tests, and fixed bugs.
What I changed.
Immutable
.Serializable
.Empty
class is renamed toImmutableCollections
.ImmutableCollections
can create not only empty collections but also convert from anIterable
to an immutable collection.HashTreePBag
,HashTreePMap
andHashTreePSet
, which are static convenience classes, are removed.PVector
is renamed toImmutableList
.POrderedSet
is renamed toImmutableListSet
. The nameListSet
comes from Scala. I didn't use the nameOrderedSet
because it does not have a comparator.minus()
of PCollections can remove an element and also can remove a specified position element, but this API creates bugs on usingImmutableList<Integer>
. Therefore I changed one tominusAt()
. For consistency, I also renamed toplusAt()
. Andwith()
is renamed toreplaceAt()
.PSequence
interface.ImmutableStack
extendsImmutableList
likeStack
extendsList
.[] as ImmutableList
.Iterator
ofImmutableList
andImmutableStack
are same as the original, butListIterator
is backed byObject[]
. I think this is faster than the original.getAt()
toImmutableList
,ImmutableStack
andImmutaleListSet
.DefaultGroovyMethods.plus()
,minus()
,multiply()
can handleImmutableCollections
.minus(no argument)
ofImmutableQueue
is renamed totail()
.Collection.asImmutable()
toCollection.asUnmodifiable()
, and deprecated the original method.Object[] as Queue
andObject[] as Stack
.