What started as a collection of efficient mutable and immutable data structures based on Phil Bagwell's bitmap trie concept became an overall reimplementation of Clojure's core datastructures and some of its base concepts specifically with performance in mind. This means, for instance, that the library prefers iterators over sequences in many situations. There are also new functional primitives developed from my experience processing data and working with Clojure over the last 10 years.
My hope is this library becomes a platform to experiment and develop new and/or better functional datastructures and algorithmic primitives.
Here are a few concepts to keep in mind -
The mutable hashmap and vector implementations allow in-place instantaneous conversion to
their persistent counterparts. This allows you to build a dataset using the sometimes much
faster mutable primitives (.compute and friends for instance in the hashmap case) and then
return data to the rest of the program in persistent form. Using this method, for example
frequencies
is quite a bit faster while still returning a persistent datastructure.
Along those lines construction of a persistent vector from an object array is very fast so it is very efficient to construct a persistent vector from an object-array-list - the array list being much faster to build.
There are many new primitive operations than listed below - please take a moment to scan the api docs. Some standouts are:
Aside from simply a reimplementation of hashmaps and persistent vectors this library
also introduces a few new algorithms namely map-union, map-intersection, and
map-difference. These are implemented at the trie level so they avoid rehashing any
keys and use the structure of the hashmap in order to boost performance. This means
merge
and merge-with
are much faster especially if you have larger maps. But it
also means you can design novel set-boolean operations as you provide a
value-resolution operator for the map values.
Because the hamf hashmaps have fast unions, you can now design systems where for instance each thread builds up a separate hashmap and the results are unioned together back in the main thread in a map-reduce type design. This type of design was the original target of the union system.
These systems are substantially faster if the objects have a high cost to hash and do not cache their hashcode but this is rare for Clojure systems as persistent vectors, maps and keywords cache their hash values. Strings, however, are an example of something where these primitives (and things like frequencies) will perform substantially better.
Float and double values are only allowed to cast to long if they are
finite.
Boolean values casted to long or double are 0 for false and 1 for true. Any nonzero
finite number casted to boolean is true, 0 is false, non-finite numbers are errors. nil
casted to a floating point number is NaN. NaN casted to an object is NaN. If objects
are not number then nil is false and non-nil is true. Any undefined cast falls back to
clojure.lang.RT.xCast
where x
denotes the target type.
Reading data from contains leads to unchecked casts while writing data to contains leads to checked casts.
update-vals
- is far faster than map->map pathways if you want to update every value in the map but leave the keys unchanged.group-by-reduce
- perform a reduction during the group-by. This avoids keeping a large map of the complete intermediate values which can be both faster and more memory efficient.mapmap
- A techascent favorite, equivalent to:
(->> (map map-fn src-map) (remove nil?) (into {}))
I have an entire topic on Reductions - give it a read and then send me an email with your thoughts :-).
These parallelization primitives allow users to pass in their own forkjoin pools
so you can use it for blocking tasks although it is setup be default for
cpu-bound operations. Concat operations can parallelize reductions over
non-finite or non-parallelizable containers using the default :seq-wise
:cat-parallelization
option.
- Any array including any primitive array can be converted to an indexed operator
with efficient sort, reduce, etc. implementations using the
lazy-noncaching
namespace's->random-access
operator. This allows you to pass arrays as is to the rest of your clojure program without conversion to a persistent vector - something that is both not particularly efficient and explodes the data size.
All random access containers, be it vectors, lists, or array lists support nth, ifn interfaces taking -1 to index from the end of the vector. For performance reasons, the implementation of List.get does not -
ham-fisted.persistent-vector-test> ((api/vec (range 10)) -1)
9
ham-fisted.persistent-vector-test> (nth (api/vec (range 10)) -1)
9
ham-fisted.persistent-vector-test> (.get (api/vec (range 10)) -1)
Execution error (IndexOutOfBoundsException) at ham_fisted.ChunkedList/indexCheck (ChunkedList.java:210).
Index underflow: -1
ham-fisted.persistent-vector-test> ((api/->random-access (api/int-array (range 10))) -1)
9
ham-fisted.persistent-vector-test> (nth (api/->random-access (api/int-array (range 10))) -1)
9
Similarly last
is constant time for all any list implementation deriving from
java.util.RandomAccess
.
-
lazy-noncaching
namespace contains very efficient implementations of map, filter, concat, and repeatedly which perform as good as or better than the eduction variants without chunking or requiring you to convert your code from naive clojure to transducer form. The drawback is they are lazy noncaching so for instance(repeatedly 10 rand)
will produce 10 random values every time it is evaluated. Furthermoremap
will produce a random-access return value if passed in all random-access inputs thus preserving the random-access property of the input. -
lazy-caching namespace contains inefficient implementations that do in fact cache - it appears that Clojure's base implementation is very good or at least good enough I can't haven't come up with one better. Potentially the decision to use chunking is the best optimization available here.
The best way to contribute is to fund me through github sponsors linked to the right or to engage TechAscent - we are always looking for new interesting projects and partners.
Aside from that as mentioned earlier my hope is this library becomes a platform that enables experimentation with various functional primitives and overall optimized ways of doing the type of programming that the Clojure community enjoys. Don't hesitate to file issues and PR's - I am happy to accept both.
If you want to work on the library you need to enable the :dev
alias.
Lies, damn lies, and benchmarks - you can run the benchmarks with ./scripts/benchmark
.
Results will be printed to the console and saved to results directory prefixed by the
commit, your machine name and the jdk version.
Results will print normalized to either the base time for clojure.core (clj) or for java.util (java). One interesting thing here is in general how much better JDK-17 is for many of these tests than JDK-8.
Here are some example timings taken using my laptop plugged in with an external cooling supply (frozen peas) applied to the bottom of the machine. An interesting side note is that I get better timings often when running from the REPL for specific benchmarks than from the benchmark - perhaps due to the machine's heat management systems.
:test | :n-elems | :java | :clj | :eduction | :hamf | :norm-factor-μs |
---|---|---|---|---|---|---|
:assoc-in | 5 | 1.0 | 0.646 | 0.245 | ||
:assoc-in-nil | 5 | 1.0 | 0.371 | 0.120 | ||
:concatv | 100 | 1.0 | 0.099 | 9.827 | ||
:frequencies | 10000 | 1.0 | 0.412 | 966.154 | ||
:get-in | 5 | 1.0 | 0.564 | 0.124 | ||
:group-by | 10000 | 1.0 | 0.333 | 1414.480 | ||
:group-by-reduce | 10000 | 1.0 | 0.313 | 1408.028 | ||
:hashmap-access | 10000 | 0.700 | 1.0 | 0.989 | 549.468 | |
:hashmap-access | 10 | 0.837 | 1.0 | 0.826 | 0.400 | |
:hashmap-cons-obj-ary | 4 | 1.0 | 0.355 | 0.392 | ||
:hashmap-cons-obj-ary | 10 | 1.0 | 0.584 | 0.864 | ||
:hashmap-cons-obj-ary | 1000 | 1.0 | 0.541 | 124.811 | ||
:hashmap-construction | 10000 | 0.563 | 1.0 | 0.923 | 1331.130 | |
:hashmap-construction | 10 | 0.240 | 1.0 | 0.357 | 2.337 | |
:hashmap-reduce | 10000 | 0.792 | 1.0 | 0.860 | 316.433 | |
:hashmap-reduce | 10 | 0.703 | 1.0 | 0.735 | 0.360 | |
:int-list | 20000 | 1.000 | 1.147 | 467.994 | ||
:mapmap | 1000 | 1.0 | 0.276 | 275.786 | ||
:object-array | 20000 | 1.0 | 0.240 | 1560.975 | ||
:object-list | 20000 | 1.000 | 0.987 | 518.878 | ||
:sequence-summation | 20000 | 1.0 | 0.29 | 0.409 | 1380.496 | |
:shuffle | 10000 | 1.0 | 0.353 | 329.709 | ||
:sort | 10000 | 1.0 | 0.337 | 2418.307 | ||
:sort-doubles | 10000 | 1.0 | 0.374 | 2272.143 | ||
:sort-ints | 10000 | 1.0 | 0.291 | 2514.779 | ||
:union | 10 | 0.155 | 1.0 | 0.088 | 1.785 | |
:union | 10000 | 0.275 | 1.0 | 0.174 | 1664.823 | |
:union-disj | 10 | 0.156 | 1.0 | 0.085 | 1.798 | |
:union-disj | 10000 | 0.279 | 1.0 | 0.178 | 1641.344 | |
:union-reduce | 10 | 0.139 | 1.0 | 0.220 | 23.954 | |
:union-reduce | 10000 | 0.100 | 1.0 | 0.159 | 41261.663 | |
:update-in | 5 | 1.0 | 1.153 | 0.276 | ||
:update-in-nil | 5 | 1.0 | 0.276 | 0.158 | ||
:update-values | 1000 | 1.0 | 0.090 | 158.994 | ||
:vector-access | 10 | 1.568 | 1.0 | 1.008 | 77.945 | |
:vector-access | 10000 | 0.957 | 1.0 | 1.027 | 125.778 | |
:vector-cons-obj-array | 10 | 1.184 | 1.0 | 0.356 | 0.071 | |
:vector-cons-obj-array | 10000 | 0.083 | 1.0 | 0.048 | 112.192 | |
:vector-construction | 10 | 0.460 | 1.0 | 1.124 | 0.078 | |
:vector-construction | 10000 | 0.082 | 1.0 | 0.078 | 117.432 | |
:vector-reduce | 10 | 1.996 | 1.0 | 1.088 | 0.150 | |
:vector-reduce | 10000 | 1.228 | 1.0 | 0.863 | 194.194 | |
:vector-to-array | 10 | 0.256 | 1.0 | 0.503 | 0.041 | |
:vector-to-array | 10000 | 0.063 | 1.0 | 0.124 | 69.590 |
:test | :n-elems | :java | :clj | :eduction | :hamf | :norm-factor-μs |
---|---|---|---|---|---|---|
:assoc-in | 5 | 1.0 | 0.801 | 0.274 | ||
:assoc-in-nil | 5 | 1.0 | 0.275 | 0.142 | ||
:concatv | 100 | 1.0 | 0.120 | 6.810 | ||
:frequencies | 10000 | 1.0 | 0.421 | 960.710 | ||
:get-in | 5 | 1.0 | 0.598 | 0.125 | ||
:group-by | 10000 | 1.0 | 0.335 | 1410.690 | ||
:group-by-reduce | 10000 | 1.0 | 0.293 | 1433.528 | ||
:hashmap-access | 10000 | 0.817 | 1.0 | 1.046 | 541.540 | |
:hashmap-access | 10 | 0.791 | 1.0 | 0.904 | 0.402 | |
:hashmap-cons-obj-ary | 4 | 1.0 | 0.398 | 0.407 | ||
:hashmap-cons-obj-ary | 10 | 1.0 | 0.696 | 0.682 | ||
:hashmap-cons-obj-ary | 1000 | 1.0 | 0.449 | 130.423 | ||
:hashmap-construction | 10000 | 0.586 | 1.0 | 0.927 | 1281.371 | |
:hashmap-construction | 10 | 0.278 | 1.0 | 0.401 | 2.238 | |
:hashmap-reduce | 10000 | 0.625 | 1.0 | 0.704 | 321.295 | |
:hashmap-reduce | 10 | 0.714 | 1.0 | 0.862 | 0.312 | |
:int-list | 20000 | 1.000 | 1.017 | 497.492 | ||
:mapmap | 1000 | 1.0 | 0.325 | 226.518 | ||
:object-array | 20000 | 1.0 | 0.391 | 1374.725 | ||
:object-list | 20000 | 1.000 | 0.981 | 493.795 | ||
:sequence-summation | 20000 | 1.0 | 0.37 | 0.227 | 1342.562 | |
:shuffle | 10000 | 1.0 | 0.430 | 286.478 | ||
:sort | 10000 | 1.0 | 0.284 | 2839.552 | ||
:sort-doubles | 10000 | 1.0 | 0.424 | 2485.058 | ||
:sort-ints | 10000 | 1.0 | 0.279 | 2885.107 | ||
:union | 10 | 0.147 | 1.0 | 0.093 | 1.938 | |
:union | 10000 | 0.278 | 1.0 | 0.198 | 1446.922 | |
:union-disj | 10 | 0.144 | 1.0 | 0.091 | 1.938 | |
:union-disj | 10000 | 0.285 | 1.0 | 0.198 | 1440.777 | |
:union-reduce | 10 | 0.114 | 1.0 | 0.230 | 25.724 | |
:union-reduce | 10000 | 0.081 | 1.0 | 0.159 | 37008.010 | |
:update-in | 5 | 1.0 | 1.401 | 0.292 | ||
:update-in-nil | 5 | 1.0 | 0.289 | 0.138 | ||
:update-values | 1000 | 1.0 | 0.083 | 166.794 | ||
:vector-access | 10 | 1.563 | 1.0 | 1.131 | 86.023 | |
:vector-access | 10000 | 0.945 | 1.0 | 1.047 | 139.996 | |
:vector-cons-obj-array | 10 | 1.150 | 1.0 | 0.365 | 0.075 | |
:vector-cons-obj-array | 10000 | 0.066 | 1.0 | 0.040 | 103.369 | |
:vector-construction | 10 | 0.508 | 1.0 | 1.119 | 0.073 | |
:vector-construction | 10000 | 0.062 | 1.0 | 0.070 | 109.040 | |
:vector-reduce | 10 | 2.041 | 1.0 | 1.031 | 0.152 | |
:vector-reduce | 10000 | 1.392 | 1.0 | 1.026 | 146.636 | |
:vector-to-array | 10 | 0.284 | 1.0 | 0.569 | 0.036 | |
:vector-to-array | 10000 | 0.052 | 1.0 | 0.068 | 65.946 |
This code is minimally tested. The datastructures especially need serious testing, potentially generative testing of edge cases.
Also, microbenchmarks do not always indicate how your system will perform overall. For instance- when
testing assoc-in
, update-in
in this project we see better performance. In at least one real
world project, however, the inlining that makes the microbenchmark perform better definitely did
not result in the project running faster -- it ran a bit slower even though the profiler of the
original code indicated the sequence operations performed during assoc-in and update-in were a source
of some time.
The JVM is a complicated machine and there are issues with using, for instance, too many classes
at a particular callsite. Overall I would recommend profiling and being careful. My honest opinion
right now is that assoc-in
and update-in
do not improve program performance at least in
some of the use cases I have tested.
- clj-fast - Great and important library more focused on compiler upgrades.
- bifurcan - High speed functional datastructures for Java. Perhaps ham-fisted should be based on this or we should measure the differences and take the good parts.
- Clojure Goes Fast - Grandaddy aggregator project with a lot of important information and a set of crucial github projects such as clj-memory-meter.