reduce
isn’t always the fastest way to transform or iterate over a collection. Especially if: 1. Primitive math is involved or 2. Multiple accumulators are needed or 3. Multiple sequences are involved.
Iterating over a sequence can be done with many functions in Clojure. There is map
, filter
, keep
, mapcat
and many more. Usually if none of the standard sequence transforming functions fit, developers tend to fall back to reduce
and more seldom to a manual loop
. These latter two are very flexible since you can do any transformation you’d like
All at the same time. reduce
also has a (deserved) reputation of being very fast since each collection itself will run it’s own optimized reducing code.
In this article I will be treating 2 problems:
Reducing into multiple valuesSuch as:- count all positive, count all odd number - calculate the sum and the product of all numbers in a collection (iterating once)
Iterating over multiple sequences at the same time.Such as:- calculate the transpose of the vectors - select all values from the one vector given another vector
In the examples I’ll be using vectors since that’s the most common data type to be used when you care about performance.
Let’s start with problem #1: To keep the code simple we’ll count the number of odd and even values of 10 million numbers:
229ms
Using Criterium and a properly configured JVM this runs in about 229ms. For comparison: A simple (reduce (fn [s _] (inc s)) 0 xs)
runs in about 82ms. So almost 3x slower just because we have two accumulators instead of one.
Can we do better? Let’s try a manual simple loop
:
148ms
This runs in about 148ms. Why is it so much faster? Multiple reasons:
However for such a simple even? — odd?
comparison it’s still much slower than the baseline of 82ms.
Internally, a vector is made up of chunks of Java arrays of length 32. A call to next
returns a sequential view on the part of the vector (a ChunkedSeq
). So this loop
generates a new object on every iteration just to throw it away at the next iteration of the loop. This mean we’re generating garbage which the JVM needs to collect.
Obviously, iterating over this array of the vector would be fastest. This is exactly what reduce
does internally. Can we somehow get access to this raw array and write our own loop? Many functions and macros do know about chunked sequences and this is the reason that map
is pretty fast on vectors.
Another candidate is doseq
which iterates over the chunked buffer (Run a (macroexpand '(doseq [x xs] “abc”))
to see how). Let’s use some tricks to speed things up:
109ms
This runs in about 109ms. Better! But now we left our nice immutable realm and the code is quite ugly. Also, this approach only works with a limited set of accumulators. What if we wanted to accumulate into another vector or a persistent map? It’s inflexible. Another even uglier version is to use arrays and mutate them:
73ms
This runs in about 73ms. Pretty fast, but als pretty ugly, error prone and inflexible if we wanted to accumulate values other than numbers.
Let’s finally introduce the construct that this article is all about: Using Iterator
. Most language have some kind of iterators, the Java ones are pretty simple, they only consist of two methods on their interface: it.hasNext()
and it.next()
. Let’s use it:
70ms
This runs in about 70ms. It’s only a tiny bit faster than using the mutable array approach but we’re much more flexible: We can accumulate any values we’d like to! The iterator is very fast, since it produces very little garbage in each iteration and has fast implementations for most data structures. The it.hasNext()
check only does an index comparison and the it.next()
call only an index lookup for PersistenVector
.
Side note: I wrote a macro which manually iterates over the chunked buffer, similarly to doseq
but also accumulates odd/even within the loop. It only ran about 7% faster (down to ~65ms). I won’t show it here since it’s much less flexible and only works for vector and makes assumptions about the internals of chunked sequences.
Let’s write a macro to make life easier:
We can now rewrite the above nicely:
Same as above but using the new macro.
Note: Telling Cursive to treat the [loop-it](https://cursive-ide.com/userguide/macros.html)
like a [for](https://cursive-ide.com/userguide/macros.html)
gives us nice syntax highlighting.
Let’s now go to our second problem: Iterating over multiple sequences. For this we’ll consider the problem of transposing two vectors. I.e: Given [a b c]
and [0 1 2]
we create [[a 0] [b 1] [c 2]]
as the output. Not too many function in clojure.core
can deal with multiple input sequences: As far as I can tell only map
, mapv
, pmap
, sequence
, interleave
, lazy-cat
and mapcat
can. So if you need to iterate over multiple collection you have to use these limited set of functions.
Instead, using iterators we can easily do this:
20ms
This runs in about 20ms for two vectors with 1 million elements. Using (mapv vector xs xs)
runs in about 85ms. So over 4x faster when using iterators. Of course, it’s slightly unfair since we’re not providing a function f
to call here. Doing so only slows down the function by about 1–2% however.
Side note: If you really need this to be fast you can use also use arrays to get this down to 10ms:
There are many applications for loop-it
. We can also use it to create a faster interleave
function (not lazy obviously):
26ms
This runs in about 26ms whereas (doall (interleave xs xs))
runs in about 64ms. A nice speedup.
Of course I hear you say: map
is much more flexible: It can take a variable number of sequences:
Heck, it can even properly deal with infinite sequences:
Can we achieve the same with loop-it
? Not in its current state: It requires you — at compile time — to give it the sequences you want to iterate over. For a variable number of sequences we can use a trick (copied from clojure.core
): Transform a variable number of iterators into a single iterator. The new iterator returns a collection for each call to it.next()
. We call this a multi iterator. Since the class in clojure.core
is private we have to rewrite it:
Just like clojure.lang.MultiIterator
it uses native arrays for speed. However, one problem remains: It cannot deal with infinite number of sequences since it is eager. So let’s create another lazy one:
Now colls
can be an infinite sequence.
Let’s actually see how we can use it:
It’s a start. But before benchmarking, let’s combine and unroll some more to come up with a really fast mapv
:
Note: I also change the loop-it
macro to call ensure-iter
instead of clojure.lang.RT/iter
.
Let’s benchmark:
(mapv vector xs xs xs xs)
For 1 million elements this runs in about 1.28 seconds. Our new iteration based implementation runs this in only 147ms. Almost a 9x speedup and the implementation is pretty readable.
take-nth
:
This runs in 52ms for (take-nth-vec 5 xs-large)
where xs-large
has 10 million elements. (into [] (take-nth 5) xs-large)
runs in 273ms and (doall (take-nth 5 xs-large))
runs in 290ms.
2. Partition your sequence into two sets: all-truthy-values
, all-falsy-values
:
This runs in about 176ms for the loop-it
based implementation and 291ms for reduce
based implementation. And IMO the loop is more readable.
Side note: This is how many other programming languages define partition
.
3. Compute the sum of a collection.
From top to bottom: 177ms, 130ms, 59ms
All this is also possible in Clojurescript. Creating an iter is done with iter
, only a change to ensure-iter
is a little trickier.
An Iterator
is pretty simple: it.hasNext()
and it.next()
. A really nice feature would an improved iterator that also allows us to say “give me the rest of the sequence right now”. Something like it.restSeq()
. This could allow us to speed up things like nthrest
which currently isn’t possible with iterators. One notable idea is to speed up apply
which isn’t all that fast. It currently iterates over the given sequence multiple times. See my clojurescript ticket for ideas. If an Iterator had a it.restSeq()
it could be used right there.
Don’t forget about Iterators
in clojure. They can often be worth it. Use them if you need to:
As always: If performance matters: Benchmark. There are times when using an Iterator is not faster. For instance, an eager mapcatv
implementation was slower with a nested loop-it
implementation (but faster using reduce
for the inner loop).
All code is at: https://github.com/rauhs/clj-bench
Note: Since loop-it
is really small you should just copy it. There is no clojars library.