reduce isn’t always the fastest way to transform or iterate over a collection. Especially if: 1. Primitive math is involved or 2. Multiple accumulators are needed or 3. Multiple sequences are involved.
Iterating over a sequence can be done with many functions in Clojure. There is
mapcat and many more. Usually if none of the standard sequence transforming functions fit, developers tend to fall back to
reduce and more seldom to a manual
loop. These latter two are very flexible since you can do any transformation you’d like
- Reduce to single value
- Ignore (filter) some values
- Map some values
- Add more values
All at the same time.
reduce also has a (deserved) reputation of being very fast since each collection itself will run it’s own optimized reducing code.
In this article I will be treating 2 problems:
- Reducing into multiple values
- count all positive, count all odd number
- calculate the sum and the product of all numbers in a collection (iterating once)
- Iterating over multiple sequences at the same time.
- calculate the transpose of the vectors
- select all values from the one vector given another vector
In the examples I’ll be using vectors since that’s the most common data type to be used when you care about performance.
Let’s start with problem #1: To keep the code simple we’ll count the number of odd and even values of 10 million numbers:
Using Criterium and a properly configured JVM this runs in about 229ms. For comparison: A simple
(reduce (fn [s _] (inc s)) 0 xs) runs in about 82ms. So almost 3x slower just because we have two accumulators instead of one.
Can we do better? Let’s try a manual simple
This runs in about 148ms. Why is it so much faster? Multiple reasons:
- We’re not destructing and constructing a new vector on every iteration.
- We’re doing primitive math inside the loop.
However for such a simple
even? — odd? comparison it’s still much slower than the baseline of 82ms.
Internally, a vector is made up of chunks of Java arrays of length 32. A call to
next returns a sequential view on the part of the vector (a
ChunkedSeq). So this
loop generates a new object on every iteration just to throw it away at the next iteration of the loop. This mean we’re generating garbage which the JVM needs to collect.
Obviously, iterating over this array of the vector would be fastest. This is exactly what
reduce does internally. Can we somehow get access to this raw array and write our own loop? Many functions and macros do know about chunked sequences and this is the reason that
map is pretty fast on vectors.
Another candidate is
doseq which iterates over the chunked buffer (Run a
(macroexpand '(doseq [x xs] “abc”)) to see how). Let’s use some tricks to speed things up:
This runs in about 109ms. Better! But now we left our nice immutable realm and the code is quite ugly. Also, this approach only works with a limited set of accumulators. What if we wanted to accumulate into another vector or a persistent map? It’s inflexible. Another even uglier version is to use arrays and mutate them:
This runs in about 73ms. Pretty fast, but als pretty ugly, error prone and inflexible if we wanted to accumulate values other than numbers.
A forgotten approach
Let’s finally introduce the construct that this article is all about: Using
Iterator. Most language have some kind of iterators, the Java ones are pretty simple, they only consist of two methods on their interface:
it.next(). Let’s use it:
This runs in about 70ms. It’s only a tiny bit faster than using the mutable array approach but we’re much more flexible: We can accumulate any values we’d like to! The iterator is very fast, since it produces very little garbage in each iteration and has fast implementations for most data structures. The
it.hasNext() check only does an index comparison and the
it.next() call only an index lookup for
Side note: I wrote a macro which manually iterates over the chunked buffer, similarly to
doseq but also accumulates odd/even within the loop. It only ran about 7% faster (down to ~65ms). I won’t show it here since it’s much less flexible and only works for vector and makes assumptions about the internals of chunked sequences.
A macro is born
Let’s write a macro to make life easier:
We can now rewrite the above nicely:
Let’s now go to our second problem: Iterating over multiple sequences. For this we’ll consider the problem of transposing two vectors. I.e: Given
[a b c] and
[0 1 2] we create
[[a 0] [b 1] [c 2]] as the output. Not too many function in
clojure.core can deal with multiple input sequences: As far as I can tell only
mapcat can. So if you need to iterate over multiple collection you have to use these limited set of functions.
Instead, using iterators we can easily do this:
This runs in about 20ms for two vectors with 1 million elements. Using
(mapv vector xs xs) runs in about 85ms. So over 4x faster when using iterators. Of course, it’s slightly unfair since we’re not providing a function
f to call here. Doing so only slows down the function by about 1–2% however.
Side note: If you really need this to be fast you can use also use arrays to get this down to 10ms:
There are many applications for
loop-it. We can also use it to create a faster
interleave function (not lazy obviously):
This runs in about 26ms whereas
(doall (interleave xs xs)) runs in about 64ms. A nice speedup.
Of course I hear you say:
map is much more flexible: It can take a variable number of sequences:
Heck, it can even properly deal with infinite sequences:
Can we achieve the same with
loop-it? Not in its current state: It requires you — at compile time — to give it the sequences you want to iterate over. For a variable number of sequences we can use a trick (copied from
clojure.core): Transform a variable number of iterators into a single iterator. The new iterator returns a collection for each call to
it.next(). We call this a multi iterator. Since the class in
clojure.core is private we have to rewrite it:
clojure.lang.MultiIterator it uses native arrays for speed. However, one problem remains: It cannot deal with infinite number of sequences since it is eager. So let’s create another lazy one:
colls can be an infinite sequence.
Let’s actually see how we can use it:
It’s a start. But before benchmarking, let’s combine and unroll some more to come up with a really fast
Note: I also change the
loop-it macro to call
ensure-iter instead of
(mapv vector xs xs xs xs)
For 1 million elements this runs in about 1.28 seconds. Our new iteration based implementation runs this in only 147ms. Almost a 9x speedup and the implementation is pretty readable.
- Let’s write a really fast
This runs in 52ms for
(take-nth-vec 5 xs-large) where
xs-large has 10 million elements.
(into  (take-nth 5) xs-large) runs in 273ms and
(doall (take-nth 5 xs-large)) runs in 290ms.
2. Partition your sequence into two sets:
This runs in about 176ms for the
loop-it based implementation and 291ms for
reduce based implementation. And IMO the loop is more readable.
Side note: This is how many other programming languages define
3. Compute the sum of a collection.
From top to bottom: 177ms, 130ms, 59ms
All this is also possible in Clojurescript. Creating an iter is done with
iter, only a change to
ensure-iter is a little trickier.
Iterator is pretty simple:
it.next(). A really nice feature would an improved iterator that also allows us to say “give me the rest of the sequence right now”. Something like
it.restSeq(). This could allow us to speed up things like
nthrest which currently isn’t possible with iterators. One notable idea is to speed up
apply which isn’t all that fast. It currently iterates over the given sequence multiple times. See my clojurescript ticket for ideas. If an Iterator had a
it.restSeq() it could be used right there.
Don’t forget about
Iterators in clojure. They can often be worth it. Use them if you need to:
- Accumulate multiple values
- Want to do primitive math
- Want to iterate over multiple sequences at the same time
As always: If performance matters: Benchmark. There are times when using an Iterator is not faster. For instance, an eager
mapcatv implementation was slower with a nested
loop-it implementation (but faster using
reduce for the inner loop).
All code is at: https://github.com/rauhs/clj-bench
loop-it is really small you should just copy it. There is no clojars library.