Faster Clojure Reduceby@rauh

# Faster Clojure Reduce

November 30th, 2017

<code class="markup--code markup--p-code">reduce</code> isn’t always the fastest way to transform or iterate over a collection. Especially if: 1. Primitive math is involved or 2. Multiple accumulators are needed or 3. Multiple sequences are involved.

### Summary

`reduce` isn’t always the fastest way to transform or iterate over a collection. Especially if: 1. Primitive math is involved or 2. Multiple accumulators are needed or 3. Multiple sequences are involved.

#### Intro

Iterating over a sequence can be done with many functions in Clojure. There is `map`, `filter`, `keep`, `mapcat` and many more. Usually if none of the standard sequence transforming functions fit, developers tend to fall back to `reduce` and more seldom to a manual `loop`. These latter two are very flexible since you can do any transformation you’d like

1. Reduce to single value
2. Ignore (filter) some values
3. Map some values

All at the same time. `reduce` also has a (deserved) reputation of being very fast since each collection itself will run it’s own optimized reducing code.

#### Problem

1. Reducing into multiple valuesSuch as:- count all positive, count all odd number - calculate the sum and the product of all numbers in a collection (iterating once)

2. Iterating over multiple sequences at the same time.Such as:- calculate the transpose of the vectors - select all values from the one vector given another vector

In the examples I’ll be using vectors since that’s the most common data type to be used when you care about performance.

Let’s start with problem #1: To keep the code simple we’ll count the number of odd and even values of 10 million numbers:

229ms

Using Criterium and a properly configured JVM this runs in about 229ms. For comparison: A simple `(reduce (fn [s _] (inc s)) 0 xs)` runs in about 82ms. So almost 3x slower just because we have two accumulators instead of one.

Can we do better? Let’s try a manual simple `loop`:

148ms

This runs in about 148ms. Why is it so much faster? Multiple reasons:

1. We’re not destructing and constructing a new vector on every iteration.
2. We’re doing primitive math inside the loop.

However for such a simple `even? — odd?` comparison it’s still much slower than the baseline of 82ms.

Internally, a vector is made up of chunks of Java arrays of length 32. A call to `next` returns a sequential view on the part of the vector (a `ChunkedSeq`). So this `loop` generates a new object on every iteration just to throw it away at the next iteration of the loop. This mean we’re generating garbage which the JVM needs to collect.

Obviously, iterating over this array of the vector would be fastest. This is exactly what `reduce` does internally. Can we somehow get access to this raw array and write our own loop? Many functions and macros do know about chunked sequences and this is the reason that `map` is pretty fast on vectors.

Another candidate is `doseq` which iterates over the chunked buffer (Run a `(macroexpand '(doseq [x xs] “abc”))` to see how). Let’s use some tricks to speed things up:

109ms

This runs in about 109ms. Better! But now we left our nice immutable realm and the code is quite ugly. Also, this approach only works with a limited set of accumulators. What if we wanted to accumulate into another vector or a persistent map? It’s inflexible. Another even uglier version is to use arrays and mutate them:

73ms

This runs in about 73ms. Pretty fast, but als pretty ugly, error prone and inflexible if we wanted to accumulate values other than numbers.

#### A forgotten approach

Let’s finally introduce the construct that this article is all about: Using `Iterator`. Most language have some kind of iterators, the Java ones are pretty simple, they only consist of two methods on their interface: `it.hasNext()` and `it.next()`. Let’s use it:

70ms

This runs in about 70ms. It’s only a tiny bit faster than using the mutable array approach but we’re much more flexible: We can accumulate any values we’d like to! The iterator is very fast, since it produces very little garbage in each iteration and has fast implementations for most data structures. The `it.hasNext()` check only does an index comparison and the `it.next()` call only an index lookup for `PersistenVector`.

Side note: I wrote a macro which manually iterates over the chunked buffer, similarly to `doseq` but also accumulates odd/even within the loop. It only ran about 7% faster (down to ~65ms). I won’t show it here since it’s much less flexible and only works for vector and makes assumptions about the internals of chunked sequences.

#### A macro is born

Let’s write a macro to make life easier:

We can now rewrite the above nicely:

Same as above but using the new macro.

Note: Telling Cursive to treat the `[loop-it](https://cursive-ide.com/userguide/macros.html)` like a `[for](https://cursive-ide.com/userguide/macros.html)` gives us nice syntax highlighting.

Let’s now go to our second problem: Iterating over multiple sequences. For this we’ll consider the problem of transposing two vectors. I.e: Given `[a b c]` and `[0 1 2]` we create `[[a 0] [b 1] [c 2]]` as the output. Not too many function in `clojure.core` can deal with multiple input sequences: As far as I can tell only `map`, `mapv`, `pmap`, `sequence`, `interleave`, `lazy-cat` and `mapcat` can. So if you need to iterate over multiple collection you have to use these limited set of functions.

Instead, using iterators we can easily do this:

20ms

This runs in about 20ms for two vectors with 1 million elements. Using `(mapv vector xs xs)` runs in about 85ms. So over 4x faster when using iterators. Of course, it’s slightly unfair since we’re not providing a function `f` to call here. Doing so only slows down the function by about 1–2% however.

Side note: If you really need this to be fast you can use also use arrays to get this down to 10ms:

There are many applications for `loop-it`. We can also use it to create a faster `interleave` function (not lazy obviously):

26ms

This runs in about 26ms whereas `(doall (interleave xs xs))` runs in about 64ms. A nice speedup.

Of course I hear you say: `map` is much more flexible: It can take a variable number of sequences:

Heck, it can even properly deal with infinite sequences:

Can we achieve the same with `loop-it`? Not in its current state: It requires you — at compile time — to give it the sequences you want to iterate over. For a variable number of sequences we can use a trick (copied from `clojure.core`): Transform a variable number of iterators into a single iterator. The new iterator returns a collection for each call to `it.next()`. We call this a multi iterator. Since the class in `clojure.core` is private we have to rewrite it:

Just like `clojure.lang.MultiIterator` it uses native arrays for speed. However, one problem remains: It cannot deal with infinite number of sequences since it is eager. So let’s create another lazy one:

Now `colls` can be an infinite sequence.

Let’s actually see how we can use it:

It’s a start. But before benchmarking, let’s combine and unroll some more to come up with a really fast `mapv`:

Note: I also change the `loop-it` macro to call `ensure-iter` instead of `clojure.lang.RT/iter`.

Let’s benchmark:

(mapv vector xs xs xs xs)

For 1 million elements this runs in about 1.28 seconds. Our new iteration based implementation runs this in only 147ms. Almost a 9x speedup and the implementation is pretty readable.

#### Other applications

1. Let’s write a really fast `take-nth`:

This runs in 52ms for `(take-nth-vec 5 xs-large)` where `xs-large` has 10 million elements. `(into [] (take-nth 5) xs-large)` runs in 273ms and `(doall (take-nth 5 xs-large))` runs in 290ms.

2. Partition your sequence into two sets: `all-truthy-values`, `all-falsy-values`:

This runs in about 176ms for the `loop-it` based implementation and 291ms for `reduce` based implementation. And IMO the loop is more readable.

Side note: This is how many other programming languages define `partition`.

3. Compute the sum of a collection.

From top to bottom: 177ms, 130ms, 59ms

#### Clojurescript

All this is also possible in Clojurescript. Creating an iter is done with `iter`, only a change to `ensure-iter` is a little trickier.

#### Dreams

An `Iterator` is pretty simple: `it.hasNext()` and `it.next()`. A really nice feature would an improved iterator that also allows us to say “give me the rest of the sequence right now”. Something like `it.restSeq()`. This could allow us to speed up things like `nthrest` which currently isn’t possible with iterators. One notable idea is to speed up `apply` which isn’t all that fast. It currently iterates over the given sequence multiple times. See my clojurescript ticket for ideas. If an Iterator had a `it.restSeq()` it could be used right there.

#### Conclusion

Don’t forget about `Iterators` in clojure. They can often be worth it. Use them if you need to:

• Accumulate multiple values
• Want to do primitive math
• Want to iterate over multiple sequences at the same time

As always: If performance matters: Benchmark. There are times when using an Iterator is not faster. For instance, an eager `mapcatv` implementation was slower with a nested `loop-it` implementation (but faster using `reduce` for the inner loop).

All code is at: https://github.com/rauhs/clj-bench

Note: Since `loop-it` is really small you should just copy it. There is no clojars library.

L O A D I N G
. . . comments & more!