5,882 reads

Faster Clojure Reduce

by Andre RauhNovember 30th, 2017

Too Long; Didn't Read

<code class="markup--code markup--p-code">reduce</code> isn’t always the fastest way to transform or iterate over a collection. Especially if: 1. Primitive math is involved or 2. Multiple accumulators are needed or 3. Multiple sequences are involved.

Company Mentioned

Summary

reduce isn’t always the fastest way to transform or iterate over a collection. Especially if: 1. Primitive math is involved or 2. Multiple accumulators are needed or 3. Multiple sequences are involved.

Intro

Iterating over a sequence can be done with many functions in Clojure. There is map, filter, keep, mapcat and many more. Usually if none of the standard sequence transforming functions fit, developers tend to fall back to reduce and more seldom to a manual loop. These latter two are very flexible since you can do any transformation you’d like

Reduce to single value
Ignore (filter) some values
Map some values
Add more values

All at the same time. reduce also has a (deserved) reputation of being very fast since each collection itself will run it’s own optimized reducing code.

Problem

In this article I will be treating 2 problems:

Reducing into multiple valuesSuch as:- count all positive, count all odd number - calculate the sum and the product of all numbers in a collection (iterating once)
Iterating over multiple sequences at the same time.Such as:- calculate the transpose of the vectors - select all values from the one vector given another vector

In the examples I’ll be using vectors since that’s the most common data type to be used when you care about performance.

Let’s start with problem #1: To keep the code simple we’ll count the number of odd and even values of 10 million numbers:

229ms

Using Criterium and a properly configured JVM this runs in about 229ms. For comparison: A simple (reduce (fn [s _] (inc s)) 0 xs) runs in about 82ms. So almost 3x slower just because we have two accumulators instead of one.

Can we do better? Let’s try a manual simple loop:

148ms

This runs in about 148ms. Why is it so much faster? Multiple reasons:

We’re not destructing and constructing a new vector on every iteration.
We’re doing primitive math inside the loop.

However for such a simple even? — odd? comparison it’s still much slower than the baseline of 82ms.

Internally, a vector is made up of chunks of Java arrays of length 32. A call to next returns a sequential view on the part of the vector (a ChunkedSeq). So this loop generates a new object on every iteration just to throw it away at the next iteration of the loop. This mean we’re generating garbage which the JVM needs to collect.

Obviously, iterating over this array of the vector would be fastest. This is exactly what reduce does internally. Can we somehow get access to this raw array and write our own loop? Many functions and macros do know about chunked sequences and this is the reason that map is pretty fast on vectors.

Another candidate is doseq which iterates over the chunked buffer (Run a (macroexpand '(doseq [x xs] “abc”)) to see how). Let’s use some tricks to speed things up:

109ms

This runs in about 109ms. Better! But now we left our nice immutable realm and the code is quite ugly. Also, this approach only works with a limited set of accumulators. What if we wanted to accumulate into another vector or a persistent map? It’s inflexible. Another even uglier version is to use arrays and mutate them:

73ms

This runs in about 73ms. Pretty fast, but als pretty ugly, error prone and inflexible if we wanted to accumulate values other than numbers.

A forgotten approach

Let’s finally introduce the construct that this article is all about: Using Iterator. Most language have some kind of iterators, the Java ones are pretty simple, they only consist of two methods on their interface: it.hasNext() and it.next(). Let’s use it:

70ms

This runs in about 70ms. It’s only a tiny bit faster than using the mutable array approach but we’re much more flexible: We can accumulate any values we’d like to! The iterator is very fast, since it produces very little garbage in each iteration and has fast implementations for most data structures. The it.hasNext() check only does an index comparison and the it.next() call only an index lookup for PersistenVector.

Side note: I wrote a macro which manually iterates over the chunked buffer, similarly to doseq but also accumulates odd/even within the loop. It only ran about 7% faster (down to ~65ms). I won’t show it here since it’s much less flexible and only works for vector and makes assumptions about the internals of chunked sequences.

A macro is born

Let’s write a macro to make life easier:

We can now rewrite the above nicely:

Same as above but using the new macro.

Note: Telling Cursive to treat the [loop-it](https://cursive-ide.com/userguide/macros.html) like a [for](https://cursive-ide.com/userguide/macros.html) gives us nice syntax highlighting.

Let’s now go to our second problem: Iterating over multiple sequences. For this we’ll consider the problem of transposing two vectors. I.e: Given [a b c] and [0 1 2] we create [[a 0] [b 1] [c 2]] as the output. Not too many function in clojure.core can deal with multiple input sequences: As far as I can tell only map, mapv, pmap, sequence, interleave, lazy-cat and mapcat can. So if you need to iterate over multiple collection you have to use these limited set of functions.

Instead, using iterators we can easily do this:

20ms

This runs in about 20ms for two vectors with 1 million elements. Using (mapv vector xs xs) runs in about 85ms. So over 4x faster when using iterators. Of course, it’s slightly unfair since we’re not providing a function f to call here. Doing so only slows down the function by about 1–2% however.

Side note: If you really need this to be fast you can use also use arrays to get this down to 10ms:

There are many applications for loop-it. We can also use it to create a faster interleave function (not lazy obviously):

26ms

This runs in about 26ms whereas (doall (interleave xs xs)) runs in about 64ms. A nice speedup.

Of course I hear you say: map is much more flexible: It can take a variable number of sequences:

Heck, it can even properly deal with infinite sequences:

Can we achieve the same with loop-it? Not in its current state: It requires you — at compile time — to give it the sequences you want to iterate over. For a variable number of sequences we can use a trick (copied from clojure.core): Transform a variable number of iterators into a single iterator. The new iterator returns a collection for each call to it.next(). We call this a multi iterator. Since the class in clojure.core is private we have to rewrite it:

Just like clojure.lang.MultiIterator it uses native arrays for speed. However, one problem remains: It cannot deal with infinite number of sequences since it is eager. So let’s create another lazy one:

Now colls can be an infinite sequence.

Let’s actually see how we can use it:

It’s a start. But before benchmarking, let’s combine and unroll some more to come up with a really fast mapv:

Note: I also change the loop-it macro to call ensure-iter instead of clojure.lang.RT/iter.

Let’s benchmark:

(mapv vector xs xs xs xs)

For 1 million elements this runs in about 1.28 seconds. Our new iteration based implementation runs this in only 147ms. Almost a 9x speedup and the implementation is pretty readable.

Other applications

Let’s write a really fast take-nth:

This runs in 52ms for (take-nth-vec 5 xs-large) where xs-large has 10 million elements. (into [] (take-nth 5) xs-large) runs in 273ms and (doall (take-nth 5 xs-large)) runs in 290ms.

2. Partition your sequence into two sets: all-truthy-values, all-falsy-values:

This runs in about 176ms for the loop-it based implementation and 291ms for reduce based implementation. And IMO the loop is more readable.

Side note: This is how many other programming languages define partition.

3. Compute the sum of a collection.

From top to bottom: 177ms, 130ms, 59ms

Clojurescript

All this is also possible in Clojurescript. Creating an iter is done with iter, only a change to ensure-iter is a little trickier.

Dreams

An Iterator is pretty simple: it.hasNext() and it.next(). A really nice feature would an improved iterator that also allows us to say “give me the rest of the sequence right now”. Something like it.restSeq(). This could allow us to speed up things like nthrest which currently isn’t possible with iterators. One notable idea is to speed up apply which isn’t all that fast. It currently iterates over the given sequence multiple times. See my clojurescript ticket for ideas. If an Iterator had a it.restSeq() it could be used right there.

Conclusion

Don’t forget about Iterators in clojure. They can often be worth it. Use them if you need to:

Accumulate multiple values
Want to do primitive math
Want to iterate over multiple sequences at the same time

As always: If performance matters: Benchmark. There are times when using an Iterator is not faster. For instance, an eager mapcatv implementation was slower with a nested loop-it implementation (but faster using reduce for the inner loop).

All code is at: https://github.com/rauhs/clj-bench

Note: Since loop-it is really small you should just copy it. There is no clojars library.