How to Optimize Scala Code Performance With Almost No Effort

How is it even possible?

In this article, I describe Scala performance tips based on my experience of optimizing Ergo's smart contracts interpreter. I found that some typical Scala code fragments have surprisingly much faster alternatives. The tips below can help you to avoid accidental performance issues and write faster Scala code for no additional effort. There is also source code with all the benchmarks.

Scala provides powerful constructs and idioms to build abstractions for correctness and clarity but may have runtime performance penalties. When solving a task, there may be several alternative solutions, each with its own performance characteristics. Even similar-looking code can have radically different performance.

Teams often specify things like formatting and documentation to make sure the code stays consistent and good quality. Similarly, tips on making the code run faster can also be useful. Performance improvement can come from almost no special effort. As such, it is a matter of coding style and habits rather than a matter of extensive code optimization, which shouldn't be premature.

The code speedups described below are only examples and not necessarily exact numbers, as there may be various factors that can impact the timings. Nevertheless, after systematically applying the outlined recommendations, I have observed strictly positive effects on performance.

Performance tips

Tip 1: Avoid Seq()

Let's start with the simplest. It's not uncommon to see the use of Seq() wherever an empty collection is required. However, this method involves the following:

an empty array is initiated
the array is wrapped into an instance of ArraySeq
the following methods called: SeqFactory$Delegate.apply -> IterableFactory.apply -> List.from -> Nil.prependAll
Nil object returned in the end

What to use instead

Simple Nil is 20-28% faster (see Benchmark: Seq) depending on whether the JIT is able to inline the Seq() invocation all the way down to Nil or not. So, why not help it and use Nil directly?

Tip 2: Avoid Map()

Next one is Map(). It is often used wherever an empty Map is required. However, as is the case with Seq(), this method involves allocations and many method calls.

What to use instead

Calling Map.empty is 25-85% faster, depending on the context (see Benchmark: Map)

Tip 3: Avoid `for` comprehensions

Looping pattern for (x <- xs) { ... } is used quite often due to its convenience. It looks very similar to the for loop in Java, but it is not.

In Scala, it is desugared by the compiler to xs.foreach { x => ... } which, besides execution of the block of code involves the following list of overheads:

foreach method call
allocation of lambda object
boxing (memory allocation) of the value x for every xs item (if xs values are not yet boxed)
hidden overhead of concrete implementation of foreach method

Such for comprehensions are particularly sub-optimal (see Benchmark: for over Range) for Range loops, where x is an Int.

What to use instead

There is a cfor macro from spire library.

I recommend the following code as a replacement when xs provides an O(1) indexing access, for example if xs is an Array or an array wrapped into Seq.

import spire.syntax.cfor._
cfor(0)(_ < xs.length, _ + 1) { i => 
  val x = xs(i)
      ...
}

The above xs collection can also be a Range in which case for (x <- 0 until n) { ... } can be replaced with

cforRange(0 until n) { x => ... }

These cfor macros are compiled to efficient Java loop without overhead points 1) - 4). Depending on xs.length it is 10-170x faster (see Benchmark: for over Range). And since foreach already implies a side effect operation, cfor doesn't make the code any less readable.

Note, however, that cfor is not a silver bullet. It is NOT a replacement for foreach in the case of non-indexed collections, when indexed access is NOT a fast O(1) operation. In many cases using cfor instead of foreach is a no-brainer and often increases the performance of any code which loops over an indexed collection (see for example Benchmark: dot-product and Benchmark: check brackets).

Tip 4: Avoid creating sequences with Seq(...)

It is tempting to use Seq.apply method where a Seq of items is required like Seq(1, 2, 3) because it is easy and concise. You can pass it as a method argument or as a method result.

However, the following happens under the hood:

new Array with the items 1, 2, 3 is created, and each item is boxed into its own Integer object
WrappedArray#ofInt wrapper is created and passed as vararg argument of Seq.apply
the above-mentioned methods down to Nil.prependAll are called
new ListBuffer is created and all items are copied from array to the buffer

Benchmarks show that JIT is not able to fully optimize this code.

What to use instead

Simple drop-in replacement Array(1, 2, 3) performs better. This code is 4-10x faster than using Seq(1, 2, 3). (see Benchmark: Seq vs Array). The speedup is even more significant when the number of items in the sequence grows (see Benchmark: Seq vs Array (from Range)).

What happens here is that:

native unboxed Java array is created and then
wrapped via implicit conversion into a WrappedArray#ofInt which is inherited from the Seq trait and can be used directly.

Why is this faster?

It avoids unnecessary allocations (only two new objects are allocated in the Array case). Note that to create Seq not only each Int is boxed into java.lang.Integer (or other primitive type), but also scala.collection.immutable.:: instances are created for each item of the list.
Creating Array avoids boxing (which is proportional to the size of Seq).
Accessing of array items is cache friendly both when the array is created and when it is later used.
Fewer allocations mean less garbage to collect later. This is especially important when the application is multithreaded because, in this case the garbage collector will compete with application threads for CPU resources, thus further slowing down the application.

Notice that using arrays like this doesn't make the code any less readable. Moreover, the arrays created in this way are only accessible via Seq interface, which is immutable. Thus, better performance is achieved without sacrificing other desirable code qualities like readability, safety, and conciseness.

Caveat: the array is not efficient if it is accessed using list operations head andtail. See, for example, Seq vs Array (list ops) benchmark. In this case, replacing Seq with Array may lead to performance degradation. However, using lists in a performance-critical code may not be a good idea as well.

Tip 5: Avoid extractors with Option

An extractor is an object with an unapply method taking an arbitrary argument and returning an Option of some (maybe different) type. Here is a simple example:

object IsEven:
  def unapply(n: Int): Option[Int] =
    if (n % 2 == 0) Some(n) else None

With extractors, pattern matching can be concise and expressive.

However, unapply returning an Option might have a negative impact on runtime performance because on every matching call an instance of Some needs to be created for each successful extraction.

What to use instead

Since version 2.11, Scala introduces name based extractors that no longer requires unapply to return an Option. Combined with value classes these two features allow implementing allocation-free extractors, which are 10-70% faster than the original Option based extractors (see Benchmark: Option vs Opt).

object IsEvenOpt:
  def unapply(n: Int): Opt[Int] =
    if (n % 2 == 0) Opt(n) else Opt.empty

A suitable implementation of Opt class can be found in the spire library.

Benchmarks Summary

The following tables summarize the results of the benchmarks run on the code snippets above and the following configuration: osArch: aarch64; osName: Mac OS X OpenJDK 64-Bit Server VM; vendor: Eclipse Adoptium; version: 17.0.5+8

You can reproduce the results on your computer by running the Benchmarks class.

Clone the repository and run sbt "Test/runMain benchmarks.Benchmark".

Benchmark: Seq

Size	Seq(), ms	Nil, ms	Speedup
10	0.0001250	0.0001250	1.00
100	0.0002080	0.0002080	1.00
1000	0.0007500	0.0006250	1.20
10000	0.0065830	0.0051250	1.28
100000	0.0625830	0.0502920	1.24

Benchmark: Map

Size	Map(), ms	Map.empty, ms	Speedup
10	0.0001250	0.0001250	1.00
100	0.0002080	0.0001660	1.25
1000	0.0010410	0.0006250	1.67
10000	0.0091670	0.0050830	1.80
100000	0.0902500	0.0487080	1.85

Benchmark: foreach vs cfor

Size	foreach, ms	cfor, ms	Speedup
10	0.0006660	0.0000830	8.02
100	0.0007500	0.0000830	9.04
1000	0.0025830	0.0001250	20.66
10000	0.0238330	0.0003330	71.57
100000	0.2244170	0.0042910	52.30

Benchmark: for over Range

Size	foreach, ms	cforRange, ms	Speedup
10	0.0004160	0.0000830	5.01
100	0.0009160	0.0000830	11.04
1000	0.0061250	0.0001250	49.00
10000	0.0569170	0.0003330	170.92
100000	0.5648750	0.0042500	132.91

Benchmark: Seq vs Array (from elems)

Size	Seq, ms	Array, ms	Speedup
10	0.0007910	0.0002080	3.80
100	0.0066250	0.0008340	7.94
1000	0.0661660	0.0160000	4.14
10000	0.6733330	0.0679170	9.91
100000	6.8687920	0.6750420	10.18

Benchmark: Seq vs Array (from Range)

Size	Seq, ms	Array, ms	Speedup
10	0.0120420	0.0066250	1.82
100	0.0836670	0.0160830	5.20
1000	0.8315420	0.0533340	15.59
10000	8.1187080	0.5097500	15.93
100000	81.0221660	4.5335830	17.87

Benchmark: Seq vs Array (list ops)

Size	Seq, ms	Array, ms	Speedup
1	0.0003330	0.0003750	0.89
10	0.0005840	0.0007500	0.78
100	0.0030410	0.0042500	0.72
1000	0.0295000	0.1258330	0.23
10000	0.2695000	15.5551250	0.02

Benchmark: Option vs Opt

Size	Option, ms	Opt, ms	Speedup
100	0.0004160	0.0003750	1.11
1000	0.0014160	0.0009160	1.55
10000	0.0117910	0.0069580	1.69
100000	0.1155420	0.0666670	1.73
1000000	1.1547090	0.6657910	1.73

Benchmark: dot-product

Size	slow, ms	fast, ms	Speedup
10	0.0050000	0.0011250	4.44
100	0.0059160	0.0016660	3.55
1000	0.0242090	0.0073750	3.28
10000	0.1297920	0.1235000	1.05
100000	1.4926660	0.8446250	1.77

Benchmark: check brackets

Size	slow, ms	fast, ms	Speedup
1	0.0007080	0.0002500	2.83
10	0.0017080	0.0005000	3.42
100	0.0077500	0.0027090	2.86
1000	0.0753340	0.0260000	2.90
10000	0.7195830	0.2587080	2.78

Concluding remarks

Program performance upkeep & optimization is the result of constant vigilance and work, rather than just a one-time job. There is no one size fits all solution as there are many trade-offs along the way.

The tips presented in this article are so simple that they can be applied by default and regardless of whether you have already identified the bottleneck or not. Back to my experience, once I systematically applied these tips all over the code base, those parts of code disappeared from the profiler's hot spots bringing up on the surface other bottlenecks

and simplifies profiling.

For further reading, I collected a list of useful references on the topic of Scala optimization. Happy optimizing!

How to Optimize Scala Code Performance With Almost No Effort

How is it even possible?

Performance tips

Tip 1: Avoid Seq()

What to use instead

Tip 2: Avoid Map()

What to use instead

Tip 3: Avoid for comprehensions

What to use instead

Tip 4: Avoid creating sequences with Seq(...)

What to use instead

Tip 5: Avoid extractors with Option

What to use instead

Benchmarks Summary

Benchmark: Seq

Benchmark: Map

Benchmark: foreach vs cfor

Benchmark: for over Range

Benchmark: Seq vs Array (from elems)

Benchmark: Seq vs Array (from Range)

Benchmark: Seq vs Array (list ops)

Benchmark: Option vs Opt

Benchmark: dot-product

Benchmark: check brackets

Concluding remarks

References

Tip 3: Avoid `for` comprehensions