In this article, I describe Scala performance tips based on my experience of optimizing Ergo's smart contracts interpreter. I found that some typical Scala code fragments have surprisingly much faster alternatives. The tips below can help you to avoid accidental performance issues and write faster Scala code for no additional effort. There is also source code with all the benchmarks.
Scala provides powerful constructs and idioms to build abstractions for correctness and clarity but may have runtime performance penalties. When solving a task, there may be several alternative solutions, each with its own performance characteristics. Even similar-looking code can have radically different performance.
Teams often specify things like formatting and documentation to make sure the code stays consistent and good quality. Similarly, tips on making the code run faster can also be useful. Performance improvement can come from almost no special effort. As such, it is a matter of coding style and habits rather than a matter of extensive code optimization, which shouldn't be premature.
The code speedups described below are only examples and not necessarily exact numbers, as there may be various factors that can impact the timings. Nevertheless, after systematically applying the outlined recommendations, I have observed strictly positive effects on performance.
Let's start with the simplest. It's not uncommon to see the use of Seq()
wherever an empty
collection is required. However, this method involves the following:
SeqFactory$Delegate.apply
-> IterableFactory.apply
-> List.from
-> Nil.prependAll
Simple Nil
is 20-28% faster (see Benchmark: Seq) depending on whether the JIT is able to inline the Seq()
invocation all the way down to Nil
or not. So, why not help it and use Nil
directly?
Next one is Map()
. It is often used wherever an empty Map is required. However, as is the case with Seq()
, this method involves allocations and many method calls.
Calling Map.empty
is 25-85% faster, depending on the context (see Benchmark: Map)
for
comprehensionsLooping pattern for (x <- xs) { ... }
is used quite often due to its convenience.
It looks very similar to the for
loop in Java, but it is not.
In Scala, it is desugared by the compiler to xs.foreach { x => ... }
which, besides execution of the block of code involves the following list of overheads:
foreach
method callx
for every xs
item (if xs values are not yet boxed)foreach
method
Such for
comprehensions are particularly sub-optimal (see Benchmark: for over Range
) for Range loops, where x
is an Int
.
There is a cfor
macro from spire library.
I recommend the following code as a replacement when xs
provides an O(1) indexing access,
for example if xs
is an Array
or an array wrapped into Seq
.
import spire.syntax.cfor._
cfor(0)(_ < xs.length, _ + 1) { i =>
val x = xs(i)
...
}
The above xs
collection can also be a Range
in which case for (x <- 0 until n) { ... }
can be replaced with
cforRange(0 until n) { x => ... }
These cfor
macros are compiled to efficient Java loop without overhead points 1) - 4). Depending on xs.length
it is 10-170x faster (see Benchmark: for over Range). And since foreach
already implies a side effect operation, cfor
doesn't make the code any less readable.
Note, however, that cfor
is not a silver bullet. It is NOT a replacement for foreach
in the case of non-indexed collections, when indexed access is NOT a fast O(1) operation. In many cases using cfor
instead of foreach
is a no-brainer and often increases the performance of any code which loops over an indexed collection (see for example Benchmark: dot-product and Benchmark: check brackets).
It is tempting to use Seq.apply
method where a Seq of items is required like
Seq(1, 2, 3)
because it is easy and concise. You can pass it as a method argument or as a method result.
However, the following happens under the hood:
1, 2, 3
is created, and each item is boxed into its own Integer objectSeq.apply
Nil.prependAll
are called
Benchmarks show that JIT is not able to fully optimize this code.
Simple drop-in replacement Array(1, 2, 3)
performs better. This code is 4-10x faster
than using Seq(1, 2, 3)
. (see Benchmark: Seq vs Array
). The speedup is even more significant when the number of items in the sequence grows (see Benchmark: Seq vs Array (from Range)
).
What happens here is that:
native unboxed Java array is created and then
wrapped via implicit conversion into a WrappedArray#ofInt
which is inherited from the
Seq
trait and can be used directly.
Why is this faster?
It avoids unnecessary allocations (only two new objects are allocated in the Array case).
Note that to create Seq
not only each Int is boxed into java.lang.Integer (or other primitive type),
but also scala.collection.immutable.::
instances are created for each item of the list.
Creating Array avoids boxing (which is proportional to the size of Seq).
Accessing of array items is cache friendly both when the array is created and when it is later used.
Fewer allocations mean less garbage to collect later. This is especially important when the application is multithreaded because, in this case the garbage collector will compete with application threads for CPU resources, thus further slowing down the application.
Notice that using arrays like this doesn't make the code any less readable. Moreover, the arrays created in this way are only accessible via Seq
interface, which is immutable. Thus, better performance is achieved without sacrificing other desirable code qualities like readability, safety, and conciseness.
Caveat: the array is not efficient if it is accessed using list operations head
andtail
. See, for example, Seq vs Array (list ops) benchmark
. In this case, replacing Seq
with Array
may lead to performance degradation. However, using lists in a performance-critical code may not be a good idea as well.
An extractor is an object with an unapply
method taking an arbitrary argument and returning an Option
of some (maybe different) type. Here is a simple example:
object IsEven:
def unapply(n: Int): Option[Int] =
if (n % 2 == 0) Some(n) else None
With extractors, pattern matching can be concise and expressive.
However, unapply
returning an Option
might have a negative impact on runtime performance because on every matching call an instance of Some
needs to be created for each successful extraction.
Since version 2.11, Scala introduces name based extractors that no longer requires unapply
to return an Option
. Combined with value classes these two features allow implementing allocation-free extractors, which are 10-70% faster than the original Option
based extractors (see Benchmark: Option vs Opt
).
object IsEvenOpt:
def unapply(n: Int): Opt[Int] =
if (n % 2 == 0) Opt(n) else Opt.empty
A suitable implementation of Opt
class can be found in the spire library.
The following tables summarize the results of the benchmarks run on the code snippets above and the following configuration: osArch: aarch64; osName: Mac OS X OpenJDK 64-Bit Server VM; vendor: Eclipse Adoptium; version: 17.0.5+8
You can reproduce the results on your computer by running the Benchmarks class.
Clone the repository and run sbt "Test/runMain benchmarks.Benchmark"
.
Size |
Seq(), ms |
Nil, ms |
Speedup |
---|---|---|---|
10 |
0.0001250 |
0.0001250 |
1.00 |
100 |
0.0002080 |
0.0002080 |
1.00 |
1000 |
0.0007500 |
0.0006250 |
1.20 |
10000 |
0.0065830 |
0.0051250 |
1.28 |
100000 |
0.0625830 |
0.0502920 |
1.24 |
Size |
Map(), ms |
Map.empty, ms |
Speedup |
---|---|---|---|
10 |
0.0001250 |
0.0001250 |
1.00 |
100 |
0.0002080 |
0.0001660 |
1.25 |
1000 |
0.0010410 |
0.0006250 |
1.67 |
10000 |
0.0091670 |
0.0050830 |
1.80 |
100000 |
0.0902500 |
0.0487080 |
1.85 |
Size |
foreach, ms |
cfor, ms |
Speedup |
---|---|---|---|
10 |
0.0006660 |
0.0000830 |
8.02 |
100 |
0.0007500 |
0.0000830 |
9.04 |
1000 |
0.0025830 |
0.0001250 |
20.66 |
10000 |
0.0238330 |
0.0003330 |
71.57 |
100000 |
0.2244170 |
0.0042910 |
52.30 |
Size |
foreach, ms |
cforRange, ms |
Speedup |
---|---|---|---|
10 |
0.0004160 |
0.0000830 |
5.01 |
100 |
0.0009160 |
0.0000830 |
11.04 |
1000 |
0.0061250 |
0.0001250 |
49.00 |
10000 |
0.0569170 |
0.0003330 |
170.92 |
100000 |
0.5648750 |
0.0042500 |
132.91 |
Size |
Seq, ms |
Array, ms |
Speedup |
---|---|---|---|
10 |
0.0007910 |
0.0002080 |
3.80 |
100 |
0.0066250 |
0.0008340 |
7.94 |
1000 |
0.0661660 |
0.0160000 |
4.14 |
10000 |
0.6733330 |
0.0679170 |
9.91 |
100000 |
6.8687920 |
0.6750420 |
10.18 |
Size |
Seq, ms |
Array, ms |
Speedup |
---|---|---|---|
10 |
0.0120420 |
0.0066250 |
1.82 |
100 |
0.0836670 |
0.0160830 |
5.20 |
1000 |
0.8315420 |
0.0533340 |
15.59 |
10000 |
8.1187080 |
0.5097500 |
15.93 |
100000 |
81.0221660 |
4.5335830 |
17.87 |
Size |
Seq, ms |
Array, ms |
Speedup |
---|---|---|---|
1 |
0.0003330 |
0.0003750 |
0.89 |
10 |
0.0005840 |
0.0007500 |
0.78 |
100 |
0.0030410 |
0.0042500 |
0.72 |
1000 |
0.0295000 |
0.1258330 |
0.23 |
10000 |
0.2695000 |
15.5551250 |
0.02 |
Size |
Option, ms |
Opt, ms |
Speedup |
---|---|---|---|
100 |
0.0004160 |
0.0003750 |
1.11 |
1000 |
0.0014160 |
0.0009160 |
1.55 |
10000 |
0.0117910 |
0.0069580 |
1.69 |
100000 |
0.1155420 |
0.0666670 |
1.73 |
1000000 |
1.1547090 |
0.6657910 |
1.73 |
Size |
slow, ms |
fast, ms |
Speedup |
---|---|---|---|
10 |
0.0050000 |
0.0011250 |
4.44 |
100 |
0.0059160 |
0.0016660 |
3.55 |
1000 |
0.0242090 |
0.0073750 |
3.28 |
10000 |
0.1297920 |
0.1235000 |
1.05 |
100000 |
1.4926660 |
0.8446250 |
1.77 |
Size |
slow, ms |
fast, ms |
Speedup |
---|---|---|---|
1 |
0.0007080 |
0.0002500 |
2.83 |
10 |
0.0017080 |
0.0005000 |
3.42 |
100 |
0.0077500 |
0.0027090 |
2.86 |
1000 |
0.0753340 |
0.0260000 |
2.90 |
10000 |
0.7195830 |
0.2587080 |
2.78 |
Program performance upkeep & optimization is the result of constant vigilance and work, rather than just a one-time job. There is no one size fits all solution as there are many trade-offs along the way.
The tips presented in this article are so simple that they can be applied by default and regardless of whether you have already identified the bottleneck or not. Back to my experience, once I systematically applied these tips all over the code base, those parts of code disappeared from the profiler's hot spots bringing up on the surface other bottlenecks
and simplifies profiling.
For further reading, I collected a list of useful references on the topic of Scala optimization. Happy optimizing!