A few weeks back, I was invited to talk about concurrency at a local university. This write up summarizes what I presented there.
We all learn about parallelism/concurrency during the university or in other ways. Any one who is learning how to program will inevitably read/learn about fundamental concepts such as,
- Threads, Thread Groups, Thread States
- Race Conditions, Mutual Exclusion, Dead Locks, Starvation
- Locks, Barriers, Thread Locals, Atomic variables
This list goes on …
Even though concurrency is a mainstream still we find it difficult to deal with. Why?
- There is a significant gap between introductory material on concurrency and real-world concurrent/distributed applications.
- Most of us end up not following the best practices.
- Applying text book, intro-level content (or cut-n-paste from the web) to solve the problem.
So what could be the end result ?
- Lots of technical debt.
- Poor application performance / stability.
- Nobody understands the code! → Lots of sleepless nights / cursing!
So what’s the approach/solution?
I started digging into this issue in particular due to my past experiences and obviously I’m not the only one who is suffering. There are lots of literature on the internet on how this can be solved. Here’s a summarized list, which I think helpful to keep in mind:
Rule #1 : Global Mutable State is Bad ! (With or With-out Concurrency)
Note: Anything is mutable if its state changes over time, immutable otherwise.
E.g. Global static variables, static classes with public instance variables, singleton objects, environment variables, configuration objects, amount of data/state shared with in a clustered application.
As the application grows in size and complexity (think multiple moving parts within the program and the program runs in clustered environments), having lots of global mutable state makes it impossible to predict the program’s next state at a given time, especially if you are in troubleshooting mode. To makes things even worst you will not have control of state changes.
Now I’m not going to be an idealist and say we shall have a zero global state! Because it’s not practical. But it’s reasonable to say we need keep it to a minimum.
Rule #2: Use an Application Framework — Adhere to It’s Programming Model.
Most of the code we write (=~ 90%?) is about moving data around such as, [Get Data → Do something → Show in the GUI or put into storage]. What this means is we don’t deal with and in-fact don’t have to deal with concurrency directly. This is unless we are writing some special purpose code such as writing a framework/server-run-time of our own.
Every programming language and surrounding ecosystem provide higher level concurrency control constructs, therefore most of us don’t need to create threads manually (synchronized blocks / locks etc.). Most of these paradigms provide simple, component based programming models (e.g. Beans, Servlets, Controllers in J2EE/Spring) allowing application programmers to focus on business logic than boilerplate code.
Therefore, every time we feel like introducing a thread/synchronized block it’s always a good idea to take a step back and rationalize the need.
Rule #3 — Best Practices
Most of these one are just obvious and applies in a broader sense of software engineering. I don’t mean to provide an exhaustive list. But please comment, if I missed anything that needs to go here!
- Reduce the use of global variables/data: Use the correct scope. This needs to happen at the application level as well as across a cluster if the application runs in that mode. This includes defining components/layers in the application correctly and how data/state is exchanged between each adjacent layer.
- Minimize locking scope: Let’s not abuse synchronization, I encountered so many situations where we tend to resort to this by default to fix problems related to concurrency. More synchronization means more likelihood for performance degradation.
- Thread Pools: This is a no-brainer. Creating threads requires a lot of boilerplate work (and run-time overhead). If anyone wants to do it, he/she lives in the past and trying to re-invent the wheel.
- Work with higher level constructs: Some of us might disagree, but most data-sharing requirements I encountered maps to consumer-producer pattern (and its variants). Any programming language should support that through is concurrency package. Why not use it (e.g. in Java a subclass of java.util.concurrent.BlockingQueue). There are more sophisticated alternatives to choose from also, such as LMax Disruptor ( in Java), Reactive programming (e.g. RxJava, Akka, Spring Reactor).
- Threading Architecture: Concurrency is not an aspect we are to figure-out/design later. Think about the following:
Data access patterns (e.g. Compute Heavy / IO Heavy)
Deployment ( e.g. VM environments, containers, industrial computers) Hardware/Resource constrains
- Dealing with concurrency becomes hard when we lack the ‘working knowledge’ and best practices are not followed.
- Excessive amount of global mutable state in an application is problematic.
- Prefer working with higher level concurrency control constructs (/ framework) than going bare-metal.
- Choosing/designing threading architecture up front is important.
- Concurrency — The good, the bad, the ugly — https://www.youtube.com/watch?v=OJfS7K-Vkgk