As you might know, concurrency and parallelism are different terms. Concurrency means that an application is making progress on more than one task at the same time (concurrently); and parallelism means that an application splits its tasks up into smaller subtasks which can be processed in parallel, for instance on multiple CPUs at the exact same time.
Let’s imagine a simple example where there is an application having progress on multiple tasks, and applying concurrency would help a lot:
A university began to use a new database as a source of questions for their tests. Because this database has way more questions than the other, they are having problems while checking if the answers are right in order to calculate the final grade for each student. Since it’s too much data, it makes no sense to cache and impossible to keep on memory.
Well, the performance got low because they had to get all answers and check the right choice for each answer before calculating the final grade, making the scripts consume a lot of time to run.
This problem could be easily solved by using concurrency. Applying it in one task would make it have progress without having to wait for the others. Then, while one task is searching answers on DB the other is already checking if the answer is right, and another one can be aggregating the answers to persist back the data.
Now, let’s implement a great solution for this using Golang:
git clone email@example.com:shodocan/golang-blocking-channels-example.git
On this example, we will be using Datastore (from Google Console Platform), but you can use any database with cursor support.
To generate your data:
go run utils.go
First, check the script in the directory without-blocking-channels.
It took 882 seconds to check 3000 answers of 300 students.
Now check the directory using-blocking-channels. Just by changing it to pipelines reduced that number to 849. Ok, no big deal yet but using pipelines we can make huge changes on steps that make the code slow.
Check your main task and replace it to:
So, now you are running 10 calcGrade and 2 upgradeGrades. Those tasks are slow because they need to read or write the DB. However, just because of this change, now our script takes 230 seconds to process all 3000 answers. You can go deeper and check every step to check what is slowing you down and make it faster.
Great, right? Now, let’s understand the go tools used and how they work.
Blocking Channels: I think it’s the most powerful weapon that golang provides for concurrent tasks. Go blocking channels works by letting your tasks communicate with each other in a clean and elegant way. It’s an only one-way communication where the producer get blocked to publish more messages until the consumer can process it. Allowing you to don’t waste your time reading data from a database while the task what process it can keep the reader speed leading our software to consume unneeded resources.
Check grades.go. After checking if an answer is right or wrong, we accumulate the grades in order to make fewer requests to update the students grades on the database.
When using channels, we should always close the channel, so the consumer can know its time to stop listening.
Blocking Channels Buffer: Sometimes you read packs of data from the database to make fewer requests. In this case, it makes no sense to block my producer if it already has a pack of data on the hand. We can use the buffer to let him publish the pack of data and start to get the new data moments before the consumer consumes the actual data.
Wait Group: it is a tool that allows you to run multiple async tasks and wait for their execution. I used it to merge channels, assuring that the aggregated channel will only close when all the input channels are closed. It’s not impossible to do, but golang made it so easy and elegant to use that I really had to show it.
I don’t even need to talk about the advantages, the example clearly shows it. Golang made it easier and let the code understandable, testable and elegant.