Gabriel Mesquita

@gabrieljustware

AWK for the win

A few months ago, I was handed the task to perform a massive update on some amount of data here in Creditas. As a “rubyist” I tought: “Ok I will simply build a ruby script to read the CSV that operations gave to me, and perform the updates”. Simple task, right?

It is, but if you think about it, with that approach I have to waste time writing the script, testing locally or in staging environment (scripts has bugs too!) and finally set up everything in production to run the script. It’s simple, but it takes time to do all that.

Now who’s going to save us?

AWK

AWK is a programming language created in 1977. The language is interpreted line by line, and was created with the goal to make shell scripts more powerful. It is based on the C language, and it is usually used to process text and manipulate files. Here you can find the book written by the creators of awk: https://www.amazon.com/AWK-Programming-Language-Alfred-Aho/dp/020107981X

So AWK gives you some tools that can make you process text and files really fast. Let’s see some examples:

Let’s say that you have a test.txt file with the following content:

test1 test2 test3

If you type in your console the following command:

awk ‘{print $0}’ test.txt

You will see the entire file as result:

test1 test2 test3

So we can see that AWK gives us some variables to work with, the $0 prints in the console the entire content of the file. Now if you use the $1 variable in the last command, you will have the following result:

test1

The same goes for variable $2 and $3, printing “test2” and “test3” respectively.

Ok but how can AWK help with the scenario that I described early? I will illustrate an example similar to the one I had to deal with. Let’s say you have to update the names of all your clients, and you receive a csv file like this:

id,new_name
1,new_name_1
2,new_name_2
...

You can process the test.csv file and generate all the updates statements with one command like this:

awk -F "," '{print "UPDATE client SET name='\''"$2"'\'' WHERE id='\''"$1"'\''"}' test.csv

And the result will be:

UPDATE client SET name='new_name_1' WHERE id='1'
UPDATE client SET name='new_name_2' WHERE id='2'
...

Notice that we need to specify the comma delimiter with the -F flag. And we also need to escape the character. Besides these two points, there is nothing complex with that command.

Now with all the update statements you can connect to your database directly and execute all at once. No need to connect to your production machine to execute any script, or to test in staging environment in order to guarantee bug safety.

And there are lots of other features inside AWK language, such like for loops and if statements. I think that AWK is an unknown language, specially with those starting their career, that can improve a lot your productivity once you get used to it. You can check this link for more examples: https://likegeeks.com/awk-command/

That’s it everyone, thanks for reading!

More by Gabriel Mesquita

Topics of interest

More Related Stories