As you may know, there is an ongoing struggle in Catalonia for self-determination. The Catalan government is planning to hold a referendum on independence on October 1st 2017. The Spanish government, on the other hand, is determined to prevent this referendum because they consider it illegal. As a result, it has taken the following measures to thwart the vote: it has censored websites, seized millions of ballot papers, used military police to intercept mail with information on where voters can cast their vote, put people in jail, etc.
However, in this post rather than focusing on politics, I want to highlight how the Catalan government is using a rather unorthodox but interesting method to inform its citizens on where to go to vote. I will walk through the information the voter needs to provide, how it is handled in order to give back the relevant data to the voter and the pros and cons of that method.
What data is requested from users?
The Catalan Government launched a website so that voters could check where to cast their vote on October 1st. In that website the user is asked to enter the following information:
- National ID number (DNI)
- Date of Birth (DOB)
- Postal Code
What happens with this data?
After the user enters this data an async request takes place, and when it finishes, if there is a match the user is provided with the information about where they should vote.
Pretty standard stuff, right? However, there is one important detail: the data entered by the user never leaves the browser. The request that takes place when the user hits the submit button is a
fetch request that looks like this:
X is a hexadecimal digit. The response contains between 70 and 100 lines of what it seems to be encrypted records.
Is the database being exposed?
I admit that when I first saw that I freaked out a little. It seemed pretty obvious that the whole database was being exposed in little chunks. And yep, I looked at the JS code and that’s exactly what is happening. There are 65,536(
0xFFFF) different DB chunks publicly accessible to anyone.
My concern was whether a user would be able to decrypt just their data or the whole database. As I dug into the code I realized that in fact things are not as bad as they initially looked. On the contrary, this is what actually happens when the user hits the submit button:
- The first 3 characters of the national ID number are trimmed. They are not used at all.
- The trimmed ID number gets concatenated with the DOB and the postal code.
- A SHA256 hash gets applied to that key 1715 times. Lets call the result of that hash loop $HASH1.
- Another SHA256 hash gets applied to $HASH1. Lets refer to the result of this as $HASH2.
- Both $HASH1 and $HASH2 are a string of 64 hexadecimal characters.
- The first 4 characters of $HASH2 are used to determine the chunk of the DB that needs to be fetched. (e.g:
- The next 60 characters of $HASH2 are used to find the record that contains the data that’s relevant to the user. If the first 60 characters of one of the lines of the response matches with the last 60 characters of $HASH2, that line is the one that contains the voting data for that user.
- If there is a match, what comes after the first 60 characters is decrypted using $HASH1 as the password. The result is the voting place info of that user.
Why did they choose this method?
My hypothesis is they did this for two reasons:
- Avoiding censorship. Once those DB chunks are public it’s almost impossible to prevent people from knowing where they are supposed to go to vote. As long as you have a way to make those files accessible, any website that’s able to serve static content can be used for that purpose (i.e. IPFS).
- Preventing DDOS attacks by having the client do all the heavy lifting (computing hashes and decrypt). The server only has to be able to serve static files efficiently.
So… Is that OK?
First off, even if a malicious user is able to “decrypt” the whole DB the only info that they could obtain would be the last 6 characters of the national ID, with a DOB and a postal code. I’m having a hard time coming up with possible malicious uses of that information. There are no names, no addresses, no social insurance numbers, etc.
The only data that’s a bit sensitive is the national ID number and the first 3 digits are trimmed. So, if a malicious user could get a hold of a few records using a brute-force attack they wouldn’t get much. It’s worth pointing out that case, because the last digit of the National Id is a letter that gets computed using the
mod(23) of the numbers. So, the attacker would be able to narrow it down to a list of ~43 possible candidates, rather than a thousand.
What would it take to “decrypt” all that data?
What if an attacker wants to get a hold of all, or most of that data, using a brute-force attack? How many samples would they need to run it against?
- 5 numbers of the national ID = 100000
- One letter of the national ID = 23
- Postal Codes: I’ve checked and in Catalonia there are 1146 different postal codes.
- Date of Birth: A voter must be at least 18 years old on October 1st 2017. Let’s say the attacker wants to target everyone that’s between the ages of 18 and 75:
(75 – 18) * 365 = ~20,805
Therefore, in order to “decrypt” most of the data the attacker would need to use a brute-force attack against ~54,837,819,000,000 keys. It would take ~850 years for my current computer to be able to process all those keys.
Of course, it would be possible to try brute-force attacks for the “low hanging fruit”: postal codes with higher population while narrowing down the age ranges between 30–50. So, yep, it’s possible to get a hold of a few records. Although, it is probably not very useful.
Why am I publishing this?
Mainly because I think that this is interesting and I would like to know your opinions on this method.
Full disclosure: I’m not a nationalist, but I’m pro-independence. I would like for the referendum to take place, but I do value privacy and security. So, for me ensuring that this method is safe is paramount.