There are different approaches when it comes to logging: to log every input as rawly as possible or to clean up log events and user inputs before saving them. There are pros and cons for both approaches. Whichever path you choose, it is important to remember your choice when analysing these log events. Just.. not to get any surprises on the way.
I’ll try to illustrate my point with the following AWS S3 Server Access Log example. Although I’m bringing an example based on S3, please keep in mind that there are other application servers with similar “logging features”. So make sure you have a good overview about how your systems deal with event logging.
The AWS S3 Server Access Logs format is quite similar to the Apache web servers access log. One of the important differences how ever is that AWS S3 server access logs are saved as raw data! No validation of data nor escaping non-printable symbols is being done before saving the events to log files. It is not a bug but it is an important aspect to remember when analysing log files later on or when choosing your tools for log analysis. If your logs contain unescaped raw data then your analytical tools have to be ready to deal with malicious content or attacks towards logging.
Let me explain by bringing some examples. When requesting a file from your AWS S3 bucket, an HTTP GET request is sent to the AWS, the content of the file will be returned and event will be logged to the server access logs.
GET /<bucket_name>/public.txt HTTP/1.1Host: s3.eu-central-1.amazonaws.comUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:61.0) Gecko/20100101 TESTIME Firefox/61.0
Log event that would be written is as follows:
<bucket_owner> <bucket_name> [30/Jul/2018:17:16:55 +0000] 328.496.13.534 — <request_id> REST.GET.OBJECT public.txt "GET /<bucket_name>/public.txt HTTP/1.1" 200 – 19 19 7 7 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:61.0) Gecko/20100101 TESTIME Firefox/61.0" -
So far so good. But as I mentioned, in AWS S3 logs the non-printable characters are not escaped. This means that when analysing your logs with command-line tools you must keep in mind that non-printable symbols might be interpreted as escape sequences and the events on the screen might not seem as they are written to the file. This might create confusion when looking at the log files. Lets illustrate this by making another request.
GET /<bucket_name>/public.txt?a=_<08><08><08><08><08><08><08><08><08><08><08><08><08>_semi.txt HTTP/1.1Host: s3.eu-central-1.amazonaws.comUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:61.0) Gecko/20100101 TESTIME Firefox/61.0
The <08> character in the request is the ascii symbol for backspace represented in HEX. The file is still being returned as previously. The log event in the log event on the screen is somewhat different
Log event for the unescaped backspaces
Since the application server didn’t escape the backspace before writing the event to the log file, terminal interpreted this as a command and removed the 13 characters from display. It is important to understand: the removal took place only on the screen: log file still contains the original text and backslash characters. The terminal removed the characters from your view. Same file with ‘vi’ or hex editor would reveal the truth
Log event seen in VI
The situation could be even more confusing when using ‘grep’.
Grepping the unescaped characters
As you can see, I grepped the parameter name which will be deleted when displayed on the screen. Imagine when you stumble upon these kind of events in the middle of an incident: grepping something that is not actually there (e.g. usernames or other user input values).
Well, not quite. Poisoning log files with ASCII characters is not the only possibility to ruin command-line log analysis. Another approach can be with ANSI escape sequences. You’ve probably used these escape sequences and colour codes to create your fancy CLI screens for your terminal. The same approach can be applied here as well. Imagine a request like the following:
GET /<bucket_name>/public.txt HTTP/1.1Host: s3.eu-central-1.amazonaws.comUser-Agent: <1b>[31;49mMozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:61.0) Gecko/20100101 TESTIME Firefox/61.0**<1b>[0;49m**
See the escape codes before and after the User-Agent, where <1b> is the HEX representation of the escape code. When grepping your logs you would see the following output on the screen
Coloured events
Well, not quite. ANSI escape sequences allow you to do much more then just colouring your characters in the terminal. The codes in the log file can reposition your cursor on the screen, reconfigure your terminal settings and do lots of other “fun” stuff with it. Coming back to log evasion, consider the following request:
GET /<bucket_name>/public.txt HTTP/1.1Host: s3.eu-central-1.amazonaws.comUser-Agent: <1b>[21D401_<1b>_[18CMozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:61.0) Gecko/20100101 TESTIME Firefox/61.0
The escape sequence means, that “move left 21 places, write 401 and move right 18 places”. Or in other words: change the response code.
Response changed with escape sequence
When just browsing your logs, you might find yourself satisfied with the result, that all of the requests have been answered with HTTP-40x. Imagine the surprise when an adversary tells you that he actually saw the file.
Or if an adversary would append the User-Agent header value with an escape sequence <1b>[2A (“move cursor up 2 lines”), the cursor would be pointing to the beginning of your log event thus over-writing the event with the next one. Or in other words: your log event would be hidden from sight when using GNU command-line tools like ‘cat’ or ‘more’. These are just a few examples to show what to consider when analysing logs.
Input validation is always important! Not only when writing an application but also when logging your applications behaviour. In the examples I brought it’s mainly a matter of taste if you do your validation when writing down the log event or when loading the log events to your analysis environment. Just keep in mind that you have to do it somewhere and design the remaining part of your log analysis environment respectively.
There are pros and cons with both approaches, e.g. it’s useful to understand when someone is trying to poison your logs with unexpected input. How ever — instead of saving the data in original format, you might consider escaping the non-printable symbols not whitelisting or ignoring them.
It would also be beneficial to search for such non-printable symbols from your logs from time-to-time to see if someone is trying to evade your logging system.