Keep Your Logs Close And Your Code Closer

Lets talk about web applications and logging. For most people, when talking about log analysis or log events, the Apache log format is the first that comes to mind as a good example of a log event. But is Apache log content enough to understand what is going on in your system? Let’s take a closer look.

Custom output of Apache access log

When sending an ordinary GET request to your web server, everything seems perfect. For the following request

GET /index.php?page_id=1 HTTP/1.1Host: www.unionselect1.eu

You will see a proper event description in the Apache access log:

423.619.551.990 — — [06/Jul/2018:14:31:52 +0300] "GET /index.php?page_id=1 HTTP/1.1" 200 795 "-" "-"

At first this looks quite good: you have the timestamp, IP address that the request originated from, URI that was visited and even the ID of the page that was requested. Or do you?

To understand if logging is enough you first have to know your application and its implmentation in your environment. To illustrate my example, I will use the popular LAMP setup and PHP parameter handling in the web application.

One of the common mistakes is to use the $_REQUEST associative array instead of GET or POST arrays to get the parameters sent to the page. I’m not going too deep about the differences and I’m not going to talk about CSRF and other harm this approach can bring to your application or customers. I’m only focusing on the logging part.

To illustrate my example, I’m creating a web page with the following content:

<?phpprintf("Current page ID is %d", intval($_REQUEST['page_id']) );?>

As a result of the last request, the output on the page would be: “Current page ID is 1”. So far so good, right?

Lets make another request. This time lets perform a POST request:

POST /index.php?page_id=1 HTTP/1.1Host: www.unionselect1.euContent-Length: 9Content-Type: application/octet-stream application/x-www-form-urlencoded

page_id=2

and check the respective entry in the log file. The event is as follows:

423.619.551.990 — — [06/Jul/2018:14:40:13 +0300] "POST /index.php?page_id=1 HTTP/1.1" 200 795 "-" "-"

However — the output on the page tells us “Current page ID is 2”. How come the page ID on the page differs from the one in the access log event? Seems that the POST parameter was used, but why? I’ll get to that, but first things first.

How could we get the correct information to the logs, i.e. how could we log the correct page_id used by the application? One approach would be to log the POST parameters as well. This would give us the full payload but this might give you additional overhead on logging (e.g. when logging file content that are POST-ed) or it could create problems with compliance when saving all the post request parameters to the log file (including usernames, password, credit card numbers etc). But it’s doable, e.g. with parameter filtering. Could that solve the issue?

Lets try another example:

POST /index.php?page_id=1 HTTP/1.1Host: www.unionselect1.euCookie: page_id=3Content-Length: 9Content-Type: application/octet-stream application/x-www-form-urlencoded

page_id=2

This time I added the cookie to the equation. And yes, you are right: the log file still shows

423.619.551.990 — — [06/Jul/2018:14:44:05 +0300] "POST /index.php?page_id=1 HTTP/1.1" 200 795 "-" "-"

but this time the output on the page shows “Current page ID is 3”. Seems that this time, the Cookie value was used by the application.

A solution could be to log all the cookie parameters as well, but the amount that you’d log would only increase and would it actually help you? If at some point you will go through your logs, you still have to understand, which parameters were used by the application.

To illustrate, here’s a little quiz: what would the following request show on the page and what would you find in the log event?

POST /index.php?page_id=1&page_id=4 HTTP/1.1Host: www.unionselect1.euContent-Length: 29Content-Type: application/octet-stream application/x-www-form-urlencoded

page_id=2&page_id=5

(Hint: the page shows 5)

But how about this one?

POST /index.php?page_id=1&page_id=4 HTTP/1.1Host: www.unionselect1.euCookie: page_id=3; page_id=6Content-Length: 19Content-Type: application/octet-stream application/x-www-form-urlencoded

page_id=2&page_id=5

(Hint: it’s not 6, it’s 3)

Hang on, what is this sorcery!?!

Generated at https://imgflip.com/memegenerator/Jackie-Chan-WTF

To answer Jackie: it has to do with three factors:

The fact that the code uses the pre-mentioned $_REQUEST array, which combines GET, POST and COOKIE arrays, thus enables you to access all the request parameters from the same array.
The “request_order” parameter in PHP.ini which describes the order in which PHP registers GET, POST and Cookie variables into the $_REQUEST array.
The “variables_order” in PHP.ini which sets the order of the EGPCS (Environment, Get, Post, Cookie, and Server) variable parsing.

In other words: a combination of poor software development practices and application server configuration might lead to a situation where the information logged might not express the behavior of the application. The same code can work differently in different servers, depending on the configuration of PHP in the server by using different request parameters (depending on their order).

What is the solution?

By secretlondon123 (originally posted to Flickr as way out) [CC BY-SA 2.0], via Wikimedia Commons

Use logging within the application and log the input values that are actually used by the program. It’s a good idea to log in Apache as well (e.g. to log attacks with unexpected input values etc) but if you want to get an understanding about what is going on in the application, you must log on the application level and as close to the data processing as possible.
A friend once told me: “A good program does what it’s supposed to do. A secure program ONLY does what it’s supposed to do”. Use only expected parameters as input values and only from the sources you’re expecting them from. So instead of REQUEST use GET and POST when either is needed.
Know your systems, know your settings and test your implementation in practise. There is no time to figure out “which parameter did what” when you are in the middle of solving an incident and are already analysing logs. You have to know how your application works in advance.