Shaping attacks in low latency networks, or why Tor won’t stop government agencies

Because the potential de-anonymization of Tor users is being widely discussed throughout Internet, I’m publishing the “print” version of the abstract for my presentation from PHDays (Positive Hack Days) 2014 conference. The attack described below is not new, is not Tor-specific and can be used against any other low latency source-hiding tools like VPNs, proxy chains, or even a combination of many. Since there are more than 3 years passed, some implementation details could change, but general attack concept does not depend on implementation.

Timing attack susceptibility is a known vulnerability for the Tor network, and has been widely discussed, you can find around a dozen articles dealing with this issue alone. So why write another one? There exists the popular misbelief that attacks of this kind always require statistical analysis, are quite difficult to implement and are theoretical. The articles around right now are devoted in particular to this class of attacks (statistical analysis). Today we’re going to check out a realistic scenario where you may only need a single query to de-anonymize a network user accessing a resource or to discover a hidden service. We’re going to review a scenario that may be implemented for Tor, based on the following condition being met:

There is the opportunity to intervene in traffic between the exit node and destination server. This can be done by accessing the end server, exit node or any other point where traffic is passing between them. In other words, this condition can be met by anyone who has arranged a sufficient number of their own exit nodes or e.g. it can be an institution capable to monitor and interact with all cross-border traffic or otherwise control the traffic from sufficient number of exit nodes.
There is passive access to traffic between the network client and entry node. That is, somebody is able to sniff the traffic passively on bulk number of broadband connections or on sufficient number of entry nodes, or, again, cross-border traffic.

So what do government agencies have to do with all this? Most countries have regulations like US CALEA or Russian SORM which require broadband and another ISPs to install hardware capable to intercept the broadband traffic and perform deep packets inspection. There are known cases this equipment was used for mass internet traffic surveillance, e.g. in Hepting v. AT&T case NSA used Narus (currently part of Symantec) hardware and software for all broadband traffic DPI. Currently, USA Freedom Act and Russian legislation* limit the ability of agencies to perform mass surveillance, but, like any legislation, it has backdoors like FISC. FISC can authorize mass surveillance and did it before. In over countries including UK and China mass surveillance is allowed and performed by government agencies.

(*In 2016 Yarovaya law was introduced in Russia. Yarovaya low requires service providers to collect bulk data and, while it doesn’t directly authorize government agencies to use this data for mass surveillance, this process cannot be controlled by any independent side.)

Tracing the presence of monitoring hardware “from the outside” is hardly possible, as it generates no native traffic and has no impact on the traffic being monitored.

The access of the government agency to surveillance hardware meets the second condition for the attack’s implementation (or sometimes both conditions, if this equipment is active). This means that the intelligence services accessing the hardware and capable of arranging a number of exit nodes in Tor or otherwise modify the traffic have all the required tools, including the desire to do so.

Principle of the attack

An exit node implements predefined changes in traffic that do not modify any data being transferred, but influence the traffic’s “shape” — the size of sent packets is changed, as well as the latency between packets. Timing specifications of low latency networks are only slightly modified, which is the fact usually exploited for timing attacks. Packet size changes in quite a predictable way, as we’ll see later. This means that by re-parsing traffic for packets with a predetermined size sequence and latency sequence, you can tag traffic from the exit node side in such a way that enables detecting this tagging from the entry node side. Thus, the connection or the query addressed to the server can be matched with a Tor user. Moreover, together with the traffic you can transmit the information intended for the listener, for instance, the client query’s unique identifier. That means that if you have two recognizable packet sizes and two recognizable time latencies, you can secretly pass two bits of information to the listener of the encrypted traffic between a user and entry node, by adding these bits to each data packet starting from the second. In fact, you can add a greater number of recognizable states, but there are still some additional restrictions we’ll check out below.

Is it difficult to implement, and how can it be implemented practically?

I carried out the attack by redirecting the exit node traffic to a proxy server that was previously slightly modified so we could arrange traffic shaping based on a predetermined pattern. So what does “ordinary” Tor traffic look like? This is the snippet of traffic from the entry node to the client, associated with sending HTTP query results without adding any tags to the traffic.

pic.1 — how tor connection is seen to observer

Tor uses 512-byte cells to mitigate packet size attacks. This cells are incapsulated into TLS. TLS itself uses the same approach, but inside TLS data potion is usually refered as a “blob”.

TLS traffic is occurring that contains data blobs (packets) that primarily have a size of 3,648 octets. The blob size is determined based on the number of Tor traffic cells with a fixed size that the blob contains. On the TCP level, blobs are parsed into IP packets with a typical size of 1,414 octets, which is due to the Path MTU. A TCP packet may contain either a fragment of a single blob, or the end of the previous blob together with the start of the next. However, there may be blobs with a size of 560 octets (1 tor cell). How data is packet into tor cells, tor cells are packet into blobs and blobs into TCP packets depends on various parameters like server timings, size of buffers used to pass the data, Nagle algorithm and network latencies. When re-sending the query, the picture may differ slightly. However, from the point of view of statistics, a query sent to the same server will have a distinct picture. When loading a website, quite a large amount of queries is transmitting basically the same requests and replies, that is, standard amounts of data with typical latencies. This enables data matching in order to discover the specific resource the user is querying, especially if they visit the resource on a regular basis. This is what classical timing attacks are based on. But we’ll be going about it our own way. Instead of passively measuring the timing, a shaping tag (maRk) (because russian word “МАЯК” means “beacon”) may be added to the source traffic from the exit node side. Clean, unencrypted traffic by-passing the Tor network now looks as follows:

pic.2 — maRk inserted into unencrypted traffic

How is tagging carried out? Here we’re transmitting small packets of two different sizes. In this case, a size difference of several hundred bytes has no impact on anything, but enables to visually distinguish two various packet types in a series. Latencies of 60 and 110 ms between the two types of packets are specifically set to present the most easily notable output picture. When the same traffic passes through the Tor network, it looks as follows during the transmission between the entry node and the user:

pic.3 — Tor traffic between entry node and client with maRk inserted on exit node

So, what do we see? TCP packet from entry node to client follow 1414–389–619 octets (link level) pattern and ~60–60–220 ms delay pattern with some minor fluctuations. All blobs in TLS now have the size of 560 octets (single tor cell). Each packet we’ve sent in open network comes in as an individual blob in TLS. At the same time, an IP packet with the size of 1,414 octets contains two complete blobs/cells and the start of the third blob/cell, a packet of 389 octets carries the end of the third blob/cell and a packet of 619 octets contains a separate fourth blob/cell. That is, four IP packets of the source traffic come in as three IP packets in Tor traffic. Is this good or bad? We’ve just lost initial size and timing information. But, looking at the traffic closer, we can clearly see size and traffic signature and, what is most impressive, we can predict it and it will be same on the most of the tor chains.

So what happened and why is this sequence so strange? This is due to how TCP stacking works, namely the collaboration between the Nagle algorithm and TCP delayed acknowledgements. However, the interval between packet groups and packets within a group remains unchanged. Therefore, we are still retrieving > 50% of timing information. In the “fight against” waiting and grouping associated with the Nagle algorithm and to create different patterns, we can send such an amount of traffic that the size of transmitted blobs was larger than the MTU size (but not large enough to make a “big” blob of 3,648 bytes); send 1 packet (for Nagle delay) or 2 packets to avoid delay; use timings above and below Nagle’s timeout. When you compare the third picture with the first, it is easy to see that a clear and easily detectable anomaly is added to the traffic, which enables the surveillance equipment to alert when and where this anomaly is detected.

This attack can be implemented against virtually any encrypted or unencrypted connection, both in the server-client and client-server direction, that is, it can be used to detect hidden onion services.

What restrictions exist?

Due to the use of the Nagle algorithm and network fluctuations, non-linear traffic behavior can result in a partial timing information loss. However, such a loss may be detected on the receiver’s side and compensated by transferring redundant data. To enable traffic shaping, the amount of traffic must be sufficient. This type of tagging may seem quite difficult for an SSH connection, for instance, with the bash initialized and regular command output. This is because the amount of transmitted data is insufficient to create a packet with required maRk signatures. In fact, you can even tag a connection where data weren’t transmitted at all. The matter at hand is that even after the TCP connection is closed by a client application, the delivery of data sent to the client via the Tor network will still continue, because this is valid state for TCP (so called “half-closed connection”). This enables the sending of a tagged portion of random data to the client after FIN+ACK is received as part of the connection from the client side. This data will never be read by a client application, but it still reaches the client, thus disclosing their identity. Therefore, the attack may be implemented completely in secret from the client, and the amount of data available to tag the connection is more than sufficient. A similar method can be applied for the majority of VPNs. Luckily, it won’t be effective with proxies or any other application-level gateway.

Is there a reliable solution available?

An attack can be hindered by the client’s current activity, as you have to detect packets associated with the same chain, and side traffic noise-masks the signature. You can also act as a relay in the Tor network, which will further hinder the detection of tagged traffic. There are also some other methods to impede attacks: a combination of Tor, a VPN and proxy chain makes it difficult to guess the final shape of traffic and can partially thwart an attack through a half-closed connection. You can also hinder detection by corrupting known signatures using certain non-standard TCP stack parameters. However, there is no reliable method to completely eliminate such threats in the Tor network or other popular VPNs. Shaping attacks as a sort of timing attack are outside the existing safety profile of these networks. The only reliable solution is to use a virtual link inside an encrypted connection with fixed bandwidth, where the steady stream of fixed-length cells is transmitted. For instance, ATM (Asynchronous Transfer Mode) networks function this way. You should also keep in mind that encryption must be done without data compression, which means the consumed bandwidth must remain unchanged. These technologies are still not suitable for everyday operations, as there are excessive extra costs from the bandwidth used even during downtime.