A limited number of relay groups can see you enter and exit the Tor network (deanonymization).
TL;DR: If you want to get the list of relevant tor relays go to the bold URL near the end of this page.
When a Tor client routes traffic through the Tor network he tries to select 3 “random” (mostly) relays (guard, middle and an exit relay). The number of relays is crucial. Using 3 hops should help reduce the risk that a single entity (excluding global passive adversaries) can “see” Alice’s real IP address and the fact that she is talking to Bob. Alice protection is lost should her tor client use relays controlled by a single entity (there are multiple cases, we mainly talk about the worst-case: operator sees entry and exit connection.)
There are multiple safeguards and properties in the Tor client to help reduce the risk of Alice loosing Tor’s traffic analysis protection due to unlucky path selection:
A default Tor client enforces distinct subnets. /16 IPv4 network blocks are treated as a “single relay operator”. Alice would not establish a circuit like 126.96.36.199(Guard) → 188.8.131.52(Middle) → 184.108.40.206(Exit), because that would violate the distinct subnets protection (more than one relay in 220.127.116.11/16)
In addition to the IP address a Tor client never uses more than one relay from a given “family”. This safeguarddepends on the relay operators actually declaring their group of relays in their configuration.
A tor client only uses relays with the guard flag as their first hop — ignoring bridges for now. Guards are static over multiple months before rotating them.
A tor client can only use relays with the exit flag to connect to the actual destination. (The guard and exit flags are not in place to mitigate this risk specifically but in practice they help as well because relay operators tend to run exits or none-exits exclusively, so they can not see both ends).
Are there relay groups that despite these safeguards could be used in multiple positions in a circuit?
Yes, but I guess most are probably not in that position because they want to harm you.
Finding relay groups in end-to-end correlation position
To identify groups of relays one can use the ContactInfo field of relays. This is the easiest method, but keep in mind, that field is completely optional and a relay operator can provide an arbitrary string. That means anyone can setup relays with other peoples contact addresses. A relay operators would notice that when we contact them to update their family configuration.
We use the contact string to group relays with the same (or similar) contact. Once we find groups we check if they meet all the requirements for end-to-end correlation (multiple netblocks, no or incomplete family configuration, guard and exit probability > 0%).
Since this list changes over time it would be a bit pointless to list these relays here, instead I’m pointing you to the regularly updated list a my OrNetStats page:
Note: I’m not saying that these groups actually perform any deanonymization attacks but we should not have to trust any individual tor relay operators (completely). That is a core principle of the Tor design.
Am I affected?
If you use one of the relays in the above list as your entry guard relay (static over multiple months) than you might sooner or later also use one of their exit relays (changes frequently) as well.
To find out if you use them currently as your entry-guard you can search for these fingerprints in your tor “state” file located in your tor client’s “DataDirectory”. The probability that you use them is around ~0.9% as of 2017–05–09. Even if they are not in the state file this might affect you in the future when your guard is rotated.
Try to contact relay operators to add proper MyFamily configurations to their relays. I do this occasionally since quite some time (that is also a reason why the list is shorter than it used to be) but some relay operators do not respond, have invalid or no contact info (and to be honest this is a boring task to do and I didn’t want to auto-generate these emails). If you do reach out to them, please be kind and thank them also for running relays (and put firstname.lastname@example.org or me in CC: so we can track who was contacted already). All operators on this list (with usable contactinfo) as of 2017–05–09 have been contacted (at least once).
Rise awareness and the importance of proper MyFamily (this blog post)
(Update 2017–05–09): Make MyFamily easier for relay operators. The current MyFamily design requires a relay operator to modify all relay configurations if they add a single new relay, that is cumbersome and one of the reasons why MyFamily configurations are often not updated. Proposal242 could help with that. Other options are automation for relay operators, so they do not have to worry about that setting at all.
Technically it is possible to configure a tor client to not use these relays (ExcludeNodes, StrictNodes), but that does not scale, needs constant updating and if not everyone is excluding them, those that do might become more unique than others. You do not want to become the unique tor client. In reality it might be hard to single out tor clients that excluded 0.9% of guard capacity in their tor configuration. Another problem is that the list is based on unauthenticated contact information, so an attacker could trick you into excluding good relays (since the attacker would have to keep these relays running — an unlikely attack).
I also tried to contact Tor directory authorities. Not so successful (only one out of 7 replied). (One soon-to-be tor directory authority operator is actually on the list of end-to-end groups as well since a long time).