For over a decade a software project called Tor has been helping people to stay anonymous on the Internet. This isn't something dodgy — Tor has been designed to help people living under oppressive regimes to access and disseminate information without risk. But now a team of computer scientists has revealed a vulnerability in the system. By looking for patterns in the data transferred between computers they can identify what Tor aims to hide with a high degree of accuracy — they don't even need to break the code used by Tor to encrypt messages.
"Anonymity is considered a big part of freedom of speech now," says Albert Kwon, a graduate student at the Massachusetts Institute of Technology (MIT) and a member of the team. "The Internet Engineering Task Force is trying to develop a human-rights standard for the Internet, and as part of their definition of freedom of expression, they include anonymity. If you're fully anonymous, you can say what you want about an authoritarian government without facing persecution." It's important to keep systems like Tor, which is used by 2.5 million people each day, safe from attack. Fortunately, the team that revealed the vulnerability have also suggested a remedy.
Layer upon layer
Anonymity can be crucial when it comes to accessing or spreading information.
Sitting atop the ordinary Internet, the Tor network consists of Internet-connected computers, belonging to volunteers, on which users have installed the Tor software. If a Tor user wants to, say, anonymously view the front page of The New York Times, his or her computer will wrap a web request in several layers of encryption and send it to another Tor-enabled computer, which is selected at random. That computer — known as the guard — will "peel off" the first layer of encryption and forward the request to another randomly selected computer in the network. That computer peels off the next layer of encryption, and so on.
The last computer in the chain, called the exit, peels off the final layer of encryption, exposing the request's true destination: The New York Times. The guard knows the Internet address of the sender, and the exit knows the Internet address of the destination site, but no computer in the chain knows both. This scheme, with its successive layers of encryption, is known as onion routing, and it gives the network its name: "Tor" is an acronym for "the onion router".
In addition to anonymous Internet browsing Tor also offers what it calls hidden services. A hidden service protects the anonymity of not just the browser, but the destination site, too. Say, for instance, that someone in an oppressive regime wishes to host a site archiving news reports from other countries, but doesn't want it on the public Internet. Using the Tor software, the host's computer identifies Tor routers that it will use as introduction points for anyone wishing to access its content. It broadcasts the addresses of those introduction points to the network, without revealing its own location.
If another Tor user wants to browse the hidden site, both his or her computer and the host's computer build Tor-secured links to the introduction point, creating what the Tor project calls a circuit: it's a chain of encrypted links between computers in which every computer only knows which computer it is giving data to or receiving data from, but never the full path. Using the circuit, the browser and host identify yet another router in the Tor network, known as a rendezvous point, and build a second circuit through it. The location of the rendezvous point, unlike that of the introduction point, is kept private and can be used to exchange information. (You can find out more about how hidden services work on the Tor website.)
Traffic fingerprinting
The attack devised by the researchers requires the attacker's computer to serve as the guard on a Tor circuit. Since guards are selected at random, if an attacker connects enough computers to the Tor network, the odds are high that, at least on some occasions, one or another of them would be well-positioned to snoop in the role of guard.
During the establishment of a circuit, computers on the Tor network have to pass a lot of data back and forth. The researchers showed that simply by looking for patterns in the number of packets passing in each direction through a guard, they could, with 99 percent accuracy, determine whether the circuit was an ordinary Web-browsing circuit, an introduction-point circuit, or a rendezvous-point circuit. Breaking Tor's encryption wasn't necessary. It wouldn't be people who perform the mathematical task of spotting the patterns, but automated machine-learning algorithms that exploit a range of mathematical tools to understand data.
Furthermore, by using a Tor-enabled computer to connect to a range of different hidden services, they showed that a similar analysis of traffic patterns could identify those services with 88 percent accuracy. That means that an adversary who lucked into the position of guard for a computer hosting a hidden service, could, with 88 percent certainty, identify it as the service's host. Similarly, a spy who lucked into the position of guard for a user could, with 88 percent accuracy, tell which sites the user was accessing.
The aim of the research wasn't to break Tor, but to identify weak spots so they could be fixed. That's why the team also recommended a way of defending against this type of attack. "We recommend that they mask the [data] sequences so that all the sequences look the same," says Mashael AlSabah, a member of the team of Qatar Computing Research Institute (QCRI). "You send dummy packets to make all five types of circuits look similar."
Kwon and AlSabah devised their attack with Srini Devadas and David Lazar, both at MIT, and Marc Dacier of QCRI. They will present their work at the USENIX Security Symposium in Washington D.C. this month.