Projects • 


Full Title
Supervised Deanonymization of Dark Web Traffic for Cybercrime Investigation

Cybercrime is escalating to unprecedented levels. Perpetrators often communicate on the Internet using highly sophisticated anonymization systems that allow them to thrive without being tracked by law enforcement authorities. Tor is by far the most popular of such systems. What makes Tor communications so hard to trace is that it relies on a large-scale network of servers — called relays — that employ advanced encryption and complex traffic obfuscation techniques. For this reason, although anonymous networks play a vital role on the Web for protecting user privacy and allowing for censorship-free access to information, they can also be used as a backbone of the so-called Dark Web, providing a key technological pillar sustaining the flourishing ecosystem of cybercrime.
To assist law enforcement authorities in cybercrime investigation, our recent research efforts have led to the development of a traffic analysis technique based on Machine Learning (ML) that can deanonymize pairs of flows between clients and targeted Tor Onion Services (OSes). Onion Services are the central technology that provides strong anonymity between clients and websites on the Dark Web. Many illicit OSes take advantage of these features to promote drug dealing, human trafficking, child pornography distribution, and many other illegal businesses. By evaluating our new traffic analysis techniques on a controlled testbed comprising multiple Onion Services and clients, we have successfully deanonymized 14% of all circuits to OSes with zero false positives. Prompted by these promising results, our ultimate goal is to put this technology at the service of law enforcement authorities such as Interpol or Europol in the form of a tool that can help them deanonymize illicit OSes in a responsible and supervised manner.
There are, however, several limitations in our current approach that require a deeper investigation. First, in a real-world deployment environment, our techniques require exchanging traffic-derived data between multiple ISPs, potentially based in several countries. Sadly, this requirement may hamper the adoptionof our technology given the existing legal restrictions in most countries, not to mention the conspicuous ethical implications of these operations. Second, our traffic analysis engine is based on a complex sequence of data processing stages involving ML models based on Deep Neural Networks (DNNs). Although DNNs are highly effective from the viewpoint of accuracy, they are quite limited in terms of latency and throughput. Third, lest these techniques are properly secured and regulated, they may allow for deanonymizing not only the communications of criminal suspects but also of innocent users. These risks raise serious ethical concerns.
In this project, named DAnon, we aim to develop new techniques to overcome these challenges. Driven by our ultimate goal of building a practical cybercrime investigation tool for analyzing Dark Web traffic, this work will advance the state of the art on cutting-edge topics in privacy-preserving computation, machine learning, and “ethical-by-design” systems. Concretely, to tackle the aforementioned challenges, we propose to investigate and combine three complementary approaches: (i) employ secure multiparty computation (MPC) protocols to enable privacy-preserving Tor OS traffic correlation, (ii)develop ML optimization techniques based on model compression and depth reduction to reduce the query latency, and (iii) incorporate quorum-based consensus between participating law enforcement agencies to reduce the chances of deanonymization abuses.
By extending our preliminary work, we will deliver a new prototype of our tool that will be able to efficiently process deanonymization queries in a privacy-preserving manner. To this end, our tool will incorporate three new components: (i) an MPC-secured ML-based traffic classifier, (ii) an optimized ML traffic correlation model, and (iii) a quorum-based query processing protocol. To design our system, we will harness the combined expertise of our team in low-latency anonymity networks, traffic analysis, and distributed systems (PI Nuno Santos, CMU PI Nicolas Christin, and Diogo Barradas), secure multiparty computation(Co-PI Bernardo Portela and Bernardo Ferreira), and machine learning (João Vinagre).

Funding Entity
Start Date
End Date
INESC-ID - Instituto de Engenharia de Sistemas e Computadores, Investigação e Desenvolvimento em Lisboa
LASIGE/FCiências.ID; Inesc Tec; NOVA.ID.FCT
Principal Investigator at LASIGE
Bernardo Ferreira
Team at LASIGE