Table of Contents
Motivation/Overview
The paper “An Empirical Analysis of Traceability in the Monero Blockchain” revealed that due to optional privacy features and biases in the decoy selection algorithm, approximately 62% of Monero transaction inputs were traceable in the currency’s first two years. This article will demonstrate a replication of this approach on a later state of the blockchain and will also outline the design of the chosen data analysis pipeline. Initially, the aim was to determine the number of transactions an attacker needs to know to potentially de-anonymize other transactions. However, as it was a private project, it didn’t quite reach that point.
Infrastructure
I was interested in how this worked and so built my own analysis pipeline. Unfortunately, I cannot share my code but can describe the infrastructure and share some of my plots.
 |
1. Enrich the data from the json RPC API and write a complete json Block as message to kafka. |
2. Read the message, extract nodes and edges, and write them to a database of your choice. Many graph databases offer a fast CSV importer. Use them for initial import and prefer Kafka connectors for your database if possible. |
I wrote multiple methods for inserting blocks into different databases. Here are my expierences on three of them
Database |
Pro |
Contra |
RedisGraph |
Fast |
Does not scale horizontal and limited on RAM size. |
ArangoDB |
Scalses horizontal, fast import |
Visualization is not as good as neo4j |
ElasticSearch |
Scales horizontal, nice visualizations |
Not a graph database |
Neo4j |
Fast graph queries, nice frontend, good query language |
Does not scale horizontal in the free version. |
Impact of Zero Mixing
Now I would like to share some results of this analysis with you.
Without any filter |
 |
After Zero Mixing filter |
 |
After Zero Mixing filter |
 |
General Structure of Monero Transactions
Here are some plots on how the general structure of transactions changed.
Before any filter |
 |
After first filter |
 |