Understanding Blockchain Analysis

Blockchains generate a lot of data, transactions, addresses, fees, smart contract state updates, and more. Combine that with market data and trading data from the exchanges, and you have a glut of data sources to analyze and try to generate some insights.

To better analyze data and understand who might find value in blockchain analysis, I categorize it as follows

Data Sources

OnChain Data | Trading Data | Market Data | Metadata


Labeling | Pattern Matching |

Who are the buyers ?

Speculators | Users | Regulators |Due Diligence Teams | Academics


Data Sources

OnChain Data – the blockchain, under normal operation generates a lot of data. Each component of the blockchain data can be analyzed separately.


  • Number of unique addresses
  • Addresses that have done X
  • Addresses with balances between a threshold


  • Amount of fees paid in USD/native token
  • Transaction size in bytes/amount

Mempool statistics

  • Size of the mempool in transactions/amount/fees


  • block size in bytes/value transferred/fees/
  • Confirmation times, number of confirmations
  • Fees, Subsidy and who this block was mined by

Trading Data – data from exchanges ( centralized, P2P, OTC, Automated ) usually

  • Price – calculated by a specified weighing algorithm
  • Order book data – trades, depth, liquidity, slippage etc on all kinds of speculative products

Market Data – there are a lot of businesses that have come up around blockchains. Things like

  • lists of upcoming projects
  • vendor databases
  • financing information
  • open source project information
  • signals from analysts
  • legal & regulatory changes

Meta Data – One could easily add metadata to each of the data points stated above. This is extremely useful in detecting patterns and regulatory compliance. Things like

  • Frequency of address re-use
  • Tainted addresses
  • Address labeling
  • compliance data like disclosures
  • Holding and control structures


Labeling – Address labeling is a very useful tool in ensuring compliance. Usually, labels come from

  • Self Avowed – static donation address, ENS names, twitter handles, officially declared addresses, smart contracts, miners labelling transactions , etc
  • Regulatory lists – when addresses are included in lists by bodies like OFAC
  • Data Breaches – Usually include personal data attributed to an address
  • Malware, Viruses and Ransomware – when they demand payment in a blockchain native token.
  • Generic labels – active during X hours, Interacted with X known entity, matches this transaction pattern

Pattern Matching – working on the assumption that operating models do not change, analysts can recognize entities based on patterns they see in the data

  • operating hours – can specify which timezone an entity is covering
  • trading patterns – is this entity an unnamed exchange, and OTC provider, or a large holder or a miner
  • input vs output patterns – is this a mixing service, a bot, or an automated market maker.


Buyers of blockchain analysis can be classified as looking for three things

  1. Alpha – or an advantage over other members of a market. Trades looking to spot patterns or get ahead of major price move.
  2. Compliance – whether it is people looking keep their business in compliance with local regulation , or regulator keeping them in check.
  3. Novel Information – researchers and academics who are looking to understand blockchains and their effect.

Businesses have come up to fill these needs.

  • Data aggregators – these businesses usually pick a nice like exchanges, labeling or onchain events and aggregate data from multiple sources. For example providers like
    • Nomics, Kaiko provide exchange data
    • Blockset, Infura and Alchemy give access to onchain data
  • Analysis as a Service — usually boils down to custom reporting and BI tooling for charts and dashboards — companies like Messari, Glassnode, Skew, Dune analytics
  • Compliance as a Service — companies that automate KYC/AML tracking, source of funds tracking, etc like Chainanlysis, Ciphertrace, Merkel science
  • Directories — Messari, Chainanlysis, Crunchbase, Angellist, Traxcn, and others have directory component to them
  • Recommendation providers are rarer but exist to help larger entities make better trades and decisions.


Transparency vs Privacy – Privacy ( at least via pseudo-anonymity and via privacy-preserving blockchain tech ) is a huge reason for people to transact via public permission-less blockchains. Blockchain analysis can reduce the privacy of transactions especially if they are tied back to off-chain entities.

Compliance vs New Business models – the rapid growth of Binance is a testament to how quickly new business models can let companies grow. The same rapid growth could also be explained by Binance operating wholly in crypto and circumventing the need for strict KYC. Would this growth have happened if it was forced to do strict KYC from the start?

Whether this is a good or bad thing depends on your view of the value created by entities like Binance or MakerDAO(which operates onchain).

Decentralized Ideals vs Centralized Reality – Decentralization, for better or worse, has been touted as an aspiration. The real world necessities of co-ordination and task management often require very much centralized entities. These entities then turn around and use this as justification for disproportionate ownership of their native token. Similar things happen in mining, commit access to code, and narrative ownership.

Blcokchain analysis can bring this centralized reality to light.

Aragon Grants – Openness, Decentralization, and Aragon vs Autark

I wanted to understand the core issues involved in the dispute between Aragon and Autark. I think they give an insight into how decentralization and transparency entangle with current jurisdictions.

State of Play

Aragon Association has claimed to initiate litigation against Autark for delivering on 1 out of 13 deliverables, “Interpersonal issues (including threats), Underperformance, Lack of code quality, Breach of confidentiality (including defamation) ”

Autark LLC claims this is a ” baseless legal action in Switzerland against Autark LLC “

The central issue is regarding disbursement of grants from Aragon to Autark, what contracts and agreements have been made, who is in breach, and how this process should play out.

Open Questions

In spite of Aragon’s efforts towards a fully transparent system, there are many questions that can’t be answered through publicly available data (at least, not that I could find).

  • What agreement was breached? Why was it not made public? Do all Grantees have to sign this agreement? Does the community know that such an agreement existed and was signed to consummate the grant IRL?
  • Is the roadmap file in the grants repo considered a legal agreement ?
  • Do all other grantees pass the hurdles set for this grantee?
  • Who makes decisions about contract completion?
  • Who tracks the work done by each grantee?
  • Given this legal action has caused leaders in the space to reduce the scope of a DAO to only on-chain actions, why is the jurisdiction messaging still on?
  • Why did no one make public a dispute that is by both parties admission an issue since at least January 2020?
  • Why are none of the token holders affected?
  • Why call outsourced product development grants?
  • Is jurisdiction the only reason to initiate a legal proceeding without going through Aragon court?
  • What is the reason for preferring litigation in the swiss courts vs american courts?


Make governance transparent – Aragon the project has many DAOs and legal entities associated with it. That may be required to comply with local regulations, but it makes governance opaque.
More concretely make clear the relationship between Aragon the project, the Aragon association, Aragon one, Aragon black and other entities. Who is in control at each entity, How are decisions made at each entity and which jurisdictions these entities are subordinate to.
ANT holders should be aware of these ownership patterns.

Outside directors : Have independent “directors” who can act as an ombudsman. The lack of these structures of course is in-line with the grand tradition of blockchain entities re-learning the lessons of legacy fintech.

Make the decision process transparent
In their blog post , Aragon states “most of the data has always been publicly accessible on GitHub” but only the artifacts of applications are on Github.

To audit a nest payment, one has to look at the application process on github, voting process on the DAO voting app, funding in the DAO financing app with no way to track work done.

Name things appropriately –
Calling features and payments what they are lets people understand what the boundaries are and what recourse they have. So calling the payment of independent software vendors Grants and calling what functionally amounts to support ticket escalation “Court” causes people to misunderstand the process.

Understanding what happened

There are three main actors here

  • Aragon – is a collection of open source code, a non-profit Swiss foundation, and a for profit entity.
  • Autark – a Wyoming LLC that develops software for blockchain projects and has their own projects.
  • Token holders – the community of people who own ANT , the token used to participate in decisions about Aragon.

Aragon would like to decentralize (read outsource) as much of product development and Governance (read project management and capital allocation) as possible. This is primarily done by incentivizing various software developers via their Grants program.

Token holders also vote on things like which vendors should be given grants via an aptly named Aragon Grants Process AGP(link to AGP) and funds are then granted via a sub-organization (read business unit) that is managed via a DAO.

For each grant (read payment) the assumption is all three parties (token holders, aragon and the grantee) are aware of this transaction, having voted for it, and are aware of the recourse each party has if there are any issues. So far so good.

Now, there is a dispute, (the issue stated in aragon’s blog post). Autark claims they have not been compensated for work done, Aragon claims breach of contract and other issues.

So What ? Why is this a question, use the resolution system agreed upon that all three parties are aware of. Here is the rub, that dispute resolution system called Aragon Court is still apparently not ready to handle an issue of this magnitude. There is also this pesky detail of no-one knowing that such a dispute had started and both sides were thinking about starting legal actions

What happens next, internal talks break down, Autark apparently threatens to sue the Aragon association (as they are the purveyor of grant funds) firstly in the Swiss courts then later to sue in American courts. The association then initiates legal action against Autark in Swiss courts. Luis cuende co-founder of Aragon project, explains in this youtube video the reason behind doing this was to “establish jurisdiction”.

He further goes on to say Aragon Court is not recognized as a jurisdiction worldwide, and the need to establish jurisdiction first was so that they(Autark) couldn’t sue us in another jurisdiction. (I am not a lawyer but I would think that Autark can sue in the US even after there is a legal action in Switzerland.)

What this angling over jurisdictions makes clear is that the Aragon association doesn’t want this case to be tried in US courts but Autark believes they will have an advantage there.

Why is there not a pre-defined dispute resolution process, where is the need to create and sign an agreement for use of funds, that was then kept secret from the rest of the community?Another reason for going to the courts directly could be that legal action could go against whatever decision reached in the Aragon court. That would be a blow to Aragon Court’s credibility for its users


Aragon is one of the most transparent projects in the blockchain space. Most of the investigation that I did here was directly on applications built for Aragon like Apiary, the Aragon App, and their Github account. They are a private legal entity and certainly do not have to do anything mentioned here.

What bothers me is that Power is still centralized and decisions are made that are fine in any other capacity but look ridiculous when you also say you want to decentralize the project. Asking the community to vote on name changes, while making a significant legal decision on your own is not what decentralization should be about.

Similarly, making some parts of decision making transparent is detrimental to the goal of being a transparent entity. Either make major decisions in public or say that during this phase of development this entity will take charge. Having transactions be public but having confidential Grant agreements just confuses people.

Current regulations may not allow DAOs or Aragon to be as transparent or decentralized as they would want, but that is where they need to innovate. This is the exact problem they are solving.No regulation has stopped DAOs from paying software vendors, but calling them Grants might be an issue. No regulation has stopped DAOs from having an internal ticketing system but calling it a court and applying jurisdiction is an issue.

DAOs were supposed to disrupt organizations by reducing transaction costs, with decentralized decision making, programmatic payments, and a tamper-proof chain of evidence. What we have now are glorified project management systems that are deployed onto ethereum, which is great but is a long way from disrupting the company structure.

Required Reading :

Autark’s side

Aragon’s side

Auditing the Grants

Modelling Miners

Miners are critical infrastructure for blockchain networks. I am going to model a minimum viable miner to understand this process better. For this exercise , I will be using a proof of work chain (like bitcoin) for simplicity. 

Inputs : 

  • Software that implements the mining algorithm
  • Hardware to run the software on
  • Electricity to power the hardware.
  • Other – place to put the hardware, people to oversee the process, etc

Outputs :

  • The token, in this case bitcoin.
  • Heat as a byproduct, and
  • Expertise in running these types of systems

Goal : Generate and  Maximize profits

Simple Miner Model

The business of running a miner

Expenses : 

  • Intial Capital expenses
  • Recurring
    • Electricity
    • Rent
    • Salaries
    • Compliance

The majority of expenses are in fiat, generally stable.

Revenue Sources : Sell or rent each of the following

  • Token
  • Heat – geographically restricted or not possible
  • Expertise – is situation dependant and may generate competition

Miner can choose how much of the output they need to convert to revenue.

Optimizing Profits

  • Decreasing Costs – Find cheaper or free sources of electricity, access mining hardware for cheaper, Find places with lower rent( but have access to cheap electricity)
  • Increasing Revenue – Find better hardware to produce more tokens per hour
  • Mine other tokens that have a higher price or lower cost of production

Smoothing the Revenue Curve

  • Capital Management – If a miner has a reserve of tokens, then they can hire someone to manage that reserve to generate other revenue.
  • Create relationships with OTC dealers, Exchanges, Funds or any buyers of decent size, who can have more stable buying patterns
  • Financialize things like hashing power, participate in futures markets etc.

External Forces

  • Block Subsidy
    • All else being equal, miners would like the price of the mined token to be higher.
    • If block reward goes to zero, miners are incentivized for higher fees per block
  • Electricity prices
    • All else being equal, miners are incentivized to find the least expensive electricity.
  • Competition
    • Because the probability of mining the next reward is (somewhat) proportional to the miners share of total hashpower. Miners are incentivized to grow their capacity.

How miners can play offense

  • Given the change in reward is public, miners can change their holding patterns to take advantage of this situation. 
  • Invest in more powerful or more efficient hardware ASICs for example.
  • Find monopoly access to electricity
  • Mine empty blocks
  • Invest in creating better mining software
  • Lobby local regulators for favorable treatment
  • Invest in longer term research like Photonics

Playing with the variables

  • Mining Pools
    • Spread out cost of electricity, hardware costs but introduces a revenue share
  • Optimizers 
    • Given total capacity remains the same, miners can move to other networks for more profits. Especially if this is done programatically
  • Mining as a service
    • When you have the expertise but not the capital to start mining at scale. There are principal-agent problems.
  • Personal miners
    • Currently there are large mining pools that dominate mining. Given the process is public, anyone can setup a miner.
    • This can be profitable on networks with little competition or where the goal is not strictly financial.

There is still a lot of ground to be covered here, like Modelling miners for non Proof-of-Work networks, Generalized Mining, Participation in Forks and Implementing updates. However this model can act as a base that can be expanded as needed.

Understand the strategy, business and impact of blockchain projects.