Understanding Blockchain Analysis

Blockchains generate a lot of data, transactions, addresses, fees, smart contract state updates, and more. Combine that with market data and trading data from the exchanges, and you have a glut of data sources to analyze and try to generate some insights.

To better analyze data and understand who might find value in blockchain analysis, I categorize it as follows

Data Sources

OnChain Data | Trading Data | Market Data | Metadata


Labeling | Pattern Matching |

Who are the buyers ?

Speculators | Users | Regulators |Due Diligence Teams | Academics


Data Sources

OnChain Data – the blockchain, under normal operation generates a lot of data. Each component of the blockchain data can be analyzed separately.


  • Number of unique addresses
  • Addresses that have done X
  • Addresses with balances between a threshold


  • Amount of fees paid in USD/native token
  • Transaction size in bytes/amount

Mempool statistics

  • Size of the mempool in transactions/amount/fees


  • block size in bytes/value transferred/fees/
  • Confirmation times, number of confirmations
  • Fees, Subsidy and who this block was mined by

Trading Data – data from exchanges ( centralized, P2P, OTC, Automated ) usually

  • Price – calculated by a specified weighing algorithm
  • Order book data – trades, depth, liquidity, slippage etc on all kinds of speculative products

Market Data – there are a lot of businesses that have come up around blockchains. Things like

  • lists of upcoming projects
  • vendor databases
  • financing information
  • open source project information
  • signals from analysts
  • legal & regulatory changes

Meta Data – One could easily add metadata to each of the data points stated above. This is extremely useful in detecting patterns and regulatory compliance. Things like

  • Frequency of address re-use
  • Tainted addresses
  • Address labeling
  • compliance data like disclosures
  • Holding and control structures


Labeling – Address labeling is a very useful tool in ensuring compliance. Usually, labels come from

  • Self Avowed – static donation address, ENS names, twitter handles, officially declared addresses, smart contracts, miners labelling transactions , etc
  • Regulatory lists – when addresses are included in lists by bodies like OFAC
  • Data Breaches – Usually include personal data attributed to an address
  • Malware, Viruses and Ransomware – when they demand payment in a blockchain native token.
  • Generic labels – active during X hours, Interacted with X known entity, matches this transaction pattern

Pattern Matching – working on the assumption that operating models do not change, analysts can recognize entities based on patterns they see in the data

  • operating hours – can specify which timezone an entity is covering
  • trading patterns – is this entity an unnamed exchange, and OTC provider, or a large holder or a miner
  • input vs output patterns – is this a mixing service, a bot, or an automated market maker.


Buyers of blockchain analysis can be classified as looking for three things

  1. Alpha – or an advantage over other members of a market. Trades looking to spot patterns or get ahead of major price move.
  2. Compliance – whether it is people looking keep their business in compliance with local regulation , or regulator keeping them in check.
  3. Novel Information – researchers and academics who are looking to understand blockchains and their effect.

Businesses have come up to fill these needs.

  • Data aggregators – these businesses usually pick a nice like exchanges, labeling or onchain events and aggregate data from multiple sources. For example providers like
    • Nomics, Kaiko provide exchange data
    • Blockset, Infura and Alchemy give access to onchain data
  • Analysis as a Service — usually boils down to custom reporting and BI tooling for charts and dashboards — companies like Messari, Glassnode, Skew, Dune analytics
  • Compliance as a Service — companies that automate KYC/AML tracking, source of funds tracking, etc like Chainanlysis, Ciphertrace, Merkel science
  • Directories — Messari, Chainanlysis, Crunchbase, Angellist, Traxcn, and others have directory component to them
  • Recommendation providers are rarer but exist to help larger entities make better trades and decisions.


Transparency vs Privacy – Privacy ( at least via pseudo-anonymity and via privacy-preserving blockchain tech ) is a huge reason for people to transact via public permission-less blockchains. Blockchain analysis can reduce the privacy of transactions especially if they are tied back to off-chain entities.

Compliance vs New Business models – the rapid growth of Binance is a testament to how quickly new business models can let companies grow. The same rapid growth could also be explained by Binance operating wholly in crypto and circumventing the need for strict KYC. Would this growth have happened if it was forced to do strict KYC from the start?

Whether this is a good or bad thing depends on your view of the value created by entities like Binance or MakerDAO(which operates onchain).

Decentralized Ideals vs Centralized Reality – Decentralization, for better or worse, has been touted as an aspiration. The real world necessities of co-ordination and task management often require very much centralized entities. These entities then turn around and use this as justification for disproportionate ownership of their native token. Similar things happen in mining, commit access to code, and narrative ownership.

Blcokchain analysis can bring this centralized reality to light.

Understand the strategy, business and impact of blockchain projects.