Table of Contents
Entropy is a known measure of “randomness” of the system. It is applied to thermodynamics. The second law of thermodynamics claims that in closed system, entropy will never decrease .
On statistical level, entropy also measures the amount of information, that is needed to completely “describe” the system. It comes from the information definition :
I(X;Y) = information about X, originated from Y
H(X) = X entropy
H(X|Y) = X entropy, knowing Y
As consequence, it means, that to completely describe the system, you need not less that entropy H(x) bits of information. if the system is completely uniform, the measure is 0, and maximum number is measured as log2(N) where N is the number of elements.
How many information does the blockchain actually carry? If we will know the answer to this question, we will be more conscious on analytical boundaries and the value, that we get by extracting information from the blockchain. The more entropy we have, the more interesting information can be extracted from the on-chain data.
We start with an easy experiment of calculating actual entropy measures from bitcoin (BTC), and ethereum (ETH) balances. Note, that we will leave smart contracts and scripts/data in bitcoin apart, for the future investigations. This is fairer to BTC, as to ETH, where smart contracts are a very important part of blockchain analytics.
We at Bloxy.info already indexed several blockchains, with the most extensive data from Bitcoin and Ethereum Main nets. These datasets allow us to easily query over the whole dataset of addresses using standard SQL. This is how SQL query looks like for all bitcoin addresses:
Bitcoin Balance Distribution
We got the list of 26,835,583 addresses. The number of unique balances in this list is lower, count of 1,528,767.Which balances from these 1,528,767 balances are found in most number of addresses? In other words, which balances are more popular?
We are talking about balance distribution of Bitcoin balances, and here it is sorted by count of addresses holding the unspent balance:
Top of the list are addresses holding just 0.00000001 BTC. This is most probable the result of the change functionality of software wallets, as this appear to be very small amount of money ( 1/100 of a US cent for now ). Almost 2 percent of all bitcoin addresses hold just this small amount.
Notable there are a lot of rounded numbers in the list, decimal fraction of 1. People, you know, love round numbers.
Number of 50 BTC appear as a trace of history, when you could mine 50 BTC for one block. So many of these are not yet spent, and residing on the wallets as the balance.
Some other numbers, as 0.00000546 BTC, 0.00007800 BTC, 0.00005461 BTC, which are minority in the list, are most likely to be artificially generated. For example, 0.00000546 BTC balance is the result of Omni  protocol.
Did not dig too much, if somebody knows the appearance of other “strange” numbers in popular balances, please comment article with thoughts!
Doing the same with Ethereum, found that most popular balances also appear in sub-cent zone. We truncated the balance value to 8 digits in fractional part to be conformant with Bitcoin data.
The total count of Ethereum addresses with a balance is 26,352,814 from which 1,136,686 (4.3%) have nonunique balances. And the top list is:
We see very similar pattern of distribution, but the top balance now is 0.00558 ETH. More than 2% non-zero Ethereum addresses hold this amount.
These “strange” number in balance is explained by a result of large tumbler work. It has this side effect of generating these small balance addresses, as in this example:
In total, these addresses hold 3515 ETH, equivalent now to ~850,000 USD. Even with gas value coverage, this will be significant amount. Likely that all private keys from this addresses is in hands of one person, but he does not care or do not want to take it out.
Knowing balance distribution, we can calculate entropy using the formula :
It can be expressed as SQL query, that executes pretty fast:
We get number around 17. Is it large or small, good or bad? Maximum theoretical limit for entropy is when the distribution is a complete uniform, and it gives a number for this number of bitcoin addresses:
As entropy is a logarithmic measure, the difference of 7 “bits” from maximum gives us approximate difference 2⁷ = 128 times. From this we can draw a conclusion, that Bitcoin addresses have far from random uniform balance distribution. In average, we need just 17 bits of information to describe the balance of an address. it is around 2¹⁷ = 128K combinations.
If we repeat the query limiting the date frame for the transactions, we will get the measures of entropy by time:
At the beginning, most addresses in BTC hold the mined amounts ( 50 / 25 BTC ), and the entropy was very low ( below 1 bit ).
Entropy is increasing, which is in conformance of second thermodynamic law. The rate of growth reflects the amount of information collected in the blockchain. With time, this rate reduces.
Currently, the entropy for BTC and ETH is pretty close in numbers, and ETH entropy approaching BTC. As a physical system, blockchain is pretty small and not very random.
This article was composed of the data and by analytical tools from Bloxy.info analytical engine. Bloxy.info web site provides a set of tools for analytics, traders, companies, and crypto enthusiasts. The tools include APIs, dashboards, and search engines, all available on-site, providing accurate data, indexed directly from the blockchain live node. Bloxy mission is to make blockchain more transparent and accessible to people and businesses. Please, make a reference to the source of data when referencing this article.