Decentralization Matters

Three Things #108: February 25, 2024

Feb 26, 2024

Many hands make light work. But how politically decentralized is this crew? 🤔 Photo by Randy Fath on Unsplash

We take decentralization very seriously at Spacemesh. People in our industry tend to throw around the word “decentralization” without taking the time to really understand what it means, why it matters, or how to achieve it. We’re really trying to get decentralization right, and all of these are a part of the puzzle.

I’m giving a talk in a few days at ETHDenver on decentralization in the context of Spacemesh: why it matters, what it means and how we measure it, how we’re doing at Spacemesh, the challenges we’re facing and what we’re doing about them, and how we can do better. Sometimes it’s good to go back to the basics and remember why we started—and decentralization is a big part of that story. Here’s how we’re thinking about decentralization at Spacemesh.

Thing #1: Why Decentralization Matters 💠

In his landmark post on the subject from a few years ago, Vitalik Buterin lays out three reasons decentralization is important: fault tolerance, attack resistance, and collusion resistance. I broadly agree with his framework. I’ll explain my interpretation of each of these and why I think each is important in the context of today’s world.

The first reason, fault tolerance, is as Vitalik points out noncontroversial. It’s an idea that’s been around a lot longer than blockchains: jet engines, electrical systems, hospitals, military infrastructure, and financial portfolios (his examples) are all designed to be fault tolerant for good reason, since failure in any of these could be catastrophic. Decentralization is the best and arguably the only way to achieve real fault tolerance: not just to remove single points of failure but also to minimize the likelihood of common mode failure: e.g., when there are four aircraft engines or four hard drives providing redundancy but all were manufactured in the same facility on the same day and fail for the same reason.

I’d add political systems to this list. Highly centralized political systems are also fragile and more prone to failure than decentralized systems based on the principle of subsidiarity. A single bad leader or ruler can take down an entire centralized system. So can idiotic, bureaucratic decision making, as was especially on display during the Covid era. Thank goodness there are many states and many countries which together form a natural experiment: bad policy in one place can be compared in practice to good policy elsewhere.

The second reason, attack resistance, is a little more nuanced. In fact I think attack resistance is a special case of fault tolerance where the faults we’re trying to defend against are intentional rather than accidental. If one of the failure modes for your jet engine is a malicious engineer or a terrorist then you need to factor in both fault tolerance and attack resistance—but it amounts to more or less the same thing and we engineer for both in a similar fashion. We want multiple layers of redundancy, better known as defense in depth, so that it takes a failure of (or a successful attack on) many such layers for the entire system to fail catastrophically. We can design and engineer systems such that the likelihood of such a cascading failure, even in many plausible attack scenarios, is infinitesimal.

In the blockchain world the types of attacks we need to defend against are mostly of the economic variety, as Vitalik describes. We need to make sure that any attacker is fighting an uphill battle, i.e., that the amount they stand to gain from attacking the system is significantly less than their cost to successfully attack the system. Decentralization matters a lot here because bribing or coercing one miner, one pool operator, or one influential participant in governance is much easier than bribing tens or hundreds, so it’s important that too much power not reside in the hands of one person or one organization.

Finally, collusion resistance is the most nuanced and complex of the three. It’s difficult even to define; Vitalik’s reasonable attempt is, “coordination that we don’t like.” As he points out, this is the hardest goal to achieve and unlike the other two it cannot be achieved without severe tradeoffs. For example, making it harder to change the protocol might prevent certain types of attacks but would also make it harder to respond in times of crisis, e.g., if a severe bug is discovered. Vitalik has three suggestions: we can build protocols to resist such collusion, we can try to distinguish between good and bad forms of collusion and try to make one harder and the other easier, or we can try to strike a balance between the two.

In my experience collusion resistance is especially difficult because it flies in the face of the harsh reality of economies of scale. It’s nearly always cheaper, faster, easier, and more efficient to do things at scale. This means that wealthy, experienced actors, known as whales, are pretty much always going to have advantages over the little guy. This leads directly to the dismal phenomenon of the rich getting richer, which has been true always and everywhere.

Nearly every system and every market trends towards a small number of dominant players. Once this trend plays out, if indeed it’s allowed to, it’s very difficult to achieve collusion resistance since those dominant players tend to know and talk to each other, and it may very well be in their collective interest (and against everyone else’s interest) for them to collude. We have to be extraordinarily thoughtful how we design systems that make such collusion difficult or, in the ideal case, impossible. Perhaps the best, most well-known example of a collusion resistant protocol is antitrust law, as Vitalik points out. I think this is very much an unsolved problem in the blockchain space and most projects that claim to have solved it haven’t even come close.

Is anything missing from this list? What about the classic blockchain goals, e.g., censorship and seizure resistance, unstoppable applications, separating money and state, disintermediation, and permissionlessness?

In fact nearly all of these fall under attack resistance or collusion resistance. Censorship and seizure are clearly forms of attack and they’re much harder to carry out when a system is decentralized and there isn’t a single known actor to go after. The reason we need to separate money and state, cypherpunk style, is to prevent the government from attacking the monetary system (through, e.g., rampant inflation to pay for never-ending wars) or simply building a monetary system that fails. The purpose of disintermediation is to prevent a form of economic attack known as rent seeking so this, too, falls under attack resistance (the topic is indeed quite broad).

What about permissionlessness? At first glance this seems like it should be its own, separate goal. But upon further examination it becomes clear that permissionlessness is quite literally the lack of all censorship! In other words, a system that can successfully defend against all forms of attack, including censorship, is by definition permissionless: no one can stop you from building on or transacting on the system regardless of who you are. There are other, more nuanced aspects to permissionlessness such as accessibility of information, keeping costs low, and fostering an inclusive culture, but for the purposes of this brief analysis I think we’ve checked all of the boxes.

Thing #2: What It Means 📖

Lots of blockchain folks tend to throw around the term decentralization without being clear what they mean. It’s important to remember that decentralization is not one-dimensional! In the same post mentioned above Vitalik outlines three types of decentralization: logical, architectural, and political. This is a good starting point for thinking about decentralization. Let’s consider each.

Logical decentralization is quite straightforward: does the system act like a single, monolithic system or more like an “amorphous swarm”? Another way to think about logical centralization or decentralization is in terms of peer connections and namespaces. Can any peer on the network discover and communicate with any other peer? Is there a single namespace, database, or state? The article gives examples of both. Corporations, countries, and blockchains are all logically centralized. CDNs, languages, and BitTorrent are logically decentralized.

The entire purpose of a blockchain is to have a single, politically and architecturally decentralized (but logically centralized) state machine that can be used to implement things like the Bitcoin ledger, so for our purposes logical decentralization is a basic requirement.

The second type of decentralization is architectural. This is the type of decentralization that most blockchain people mean when they talk about decentralization. Vitalik defines it as, “How many physical computers make up the network?” A better definition is how many nodes are there, and how much diversity is there among those nodes? This ideally includes diversity along several dimensions: nodes run by both hobbyists and companies, nodes run in many geographic locations, nodes run on many different networks and in many different data centers, nodes running on different hardware configurations and different operating systems, nodes running different versions of the software, and nodes running different client implementations, where possible.

The main reason diversity is good and monoculture is bad is for fault tolerance purposes, as discussed above: the network should be able to withstand failure along one or even multiple of these dimensions (e.g., an issue affecting nodes running the latest version of a particular client on AWS). As mentioned above there are also political reasons why diversity is good, and this type of decentralization is fundamental to blockchains. It’s how they achieve many of the desirable properties described above.

Finally, this brings us to the third type of decentralization, political. Vitalik defines this as, How many individuals or organizations ultimately control the computers that compose the network? This is the most nuanced type of decentralization and the most difficult one to understand. A big reason is that, unlike the logical and architectural dimensions, political centralization or decentralization is often invisible. We can see roughly how many nodes are on the network at any given time and we can see the contents of the data structures (e.g., blocks and transactions) but we can only speculate on who is controlling or profiting from those nodes.

Despite being difficult this type of decentralization is also critical to the success of blockchains. If there are hundreds or thousands of nodes on a network, even if they’re running a wide array of implementations on different operating systems in different data centers, etc., but all are controlled by a single actor, it should be fairly obvious how and why the network isn’t meaningfully decentralized at all. This is an extreme example, but many blockchain networks seem decentralized at first glance—because they’re very architecturally decentralized—but are in fact highly politically centralized. A good rule of thumb for political centralization is, how many individual humans would have to collude to attack, halt, or destroy the network? And, do those humans know one another and have common interests? How many doors would the police have to knock on to shut things down?

In addition to Vitalik’s list I think there’s also an economic dimension to decentralization. It could be thought of as an extension to the last kind: political-economic decentralization. How many actors profit from the protocol? And even more importantly and more interestingly in the case of proof of stake networks where coins translate directly to protocol votes, how many actors collectively control enough of the stake to shut down or permanently capture the network? Some folks use the term Nakamoto coefficient to capture this idea; the number is shockingly low for many networks, and I’d argue that these networks are not meaningfully decentralized! (Nakamoto coefficient is a useful idea but it’s also incomplete.)

I won’t consider Spacemesh meaningfully decentralized unless there are many (millions) of nodes running on a diverse array of infrastructure controlled by a diverse array of people and organizations.

Thing #3: How We Measure It 🌡️

Just in case it wasn’t clear from the above: decentralization is a fuzzy concept and it’s difficult to measure objectively. It’s impossible to measure something you can’t define, and one of the problems with defining decentralization is the different dimensions described above. For our purposes we’re most interested in architectural decentralization, but it’s still not obvious how to measure it. Even focusing on only architectural decentralization there’s no single metric that perfectly captures everything that matters. You might claim that your network is super-duper decentralized because it has 1M nodes, but as discussed above if all of those nodes are identical (same software, same release, same operating system, same hardware, same data center, etc.), not to mention operated by the same person, then there’s nothing decentralized about it at all.

Nevertheless, we do need to attempt to measure it even imperfectly. We created a “decentralization ratio” for Spacemesh a few years ago, which is visible today on our network dashboard (it currently hovers around 80%). It’s a weighted average of two components: the number of miners measured quadratically up to some arbitrary maximum, and the distribution of recent proposals and their voting weight in recent blocks. We use Gini coefficient for the latter, which gives us some measurement of the evenness of distribution or endowment of voting weight across the network.

There are a few parameters here: how much weight to give each of the two measures, what to set as the “target” number of miners, the number of blocks or layers to look back when measuring weight, etc. These numbers were set relatively arbitrarily, they’re subjective, and I make no claim that they’re perfect or final, but they do nevertheless produce some interesting output. As is so often the case with imperfect metrics, the absolute number (which is really just a totally arbitrary score, normalized to 100) is much less interesting than the relative change day to day and epoch to epoch.

The astute reader can already poke holes in this formula. The main issue here is that the only thing we can measure quantitatively is miner identities (we call them “ATXs”, for “activation transaction”—each miner needs to produce one each epoch to ensure ongoing eligibility), and miner identities don’t map one to one to anything in the real world like people or organizations. In some cases, one actor may control many such identities for a variety of reasons: they initialized different plots of storage at different times, they’re running multiple miners on multiple machines, they want to be able to prove their storage in parallel, they want to hide the fact that they’re large, or they just didn’t initialize their storage efficiently. The opposite may also be true in the case of a pool—this is exactly the definition of a pool, many actors who collectively appear like one to the network. So we don’t even know for sure whether this is an overcount or an undercount.

Another way we try to measure architectural decentralization is by counting the number of nodes on the network. This is also difficult. We don’t collect any analytics by default. Spacemesh is a P2P network so it’s certainly not the case that all of the node software talks to our servers.

The only thing we can count is the number of distinct nodes that our own nodes happen to see on the network. Over a long enough period of time we probably see most nodes since all nodes talk to our bootstrap nodes when they first connect to the network, and since we operate some pretty big “master” nodes that each have thousands of peers. But we definitely don’t see all the nodes. In particular, we don’t see private nodes—nodes that the operator has configured to talk only to their own “gateway” node. And if a node has been running for a long time and has a stable peerset that happens not to include any of our nodes, it would be effectively invisible to us. The number of nodes we see over a given two week period fluctuates between 25-30k. At least in this case we know this is an undercount, so we can safely say that the total number of nodes is on the order of 25k, but it could be 25k or it could be 250k, we really don’t know.

One cannot discuss decentralization without touching upon the issue of pools, which I wrote about at more length a few months ago. While Spacemesh is designed so that there’s no economic incentive to join a pool—a relatively small home miner still earns frequent rewards without pooling—there are other incentives to join a pool, mainly that operating your own node is difficult and pools make it easier (in exchange for a portion of the fees earned). We don’t have exact data on pools either but we know that there are a few big ones operating on Spacemesh.

Every blockchain community has to deal with pools. They’ve been a feature of Ethereum and Bitcoin mining and validation almost since the very beginning. Delegated proof of stake protocols effectively embed pooling into the protocol via delegation. Try as we might, we cannot completely eliminate the effects of economies of scale, which pools exploit and which is the reason they exist. A very small number of pools have always dominated Bitcoin mining, and Ethereum mining and validation both before and after the Merge. This remains true today.

Pools are less than ideal from a decentralization perspective. They reduce architectural and political decentralization. In theory, a pool operator could “go rogue” and attack the protocol by, say, censoring transactions, refusing to build on honest blocks, or causing a reorg. In practice this sort of attack is so rare that I’m not aware of it ever having happened to a major blockchain. Having multiple pools helps balance the influence of any one pool. And it’s important to remember that we can always fall back on the social coordination layer if a pool were to go rogue. I have no doubt that, in the unlikely event that happened, the community would not hesitate to “fork out” the offending pool and keep going as if nothing had happened. There is precedent for this happening!

As this issue has hopefully made clear, decentralization isn’t simple, it isn’t cheap, and it isn’t easy. It also doesn’t happen overnight. The arc of the blockchain story is long but it bends towards decentralization. We’re on this path, we still have our work cut out for us, and we hope you’ll join us!

Three Things

Decentralization Matters

Three Things #108: February 25, 2024

Thing #1: Why Decentralization Matters 💠

Thing #2: What It Means 📖

Thing #3: How We Measure It 🌡️

Discussion about this post