I wrote previously about our plans for the design of the Spacemesh VM, Athena, and this week we finally released a longer document explaining those plans in more depth and answering some frequently asked questions. We’ve spoken about it enough; now begins the hard but exciting work of designing and building the thing.
As with most complex engineering the devil’s in the details. VM design isn’t for the feint of heart and there are still a lot of details to work out as we build Athena. The thing we’re building isn’t actually a VM, or rather, it isn’t just a VM. The VM—as in the “virtual processor” and ISA that interprets smart contract bytecode in the context of the blockchain—is just a piece of a bigger puzzle. A better term is probably “smart contract engine” but that doesn’t roll off the tongue like “VM” does.
What does this entail? Other than the VM and instruction set there’s the account model, the bytecode format, the precompiles/system contracts/host functions, the API between the VM and the node, and of course all the details of the rollup architecture: how transactions are created, signed, gossipped, bundled into transactions, approved, processed, finalized, etc., as well as the game theory, incentives, and mechanism design around all of this. It’s a big project.
But the piece that excites me the most is the programmability model, which has the biggest impact on developer experience. As a software developer, as someone who loves building apps above almost all else, and as someone who has been building infrastructure these past seven years so I could get back to writing apps, I think the experience of building web3 apps today, while markedly better than when I started, is still pretty terrible. Here are some of the ways I think we can and should make web3 programming easier.
Thing #1: No Backend 🛢
To understand where we’re heading it’s helpful to know where we’re coming from. In the beginning, decades ago, apps ran locally with a local database and there was no need for “multiplayer” mode, servers, or shared data. That began to change as we moved into the web era in the nineties, and continued to evolve when we moved into the interactive web2 era in the late nineties. At the time the standard way of writing interactive web apps was CGI, where the entire application ran on the server. So while these applications did have a notion of multiple users and data ownership, those data all resided in one place. In retrospect the experience was pretty bad—for those not old enough to remember, imagine filling out a form, clicking Send, and waiting 30 seconds for the page to refresh (if you didn’t encounter an error)—but that’s how Google and Facebook and Amazon and every other successful web2 app started.
Things began to get better in the early 2000s with the introduction of AJAX. Web development became much more sophisticated: web2 apps evolved into SPAs that ran on the client’s browser and were more responsive. Web apps began to look and feel a little more like native apps. Data moved back and forth using RESTful APIs and asynchronous Javascript made the experience much better. These same ideas carried over into mobile and they’re by and large how web and mobile apps are still written today.
The API is the glue that connects the data in the frontend, i.e., the app the user sees and interacts with, with the data on the backend: typically a database that stores data for all users and for the entire system, and also typically a set of business logic processes or routines that use those data, depending on the application. There are many strategies for splitting the data model between the frontend and the backend. REST and RPC are two of the most popular, and GraphQL is a more recent contender, but as a developer I’ve always found them all frustrating and difficult to work with. For one thing each requires a ton of additional logic just to manage the API on either end. I’ve seen many web apps, and have probably written one or two, where this “boilerplate” API and data wrangling code involves many more lines of code than the actual business logic!
There are all sorts of issues with client-server apps and the APIs that connect the two. What happens if the client makes a request and the server doesn’t respond, or produces an error? What happens if it takes a long time to get a response? What happens if the Internet connection disappears or changes? How can the client avoid downloading data it doesn’t actually need? These questions are just the tip of the iceberg of the sorts of questions and challenges that can arise with API design, and every single app built with such an API needs a strategy to deal with these and other questions.
There have been a tiny number of projects and frameworks that took a different approach to solving this problem by uniting the backend and frontend data model (or, viewed differently, not unnecessarily splitting them). My favorite is a web and mobile app framework called Parse from way back in 2011. Facebook acquired the company and shut down the platform, but it fortunately still exists as an open source project. I remember writing an app using Parse after writing tons of traditional web and mobile apps and feeling like it was a much better, more intuitive way of architecting an application. The details of how it works are beyond the scope of this issue but the rough idea is that the app had direct knowledge of the underlying tables or data sets powering the entire application and could read from them and write to them using a simple SDK. The Parse framework handled permissions, synchronization, and all of the gory details for the developer invisibly. While I haven’t developed an app with it, the Meteor Javascript framework offers a similar model.
The typical web3 app is even more complicated. In addition to a frontend (running on the user’s device) and a backend (running on the operator’s server), there’s a third component, the smart contract, running on chain. Web3 apps therefore typically have to split their data model in three. This increases coding complexity and increases the cognitive burden on the developer. (Three is much harder than two, not 50% harder.)
I’m sure there’s a better way. As an application developer I want to focus on the application and its business logic, not the data model, not data pipelining, not networking code, and certainly not things like gas and finality. DevEx and UX issues like gas and finality will never go away completely in web3 apps of course but there are ways to make them much less of a headache to developers and much less of a stumbling block in the application development process. (Solana’s requestAirdrop is a great example of this.) If I’m building an app that needs to transfer a NFT from Alice to Bob it should be as simple as adding four or five lines of code to my app: check that the NFT asset exists where I think it does and error otherwise, check that Alice’s and Bob’s accounts both exist and error otherwise, then perform the transfer and error otherwise. That’s it. (Compare this to what it looks like today in practice.)
To be clear I’m not suggesting that APIs are always bad or that applications should always be built without them. There are valid reasons to divide an application’s data model including permissions and security, modularity, testing, etc. A backendless app isn’t appropriate in every situation. But it makes prototyping easier and it should be possible, and it can greatly reduce the cognitive burden of application architecture and is worth pursuing for this reason alone.
It would be amazing to include something like this in Athena but a client-server data synchronization and permissioning model and a sophisticated SDK like Parse isn’t something that can be built overnight. Athena is complicated enough without it; it’s not realistically going to ship with v1 of Athena. But it’s a direction we should move towards in order to make web3 application development sane and legible to millions of web2 devs. Web2 development ain’t no walk in the park either, and web3 development has unfair advantages which means it should be easier than web2, not harder. That’s the direction we’re headed.
Thing #2: Embedded Consensus 📦
Applications have different kinds of state. To simplify just a little bit there’s at least one kind that corresponds to each of the major architectural components discussed above. For a web2 app that’s local state, which tends to be ephemeral, and backend state, which tends to be persistent. Local state includes things like the state of a UI application: which page or view the user is on, form fields the user has filled out, a message the user is in the process of typing, and generally local changes that haven’t been saved or synchronized with the backend. For most kinds of applications this local state gets erased or reset when the application restarts. By contrast the meat and potatoes of the application, i.e., the data that really matter, get sent to the backend and persisted in a database.
For a web3 app there’s also on chain data. Well designed web3 apps store only tiny pieces of critical data on chain—typically this is just ownership information (who owns which asset, unique name mappings, token balances, things like that). For games this may be the critical elements of game state, for voting applications it may be actual votes, etc., but it should only be those elements of the application’s state that require global agreement (i.e., consensus) and/or verifiability.
In web3 apps today each type of state needs to be managed differently by a different component of the application, often written in a different language using different UI and data model frameworks and different programming paradigms, and often managed by different developers or even entirely different teams. As mentioned above this can be a good thing! Modularity and abstraction are good for many reasons and a protocol isn’t a protocol if it only has one frontend. In other words, a clean separation between the on chain and off chain components of an app quite often makes a lot of sense.
The astute reader may have noticed how this thing is related to the previous one. As with removing the backend, at least for certain types of apps and especially for prototyping, I think we can do better here too. I think we can go so far as to remove the on chain component entirely, or rather, embed it into the frontend app as well. The idea and the motivation are the same as in the first thing above: streamline the process of designing and building an app and make it possible for a single developer to build a monolithic web3 app top to bottom using a single language and a single framework.
For those not old enough to remember, it was a really big deal when it became possible to build a full-stack web application, including both the frontend and the backend, using a single language, Javascript, thanks to Node.js and frameworks like Express. There are two reasons this is such a big deal, one technical and one social. The technical one is simply the fact that it’s easier to pass data structures around when the code running on the frontend, generating and interacting with the data, is written in the same language as the code on the backend that’s responsible for managing the data. (And in the case of Javascript the format most commonly used to ship data back and forth between the frontend and the backend is also based on Javascript.) The social one is more straightforward: it’s easier to assemble a team to build a full stack app when they only need to know a single language! And by the same token aspiring full stack engineers could get by with only Javascript.
How could we extend this idea to web3? The basic idea is that rather than having a frontend app written in, say, TypeScript, a backend written in Go or Java, and a separate smart contract written in, say, Solidity and deployed to Ethereum, the app developer could instead write their frontend app without a backend (as described above) and additionally make inline updates to those few, critical components that require on chain consensus. The SDK would handle the rest, and could do so automagically for many use cases. The app developer could indicate in the app’s data model which elements can be local, which need to be saved to the database, and which elements should live on chain, and could additionally indicate things like whether the data structure should be upgradable. Some affordance could be made for concepts like ownership and delegation where necessary. The rest would more or less take care of itself. The SDK would dynamically deploy code and data on chain as necessary and would also handle the messy bits like gas and finality.
This is just an idea. It’s an idea I’ve had for many years and I’m not at all convinced that it’s impossible—I won’t know for sure until I try to build it—but I’m also not convinced that it is possible, so no promises! As with the other ideas here I think it has merit and it’s definitely worth trying, and Athena could be a great platform for running this sort of experiment.
Thing #3: Hierarchical Consensus 🪆
Most of the time it’s good enough to run something on only one system. This is true of 99.99% of the computing we do. We don’t usually need our computing to be verifiable, i.e., for someone else to be able to independently verify that we ran some algorithm or did some math correctly. This is true for word processing, text messaging, email, image editing, gaming, listening to music, and the vast majority of other things we use our computers and other devices for. In fact I can’t think of a single consumer application that today relies on verifiable computing! This is the case since computing today is centralized. The bank doesn’t need to check that you did the math correctly; of course it relies on its own centralized database and its own computers to manage your account balance.
This is one of the key differentiators of a decentralized system like a blockchain. Since there’s no central operator every actor needs to verify on their own that each transaction was performed correctly (or, in practice, to outsource this task to a trusted third party). There are a few other obscure computation systems based on things like MPC that rely on verifiability, but these are largely experimental today.
The way blockchains work today consensus is all or nothing. A transaction is either run and finalized on Bitcoin or Ethereum, or it isn’t. If it is then, barring an attack or major network failure, substantially every node in the network agrees about the state of the transaction. In a nutshell that’s literally the definition of a blockchain! If this weren’t the case, it wouldn’t be a blockchain.
The problem is that global consensus is slow and expensive, and as mentioned above, it’s not appropriate for many use cases. It makes sense that the entire Bitcoin network needs to come to consensus on the outcome of a $1M transaction. But does the entire world need to agree about your ownership of an obscure NFT or how you voted on some obscure DAO? Probably not.
Blockchains have addressed this in practice through sidechains and rollups. When Ethereum consensus and block space became expensive lots of lower-value transactions migrated to other, less secure networks like Gnosis Chain and Polygon. Today there are many more options in the Ethereum world, and many people are also working on Bitcoin L2s. Layer three chains have even begun to emerge. This is a good trend! It means that applications are beginning to organically discriminate on the basis of price and security requirements. The downside is that, so far anyway, nearly all L2 chains, rollups, and sidechains are extremely centralized and have very different security guarantees from robust L1 chains like Bitcoin and Ethereum. And these differences aren’t always clearly explained or understood, which is risky.
In the real world consensus isn’t all or nothing. Blockchains need tiered or hierarchical consensus, i.e., a form of consensus that more closely matches real world social and economic structures. In the real world not every transaction needs to be visible or verifiable to everyone, everywhere! If friends trade things amongst themselves it’s sufficient that the friend group be able to see and verify the “settlement” of those trades. Within a company it’s often sufficient that the company itself, or certain people or departments within the company, agree about the state of a transaction.
Blockchains aren’t there yet. It’ll take some time. Global consensus is already a very difficult problem—take it from someone who’s been working on it for some time now. And it’s not quite a “solved” problem yet as the proliferation of blockchains, consensus mechanisms, and security models shows. Ultimately, however, the only way blockchain and verifiable computing scale to billions of ordinary users and consumer use cases is through a hierarchical model. That’s how everything scales. The only project I’m aware of that’s working on this problem today is IPC, which is still in testing.
How does all of this relate to programmability? What I’d like to see is an option for a given data model to live at different consensus “tiers,” and/or for a given transaction to settle at a different “tier” or layer based on several factors: urgency, value, how much the transaction is willing to pay, and the context of the data structure or transaction.
It’ll take a while to design and build a blockchain that’s capable of offering tiered consensus like this but in the meantime we should still try to make progress on the front end, i.e., on adding support to the application layer. In order to be future proof it would be nice if the Athena programming language, SDK, and VM could all support tiered consensus.
A transaction settling or becoming final or, in application terms, an update to a data structure shouldn’t be binary, all or nothing. It should happen in stages: a preconfirmation, then a confirmation, then finality at a lower tier and eventually, for certain applications, finality at higher tiers. The entire application development stack has to be designed with this in mind!
This is another example of something we won’t realistically be able to ship with V1 of Athena but it’s worth pursuing down the road. It’s the sort of breakthrough feature we need to attract millions of developers to web3 and to foster the development of bigger, better, more usable, mainstream web3 applications.