What Is the Ethereum Virtual Machine?

9 min readAug 11, 2021

Let us take a look at Java as an example, in Java, developers write the codes and compile them into bytecodes. These bytecodes are then loaded into the Java Virtual Machine (JVM) and executed in the JVM. The JVM specification ensures the interoperability of the programs across different implementations. Ethereum takes a similar approach, where developers code smart contracts in Solidity. These contracts are then converted into bytecodes and uploaded on the blockchain for execution on the EVM (Ethereum Virtual Machine) running on various hardware platforms.

The Ethereum Virtual Machine is a “quasi-Turing” complete machine. A Turing complete machine is mathematically able to solve any problem given to it. So why is the Ethereum Virtual Machine seen only as quasi-Turing complete? The reason is that the calculations performed by the machine are limited by Gas, which acts as a safety limit when it comes to the number of computations the machine can perform.

We mentioned earlier that the Ethereum Virtual Machine is at the heart of the Ethereum ecosystem. It handles the deployment and the execution of smart contracts. Simple transactions do not need the involvement of the EVM. However, other operations will require a state update by the EVM. The Ethereum Virtual Machine contains millions of executable objects, with each object having its data store.

The Ethereum Virtual Machine utilizes a stack-based architecture, storing the in-memory values on a stack. All operations are performed as defined in the EVM code or the bytecode. It also has several data components such as

● Immutable program code ROM, The ROM is loaded with the byte code of the contract that has to be executed.

● A volatile memory initialized to zero

● Permanent storage, part of the Ethereum state.

Comparing EVM with Existing Technology

The Ethereum Virtual Machine has a relatively limited operational scope. It is similar to the Java Virtual Machine, acting only as a computation engine, providing an abstraction of computation and storage. The Java Virtual Machine enables compatibility across several systems because it provides an agnostic environment of the underlying operating system.

The EVM functions similarly, executing its bytecode instruction set that higher-level smart contract programming languages are compiled into. The execution order is organized externally, which means that the EVM has no scheduling capability. The Ethereum clients determine the order of execution of the smart contracts. The Ethereum Virtual Machine also has no system interface or physical machine as it is entirely virtual.

Understanding How the Ethereum Virtual Machine Functions

The Ethereum Virtual Machine can be defined as a sandboxed virtual stack embedded within an Ethereum node. The primary function of the Ethereum Virtual Machine is to enable developers to create dApps and execute smart contracts. Now, you don’t need prior experience as a developer or have an extensive coding background to use the EVM. It eliminates the need for powerful and expensive hardware, making it perfect for beginners.

Contracts on the Ethereum Virtual Machine are written in languages like Solidity, Vyper and then compiled into the EVM bytecode. This allows the code to be completely isolated from the network. Each node on the network runs an EVM instance, enabling them to agree on the execution of instructions.

The Ethereum Virtual Machine allows developers to create and execute their code in a trustless environment. Each instruction that is implemented on the EVM is kept track of by a system. The system keeps track of the execution cost of the instruction. It then assigns the instruction an associated cost of Gas. To initiate a contract or transaction, the user has to provide enough Ether to pay the Gas cost.

Gas solves two issues for the EVM, Validators receive their pre-paid amount even if the execution fails, and the execution cannot run longer than what the pre-paid amount can allow. For each transaction sent to the network, the validator has to ensure the following

● All information regarding the transaction has to be valid.

● The EVM does not run into any exceptions

● The sender needs to have enough funds to pay for the execution.

Ethereum’s State Transition Function

Ethereum acts as a mathematical function where you give it an input, and it produces an output. Take a look at the function below.

Y(S, T)= S’

The old state S and a set of instructions T means the Ethereum State Function Y(S, T) produces a new output S’.

The State

In the context of Ethereum, the state is a data structure known as a “Modified Merkle Patricia Trie.” What is the role of the Merkle Patricia Trie? It provides a cryptographically authenticated, persistent data structure that maps arbitrary-length binary data. It is a mutable data structure that can map between 256-bit binary fragments and arbitrary-length binary data. When it comes to Ethereum, the sole requirement of the Merkle Tree is to provide a 32-byte value that can be used to identify a given set of key-value pairs.

The Ethereum world state sits at the top level, mapping Ethereum addresses to accounts. Each address represents an account that consists of an ether balance, a nonce, account storage, and the account program code. When a smart contract code is executed, The EVM is presented with the current block and transaction information. The code of the contract account is loaded onto the program code ROM, and the program counter is set to zero. The storage is also loaded from the contract account storage, and the memory is also set to zero.

What Are Transactions

Ethereum uses two types of accounts — Normal (or Externally Owned Accounts) and Contract Accounts. The Normal Accounts can make ETH payments using their private key. The Contract Accounts are responsible for message calls and transactions that result in contract creation.

What Are EVM Instructions

The Ethereum Virtual Machine has a depth of 1,025 items, with each item being a 256-bit word. When the EVM executes a transaction, it maintains a transient memory (word-addressed byte array). This memory does not persist in between transactions. Smart contracts that have been compiled execute as EVM opcodes. These can perform stack operations such as AND, XOR, ADD, SUB, and more. The EVM can also implement specific stack operations such as BALANCE, ADDRESS, BLOCKHASH, etc.

Gas And EVM

Transactions from one account to another contain either Ether or binary data, also known as payload. There is no central authority, and the contracts are executed on Ethereum nodes. This presents the risk of a malicious actor slowing down the network by creating several complex contracts. To protect the network from such attacks, the opcodes have a base gas cost used to pay for executing the transaction. The Gas limits the work required by operations. As the transaction is executed, the Gas starts getting used.

If we take the example of the opcode KECCAK256, we see that it has a base Gas cost of 30, with a dynamic Gas cost of 6 Gas per word. Computationally expensive instructions will charge a higher fee than straightforward instructions. However, Gas can also be refunded if instructions that reduce the state size are executed. If the storage value is set to zero, then 15,000 Gas will be refunded, and if the SELF-DESTRUCT opcode is used, then 24,000 Gas will be refunded.

What Are Opcodes?

Now that we have seen an example let us understand opcodes. Opcodes help the Ethereum Virtual Machine to carry out very specific tasks. The opcodes are what allow the Ethereum Virtual Machine to be Turing complete. As of writing, the EVM can execute a little over 150 opcodes which can be divided into the following. The comprehensive descriptions of the opcodes are available in the Ethereum Yellow Paper.

● Comparison and bitwise logic operation

● Stop and arithmetic operation

● SHA3

● Block information

● Environmental information

● Stack, memory, storage, and flow operations

● Duplication operations

● Push operations

● Logging operations

● Exchange operations

● System operations

Bytecode

Bytecodes are crucial to EVMs as they store the opcodes efficiently. Each opcode is assigned a byte. Let us look at an example to understand bytecodes better.

In the illustration above, the instruction at the beginning is 0x60, which translates to PUSH 1. This tells us that the data is 1 byte long, and we can add the next byte to the stack. Now the stack consists of 1 item. Next, we know that 0x01 is a PUSH instruction; the next instruction needed is another 0x60 with the same data. Now the stack consists of two identical items. The last instruction, 0x01, translates to ADD> This will take both the items from the stack and push the sum of them into the stack. Now the stack contains only one item, 0x02.

How To Decompile Bytecodes

Several projects are trying to create programs that help in the decompiling of a contract. A couple of examples of such projects are eveem.org and ethervm.io. However, even while using these programs, snippets of the original contract are almost always lost due to optimization. However, function names can still be shown through brute force by comparing the signatures to datasets containing function and event names.

Storage

The Ethereum Virtual Machine uses a 256-bit stack, allowing 16 recent items to be accessible simultaneously. The stack can hold a total of 1,024 items. These limitations are severely restrictive, and opcodes end up using the contract memory to pass data. When the contract has been executed, the memory is not saved.

There is only one way to store data permanently: to use “storage” to make it accessible for contract executions. What is Contract Storage? It is a publicly available database where values can be read externally without sending a transaction to the contract. However, writing to storage is expensive compared to writing to memory.

SELF-DESTRUCT

SELF-DESTRUCT is an operation in Solidity that is used for removing codes from the blockchain. However, the codes could remain in the history of the EVM and will definitely remain in the history of the chain. When a SELF-DESTRUCT operation is executed, both code and storage are removed from the state.

Creating A Smart Contract

Smart contracts on Ethereum are written in a programming language called Solidity. As a programming language, Solidity is similar to other languages such as JavaScript and C++. There are other languages used to write smart contracts, such as Vyper and Bamboo. The Ethereum Virtual Machine cannot execute smart contracts written in Solidity directly. Instead, they are assembled into opcodes.

An example of a smart contract written in Solidity

Let’s look at a simple example of how to write a smart contract. Let’s say you want to write a smart contract to purchase a few things with your family. There are several things that you would have to include in the smart contract.

● First, you and your family members will have to create individual accounts.

● Each member will deposit some money from their accounts into a holding account.

● Each member will agree that no one can take out any money from the holding account.

● You can withdraw money from the holding account only if all members provide a digital signature.

What if you want to create a different smart contract that deals with buying groceries for the house? You would have to specify the budget for groceries, the store from where the supplies have to be purchased, and where they have to be delivered. That smart contract would need to specify the following.

● The weekly budget for groceries

● The store or address of the store from where the groceries need to be bought

● The address where you want the groceries delivered

Smart Contract Deployment

Let’s go through the process of deploying a smart contract. When you need to deploy a smart contract, a transaction is created. However, the transaction does not have a “to” address. Bytecode is added as input data, which then acts as the constructor. The constructor is needed to write the initial variables to the storage. Once the contract is deployed, the bytecode runs only once, while the runtime bytecode runs on every contract call.

The bytecode can be split into the following

● Constructor

● Runtime

● Metadata

In the end, a swarm hash is created by Solidity and will be included in the runtime bytecode. Swarm is both a distributed storage platform and a content dispersion service. Swarm will not be seen as an opcode by the Ethereum Virtual Machine as its location is unreachable.

Conclusion

The Ethereum Virtual Machine is the bedrock of Ethereum and its interface, acting as a decentralized computer that contains millions of executable projects. The EVM is central to the execution of functions and the deployment of smart contracts. It ensures that users do not face issues on the distributed ledger by employing several extra functionalities.