In-depth Explanation of Kakarot zkEVM: Starknet's Journey to EVM Compatibility

TL;DR

A virtual machine is a software simulation of a computer system that provides an execution environment for programs. It can simulate various hardware devices and allow programs to run in a controlled and compatible environment. The Ethereum Virtual Machine (EVM) is a stack-based virtual machine used for executing Ethereum smart contracts.
zkEVM is an EVM that integrates zero-knowledge proofs/validity proofs technology. It allows the use of zero-knowledge proofs to verify the execution process of the EVM without requiring all validators to re-execute the EVM. There are various zkEVM products in the market, each with its own methods and designs.
The need for zkEVM arises from the demand for a virtual machine that supports smart contract execution on Layer 2. Additionally, some projects choose to use zkEVM to leverage the extensive user ecosystem of the EVM and design instruction sets that are more friendly to zero-knowledge proofs.
Kakarot is a zkEVM implemented on Starknet using the Cairo language. It simulates the stack, memory, execution, and other aspects of the EVM in the form of Cairo smart contracts. Kakarot faces challenges such as compatibility with the Starknet account system, cost optimization, and stability, as Cairo language is still in the experimental stage.
Warp is a converter that transforms Solidity code into Cairo code, providing compatibility at the high-level language level. On the other hand, Kakarot provides compatibility at the EVM level by implementing EVM opcodes and precompiles.

What is a virtual machine?

To explain what a virtual machine is, we must first explain the execution process of computers under the mainstream von Neumann architecture. Various programs running on computers are usually written in high-level languages and undergo multiple transformations to generate machine-readable machine code for execution. Depending on the method of transformation into machine code, high-level languages can be roughly divided into compiled languages and interpreted languages.

A compiled language refers to the process of converting high-level language code into machine code through a compiler after the code is written, generating executable files. It can be executed multiple times with high efficiency after compilation. The advantages of compiled languages are fast execution speed because the code is converted into machine code during compilation, and the ability to run programs without a compiler, making it easy for users to use without installing additional software. Common compiled languages include C, C++, Go, etc.

In contrast to compiled languages, interpreted languages refer to code that is executed line by line through an interpreter, running directly on the computer, and requiring re-translation each time it is run. The advantages of interpreted languages are high development efficiency, easy code debugging, but relatively slower execution speed. Common interpreted languages include Python, JavaScript, Ruby, etc.

It is important to note that languages do not fundamentally distinguish between compiled and interpreted languages, but there may be some tendencies in the initial design. C/C++ is mostly executed by compilation, but can also be executed by interpretation (Cint, Cling). Many traditionally interpreted languages are now compiled into intermediate code for execution on a virtual machine (Python, Lua).
Knowing the execution process of physical machines, let's now talk about virtual machines.

A virtual machine typically provides a virtual computing environment by simulating different hardware devices. Different virtual machines can simulate different hardware devices, but usually include CPU, memory, hard disk, network interfaces, etc.

Taking the Ethereum Virtual Machine (EVM) as an example, the EVM is a stack-based virtual machine used for executing Ethereum smart contracts. The EVM provides a virtual computing environment by simulating CPU, memory, storage, and stack, among other hardware devices.

Specifically, the EVM is a stack-based virtual machine that uses a stack to store data and execute instructions. The EVM's instruction set includes various opcodes, such as arithmetic operations, logical operations, storage operations, jump operations, etc. These instructions can be executed on the EVM's stack to complete the execution of smart contracts.

The memory and storage simulated by the EVM are devices used to store the state and data of smart contracts. The EVM treats memory and storage as two different areas and can access the state and data of smart contracts by reading from and writing to memory and storage.

The stack simulated by the EVM is used to store the operands and results of instructions. Most of the instructions in the EVM's instruction set are stack-based, meaning they read operands from the stack and push results back to the stack.

In summary, the EVM provides a virtual computing environment by simulating CPU, memory, storage, and stack, allowing the execution of smart contract instructions and storage of smart contract state and data. In practice, the EVM loads the bytecode of smart contracts into memory and executes the logic of smart contracts by executing the instruction set. The EVM effectively replaces the operating system + hardware part in the diagram.

The design process of the EVM is clearly bottom-up, with the simulation of hardware environment (stack, memory) determined first, followed by the design of its own set of assembly instructions (Opcode) and bytecode based on the corresponding environment. Although the assembly instruction set is for human readability, it involves a lot of low-level knowledge and requires high requirements for developers. It can be cumbersome to develop, so a high-level language is needed to shield the obscure and cumbersome low-level calls and provide a better experience for developers. Due to the customized design of the assembly instruction set, it is difficult to directly use traditional high-level languages for the EVM. The Ethereum community has designed two compiled high-level languages, Solidity and Vyper, to address this. Solidity is widely used, while Vyper is an EVM high-level language designed by Vitalik to improve some deficiencies in Solidity. However, Vyper has not gained high adoption in the community and has gradually faded out of the stage of history.

What is zkEVM?

Simply put, zkEVM is an EVM that uses zero-knowledge proofs/validity proofs technology to efficiently and cost-effectively verify the execution process of the EVM without requiring all validators to re-execute the EVM.

There are many zkEVM products in the market, and the competition is fierce. The main players include Starknet, zkSync, Scroll, Taiko, Linea, Polygon zkEVM (formerly Polygon Hermez), etc., which Vitalik has classified into 5 types (1, 2, 2.5, 3, 4). For specific details, please refer to Vitalik's blog.

Why do we need zkEVM?

This question needs to be viewed from two perspectives.

In the initial attempts of zk Rollup, only simple transfer and transaction functions could be implemented, such as zkSync Lite, Loopring, etc. However, people have become accustomed to the Turing-complete EVM on Ethereum and started to call for a virtual machine on Layer 2 that can create diverse applications through programming. This is the demand for writing smart contracts.

Due to the unfriendliness of some EVM designs to generating zero-knowledge proofs/validity proofs, some players choose to use instruction sets that are friendly to zero-knowledge proofs/validity proofs at the lower level, such as Starknet's Cairo Assembly and zkSync's Zinc Instruction. However, everyone is unwilling to give up the extensive user ecosystem of the EVM, so they choose to maintain compatibility with the EVM at the higher level, which is Type 3 and Type 4 zkEVM. Some players still insist on using the traditional instruction set Opcode of the EVM and focus on generating more efficient proofs for Opcode, which is Type 1 and Type 2 zkEVM. The extensive ecosystem of the EVM is the second reason.

Kakarot: A virtual machine on a virtual machine?

Why can we create another virtual machine on a virtual machine? This is a common occurrence for computer professionals, but it may not be so obvious to users who are not familiar with computers. It is actually quite easy to understand. It's like building with building blocks. As long as the lower layers are solid enough (having a Turing-complete execution environment), you can add blocks infinitely. However, no matter how many layers are stacked, the final execution still needs to be handed over to the physical hardware at the lowest layer, so increasing the number of layers will reduce efficiency. At the same time, as the design of different blocks (virtual machines) varies, the higher the stack, the greater the possibility of collapse (runtime errors), which requires higher technical expertise to support.

Kakarot is an EVM implemented on Starknet using the Cairo language. It simulates the stack, memory, execution, and other aspects of the EVM in the form of Cairo smart contracts. Compared to implementing the EVM, which is not a difficult task, there are existing EVM implementations written in Go-Ethereum, which is the most widely used. Other implementations include Python, Java, JavaScript, Rust, etc.

The technical challenges of Kakarot zkEVM lie in the fact that the protocol exists as a contract on Starknet, which brings two key issues.

Compatibility: Starknet uses a completely different account system from Ethereum. In Ethereum, accounts are divided into EOA (Externally Owned Account) and CA (Contract Account), while Starknet supports native account abstraction, where all accounts are contract accounts. Additionally, due to the use of different cryptographic algorithms, users cannot generate the same addresses in Starknet using the same entropy as in Ethereum.
Cost: Since Kakarot zkEVM exists as a contract on the chain, there are high requirements for code implementation, and optimization based on Gas is necessary to reduce interaction costs.
Stability: Unlike traditional high-level languages such as Golang, Rust, and Python, Cairo language is still in the experimental stage. From Cairo 0 to Cairo 1 to Cairo 2 (or Cairo 1 version 2, if you prefer), the official team is still making modifications to the language features. At the same time, the Cairo VM has not undergone sufficient testing, and the possibility of large-scale rewriting cannot be ruled out in the future.
The Kakarot protocol consists of five main components (the GitHub documentation mentions four, excluding EOA, but this article has been adjusted for better understanding):

Kakarot (Core): Responsible for executing Ethereum-style transactions and providing corresponding Starknet accounts for Ethereum users.
Contract Accounts: Equivalent to CA in Ethereum, responsible for storing bytecode and variable states of contracts.
Externally Owned Accounts: Equivalent to EOA in Ethereum, responsible for forwarding Ethereum transactions to Kakarot Core.
Account Registry: Stores the mapping between Ethereum accounts and Starknet accounts.
Blockhash Registry: As a special opcode, Blockhash requires past block data, which Kakarot cannot directly obtain on-chain. This component stores the mapping between block_number and block_hash and is written by administrators to provide to Kakarot Core.

According to feedback from Kakarot CEO Elias Tazartes, in the latest version of the team, the design of the Account Registry has been abandoned, and a mapping from a 31-byte Starknet address to a 20-byte EVM address is used to store the corresponding relationship. In the future, to improve interoperability and allow Starknet contracts to register their own EVM addresses, the design of the Account Registry may be reconsidered.

Compatibility with EVM on Starknet: What are the differences between Warp and Kakarot?

In terms of the zkEVM types defined by Vitalik, Warp belongs to Type 4, while Kakarot currently belongs to Type 2.5.

Warp is a transpiler that transforms Solidity code into Cairo code. It allows Solidity developers to maintain their original development state without having to learn a new language like Cairo. For many projects, Warp lowers the entry barrier to the Starknet ecosystem, as there is no need to rewrite a large amount of engineering code using Cairo.

The transpilation idea is simple, but the compatibility is the worst. Some Solidity code cannot be translated well into Cairo, and modifications to the source code are required to complete the migration when it involves account systems, cryptographic algorithms, and other code logic. The specific unsupported features can be seen in the Warp documentation. For example, many projects differentiate between EOA accounts and contract accounts in terms of execution logic, but all accounts in Starknet are contract accounts, so this part of the code needs to be modified before transpilation.

Warp provides compatibility at the high-level language level, while Kakarot provides compatibility at the EVM level.

The complete rewriting of the EVM, implementation of opcodes and precompiles, gives Kakarot higher native compatibility. After all, executing in the same virtual machine (EVM) is always more compatible than executing in different virtual machines (Cairo VM). The Account Registry and Blockhash Registry cleverly shield the differences between different systems, minimizing the friction of user migration.

Kakarot Team

Thanks to the Kakarot team, especially Elias Tazartes, for their valuable feedback on this article. Thank you, sir!