Summary

The provided content explains the basics of Solidity bytecode and opcodes within the context of the Ethereum Virtual Machine (EVM), detailing how Solidity code is compiled into bytecode that the EVM can execute.

Abstract

The article delves into the foundational aspects of Solidity bytecode and opcodes, which are crucial for understanding how smart contracts operate within the Ethereum ecosystem. It outlines the process of compiling Solidity code into bytecode, which is a hexadecimal representation that the EVM can interpret. The author notes the scarcity of educational resources on EVM opcodes and emphasizes the importance of this knowledge for developers who wish to engage with the EVM at a deeper level. The article also touches on the EVM's nature as a stack machine, the significance of hexadecimal values in EVM interactions, and the different storage options available in the EVM, including stack, memory, and storage, each with varying gas costs. Additionally, the concept of assembly language in Solidity is introduced as a means to optimize gas usage and perform operations not directly supported by Solidity.

Opinions

The author expresses surprise at the limited availability of resources on EVM opcodes, suggesting that the subject might be considered too technical for some.
There is a suggestion that while understanding opcodes is not necessary for beginning smart contract development, it can be beneficial for debugging and optimizing gas usage.
The author implies that learning about bytecode and opcodes can be advantageous despite the additional complexity, as it provides a deeper insight into the functioning of smart contracts and the EVM.
The article conveys that the EVM operates as a stack machine, which processes instructions in a Last In, First Out (LIFO) manner, a concept that may be unfamiliar to developers accustomed to other programming paradigms.
The author seems to advocate for the utility of Solidity's assembly language for developers looking to minimize gas costs and implement functionalities beyond Solidity's capabilities.

Solidity Bytecode and Opcode Basics

As we go deeper into writing smart contracts, we will come across terminologies like “PUSH1”, “SSTORE”, “CALLVALUE” …etc. What are they and should we even care about them?

To know these commands, we have to go deeper into the Ethereum Virtual Machine (EVM). I was surprised there were very few resources on this subject when I googled around. Perhaps they were too technical? In this article, I’ll try to explain some EVM basics as simple as I can.

Like many other popular programming languages, Solidity is a high level programming language. We understand it but the machine doesn’t. When we install an ethereum client such as geth, it also comes with the Ethereum Virtual Machine, a lightweight operating system that is specially created to run smart contracts.

When we compile the solidity code using the solc compiler, it will translate our code into bytecode, something only the EVM can understand.

Let us take a very simple contract for example:

pragma solidity ^0.4.11;

contract MyContract {
    uint i = (10 + 2) * 2;
}

If we run this code in the remix browser and click on the contract details, we see lots of information.

In this case, the compiled code is:

60606040525b600080fd00a165627a7a7230582012c9bd00152fa1c480f6827f81515bb19c3e63bf7ed9ffbb5fda0265983ac7980029

These long values are hexadecimal representation of the final contract, also known as bytecode. Under the “Web3 Deploy” section of the remix browser, we see:

...
   {
     from: web3.eth.accounts[0], 
     data: '0x606060405260186000553415601357600080fd5b5b60368060216000396000f30060606040525b600080fd00a165627a7a7230582012c9bd00152fa1c480f6827f81515bb19c3e63bf7ed9ffbb5fda0265983ac7980029', 
     gas: '4300000'
   }, function (e, contract){
    console.log(e, contract);
    if (typeof contract.address !== 'undefined') {
         console.log('Contract mined! address: ' + contract.address + ' transactionHash: ' + contract.transactionHash);
    }
 })

In simple terms, it means that when we deploy the contract, we simply deploy the hexadecimals under the data field with the recommended gas of 4300000.

We have to start thinking hexadecimal if we want to talk to the EVM. Ever wonder why there is a “0x” in front of your wallet or transaction address? That’s right, anything beginning with “0x” simply means the value is in hexadecimal format. Having “0x” in front of a hexadecimal is not compulsory because the EVM will treat any value as hexadecimal irregardless.

We also see the operation code (aka opcode):

PUSH1 0x60 PUSH1 0x40 MSTORE PUSH1 0x18 PUSH1 0x0 SSTORE CALLVALUE ISZERO PUSH1 0x13 JUMPI PUSH1 0x0 DUP1 REVERT JUMPDEST JUMPDEST PUSH1 0x36 DUP1 PUSH1 0x21 PUSH1 0x0 CODECOPY PUSH1 0x0 RETURN STOP PUSH1 0x60 PUSH1 0x40 MSTORE JUMPDEST PUSH1 0x0 DUP1 REVERT STOP LOG1 PUSH6 0x627A7A723058 KECCAK256 SLT 0xc9 0xbd STOP ISZERO 0x2f LOG1 0xc4 DUP1 0xf6 DUP3 PUSH32 0x81515BB19C3E63BF7ED9FFBB5FDA0265983AC798002900000000000000000000

Opcodes are the low level human readable instructions of the program. All opcodes have their hexadecimal counterparts, eg “MSTORE” is “0x52”, SSTORE” is “0x55" …etc. Pyethereum github repo and the older Ethereum yellow paper have some good reference for all the solidity opcodes and their hexadecimal values.

The EVM is also a Stack Machine. To explain simply, imagine stacking up slices of bread in a microwave, the LAST slice you put in is the FIRST one you take out. In computer science jargon, we call this LIFO.

In normal arithmetic, we write our equation this way

// Answer is 14. we do multiplication before addition.
10 + 2 * 2

In a stack machine, it works in LIFO principle

2 2 * 10 +

It means, put “2” in the stack first, followed by another “2”, then followed by multiplication action. The result is “4” sitting on top of the stack. now add a number “10” on top of “4” and eventually add the 2 numbers together. The final value of the stack becomes 14. This type of arithmetic is called Postfix Notation or Reverse Polish Notation.

The act of putting data in the stack is called the “PUSH” instruction and the act of removing data from the stack is called the “POP” instruction. Its obvious that the most common opcode we see in our example above is “PUSH1" which means putting 1 byte of data into the stack.

So, this instruction:

PUSH1 0x60

means putting a 1 byte value of “0x60” in the stack. Coincidentally, the hexadecimal value for “PUSH1” happens to be “0x60” as well. Removing the non-compulsory “0x”, we could write this logic in bytecode as “6060".

Let us go abit further.

PUSH1 0x60 PUSH1 0x40 MSTORE

Looking at our favourite pyethereum opcode chart again, we see that MSTORE (0x52) takes in 2 inputs and produces no output. The opcodes above mean:

PUSH1 (0x60): put 0x60 in the stack.
PUSH1 (0x40): put 0x40 in the stack.
MSTORE (0x52): allocate 0x60 of memory space and move to the 0x40 position.

The resulting bytecode is:

6060604052

In fact, we always see this magic number “6060604052” in the beginning of any solidity bytecode because its how the smart contract bootstrap.

To further complicate the matter, 0x40 or 0x60 cannot be interpreted as the real number 40 or 60. Since they are hexadecimal, 40 actually equates to 64 (16¹ x 4) and 60 equates to 96 (16¹ x 6) in decimal.

In short, what “PUSH1 0x60 PUSH1 0x40 MSTORE” is doing is allocating 96 bytes of memory and moving the pointer to the beginning of the 64th byte. We now have 64 bytes for scratch space and 32 bytes for temporary memory storage.

In the EVM, there are 3 places to store data. Firstly, in the stack. We’ve just used the “PUSH” opcode to store data there as per the example above. Secondly in the memory (RAM) where we use the “MSTORE” opcode and lastly, in the disk storage where we use “SSTORE” to store the data. The gas required to store data to storage is the most expensive and storing data to stack is the cheapest.

Assembly Language

It is also possible to write the whole smart contract using opcodes. That’s where the Solidity Assembly Language comes in. It might be a lot harder to understand but could be useful if you want to save gas and do things that cannot be done by solidity.

Summary

We have only covered the basics of bytecode and a few opcodes. There are so many opcodes not yet discussed but you get the idea. Back the original question of whether we should even bother learning solidity opcodes — possibly yes and no.

We don’t need to know opcodes to start writing smart contracts and it adds to the learning curve. On the other hand, the EVM error handling is still very primitive at the time of writing and its handy to look at opcodes when things go wrong. At the end of the day, there is no harm learning more.