Summary

The provided content discusses the role of Calldata in Ethereum transactions, detailing its structure, usage, and gas cost implications within Solidity smart contracts.

Abstract

Calldata is a crucial component in Ethereum transactions, serving as a data location in Solidity smart contracts alongside storage, memory, and stack. It is the array of bytes in the "Data" field of an Ethereum transaction, used to convey information from sender to receiver. In contract creation transactions, Calldata includes the creation code, runtime code, metadata hash, and input parameters for the constructor. For contract invocation transactions, it specifies the function selector and input parameters layout, with value types encoded in 32-byte words and reference types like arrays and strings encoded with additional metadata indicating their location and length. Calldata is cost-effective, with a gas cost of 4 gas for zero bytes and 16 gas for non-zero bytes, but is limited to read-only usage within external methods due to its immutable nature.

Opinions

The author suggests that using Calldata to store information on the blockchain is not considered good practice, emphasizing its primary function for data transfer.
The text implies that the Ethereum community values the immutability and efficiency of Calldata, as evidenced by its specific encoding rules and lower gas costs compared to other data locations.
The author indicates a preference for clarity and understanding in the receiver's interpretation of the Calldata content, highlighting the importance of standardized encoding practices.
There is an underlying appreciation for the strict rules Solidity enforces when handling Calldata, ensuring consistency and predictability in smart contract interactions.
The mention of storing metadata hashes in Calldata, with the actual metadata stored off-chain (e.g., IPFS, Swarm), reflects a practical approach to balancing on-chain and off-chain data storage for smart contracts.

The nitty-gritty of Ethereum and Solidity : EVM Calldata.

The Calldata is one of the four data locations a solidity smart contract has (the others are : storage, memory and stack). It is the array of bytes that is passed in the “Data” field of an ethereum transaction.

Calldata Definition

An ethereum transaction contains a field called “Data” where senders (EOA or smart contracts) can add as many bytes as they want.

They can leave the field empty or literally add anything they want, the point of this field is to transfer information from the sender to the receiver, which is why the important thing is for the receiver to understand its content.

It is true that sometimes this field can be used simply to store information on the blockchain, since the transaction is immutable once added to a block, but it is not considered good practice…

In simple words, Calldata is the message contained within an ethereum transaction.

Calldata Layout

Solidity follows very strict rules when building and deconstructing the array of bytes in the data field (aka CallData).

Contract Creation Transaction

When an EOA sends a transaction to deploy a new smart contract onto the blockchain (transaction sent to address “0”), the content of the Calldata is going to be the list of opcodes to be executed by the EVM in order to generate the new smart contract’s runtime bytecode.

The Calldata contained in a creation transaction is thus divided in this way:

Creation Code: These are the opcodes that will actually be executed by the EVM. Once this code is executed, the “Runtime Code” is associated to the new smart contract address and stored in the blockchain. The Creation code can also set some initial storage variables or immutable constants by calling at other smart contracts already deployed on the blockchain...
Runtime Code: Opcodes that will be returned at the end of the creation code execution and stored in the blockchain.
Metadata Hash: This code is unreachable by the EVM (the creation and runtime code will never access this part). It is NOT actual bytecode, but just Bytes forming a Hash of the contract’s metadata (compiler version, source code, …). The contrac’ts metadata can be stored anywhere (IPFS, Swarm, …) so that anyone can download it, hash it and check that it corresponds to the Metadata Hash appended here.
Input parameters: Constructor methods might also have input parameters, these can always be found at the very end of the creation transaction calldata.

Contract Invocation Transaction

When an EOA sends a “regular” transaction (any transaction that is not meant to deploy a new contract) to a smart contract, the content of the calldata indicates the method to execute and the input parameters.

Function selector: The first 4 bytes of the calldata field correspond to the “function selector”. The function selector of a given contract’s method is obtained by hashing the function signature ad extracting the first 4 bytes:

function testMethod(uint256 var1, address var2) external {
        /** 
         method signature : 
                 testMethod(uint256,address) -- Method name and input parameters datatypes (names exlucded)
         method selector : 
                 bytes4(keccak256(testMethod(uint256,address))) -- First 4 bytes of the method signature hash 
        */
    }

Input parameters layout: After the function selector come the function input parameters encoded in 32 bytes words format. The encoding is done depending on the data type:

Value types (uint256, address, …) : encoded in a 32 bytes word each. If the data type takes less than 32 bytes, it gets padded (extra 0s added at the beginning).

Reference types (arrays, structs and strings) : Structs are encoded like value types, each element within the struct takes 32 bytes. Arrays and Strings are a different story since their size is not necessarily known at compilation time, the first 32 bytes word indicates the calldata location in which the array/string can be found, then, at the indicated location, the first 32 bytes word indicates the array/string length. The actual array/string value can be found right after the length.

struct MyStruct{
        uint256 structVar1;
        address structVar2;
   }

function testMethod(uint256 var1, address var2, bool[] calldata var3, MyStruct calldata var4) external {
        /** 
            -   Calldata first 4 bytes: method's selector.
            -   SLOT 1 (bytes 5 to 37): var1.
            -   SLOT 2 (bytes 37 to 69): var2.
            -   SLOT 3 (bytes 69 to 101): var3 location => SLOT 6.
            -   SLOT 4 (bytes 101 to 132): structVar1 of var4.
            -   SLOT 5 (bytes 132 to 164): structVar2 of var4.
            -   SLOT 6(bytes 164 to 196): var3 length.
            -   SLOT 7(bytes 196 to 228): var3 item 1.
            -   SLOT 8(bytes 228 to 260): var3 item 2.
            - ...
        */
   }

Gas Cost

Calldata is the cheapest data location, however it is the only “read-only” one too, which means that you can only use it for reference data type arguments (strings, arrays, structs) that are not supposed to be modified.

Calldata can only be used within “external” methods, since any other method (public, internal, private) is reachable from your internal code, meaning that arguments will always be passed by memory or stack.

The Gas cost of the calldata bytes depends on the number of bytes and their values:

4 gas for zero bytes
16 gas for non-zero bytes.

For instance, if your transaction data field contains the following bytes array 0x98a3450007bca40023 the total cost will be :

2 * 4 gas (for the 2 zero bytes) + 7 * 16 gas (for the 7 non-zero bytes) = 120 gas.