Summary

The web content explains the process of program execution within an operating system, detailing the steps a CPU takes to execute instructions, the conversion of high-level code to machine code, and the advantages of 64-bit CPUs over 32-bit CPUs.

Abstract

The article titled "Operating System — The Basic Process of Program Execution" outlines how a computer's CPU executes a program. It begins with the CPU fetching instructions from memory using the program counter and instruction register, then incrementing the program counter based on the CPU's architecture. The CPU deciphers the instruction, executing computational tasks with the arithmetic logic unit (ALU) or handling data storage/retrieval with the control unit. This cycle repeats until the program completes. High-level code, such as "a = 1 + 2", must be compiled into assembly code and then translated into machine code, which consists of binary instructions that the CPU can execute. These instructions are stored in the text segment of memory, while data is stored in the data segment. The CPU uses pipelining to enhance performance by splitting tasks into subtasks, with each instruction going through fetch, decode, execute, and store stages. The article also categorizes instructions into data transfer, arithmetic, jump, signal, and idle types, and discusses instruction execution speed in relation to CPU clock cycles. Finally, it highlights the benefits of 64-bit CPUs, including larger address space, increased data throughput, better performance with 64-bit applications, and more general-purpose registers.

Opinions

The article suggests that the CPU's architecture determines how quickly it can process instructions, with 32-bit CPUs incrementing the program counter by 4 bytes per instruction, implying a direct relationship between architecture and performance.
It implies that the conversion of high-level code to machine code is essential for the CPU to execute programs, emphasizing the importance of compilers and assemblers in the execution process.
The use of pipelining in contemporary CPUs is presented as a key factor in improving CPU efficiency and performance.
The article conveys that a CPU's clock rate, measured in gigahertz, is a significant indicator of its performance capability, with higher clock rates allowing for more operations per second.
It posits that 64-bit CPUs are superior to 32-bit CPUs due to their ability to handle more memory, process more data per cycle, and run 64-bit applications more efficiently, suggesting a strong advantage for systems using 64-bit processors.

Operating System — The Basic Process of Program Execution

A program is actually a series of instructions, so the process of running a program is to execute each instruction step by step, and the CPU is responsible for executing these instructions.

Program Execution Steps

The process of the CPU executing a program is outlined in the follows:

Step one: The CPU looks at the “program counter” to fetch the memory address where the current instruction is stored. The CPU’s “control unit” then uses the “address bus” to pinpoint the required memory location. It signals the memory to get ready to send data, and once the data is prepped, it’s shuttled over to the CPU through the “data bus.” The CPU, upon receiving this data, stores the incoming instruction into the “instruction register.”
Step two: The “program counter” increases its value to point to the next instruction. How much it increases by is determined by the CPU’s architecture — for instance, with a 32-bit CPU, instructions are 4 bytes, therefore the program counter jumps ahead by 4 to align with the next instruction’s memory location.
Step three: The CPU deciphers the instruction that’s currently in the “instruction register,” figuring out what operation is to be performed and what the operands are. If it’s an operation that requires computation, it passes the task to the “arithmetic logic unit” (ALU). If the instruction is about storing or retrieving data, then the “control unit” handles it.

This repetitive process, where the CPU reads an instruction, executes it, and then proceeds to the next, creating a loop until the program is complete, is known as the CPU’s instruction cycle.

The Execution Process of “a = 1 + 2”

The CPU cannot interpret the string “a = 1 + 2” as it is; such strings are simply meant to make the code readable for programmers. In order to get this code to function, it must be converted into an assembly language program. This conversion process is referred to as compilation into assembly code.

Once we have the assembly code, it requires further translation by an assembler into machine code, which is the binary language of 0s and 1s that computers operate with. These strings of machine code are the actual instructions that the CPU can execute.

During the program compilation process, the compiler, by analyzing the code, identifies that 1 and 2 are data. Thus, when the program runs, there will be a specific area in memory allocated to store this data, known as the “data segment”. The positions in the data segment for the data 1 and 2 are as follows:

1 is stored at the 0x200 location
2 is stored at the 0x204 location;

The compiler will translate “a = 1 + 2” into 4 instructions, which are stored in the text segment. As illustrated, these 4 instructions are placed in the region from 0x100 to 0x10c:

The content at 0x100 is a load instruction that takes the data 1 from address 0x200 and loads it into register R0;
The content at 0x104 is another load instruction that takes the data 2 from address 0x204 and loads it into register R1;
The content at 0x108 is an add instruction that adds the data in register R0 to the data in R1, and stores the result in register R2;
The content at 0x10c is a store instruction that takes the data from register R2 and stores it back into the data segment at address 0x208, which is the memory address corresponding to the variable a.

Sample assembly code:

# Assume that 0x200 is the address where the value 1 is stored
# and 0x204 is the address where the value 2 is stored.
# The variable 'a' will be stored at address 0x208.

LOAD R0, 0x200    # Load the value from memory address 0x200 into register R0
LOAD R1, 0x204    # Load the value from memory address 0x204 into register R1
ADD R2, R0, R1    # Add the contents of R0 to R1 and store the result in R2
STORE R2, 0x208   # Store the result from R2 to memory address 0x208 (variable 'a')

CPU Pipelining

During the program compilation phase, compilers create machine-level instructions, a process referred to as instruction encoding. When the CPU runs the program, it reads and interprets these machine-level instructions, a process called instruction decoding.

Most contemporary CPUs execute instructions using a method called pipelining, which involves splitting a task into several subtasks. Consequently, an instruction is commonly divided into four distinct stages, which are collectively termed a four-stage pipeline.

Fetch: The CPU retrieves the instruction from the memory address specified by the program counter. The process of retrieving instructions through the program counter and the instruction register is orchestrated by the controller.
Decode: The CPU interprets the instruction, the decoding of instructions is also carried out by the controller.
Execution: The CPU carries out the instruction. The execution of instructions, whether it involves arithmetic operations, logical operations, data transfers, or conditional branching, is conducted by the arithmetic logic unit, meaning it is processed by the ALU.
Store: The CPU saves the result of the computation into a register or transfers the register’s content back into memory. This is completed directly within the controller.

These four stages mentioned above are referred to as the Instruction Cycle. The operation of a CPU is a continuous loop of these cycles, repeating over and over again.

Instructions Types

Instructions can be classified into five major categories based on their functionality:

Data transfer instructions: Such as store/load, which transfer data between registers and memory, and mov, which moves data from one memory address to another.
Arithmetic instructions: Such as addition, subtraction, multiplication, division, bitwise operations, comparisons, etc. These can process data from at most two registers.
Jump instructions: Which alter the value of the program counter to achieve the process of jumping to execute an instruction, similar to if-else, switch-case, and function calls in programming.
Signal instructions: Like the trap instruction that occurs during an interrupt.
Idle instructions: Such as the nop instruction, after which the CPU idles for one cycle.

Instruction Execution Speed

A CPU’s performance capability is often measured in gigahertz (GHz), for instance, a CPU with a 1 GHz specification. This measurement denotes the clock rate, which corresponds to 1 billion cycles per second. Each cycle is a pulse of electrical signal, fluctuating between high and low levels, and this fluctuation period is called a clock cycle.

Within the span of one clock cycle, a CPU is limited to executing one elementary operation. The greater the clock frequency, the shorter each clock cycle is, which translates to increased operational speed.

Does this mean a CPU can execute one instruction per clock cycle? Not necessarily. The majority of instructions require more than one clock cycle to be executed fully. Different instructions consume different amounts of clock cycles; for example, although both addition and multiplication are single CPU instructions, multiplication typically consumes more clock cycles than addition.

64-bit CPU vs 32-bit CPU

Larger Address Space: A 64-bit CPU can handle a larger address space, which means it can support more than 4GB of RAM directly, while a 32-bit CPU is limited to addressing a maximum of 4GB of RAM. This larger address space allows for more applications to run simultaneously and for applications to work with larger data sets without running into memory constraints.
Increased Data Throughput: With a wider data bus, a 64-bit processor can process more data per clock cycle compared to a 32-bit processor. This means that for certain computational tasks, a 64-bit processor can be significantly faster than its 32-bit counterpart.
Enhanced Performance with 64-bit Applications: Applications that are written specifically for a 64-bit operating system can take advantage of the increased computing power of a 64-bit CPU, performing complex calculations and tasks more efficiently.
More General-Purpose Registers: 64-bit processors typically have more general-purpose registers, which allows for more local variables to be stored without accessing main memory. This can lead to performance improvements for certain types of programs.