The provided web content explains PHP's garbage collection mechanism, detailing how PHP manages memory through reference counting and the challenges of circular references.
Abstract
The article "Easy to Understand PHP Garbage Collection Mechanism" delves into the intricacies of memory management in PHP, particularly focusing on the zval variable container system. It explains that each PHP variable is associated with a zval, which includes metadata such as is_ref and refcount to manage references and optimize memory usage. The process of increasing and decreasing reference counts is described, along with the specifics of how composite types like arrays and objects are handled. The article highlights a significant issue: memory leaks due to circular references, which PHP 5.3.0 addresses with a garbage collection algorithm that identifies and frees up unreferenced memory. The performance implications of this mechanism are discussed, noting that while there may be some additional time consumption for garbage collection in long-running scripts, the overall benefit is a reduction in memory footprint, allowing for more concurrent processes on a server.
Opinions
The author suggests that the garbage collection mechanism in PHP is crucial for preventing memory leaks, especially in long-running scripts or daemon processes.
It is implied that the introduction of the garbage collection algorithm in PHP 5.3.0 was a significant improvement in handling circular references that previous versions could not resolve.
The article conveys that while garbage collection may introduce some overhead, the trade-off is acceptable given the memory savings and the subsequent ability to run more scripts simultaneously.
The author seems to value the efficiency and optimization of PHP scripts, emphasizing the importance of understanding PHP's memory management to improve overall performance.
There is an appreciation expressed for the complexity of garbage collection and the need for developers to be aware of how PHP manages memory under the hood to write better-performing code.
Easy to Understand PHP Garbage Collection Mechanism
Each PHP variable is stored in a variable container called zval.
A zval variable container that, in addition to the type and value of the variable, contains two bytes of extra information.
The first is is_ref, which is a bool value that identifies whether the variable belongs to the reference collection. Through this byte, the PHP engine can distinguish normal variables from reference variables. Since PHP allows users to use custom references by using &, there is also an internal reference counting mechanism in the zval variable container to optimize memory usage.
The second extra byte is refcount, which is the number of variables that point to this zval variable container.
All symbols are stored in a symbol table, where each symbol has a scope, those main scripts (eg: scripts requested by the browser), and each function or method also have a scope.
Generate zval container
When a variable is assigned a constant value, a zval variable container is generated.
If Xdebug is installed, both values can be viewed via xdebug_debug_zval().
Increase the reference count of zval
Assigning one variable to another will increase the number of references.
Reduce the zval reference count
Use unset() to reduce the number of references.
The variable container containing the type and value is removed from memory.
Composite type zval container
Different from a value of type scalar.
Variables of type array and object store their members or properties in their own symbol table.
This means that the following example will generate three zval variable containers.
The three zval variable containers are: a meaning and number
Increase the reference count of composite types
Add an existing element to the array
Reduce the reference count of composite types
Delete an element from an array.
It is similar to removing a variable from the scope.
After deletion, the “refcount” value of the container in which this element in the array is located is decremented.
Special circumstances
Things get interesting when we add an array itself as an element of this array.
As above, calling unset on a variable will delete the symbol, and the number of references in the variable container it points to is also reduced by 1.
The problem of cleaning up variable containers
Although there is no longer any symbol in a scope pointing to this structure (that is, the variable container), since the array element “1” still points to the array itself, the container cannot be cleared.
Since no other symbols are pointing to it, there is no way for the user to clear the structure, resulting in a memory leak.
Fortunately, PHP will clear this data structure at the end of the script execution, but before PHP clears it will consume a lot of memory.
It’s okay if the above happens only once or twice, but if there are thousands or even hundreds of thousands of memory leaks, it’s a big problem.
Recycling cycle
Like the reference counting memory mechanism used in the previous PHP, it cannot handle circular reference memory leaks.
In PHP 5.3.0, a synchronization algorithm is used to deal with this memory leak problem.
If a reference count is incremented, it will continue to be used and of course not be in the garbage anymore.
If the reference count is reduced to zero, the variable container will be cleared (free).
That is, a garbage cycle occurs only when the reference count is reduced to a non-zero value.
During a garbage cycle, find out which parts are garbage by checking if the reference count is decremented by 1, and by checking which variable containers have zero references.
Analysis of recovery algorithm
To avoid having to check all garbage cycles where reference counts may decrease.
This algorithm puts all possible roots (possible roots are zval variable containers) in the root buffer (marked with purple, called suspected garbage) so that every possible garbage root (possible garbage) can be guaranteed at the same time. garbage root) appears only once in the buffer. All the different variable containers inside the buffer are garbage collected only when the root buffer is full. See step A in the figure above.
In step B, the simulation removes each purple variable. When simulating deletion, the reference count of a common variable that is not purple may be decremented by “1”. If the reference count of a common variable becomes 0, the common variable will be simulated and deleted again. Each variable can only be deleted once by the simulation, and it is marked as gray after the simulation is deleted.
In step C, the simulation recovers each purple variable. Restoration is conditional, and only when the reference count of the variable is greater than 0 is the simulated restoration of it. Similarly, each variable can only be recovered once and is marked as black after recovery, which is the inverse operation of step B. In this way, the remaining bunch of unrecoverable blue nodes is the blue nodes that should be deleted. After traversing in step D, they are deleted.
Performance Considerations
Two main areas have an impact on performance:
The first is the savings in memory footprint.
Another is the increased time spent by the garbage collection mechanism to free leaked memory.
Finally
The garbage collection mechanism in PHP will only increase the time consumption when the circular collection algorithm does run. But there should be no performance impact in normal (smaller) scripts.
However, in the case of normal scripts running with a recycling mechanism, the memory savings will allow more of these scripts to run concurrently on your server. Because the total memory used has not reached the upper limit.
This benefit is especially noticeable in long-running scripts, such as long test suites or daemon scripts.
Thanks for reading. I am looking forward to your following and reading more high-quality articles.