Python Up Your Code: Reference cycles
The why, the how and the garbage collector

Greetings! For today, I picked a more “hairy” topic to share with you. It revolves around a phenomenon called reference cycle. The focus of this article’s going to be around how these reference cycles get created in the first place, what issues might occur memory-wise and, of course, a few of the approaches that we can take to either ensure these don’t take place or get eliminated altogether.
But, before we can further discuss the reference cycle, we need to first (re)visit another very important element that will prove instrumental to the point of this article: the garbage collector. Garbage collection, simply put, is a form of automatic memory management, that attempts to automatically reclaim allocated memory that’s no longer referenced.
A bit of context — the road to the garbage collector
Prior to the creation of automatic memory management mechanisms, programmers needed to deal with manual memory management. It seems to have been a very tedious practice back then, tedious enough that some automatic memory management mechanisms — namely, the famous garbage collector — surfaced as early as 1959, when John McCarthy, an American computer scientist invented it to simplify manual memory management in Lisp.
Typically, the values of our program’s objects are being stored in memory. Before the garbage collector came into play, people had to manually allocate some memory for their variables and, most importantly, they had to also deallocate said memory once they were done with those variables, making it free to be then used by others. In practice, two big issues were plaguing the world of software:
- It was (and still is, depending on the language being used) easy to simply forget to free the memory, leading to the all-too-familiar memory leaks;
- On the other hand, freeing the memory too soon may also cause serious issues, when your code tries to access a variable that was defined in a memory zone that has been freed in the meantime; that variable is now known as a dangling pointer.
Automatic memory management aimed (and succeeded) in solving these shortcomings by effectively relieving programmers from having to manually manage the memory. They also experienced an increase in development speed, as they no longer needed to think about the low-level memory details.
One of the most popular forms of automatic memory management relies on what’s known as reference counting. What this means is the runtime is keeping track of every reference of every object. Each object has a reference count, which, when it reaches 0, the object is marked as unusable / obsolete so it can be safely deleted.
The garbage collector in Python
CPython — currently the most popular Python implementation out there — ensures Garbage is identified and collected via the reference counting algorithm. Every time we create a new object in Python, the underlying C object has not only a Python type (list
, dict
, etc.), but a reference count also.
This reference count is being incremented and decremented whenever the object is referenced and dereferenced, respectively. As I was mentioning a bit earlier in the article, when this count reaches 0, the memory for this object will be deallocated.
We can actually see these reference counts in Python. In our extremely simple example, we’re just going to create a list object and assign it to a variable. Using the sys
module, we can check the reference count for our variable:
import sys
# assign a list object to a variable (ref count increases to 1)
my_list = [1, 2, 3]
# add the object to a data structure (ref count increases to 2)
my_dict = {"key": my_list}
# print out the reference count
print(sys.getrefcount(my_list))
Output:
3
And the output is a staggering… 3
. If you’re as surprised as I was when I first learned about this, allow me to explain: the reference count for my_list
increases to 1 when we’re creating the variable, assigning it the list object and it further increases to 2 when we’re adding the list object to a dictionary. But it also goes up by 1 when we’re passing it as an argument to the sys.getrefcount()
method of the sys
module. As the description for this method states:
The count returned is generally one higher than you might expect, because it includes the (temporary) reference as an argument to getrefcount().
Of course, we could also be counting the references using the ctypes
module. The differences are that we wouldn’t be passing the variable, but the id
of the variable as argument and the result wouldn’t be incremented by 1:
import ctypes
# assign a list object to a variable (ref count increases to 1)
my_list = [1, 2, 3]
# add the object to a data structure (ref count increases to 2)
my_dict = {"key": my_list}
# print out the reference count
print(ctypes.c_long.from_address(id(my_list)))
Output:
c_long(2)
Summing up, there are multiple ways the reference count increases in Python:
- by assigning an object to a variable (
my_list = [1, 2, 3]
); - by passing the object as an argument to a function / method (
print(sys.getrefcount(my_list))
); - by adding the object to a data structure (
my_dict = {“key”: my_list}
).
Of course, when we’d delete variables, like del my_list
, the reference count for my_list
would obviously decrease by 1.
Now, since we’re finally getting very close to the topic of the article, I propose to leave the rest of the theoretical and practical aspects of the garbage collector (but here’s the official docs page on it, if you want more on that subject).
Moving on, since we’ve learned how every object has a reference count associated with it and how that reaching 0 triggers the garbage collector to free the memory allocated for it, here’s a simplified case of a reference cycle:
import sys
# define a list object
my_list = [1, 2, 3]
# define a dictionary object referencing the list as value
my_dict = {"key": my_list}
# now append the dictionary to the list
my_list.append(my_dict)
# now the cycle is complete. Each container references
# the other in a reference cycle
print(my_list)
print(my_dict)
# let's count the references for each container:
print(sys.getrefcount(my_list))
print(sys.getrefcount(my_dict))
Output:
[1, 2, 3, {'key': [...]}]
{'key': [1, 2, 3, {...}]}
3
3
What we have here is a typical case of a reference cycle. One list object contains a dictionary element which in turn references said list. The dictionary references the list as one of its values, while the list references the dictionary as one of its elements.
Notice the list values? Python printed it out as [1, 2, 3, {‘key’: […]}]
. The 3 dots notation is here used to avoid recursion, because the last list element is the dictionary, which references the list, whose last element is the dictionary… you get the idea.
Same goes for the dictionary values: {‘key’: [1, 2, 3, {…}]}
. The single element of the dictionary features the key
key and my_list
as value. But my_list
contains said dictionary object, which features my_list
as one of its values, which contains the dictionary object… again, recursion.
The references count are 3 for each of the two objects:
- increased to 1 the moment we defined each of them;
- further increased to 2 when we started altering them both so they’d reference each other;
- finally increased to 3 when we used them as arguments to the
sys.getrefcount()
method.
The question is now the following: how will the garbage collector deal with this thing?
The garbage collector in Python has 2 components: the reference counting algorithm (the one we’ve been showcasing just now) and another component, a generational garbage collector, which is built and used for just this kind of situations.
Let’s consider an even simpler case, where a list references itself:
import ctypes
# create the reference cycle
my_list = [1, 2, 3]
my_list.append(my_list)
# get the address for the object
my_list_id = id(my_list)
# count the references of the object
print(ctypes.c_long.from_address(my_list_id))
# delete the variable 'my_list'
del my_list
# count the references of the object again
print(ctypes.c_long.from_address(my_list_id))
Output:
c_long(2)
c_long(1)
What happened here? Well, as soon as we created the list, the reference count went up from 0 to 1. Then, as soon as we appended the list to itself, the reference count increased to 2. We’ve deleted the variable my_list
, so, naturally, one less reference to the list object in memory. As such, the reference count decreased from 2 to 1. But it never went down to 0, so it was never collected by the reference counting collector.
Enter the generational garbage collector (gc
module):
import ctypes
import gc
# create the reference cycle
my_list = [1, 2, 3]
my_list.append(my_list)
# get the address for the object
my_list_id = id(my_list)
# count the references of the object
print(ctypes.c_long.from_address(my_list_id))
# delete the variable 'my_list'
del my_list
# count the references of the object again
print(ctypes.c_long.from_address(my_list_id))
# run a manual collection
print(f"Collected {gc.collect()} object(s)")
# count the references of the object again
print(ctypes.c_long.from_address(my_list_id))
Output:
c_long(2)
c_long(1)
Collected 1 object(s)
c_long(0)
Following the creation of the list object and that of the reference cycle, the reference count was naturally increased to 2. As soon as we deleted the variable my_list
, the reference count dropped to 1. Upon running a manual collection via gc.collect()
, the reference count finally reached 0.
There are a number of things to consider here. First of all, the algorithm of detecting and, more importantly, destroying the reference cycles is pretty expensive in terms of computational effort, so automatic garbage collection has to be scheduled. To that end, a threshold is set. We can get this threshold using gc.get_threshold()
:
import gc
print(gc.get_threshold())
Output:
(700, 10, 10)
Python’s garbage collector features 3 generations of objects. All objects start their life in the first generation. If Python executes a garbage collection process on a generation and the object survives it, it is moved up to the next (also called older) generation.
The 700, 10, 10
values are the three thresholds for the garbage collector’s generations. These thresholds represent the number of objects in memory that trigger garbage collections. If that number of objects exceeds the threshold, the collector starts the garbage collection on that generation.
The example we just saw featured an automated generational garbage collector, but one that never ran, because we never had 700 objects in memory. This is why we had to manually run gc.collect()
.
Luckily for us, we get the chance to see it in action, as this threshold can also be altered:
import ctypes
import gc
# alter the threshold for gc
gc.set_threshold(1, 1, 1)
# now disable the gc
gc.disable()
# create the reference cycle
my_list = [1, 2, 3]
my_list.append(my_list)
# get the address for the object
my_list_id = id(my_list)
# count the references of the object
print(ctypes.c_long.from_address(my_list_id))
# delete the variable 'my_list'
del my_list
# count the references of the object again
print(ctypes.c_long.from_address(my_list_id))
# enable the gc
gc.enable()
# count the references of the object again
print(ctypes.c_long.from_address(my_list_id))
Output:
c_long(2)
c_long(1)
c_long(0)
Please bear in mind that the gc
module only refers to the generational garbage collector. It is only used to break reference cycles and deallocate the involved variables. The gc
module does not refer to the reference count algorithm which we have also presented.
It’s been a pretty long article, but its complexity demanded that some notions be revisited and fresh in our memory before we could attempt to understand what the reference cycle’s all about.
Having wrapped yet another (hopefully) useful piece of knowledge sharing article, I wish you nothing but the best, stay safe and, as always, happy coding! Until next time!
Deck is a software engineer, mentor, writer, and sometimes even a teacher. With 12+ years of experience in software engineering, he is now a real advocate of the Python programming language while his passion is helping people sharpen their Python — and programming in general — skills. You can reach Deck on Linkedin, Facebook, Twitter, and Discord: Deck451#6188, as well as follow his writing here on Medium.