This article provides a deep dive into Python's "self" argument, explaining its purpose, how it works under the hood, and its role in the descriptor protocol.
Abstract
Python's "self" argument is a well-known concept in Python programming, used in every method declaration of every class. It refers to the class instance, but its internal workings are not always fully understood. This article explains how Python internally converts "instance.do_stuff("whatever")" to "MyClass.do_stuff(instance, "whatever")". It also delves into the descriptor protocol, which allows customization of attribute lookup in classes, ultimately creating a "bound method" with "self" prepended to its other arguments. The article concludes by discussing the design philosophy behind the explicit "self" method argument.
Bullet points
"self" is a convention in Python, referring to the class instance.
Python internally converts "instance.do_stuff("whatever")" to "MyClass.do_stuff(instance, "whatever")".
Methods are just regular functions defined in a class's namespace, making them an attribute of the class.
The descriptor protocol allows customization of attribute lookup in classes, creating a "bound method" with "self" prepended to its other arguments.
The explicit "self" method argument is a design choice in favor of simplicity, following the "worse is better" design philosophy.
Understanding the descriptor protocol can be practically useful, as it has some use cases beyond the @property descriptor.
What Is Python’s “Self” Argument, Anyway?
A behind-the-scenes look into this well-known argument
Every Python developer is familiar with the self argument, which is present in every* method declaration of every class. We all know how to use it, but do you really know what it is, why it's there, and how it works under the hood?
What We Already Know
Let’s start with what we already know: self — the first argument in methods — refers to the class instance, as shown below:
Also, this argument doesn’t have to be called self — it's just a convention. You could use, for example, this as is common in other languages (but don't).
The above code is probably natural and obvious since you’ve been using since forever, but we’ve given the .do_stuff() only one argument ( some_arg), yet the method declares two ( self and some_arg), which doesn't make sense. The arrows in the snippet show that self got translated into the instance, but how did it really get there?
What Python does internally is a conversion from instance.do_stuff("whatever") to MyClass.do_stuff(instance, "whatever"). We could end it here and call it "Python magic," but if we want to understand what's going on under the hood, we need to understand what Python methods are and how they relate to functions.
Class Attributes/Methods
In Python, there’s no such thing as the “method” object — in reality, methods are just regular functions. The difference between function and method is that methods are defined in a class’s namespace, making them an attribute of said class.
These attributes are stored in the class dictionary, __dict__, which we can access directly or use the vars built-in function. Here’s the code:
The most common way to access them would be the “class method”-way, as shown below:
Here we accessed the function using a class attribute, which, as expected, prints that do_stuff is a function of MyClass. We can, however, access it also using the instance attribute. Here’s what that looks like:
In this case, we get back a “bound method” rather than the raw function. What Python does for us here is that it binds the class attribute to the instance, creating what’s called a “bound method.” This “bound method” is a wrapper around the underlying function that has the instance already inserted as a first argument ( self).
Therefore, methods are plain functions that have a class instance (self) prepended to their other arguments.
We need to look at the descriptor protocol to understand how that happens.
Descriptor Protocol
Descriptors are the mechanism behind methods (among other things). They’re objects (classes) that define __get__(), __set__(), or __delete__() method(s). To understand how self works, we will only consider the __get__(), which has a signature:
But what does the __get__() method actually do? It allows us to customize an attribute lookup in classes — or in other words, what happens when the class attribute is accessed using dot notation. This is very useful, considering that methods are just class attributes. This means that we can use the __get__ method to create a "bound method" of a class.
Let’s demonstrate this by implementing a “method” using a descriptor to make it a little easier to understand. First, we create a pure-Python implementation of a function object:
The above Function class implements __get__ which makes it a descriptor. This dunder method receives class instances in instance argument. If this argument is None, we know that the __get__ method was called directly from a class (e.g., MyClass.do_stuff), so we just return self.
If it was, however, called from class instance such as instance.do_stuff, then we return types.MethodType, which is a way of creating a "bound method" manually.
Additionally, we also provide __call__ dunder method. While __init__ is invoked when a class is called to initialize an instance (e.g. instance = MyClass()), the __call__ is invoked when the instance is called (e.g. instance()). We need this because self in types.MethodType(self, instance) must be callable.
Now that we have our function implementation, we can use it to bind a method to a class, as shown below:
By giving the MyClass an attribute do_stuff of type Function, we roughly simulate what Python does when you define a method in class’ namespace.
To summarize, upon attribute access such as instance.do_stuff, do_stuff is looked up in the attribute dictionary (__dict__) of instance. If do_stuff defines __get__ method, then do_stuff.__get__ is invoked, ultimately calling this code:
Which, as we now know, will return a bound method — a callable wrapper around the original function, which has the self prepended to its arguments!
If you want to explore this further, you can similarly implement static and class methods — examples of how to do that can be found in the docs here.
Why It’s There, Though?
We know how it works, but a more philosophical question stands: “Why does it have to appear in method definitions?”
The explicit self method argument is a controversial design choice, but it’s a choice in favour of simplicity.
Python’s self is the embodiment of the “worse is better” design philosophy — described here. The priority of this design philosophy is “simplicity” defined as:
The design must be simple, both in implementation and interface. It is more important for the implementation to be simple than the interface…
That’s exactly the case with self — a simple implementation, at the expense of interface, where the method signature doesn’t match its invocation.
There are more reasons why we have explicit self or rather, why it has to stay. Some of them are described in the blog post by Guido van Rossum in response to a proposal calling for its removal.
Closing Thoughts
Python abstracts away a lot of complexity, but digging into low-level details and intricacies can be — in my opinion — very valuable for getting a greater understanding of how the language works. This can come in handy when things break, and high-level troubleshooting/debugging isn’t enough.
Additionally, understanding descriptors can be quite practical as they have some use cases. While most of the time, you will only really need the @property descriptor, there are situations where custom ones make sense, such as ones in SLQAlchemy or, e.g., custom validators.