The provided content discusses best practices for structuring PyTorch models using Module, Sequential, ModuleList, and ModuleDict.
Abstract
The article "Pytorch: how and when to use Module, Sequential, ModuleList and ModuleDict" by Francesco Saverio Zuppichini offers guidance on organizing PyTorch code for neural network models. It emphasizes the importance of using the Module class as the foundation for all models, demonstrating how to create a convolutional neural network (CNN) classifier as an example. The author then introduces Sequential for stacking layers in a straightforward manner, which simplifies the model definition and enhances code readability. The article further explores the use of ModuleList for iterating over layers, particularly useful in complex architectures like U-net, and ModuleDict for dynamically selecting layers or blocks, such as switching between activation functions. The final implementation showcases a clean and maintainable CNN model using these constructs, with the full code available on GitHub. The article concludes with a summary of when to use each of the four main building blocks in PyTorch to create well-organized and efficient machine learning models.
Opinions
The author, Francesco Saverio Zuppichini, suggests that many people write disorganized PyTorch code, implying a need for better coding practices in the ML community.
Sequential is presented as a cleaner alternative to manually writing each layer in the __init__ and forward methods of a Module.
The author expresses that using ModuleList is beneficial when one needs to store and iterate over layers, which is a common requirement in more complex models.
ModuleDict is highlighted as a useful tool for making models more flexible, allowing for dynamic changes, such as swapping activation functions.
The article advocates for dividing models into submodules (e.g., encoder and decoder) to improve sharing, debugging, and testing of the code.
The author's preference for using classes over functions for model components like MyEncoder and MyDecoder is implied, suggesting a structured approach to model design.
The final implementation is showcased as a best-practice example, suggesting that following these guidelines leads to more maintainable and efficient code.
Pytorch is an open source deep learning framework that provides a smart way to create ML models. Even if the documentation is well made, I still find that most people still are able to write bad and not organized PyTorch code.
Today, we are going to see how to use the three main building blocks of PyTorch: Module, Sequential and ModuleList. We are going to start with an example and iteratively we will make it better.
All these four classes are contained into torch.nn
Module: the main building block
The Module is the main building block, it defines the base class for all neural network and you MUST subclass it.
This is a very simple classifier with an encoding part that uses two layers with 3x3 convs + batchnorm + relu and a decoding part with two linear layers. If you are not new to PyTorch you may have seen this type of coding before, but there are two problems.
If we want to add a layer we have to again write lots of code in the __init__ and in the forward function. Also, if we have some common block that we want to use in another model, e.g. the 3x3 conv + batchnorm + relu, we have to write it again.
Sequential: stack and merge layers
Sequential is a container of Modules that can be stacked together and run at the same time.
You can notice that we have to store into self everything. We can use Sequential to improve our code.
Did you notice that conv_block1 and conv_block2 looks almost the same? We could create a function that reteurns a nn.Sequential to even simplify the code!
self.encoder now holds booth conv_block. We have decoupled logic for our model and make it easier to read and reuse. Our conv_block function can be imported and used in another model.
Dynamic Sequential: create multiple layers at once
What if we can to add a new layers in self.encoder, hardcoded them is not convinient:
Would it be nice if we can define the sizes as an array and automatically create all the layers without writing each one of them? Fortunately we can create an array and pass it to Sequential
Let’s break it down. We created an array self.enc_sizes that holds the sizes of our encoder. Then we create an array conv_blocks by iterating the sizes. Since we have to give booth a in size and an outsize for each layer we ziped the size'array with itself by shifting it by one.
Just to be clear, take a look at the following example:
1 323264
Then, since Sequential does not accept a list, we decompose it by using the * operator.
Tada! Now if we just want to add a size, we can easily add a new number to the list. It is a common practice to make the size a parameter.
We followed the same pattern, we create a new block for the decoding part, linear + sigmoid, and we pass an array with the sizes. We had to add a self.last since we do not want to activate the output
Now, we can even break down our model in two! Encoder + Decoder
Be aware that MyEncoder and MyDecoder could also be functions that returns a nn.Sequential. I prefer to use the first pattern for models and the second for building blocks.
By diving our module into submodules it is easier to share the code, debug it and test it.
ModuleList : when we need to iterate
ModuleList allows you to store Module as a list. It can be useful when you need to iterate through layer and store/use some information, like in U-net.
The main difference between Sequential is that ModuleList have not a forward method so the inner layers are not connected. Assuming we need each output of each layer in the decoder, we can store it by:
What if we want to switch to LearkyRelu in our conv_block? We can use ModuleDict to create a dictionary of Module and dynamically switch Module when we want