Free AI web copilot to create summaries, insights and extended knowledge, download it at here
2746
Abstract
n library. This library contains a lot of DNN models. Since the task is a 10 class classification task but the model from the library is based on the ImageNet dataset which is a 1000 class classification task I will be changing the very last fully connected layer to replace the 1000 output nodes to 10.</p><div id="8d8c"><pre>model = torchvision.models.resnet18().to(<span class="hljs-string">'cuda'</span>)
model.fc = torch.nn.Linear(in_features=<span class="hljs-number">512</span>,
out_features=<span class="hljs-number">10</span>, <span class="hljs-comment"># same number of output units as our number of classes</span>
bias=<span class="hljs-literal">True</span>).to(<span class="hljs-string">'cuda'</span>)</pre></div><p id="a557">Here, I changed the number of output nodes to 10 based on the layer name. Most of the tutorials or blogs doesn’t say how you get the name of the layer. For different models it is often hard to find. Also what is the number of in_features is another concern. You can use summary from torchinfo to get these information.</p><div id="c45c"><pre>summary(model=model,
input_size=(<span class="hljs-number">32</span>, <span class="hljs-number">3</span>, <span class="hljs-number">32</span>, <span class="hljs-number">32</span>),
col_names=[<span class="hljs-string">"input_size"</span>, <span class="hljs-string">"output_size"</span>, <span class="hljs-string">"num_params"</span>, <span class="hljs-string">"trainable"</span>],
col_width=<span class="hljs-number">20</span>,
row_settings=[<span class="hljs-string">"var_names"</span>]
)</pre></div><p id="47b0">You need to pass the model and the input size of the dataset that is going to be passed to the model in the summary. This will help generate a random dataset and show you the name of the layers, number of parameters and whether the layers are trainable or not. <i>The image below is for a VGG model not the ResNet-18 model. It was huge for ResNet-18 when printed out like this, so for reference only I am showing you an example of the VGG model.</i></p><figure id="f175"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*5lr_YZ5eSAGWcbe0R6lIrg.png"><figcaption>Visualizing the model using summary</figcaption></figure><p id="0d6f">Now that we have our model we just train the model using the CIFAR-10 data. Detailed implementation can be found <a href="https://github.com/aminul-huq/medium">here</a>. After training for 10 epochs we will find out that the testing accuracy is 45.80%. We can visualize how the model learned in each epoch using the loss vs epoch and accuracy vs epoch curve shown below.</p><figure id="4f44"><img src="https://cdn-images-1.readmediu
Options
m.com/v2/resize:fit:800/1*A4j7TEBm4nCS468NxnHTMg.png"><figcaption>Without pretraining.</figcaption></figure><p id="1995">Let’s now experiment with pre-training and freezing some layers. Now I will be changing the model initialization slightly. I will be using the pre-training weights of the ImageNet dataset which is available in pytorch and in order to use it we just need to set ‘pretrained=True’. However, there are other ways to do the same thing.</p><div id="3adc"><pre>model = torchvision.models.resnet18(pretrained=<span class="hljs-literal">True</span>).to(<span class="hljs-string">'cuda'</span>)
model.fc = torch.nn.Linear(in_features=<span class="hljs-number">512</span>,
out_features=<span class="hljs-number">10</span>, bias=<span class="hljs-literal">True</span>).to(<span class="hljs-string">'cuda'</span>)</pre></div><div id="3b60"><pre><span class="hljs-keyword">for</span> name, param <span class="hljs-keyword">in</span> model.named_parameters():
<span class="hljs-keyword">if</span> name[<span class="hljs-number">5</span>] < <span class="hljs-string">'2'</span>:
param.requires_grad = <span class="hljs-literal">False</span></pre></div><p id="4d2a">In the code snippet above, based on the the name of each of the layers I am actually freezing several initial layers by setting the ‘requires_grad=False’. What it does is that, this makes sure these layers are not trained and preserve the assigned weights. If you pass the model to summary after these lines of code you will find that now the trainable column has several False values which was not the case previously.</p><p id="d3f5">Now if we retrain the model from the start using the same hyper-parameters we will see that after 10 epochs we are getting 62.01% of accuracy on the testing data which is more than 15% than before. We can visualize the loss and accuracy curve of this model below.</p><figure id="882c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*aArjzOgvEP6Su6WNEy8Pxw.png"><figcaption>With pre-training and freezing layers</figcaption></figure><p id="d7c1">Based on this graph and the one before we can see the mode with pre-trained weights are doing much better in terms of training and validation data. After 10 epoch the loss value is much lower and the accuracy is much higher.</p><p id="ee64">I hope this blog helped you out in some extend to understand the concept of pre-training and finetuning. Detailed implementation can be found <a href="https://github.com/aminul-huq/medium">here</a>.</p><blockquote id="648f"><p>If you have any difficulty understanding anything or want to reach out to me for any question shoot me an email at [email protected].</p></blockquote></article></body>