avatarChun-kit Ho

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

1271

Abstract

</figure></iframe></div></div></figure><p id="2951">Day 34–35: 2020.05.15–16

Paper: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size Category: Model/Optimization</p><h1 id="3957">SqueezeNet</h1><h2 id="43c3">Strategy</h2><ol><li>Replace 3x3 filters with 1x1 filters</li><li>Decrease the number of input channels to 3x3 filters using <b><i>squeeze layers</i></b></li><li>Downsample late in the network so that convolution layers have large activation maps</li></ol><h2 id="8896">Fire Module, comprised of</h2><ul><li>a squeeze convolution layer (which has only 1x1 filters) (as per Strategy 1)</li><li>feeding into an expand layer that has a mix of 1x1 and 3x3 convolution filters</li><li>We expose three tunable dimensions (hyperparameters) in a Fire module: s_(1x1), e_(1x1), and e_(3x3).</li><li>In a Fire module, s_(1x1) is the number of filters in the squeeze layer (all 1x1), e_(1x1) is the number of 1x1 filters in the expand layer, and e_(3x3) is the number of 3x3 filters in the expand layer.</li><li>When we use Fire modules we set <b>s_(1x1) to be less than [e_(1x1) + e_(3x3)]</b>, so the squeeze layer helps to limit the number of input channels to the 3x3 filters, as per Strategy 2.</li></ul><figure id="5d9

Options

e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*geoa4bSP7Xj3A70MS1CxVw.png"><figcaption></figcaption></figure><h2 id="3a92">SqueezeNet Architecture</h2><ul><li>begins with a standalone convolution layer (conv1)</li><li>followed by 8 Fire modules (fire2–9)</li><li>ending with a final conv layer (conv10)</li><li>gradually increase the number of filters per fire module from the beginning to the end of the network</li><li>performs max-pooling with a stride of 2 after layers conv1, fire4, fire8, and conv10; these relatively late placements of pooling are per Strategy 3</li></ul><figure id="94f4"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*vng_yuHSKPR9N6SXQXNn4g.png"><figcaption></figcaption></figure><p id="b986">Other details</p><ul><li>To have the same height and width in the output activations from 1x1 and 3x3 filters, add a 1-pixel border of zero-padding in the input data to 3x3 filters of expand modules.</li><li>ReLU is applied to activations from squeeze and expand layers.</li><li>Dropout with a ratio of 50% is applied after the fire9 module.</li><li>Lack of fully-connected layers</li><li>begin with a learning rate of 0.04, and linearly decrease the learning rate throughout training</li></ul></article></body>

ML Paper Challenge Day 34, 35 — SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

Day 34–35: 2020.05.15–16 Paper: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size Category: Model/Optimization

SqueezeNet

Strategy

  1. Replace 3x3 filters with 1x1 filters
  2. Decrease the number of input channels to 3x3 filters using squeeze layers
  3. Downsample late in the network so that convolution layers have large activation maps

Fire Module, comprised of

  • a squeeze convolution layer (which has only 1x1 filters) (as per Strategy 1)
  • feeding into an expand layer that has a mix of 1x1 and 3x3 convolution filters
  • We expose three tunable dimensions (hyperparameters) in a Fire module: s\_(1x1), e\_(1x1), and e\_(3x3).
  • In a Fire module, s\_(1x1) is the number of filters in the squeeze layer (all 1x1), e\_(1x1) is the number of 1x1 filters in the expand layer, and e\_(3x3) is the number of 3x3 filters in the expand layer.
  • When we use Fire modules we set s\_(1x1) to be less than [e\_(1x1) + e\_(3x3)], so the squeeze layer helps to limit the number of input channels to the 3x3 filters, as per Strategy 2.

SqueezeNet Architecture

  • begins with a standalone convolution layer (conv1)
  • followed by 8 Fire modules (fire2–9)
  • ending with a final conv layer (conv10)
  • gradually increase the number of filters per fire module from the beginning to the end of the network
  • performs max-pooling with a stride of 2 after layers conv1, fire4, fire8, and conv10; these relatively late placements of pooling are per Strategy 3

Other details

  • To have the same height and width in the output activations from 1x1 and 3x3 filters, add a 1-pixel border of zero-padding in the input data to 3x3 filters of expand modules.
  • ReLU is applied to activations from squeeze and expand layers.
  • Dropout with a ratio of 50% is applied after the fire9 module.
  • Lack of fully-connected layers
  • begin with a learning rate of 0.04, and linearly decrease the learning rate throughout training
Deep Learning
Neural Networks
Optimization
Machine Learning
Recommended from ReadMedium