SqueezeNet Architecture Design


What is SqueezeNet?
  • a deep convolutional neural network (CNN)
  • compressed architecture design
  • model contains relatively small amount of parameters
  • achieve AlexNet-level accuracy on ImageNet dataset with 50x fewer parameters
Three advantages of small CNN architectures:
  • require less communication across servers during distributed training.
  • require less bandwidth to export a new model from the cloud.
  • more feasible to deploy on customized hardware with limited memory.

References and Additional Information

Architectural Design Strategies

The authors outline 3 main strategies for reducing parameter size while maximizing accuracy

Strategy 1

Make the network smaller by replacing 3x3 filters with 1x1 filters
  • conventional 3x3 replaced by 1x1 convolution filters
  • 1x1 filter has 9X fewer parameters than a 3x3 filter

Difference between 3x3 filters and 1x1 filters

3x3 filters
  • larger spacial receptive field
  • captures spatial information of pixels close to each other.
1x1 filters
  • looks at one pixel at the time
  • caputres relationships amongst its channels
  • equivalent to a fully connected layer along the channel dimension

Strategy 2

Reduce the number of inputs for the remaining 3x3 filters.
  • fewer inputs to conv layers result in fewer parameters
  • achieved by using only 1x1 filters prior to the 3x3 conv layer
  • called the squeeze layer (description in next section)
  • total number of parameters in 3x3 conv layer = (number of input channels) (number of filters) (3*3)

Strategy 3

Downsample late in the network so that convolution layers have large activation maps.
  • make the most of smaller number of parameters and maximize accuracy
  • delaying downsampling late in the network, creates larger activation/feature maps
  • departure from more traditional architectures like the VGG network that use early downsampling
  • large activation maps results in a higher classification accuracy given the same number of parameters
VGG Architecure with early downsampling
The two main ways to achieve downsampling:
  • strides > 1 in the convolutional layers
  • pooling layers (eg max/average pooling)

General Strategy

  • Strategies 1 and 2 are about carefully decreasing the quantity of parameters in a CNN while attempting to preserve accuracy.
  • Strategy 3 is about maximizing accuracy on a limited budget of parameters.

Fire Module

What is the Fire Module?
  • building block used in the SqueezeNet
  • employs Strategies 1, 2, and 3
  • comprised of squeeze layers which have only 1x1 filters (strategy 1)
  • comprised of expand layers which have a mix of 1x1 and 3x3 convolution filters
  • number of filters in squeeze layer must be less than the expand layer (strategy 2)

SqueezeNet Architecture

Layers breakdown
  • layer 1: regular convolution layer
  • layer 2-9: fire module (squeeze + expand layer)
  • layer 10: regular convolution layer
  • layer 11: softmax layer
Architecure specifications
  • gradually increase number of filters per fire module
  • max-pooling with stride of 2 after layer 1,4,8
  • average-pooling after layer 10
  • delayed downsampling with pooling layers

Next Lesson

Implementation of SqueezeNet

  • Implementation of Fire module
  • Implementation of full SqueezeNet model