coding

coding blog scrapbook about

SqueezeNet Architecture Design

SqueezeNet

What is SqueezeNet?

a deep convolutional neural network (CNN)
compressed architecture design
model contains relatively small amount of parameters
achieve AlexNet-level accuracy on ImageNet dataset with 50x fewer parameters

Three advantages of small CNN architectures:

require less communication across servers during distributed training.
require less bandwidth to export a new model from the cloud.
more feasible to deploy on customized hardware with limited memory.

References and Additional Information

Architectural Design Strategies

The authors outline 3 main strategies for reducing parameter size while maximizing accuracy

Strategy 1

Make the network smaller by replacing 3x3 filters with 1x1 filters

conventional 3x3 replaced by 1x1 convolution filters
1x1 filter has 9X fewer parameters than a 3x3 filter

Difference between 3x3 filters and 1x1 filters

3x3 filters

larger spacial receptive field
captures spatial information of pixels close to each other.

1x1 filters

looks at one pixel at the time
caputres relationships amongst its channels
equivalent to a fully connected layer along the channel dimension

Strategy 2

Reduce the number of inputs for the remaining 3x3 filters.

fewer inputs to conv layers result in fewer parameters
achieved by using only 1x1 filters prior to the 3x3 conv layer
called the squeeze layer (description in next section)
total number of parameters in 3x3 conv layer = (number of input channels) (number of filters) (3*3)

Strategy 3

Downsample late in the network so that convolution layers have large activation maps.

make the most of smaller number of parameters and maximize accuracy
delaying downsampling late in the network, creates larger activation/feature maps
departure from more traditional architectures like the VGG network that use early downsampling
large activation maps results in a higher classification accuracy given the same number of parameters

VGG Architecure with early downsampling

The two main ways to achieve downsampling:

strides > 1 in the convolutional layers
pooling layers (eg max/average pooling)

General Strategy

Strategies 1 and 2 are about carefully decreasing the quantity of parameters in a CNN while attempting to preserve accuracy.
Strategy 3 is about maximizing accuracy on a limited budget of parameters.

Fire Module

What is the Fire Module?

building block used in the SqueezeNet
employs Strategies 1, 2, and 3
comprised of squeeze layers which have only 1x1 filters (strategy 1)
comprised of expand layers which have a mix of 1x1 and 3x3 convolution filters
number of filters in squeeze layer must be less than the expand layer (strategy 2)

SqueezeNet Architecture

Layers breakdown

layer 1: regular convolution layer
layer 2-9: fire module (squeeze + expand layer)
layer 10: regular convolution layer
layer 11: softmax layer

Architecure specifications

gradually increase number of filters per fire module
max-pooling with stride of 2 after layer 1,4,8
average-pooling after layer 10
delayed downsampling with pooling layers

Next Lesson

Implementation of SqueezeNet

Implementation of Fire module
Implementation of full SqueezeNet model

PreviousImplementation of a CNN Fire module for SqueezeNet NextTrain and Evaluate SqueezeNet on Cifar10 Dataset

Last updated 4 years ago