Summary of AlexNet Paper

Alexnet is a deep neural network which learns visual knowledge from images to classify a given image. It had produced breakthrough results in ImageNet LSVRC-2010 contest achieving a top-1 error rate of 37.5 % which was better than previous start of art methods which achieved 47.1%. The network consists of 8 layers out of which 5 were convolutional layers and other 3 were fully connected layers. The below Section details its architecture

Architecture

Architecture

For ILSVRC-2010, it was trained on 1.2 million images, validated on 50,000 images and tested on 150,000 images

Layer 1

Architecture

Before image(s) is given to this layer, variable resolution images in training set are re-scaled to a fixed size of 256 * 256 as deep neural networks expects all inputs to be of fixed size. This is done by first rescaling the image such that shorter side is of length 256, and then central 256*256 is cropped out.

To prevent overfitting the training images on this net, dropout of 0.5 in some layers and two methods of data augmentation where labels are preserved were used. One is in which random 224 * 224 patches(and their horizontal reflections) were extracted from each image. Thus this method increase the size of training set by reasonable amount. Another is to perform a transformation to vary the intensity and color of illumination for a image as object identity is invariant to these images.

And Rectified Linear unit neurons are used, where its activation function is max(0,x). By using these neurons, training time will be decreased according to the observations.

Layer 2

Architecture

Layer 3

Architecture

Layer 4

Architecture

Layer 5

Similarly after the fifth convolutional layer

Softmax

Architecture

To the final layer 1000 way softmax is used to get predicted probabilities for each class.