Summary of Dilated Convolutions Paper

Paper

Key contributions

Achievements

Dilated Convolutions in semantic segmentation

The goal of semantic segmentation is classifying each pixel of the input image into a given set of classes. The main challenge of this is to combine pixel level accuracy with multi scale contextual information. The previous state of the art models are based on the adaptations of convolutional neural networks designed for image classification.

The authors developed a new convolutional architecture that systematically uses dilated convolutions for multi-scale context aggregation. This idea was motivated by the fact that dilated convolutions support exponential expansion of receptive field without losing resolution as well as coverage.

Front End prediction module

This module is modified version of adapted VGG-16 network for semantic segmentation by removing the last two pooling and striding layers. Each of these pooling and striding layers was removed and convolutions in all subsequent layers were dilated by a factor of 2 for each pooling layer that was removed. The front-end prediction module was compared with FCN-8 and DeepLab(2015). The obtained results proved that front-end prediction module is both simpler and more accurate.

Context module

This module increases the performance by aggregating multi-scale contextual information. The architecture of this module is described in the below table.

context module

The above describes two types of architecture, i.e. basic and larger context network which maintains constant and larger number of feature maps correspondingly. These networks processes the given feature maps by aggregating contextual information at increasing scales.

Experiments are conducted by plugging basic and larger context module into front-end. These are evaluated on the PASCAL VOC 2012 test set. The results demonstrated larger context module with front-end yields a significant boost in accuracy over the front-end alone. The below table demonstrates the results.

results

Thus the context network was outperforming the DeepLab++ architecture without performing structured prediction. Combining the context network with the CRF-RNN structured prediction module improves the accuracy of the CRF-RNN system.