Written by Swetha Tanamala
on August 01, 2018

Summary of FCN paper

Paper

Title: Fully Convolutional Networks for Semantic Segmentation(FCN)
Submission date: 14 Nov 2014

Achievements

The FCN models are tested on the following datasets, the results reported are compared to the previous state-of-the-art methods.

PASCAL VOC 2012
- achieved the best results on mean intersection over union (IoU) by a relative margin of 20%
NYUDv2 dataset
- achieved mean IoU by 18.8% relative improvement
SIFT Flow dataset
- achieved the best results on pixel accuracy by a relative margin of 8.39%

Key Contributions

This paper introduced the idea of converting classification networks into fully convolutional networks that produce coarse outputs. Then these coarse outputs are connected to dense pixels for pixelwise prediction.

The ideas the authors used in improving semantic segmentation are follows

Adapting Classifiers for dense prediction
Upsampling using deconvolution

Architecture

This paper cast the Imagenet classifiers into fully convolutional layers and augmented them with in-network upsampling and pixel-wise loss. The learned representation of Imagenet classifiers are fine-tuned for semantic segmentation.

fcn

Authors adapted and extended the then best classfication networks on Imagenet. They compared the performance of each network by inference time and mean IoU on the validation set of PASCAL VOC.

fcn

The 32 pixel stride at the final prediction layer (of unmodified classfication networks) limits the scale of detail in the upsampled output. They addresed this issue by adding skips between layers to fuse coarse, semantic information and fine, location information. See the architecture below: FCN-32, FCN-16 and FCN-8 architectures have stride of 32, 16 and 8 respectively.

fcn

Conclusion

Extending the classification networks to semantic segmentation, and modifying these architectures with multi-resolution layer combinations dramatically improved the then state-of-the-art results.

← → Top