Summary of FCN paper

Paper

Achievements

The FCN models are tested on the following datasets, the results reported are compared to the previous state-of-the-art methods.

Key Contributions

This paper introduced the idea of converting classification networks into fully convolutional networks that produce coarse outputs. Then these coarse outputs are connected to dense pixels for pixelwise prediction.

The ideas the authors used in improving semantic segmentation are follows

  1. Adapting Classifiers for dense prediction
  2. Upsampling using deconvolution

Architecture

This paper cast the Imagenet classifiers into fully convolutional layers and augmented them with in-network upsampling and pixel-wise loss. The learned representation of Imagenet classifiers are fine-tuned for semantic segmentation.

fcn

Authors adapted and extended the then best classfication networks on Imagenet. They compared the performance of each network by inference time and mean IoU on the validation set of PASCAL VOC.

fcn

The 32 pixel stride at the final prediction layer (of unmodified classfication networks) limits the scale of detail in the upsampled output. They addresed this issue by adding skips between layers to fuse coarse, semantic information and fine, location information. See the architecture below: FCN-32, FCN-16 and FCN-8 architectures have stride of 32, 16 and 8 respectively.

fcn

Conclusion

Extending the classification networks to semantic segmentation, and modifying these architectures with multi-resolution layer combinations dramatically improved the then state-of-the-art results.