OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks. (2014) ☻
This paper develops several useful technology include offset max-pooling and combinng prediction
(they claim this approach is better than NMS). This paper also talk about multi-scale training.
R-CNN: Rich feature hierarchies for accurate object detection and semantic segmentation. (2014) ☻
Region proposal (selective search) -> Feature extraction (AlexNet) -> SVM -> Bounding-box regression.
This is a multi-stage network, we need seperately to train Feature extraction, SVM and Bounding-box regression.
The speed of this method is 47s/image.
SPPnet: Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. (2015) ☻
This paper use SPP to speed up classification and detection time. It also brings up a multi-size training method.
The SPPnet method computes a convolutional feature map for the entire input image and then classifies each object proposal
using a feature vector extracted from the shared feature map.For single scale input, the flow is:
Region proposal (selective search) -> Feature extraction for full image (AlexNet) -> SPP for every window -> FC -> SVM -> Bounding-box regression
This method can handle different size input images by using pooling (size and stride). However, this method is also a multi-stage network.
Fast R-CNN (2015) ☻
ION: Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks (2015)
This paper present an object detector based on Fast R-CNN that exploits information both inside and outside the region of interest. Contextual information outside
the region of interest is integrated using spatial recurrent neural networks (IRNNs). Inside, we use skip pooling to extract information at multiple
scales and levels of abstraction.
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. (2016) ☻
PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection. (2016)
This paper based on Faster R-CNN. It improves Faster R-CNN performance by using "less channels with more layers" with the help of C.ReLU and
concatenation of multi-scale intermediate outputs.
R-FCN: Object Detection via Region-based Fully Convolutional Networks. (2016) ☻
This paper construct a set of position-sensitive score maps by using a bank of specialized convolutional layers as the FCN output to improve Faster R-CNN. Each of these score maps
encodes the position information with respect to a relative spatial position. Then append a position-sensitive RoI pooling layer for every region proposal that shepherds information from these score maps.
HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection. (2016)
This paper is primarily based on an elaborately designed Hyper Feature which aggregates hierarchical feature maps first and then compresses them into a uniform space.
Next, a slight region proposal generation network is constructed to produce about 100 proposals. Finally, these proposals are classified and adjusted based on the detection module.
FPN: Feature Pyramid Networks for Object Detection. (2017) ☻
The construction of our pyramid involves a bottom-up pathway, a top-down pathway, and lateral connections. This method is a generic solution for building feature
pyramids inside deep ConvNets.
Non-Region based method
YOLO: You Only Look Once: Unified, Real-Time Object Detection. (2015)
This paper frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities.
A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation.
YOLO9000: Better, Faster, Stronger. (2016) ☻ ☻
This paper use anchor (based on Faster R-CNN) to improve YOLO. This paper develops the anchor by designing a k-means clustering to aotomaticlly find good number and size.
In this work, it use k=5 while k=9 in Faster R-CNN. Finally this paper propose a hierarchical tree method to jointly train on object detection and classification.
SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving. (2016)
This paper improve YOLO by three ways. First, this paper use stacked convolution filters to extract a high dimensional, low resolution feature map for the input image.
Then, they use ConvDet, a convolutional layer to take the feature map as input and compute a large amount of object bounding boxes and predict their categories.
Finally, this paper filter these bounding boxes to obtain final detections.
SSD: Single Shot MultiBox Detector. (2015) ☻
This paper creat different sizes of feature map and discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales
per feature map location.
There are several paper based on this work: DSSD: Deconvolutional Single Shot Detector. (2017) This paper combine Residual block and Deconvolution Module to improve SSD model. BlitzNet: A Real-Time Deep Network for Scene Understanding. (2017) This paper is almost the same with DSSD model.
DSOD: Learning Deeply Supervised Object Detectors from Scratch. (2017) This paper use DenseNet to improve SSD model.
PLN: Point Linking Network for Object Detection. (2017) ☻
This paper propose a novel object bounding box representation using points and links and implemented using deep ConvNets. There are two kinds of points in this system,
the center point of an object bounding box which is denoted as O and a corner point of an object bounding box. In order to detect the point pair, there are two tasks,
to localize the two points and to associate the two points. Based on these tasks, we propose a point-based object detection framework.