CNN
1. Why CNN for Image?
- 参数过多,我们可以减少全连接神经元、共享参数、卷积
- Some patterns are much smaller than the whole image
- 神经元不需要看整张图片,而只需要看一小部分
- The same patterns appear in different regions.
- 对于重复出现的模式可以共享参数
- Subsampling the pixels will not change the object
- 降采样不影响图像语义
- We can subsample the pixels to make image smaller
- Less parameters for the network to process the image
1.2 The whole CNN
- 卷积是局部区域的加权和,做卷积时大小相同就共享参数了
- 而maxpooling相当于降采样
1.3 CNN Convolution
- 由于过于一个输出通道的卷积核都可以学习一种模式,这相当于就是共享参数
- Do the same process for every filter
- CNN Zero Padding
- CNN Colorful image
- Convolution v.s. Fully Connected
- 减少了很多参数
- 输出多个feature map,说明其增加更多的非线性变换,增强网络的表征能力
- CNN Max Pooling
- 增加非线性,以及减少参数
Flatten
Convolutional Neural Network
What does CNN learn?
What is the essential difficulty?
- 深度学习解决语义鸿沟:提取高阶语义模式、不受光照、旋转等影响
What can CNN do for computer vision?
Before deep learning was born
Feature extraction example #1
Feature name: Local Binary Pattern (LBP)
Use center pixel value to threshold the 3x3 neighborhood
Result in binary number
Histogram of the labels is used as a texture descriptor
Feature extraction example #2
Feature name: Scale invariant feature transform (SIFT)
Divide the 16x16 window into a 4x4 grid of cells
Compute an orientation histogram for each cell
16 cells x 8 orientations = 128 dimensional descriptor
What’s wrong with traditional features?
Image classification with deep learning
Four typical image classification nets
Image classification with deep learning
- AlexNet
Characters of AlexNet
Trained by two GPUs
Data augmentation
Clipping / flipping / …
Using ReLU rather than sigmoid function
Overlapped pooling
Dropout in full connection layers
- VGG
- Q: Why use smaller filters? (3x3 conv)
- 大的卷积核可以分解成小的卷积核,网络加深,获得更大感受野,减少参数,速度加快
两个conv 33 相当于5\5,三个相当于7*7,(2*3+1)
GoogLENet
- 尺度信息更丰富,防止丢失信息
- 增加每一层学到的模式
- Why not going much deeper?
- ResNet
- 学习到的函数变为残差:F(x)-x
- 瓶颈残差块:使得计算量变少
- 好处:保证前向信息传播的流畅性、其次保证梯度回传的稳定性
From classification to segmentation
- Converting the segmentation problem to classification
- 把一个窗口扣成小块去卷积
- 列举所有滑动窗口去卷积分类
- 这样会导致参数爆炸
Downsampling and Upsampling
先下采样,使得参数变少,再上采样还原分类结果
Review: Unpooling
- 记住pooling的所有位置,然后反pooling,除了最大值的位置,其他标为零
- Review: Deconvolution
- 末尾补零即可
Object detection
- Predict bounding boxes, class labels, and confidence scores
- For each detection, determine whether it is true or false
Basic idea to detection: Sliding windows
- 通过设计一些判断的准则,找一些置信度最高的框保留下来
R-CNN: Region proposals + CNN features
- Regional-based Convolutional Neural Network (R-CNN)
- Fast R-CNN
RoI pooling goal
“Crop and resample” a fixed size feature representing a region of interest out of the feature map
Use nearest neighbor interpolation of coordinates, max pooling
把原始图片的候选框映射到feature map上去
- For each RoI , predicts probabilities for c+1 classes (with background) and four bounding box offsets for c classes
Fast R-CNN training
Bounding box regression
Faster R CNN
- Slide a small window (3x3) over the conv5 layer
- Predict object/no object
- Regress bounding box coordinates with reference to anchors (3 scales x 3 aspect ratios)
- 一开始是调整每个候选框,而最后是只对一些框进行计算loss
YOLO
Streamlined detection architectures
- The Faster R CNN pipeline separates proposal generation and region
classification:
- Is it possible do detection in one shot?
- Idea: No bounding box proposals. Predict a class and a box for every location in a grid.
- Divide the image into 7x7 cells.
- Each cell trains a detector.
- The detector needs to predict the object’s class distributions.
The detector also predicts bounding boxes and confidence
scores.Objective function
- 为了让小的框的长宽对loss的影响更大一点
- Localization accuracy suffers compared to Fast(er) R CNN due to coarser features, errors on small boxes
- 7x speedup over Faster R CNN (45 155 FPS vs. 7 18 FPS)