Smurf
文章50
标签0
分类6
CNN

CNN

CNN

1. Why CNN for Image?

  • 参数过多,我们可以减少全连接神经元、共享参数、卷积

image-20211203101411725

  • Some patterns are much smaller than the whole image
    • 神经元不需要看整张图片,而只需要看一小部分

image-20211203101707872

  • The same patterns appear in different regions.
    • 对于重复出现的模式可以共享参数

image-20211203102046106

  • Subsampling the pixels will not change the object
    • 降采样不影响图像语义

image-20211203102127222

  • We can subsample the pixels to make image smaller
    • Less parameters for the network to process the image

1.2 The whole CNN

image-20211203102209409

image-20211203102220746

image-20211203102250570

  • 卷积是局部区域的加权和,做卷积时大小相同就共享参数了
  • 而maxpooling相当于降采样

1.3 CNN Convolution

image-20211203102539835

  • 由于过于一个输出通道的卷积核都可以学习一种模式,这相当于就是共享参数

image-20211203102754826

  • Do the same process for every filter

image-20211203103045245

  • CNN Zero Padding

image-20211203103111181

  • CNN Colorful image

image-20211203103130349

  • Convolution v.s. Fully Connected
    • 减少了很多参数
    • 输出多个feature map,说明其增加更多的非线性变换,增强网络的表征能力

image-20211203103159154

  • CNN Max Pooling

image-20211203103519246

  • 增加非线性,以及减少参数

image-20211203104206311

image-20211203104309324

image-20211203104339104

Flatten

image-20211203104453704

Convolutional Neural Network

image-20211203104522777

What does CNN learn?

image-20211203104609839

What is the essential difficulty?

  • 深度学习解决语义鸿沟:提取高阶语义模式、不受光照、旋转等影响

image-20211203104951828

What can CNN do for computer vision?

Before deep learning was born

image-20211203105051720

Feature extraction example #1

  • Feature name: Local Binary Pattern (LBP)

  • Use center pixel value to threshold the 3x3 neighborhood

  • Result in binary number

  • Histogram of the labels is used as a texture descriptor

image-20211203105257758

Feature extraction example #2

  • Feature name: Scale invariant feature transform (SIFT)

  • Divide the 16x16 window into a 4x4 grid of cells

  • Compute an orientation histogram for each cell

  • 16 cells x 8 orientations = 128 dimensional descriptor

image-20211203105351055

What’s wrong with traditional features?

image-20211203105436988

image-20211203105443905

Image classification with deep learning

image-20211203105551122

Four typical image classification nets

image-20211203105720465

Image classification with deep learning

  • AlexNet

image-20211203105904474

image-20211203105927967

image-20211203105934868

Characters of AlexNet

  • Trained by two GPUs

  • Data augmentation

  • Clipping / flipping / …

  • Using ReLU rather than sigmoid function

  • Overlapped pooling

  • Dropout in full connection layers

  • VGG

image-20211203110427542

  • Q: Why use smaller filters? (3x3 conv)

image-20211203110921407

  • 大的卷积核可以分解成小的卷积核,网络加深,获得更大感受野,减少参数,速度加快
  • 两个conv 33 相当于5\5,三个相当于7*7,(2*3+1)

  • GoogLENet

image-20211203111115747

  • 尺度信息更丰富,防止丢失信息
  • 增加每一层学到的模式

image-20211203111307114

image-20211203111331196

  • Why not going much deeper?

image-20211203111552795

image-20211203111443312

  • ResNet
    • 学习到的函数变为残差:F(x)-x

image-20211203111718478

image-20211203111926728

  • 瓶颈残差块:使得计算量变少

image-20211203112039379

  • 好处:保证前向信息传播的流畅性、其次保证梯度回传的稳定性

image-20211203112328869

From classification to segmentation

image-20211203112501571

  • Converting the segmentation problem to classification
    • 把一个窗口扣成小块去卷积

image-20211203113242819

  • 列举所有滑动窗口去卷积分类
    • 这样会导致参数爆炸

image-20211203113406451

Downsampling and Upsampling

image-20211203113519759

  • 先下采样,使得参数变少,再上采样还原分类结果

  • Review: Unpooling

    • 记住pooling的所有位置,然后反pooling,除了最大值的位置,其他标为零

image-20211203113700909

  • Review: Deconvolution
    • 末尾补零即可

image-20211203113818259

Object detection

image-20211203114212752

  • Predict bounding boxes, class labels, and confidence scores
  • For each detection, determine whether it is true or false

Basic idea to detection: Sliding windows

image-20211203114320913

image-20211203114411621

image-20211203114424578

  • 通过设计一些判断的准则,找一些置信度最高的框保留下来

image-20211203114526791

R-CNN: Region proposals + CNN features

  • Regional-based Convolutional Neural Network (R-CNN)

image-20211203114621937

image-20211203114722303

  • Fast R-CNN

image-20211203115029829

RoI pooling goal

  • “Crop and resample” a fixed size feature representing a region of interest out of the feature map

  • Use nearest neighbor interpolation of coordinates, max pooling

  • 把原始图片的候选框映射到feature map上去

image-20211203115241898

  • For each RoI , predicts probabilities for c+1 classes (with background) and four bounding box offsets for c classes

image-20211203115325462

Fast R-CNN training

image-20211203115411330

Bounding box regression

image-20211203115505658

Faster R CNN

image-20211203115618842

  • Slide a small window (3x3) over the conv5 layer
    • Predict object/no object
    • Regress bounding box coordinates with reference to anchors (3 scales x 3 aspect ratios)
  • 一开始是调整每个候选框,而最后是只对一些框进行计算loss

image-20211203120013554

image-20211203120104446

YOLO

Streamlined detection architectures

  • The Faster R CNN pipeline separates proposal generation and region
    classification:

image-20211203120306245

  • Is it possible do detection in one shot?

image-20211203120326525

  • Idea: No bounding box proposals. Predict a class and a box for every location in a grid.

image-20211203120453030

image-20211203120633612

  • Divide the image into 7x7 cells.
  • Each cell trains a detector.
  • The detector needs to predict the object’s class distributions.
  • The detector also predicts bounding boxes and confidence
    scores.

  • Objective function

    • 为了让小的框的长宽对loss的影响更大一点

image-20211203121111420

  • Localization accuracy suffers compared to Fast(er) R CNN due to coarser features, errors on small boxes
  • 7x speedup over Faster R CNN (45 155 FPS vs. 7 18 FPS)

image-20211203121536705

本文作者:Smurf
本文链接:http://example.com/2021/08/15/cv/11.%20CNN/
版权声明:本文采用 CC BY-NC-SA 3.0 CN 协议进行许可