review

1. Course Outline

2. What is computer vision

Computer vision is a field of artificial intelligence (AI)that enables computers and systems to derive meaningful information from digital images, videos and other visual inputs —and take actions or make recommendations based on that information. —IBM
- 利用计算机系统从数字图像、视频和其他可视化输入提取有意义的信息，并基于这些信息做出决策

Key point 1: human visual system

能够模拟人
- 双目->三维重构
- 多尺度特性->sift
- 注意力机制->op算法、空间注意力机制FCNN
- 并行处理->多尺度多特征融合
Philosophies learn from the human visual system for computer vision systems.
Hierarchical :Multi-scale fusion
- Meaning
Applications:
- Handcrafted features, e.g., SIFT, HOG..….
- Deep learning architectures, e.g., segmentation….
Attention mechanism
- Meaning
- Applications: various CNNs

3. Key point 2: computer vision system (CVS)

系统分析
Related domain knowledge when you construct a CVS.
Examples: a self-driving system, a video surveillance system……
天气预报
- 云图：根据图像以及过往天气预测未来天气、用到图像处理、信息、天气学、深度学习

4. Key point 3: CVS in our daily lives

人脸识别：VGmodel RCNN
- 耗时、准确率低
指纹识别、身份证识别
Various applications
- 分析计算机视觉应用，分析算法优劣

5. Key point 4: challenges

Analyze the challenges with real-life CV systems

尺度：尺度金字塔
光线：边缘、角点
- 划归为cell，增加光照鲁棒性
视角
遮挡：特征点检测、局部特征应对遮挡
形变：分开考虑

6. Image filtering

平滑滤波移除高频特征
高斯滤波：高斯核可分，降低计算量
- 两个的$\sigma$关系
补零关系
Computing
Properties:
- Remove “high-frequency” components from the image
- Convolution with self is another Gaussian
  - So can smooth with small-σ kernel, repeat, and get same result as larger-σ kernel would have
  - Convolving two times with Gaussian kernel with std. dev. σ is same as convolving once with kernel with std. dev. σ√2
- Separable kernel
  - Factors into product of two 1D Gaussians
- Padding on the edge: methods and problems
  - What is the complexity of filtering an n×nimage with an m×mkernel?
  - What if the kernel is separable?
    - O(n2m2)
    - O(n2m)

6.2 Key point 2: Separability

6.3 Key point 3: Image filtering -noise

Salt and pepper noise:
Contains random occurrences
of black and white pixels
Impulse noise:
Contains random occurrences
of white pixels
Gaussian noise:
Variations in intensity drawn from a Gaussian normal distribution

中值滤波利于处理椒盐噪声，容易滤去过低或过高的异常值
- 利于保护边缘
- 不好是非线性、不能写成卷积
高斯滤波利于去除高斯噪声
- 可能会把边缘模糊掉

6.4 Key point 4: Sharpening

Understand the process and parameter influence
- What does blurring take away?

Let’s add it back:

7. Edge detection

7.1 Key point 1: Image gradient

7.2 Key point 2: Edge filters

Design philosophy and their functions
Compute

7.3 Key point 3: Canny edge detector

Steps and their motivations
Parameter choice and reasons

Filter image with derivative of Gaussian
Find magnitude and orientation of gradient
Non-maximum suppression:
-Thin wide “ridges” down to single pixel width
Linking and thresholding (hysteresis):
-Define two thresholds: low and high
-Use the high threshold to start edge curves and the low threshold to continue them

8. Local features -corner

8.1 Key point 1: Corner Detection -Basic Idea

We should easily recognize the point by looking through a small window
Shifting a window in any direction should give a large change in intensity

8.2 Key point 1: Harris detector

Step3: Compute corner response function R and judge it is a corner or edge or ….

8.3 Key point 2: Harris detector –Properties

9.1 Key point 1: Scale space/SIFT

9.2 Key point 4: HOG -steps and motivations

怎么算
Blocks and cells:
- Each block contains 2×2 cells
- Each cell is with 8×8 pixels
- Each block：16×16 pixels
- Neighboring blocks are with 50% overlap.
- For a 64×128 image, it cantains
  7×15 = 105 blocks in total

9.3 Key point 5: HOG for Detection

卷积、hog特征、三维

10. Key point 1: RANSAC for line fitting

参数选择

11. K-means: pros and cons

Normalized cut：为什么不一样

Key point 1: Viola-Jones face detector

Visual vocabularies: Issues

How to choose vocabulary size?
•
Too small: visual words not representative of all patches
•
Too large: quantization artifacts, overfitting
•
Why BOW?
•
Efficiency
•
BOW have been useful in matching an image to a large database

Key point 2: Pedestrian detection with HOG
Train a pedestrian template using a linear support vector machine
At test time, convolve feature map with template
Find local maxima of response
For multi-scale detection, repeat over multiple levels of a HOG pyramid

Strengths
Works very well for non deformable objects with canonical orientations: faces, cars, pedestrians
Fast detection
Weaknesses
Not so well for highly deformable objects or “stuff”
Not robust to occlusion
Requires lots of training data