Image Feature Extraction

1. Why computer vision is challenging?

Viewpoint variation 视觉角度
Illumination 光照
Occlusion 遮挡
Scale 尺度
Deformation 姿态变形
Background clutter 背景杂乱
Local ambiguity 局部迷惑
Object intra class variation 目标内部变换

2. Motivation for using local features

Occlusions
Articulation 关节

Intra category variations 局部不变

3. General Approach

Find a set of distinctive keypoints 找到一系列关键点
Define a region around each keypoint 在关键点周围定义对应的区域
Extract and normalize the region content 从区域中提取内容并归一化
Compute a local descriptor from the normalized region 用特征算子提取特征
Match local descriptors 匹配局部描述信息

4. Characteristics of good features

Compactness and efficiency 紧凑且高效，要求关键点远少于图片pixel，且提取得算法是高效的，存储量小的
Saliency 具有判别力
- Each feature is distinctive
Locality 局部特征不会变
Repeatability 可重复性

5. Keypoint extraction: Corners

平滑部分区域的各个方向梯度都为零
边缘区域有一个方向的梯度不为零
角点位置至少有两个方向的梯度值不为零

6. Corner Detection

最终结果是一个窗口大小的像素，不是只有一个像素

6.1 Derivation

$E(u,v)=\sum_{(x, y)\in W}[I(x+u, y+v)-I(x, y)]^{2}$

窗口移动前后像素值变化
类似于欧式距离

$E(u, v)=\sum_{( x, y )\in W} w(x,y)[I(x+u, y+v)-I(x, y)]^{2}$

window function 取窗口为矩形波或者高斯窗口

First-order Taylor approximation for small motions [u, v]:

$I(x+u, y+v) \approx I(x, y)+I_{x} u+I_{y} v$

一维泰勒：

$f(x_0+h)=f(x_0)+\frac{f'(x_0)}{1!}h+\frac{f''(x_0)}{2!}h^2+o(h^2)$

二维泰勒：

$f(x_0+h,y_0+k)=f(x_0,y_0)+\frac{f'_x(x_0,y_0)h+f'_y(x_0,y_0)k}{1!}+\frac{f_{xx}''(x_0,y_0)}{2!}h^2+\frac{f_{xy}''(x_0,y_0)} {2!}hk+\frac{f_{yx}''(x_0,y_0)}{2!}hk+\frac{f_{yy}''(x_0,y_0)}{2!}k^2+...$

代入$E(u,v)$：

$E(u, v)=\sum_{(x, y) \in W}\left(I_{x} u+I_{y} v\right)^{2}=\sum_{(x, y) \in w} I_{x}^{2} u^{2}+2 I_{x} I_{y} u v+I_{y}^{2} v^{2}$ $E(u, v) \approx u^{2} \sum_{x, y} I_{x}^{2}+2 u v \sum_{x, y} I_{x} I_{y}+v^{2} \sum_{x, y}I_{y}^{2}$

$E(u, v) \approx[u, v]\left[\begin{array}{cc} \sum_{x,y\in w} I_{x}^{2} & \sum_{x,y\in w} I_{x} I_{y} \\ \sum_{x,y\in w} I_{x} l_{y} & \sum_{x,y\in w} I_{y}^{2} \end{array}\right]\left[\begin{array}{l} u \\ v \end{array}\right]$

6.2 研究$M$矩阵

A horizontal slice ” of 𝐸(𝑢,𝑣)is given by the equation of an ellipse: 做垂直切片，即令E=const

$\left[\begin{array}{ll} u & v \end{array}\right] M\left[\begin{array}{l} u \\ v \end{array}\right]=\mathrm{const}$

$M =\left[\begin{array}{cc} \sum_{x,y\in w} I_{x}^{2} & \sum_{x,y\in w} I_{x} I_{y} \\ \sum_{x,y\in w} I_{x} l_{y} & \sum_{x,y\in w} I_{y}^{2} \end{array}\right]$

6.2.1 假设窗口只朝x，y方向移动

$M$矩阵为对角矩阵，记为：

$M =\left[\begin{array}{cc} a & 0\\ 0 & b \end{array}\right]$

即没有旋转角度的影响，角点刚好和x，y方向平行

$[u, v]\left[\begin{array}{cc} a & 0\\ 0 & b \end{array}\right]\left[\begin{array}{l} u \\ v \end{array}\right]=1$

$au^2+bv^2=1$ $\frac{u^2}{(a^{-\frac{1}{2}})^2}+\frac{v^2}{(b^{-\frac{1}{2}})^2}=1$

这个式子说明，$a,b$都不接近0，则为角点，这说明E随u,v呈现至少两个方向的变化。
当a或b其中一个接近于0，则为边缘，因为此时椭圆坍塌为一条直线，这说明E随着u,v变化只有一个方向。
当a和b都很接近于零，则说明，E随着u,v基本不变，则说明此时该部分为平滑区域。

6.2.2 假设窗口只朝有一定转角

由于：

$M =\left[\begin{array}{cc} \sum_{x,y\in w} I_{x}^{2} & \sum_{x,y\in w} I_{x} I_{y} \\ \sum_{x,y\in w} I_{x} l_{y} & \sum_{x,y\in w} I_{y}^{2} \end{array}\right]$

我们可以知道$M$是对称矩阵，所以$M$可以进行正交分解：

$M=R^{-1}\left[\begin{array}{cc} \lambda_{1} & 0 \\ 0 & \lambda_{2} \end{array}\right] R$

不难发现，$R$是决定旋转的旋转矩阵，$\lambda_1,\lambda_2$为$M$的特征值，所以真正决定该处是否为角点的是$M$的特征值。

6.3 改进

$R=\operatorname{det}(M)-\alpha \operatorname{trace}(M)^{2}=\lambda_{1} \lambda_{2}-\alpha\left(\lambda_{1}+\lambda_{2}\right)^{2}$

用这种方法，可以避免计算矩阵的特征值。

如果$\lambda_1,\lambda_2$可以比拟，且均不接近零，可以推出$R>0$，则为角点：

$\begin{array}{cc} R=\lambda_{1} \lambda_{2}-\alpha\left(\lambda_{1}+\lambda_{2}\right)^{2} \\ =\lambda^2-\alpha(4\lambda^2)\\ =0.84\lambda^2 > 0 \end{array}$

如果$\lambda_1,\lambda_2$可以比拟，且均接近零，可以推出$|R|\approx 0$，则为平滑区域：

$\begin{array}{cc} R=\lambda_{1} \lambda_{2}-\alpha\left(\lambda_{1}+\lambda_{2}\right)^{2} \\ =\lambda^2-\alpha(4\lambda^2)\\ =0.84\lambda^2 \approx 0 \end{array}$

如果$\lambda_1\gg\lambda_2$，则可以推出$R<0$,为边缘处：

$R=\lambda_1\lambda_2-\alpha\lambda_1^2<0$

7. The Harris corner detector

Compute partial derivatives at each pixel 计算每个像素值的梯度，即利用高斯偏导核做卷积
Compute second moment matrix $M$ in a Gaussian window around each pixel 用窗口遍历所有像素，并计算每个$M$矩阵

$M =\left[\begin{array}{cc} \sum_{x,y\in w} I_{x}^{2} & \sum_{x,y\in w} I_{x} I_{y} \\ \sum_{x,y\in w} I_{x} l_{y} & \sum_{x,y\in w} I_{y}^{2} \end{array}\right]$

Compute corner response function $R$

$R=\operatorname{det}(M)-\alpha \operatorname{trace}(M)^{2}(\alpha\in(0.04,0.06))$

Threshold $R$, 对 $R$ 进行阈值处理，以去除不必要的噪音
Find local maxima of response function nonmaximum suppression) 非极大抑制

8. Harris Detector：Robustness of corner features

8.1 Affine intensity change 像素值强度发生仿射变化，例如光照

$b$ 相当于均匀光照，对 $R$ 的计算没有影响；
对于 $a$ 虽然会产生一定增益，但是部分角点的判别不会发生改变

8.2 Image translation 图像平移

Derivatives and window function are shift invariant Corner location 检测到的角点会发生位置的变化
Corner location is covariant w.r.t . translation

8.3 Image rotation

Second moment ellipse rotates but its shape (i.e. eigenvalues ) remains the same
Corner location is covariant w.r.t . rotation

8.4 Scaling

会导致角点被检测为边缘
Corner location is not covariant w.r.t .

9. Blob detection 为了解决尺度问题

9.1 Laplacian of Gaussian

Circularly symmetric operator for blob detection in 2D

$\nabla^{2} g=\frac{\partial^{2} g}{\partial x^{2}}+\frac{\partial^{2} g}{\partial y^{2}}$

Edge = ripple波纹
Blob = superposition of two ripples 两个波纹的叠加

Spatial selection: the magnitude of the Laplacian response will achieve a maximum at the center of the blob, provided the scale of the Laplacian is matched ” to the scale of the blob.
当拉普拉斯核的规模和斑点相匹配，拉普拉斯响应模值处极为斑点中心处
However, Laplacian response decays as scale increases: 拉普拉斯响应会随着规模增大而逐渐减小

9.2 Laplacian of Gaussian’s scale normalization

对于高斯一阶偏导核响应曲线与x轴围成的面积为$\frac{1}{\sigma \sqrt{2 \pi}}$，$\sigma$与响应强度呈反比：

To keep response the same (scale invariant), must multiply Gaussian derivative by $\sigma$
Laplacian is the second Gaussian derivative, so it must be multiplied by $\sigma^2$
Effect of scale normalization：

Scale-normalized Laplacian of Gaussian:

$\nabla_{\text {norm }}^{2} g=\sigma^{2}\left(\frac{\partial^{2} g}{\partial x^{2}}+\frac{\partial^{2} g}{\partial y^{2}}\right)$

9.3 Laplacian of Gaussian’s scale

拉普拉斯算子在多大尺度上对半径为 r 的二元圆达到最大响应?

To get maximum response, the zeros of the Laplacian have to be aligned with
the circle 为了得到最大的响应，拉普拉斯零点需要与斑点对齐

因为这样才可以把拉普拉斯小于零的部分完全抵消，注意上图，两条曲线做卷积后才得到响应。
演算：

$G_{\sigma}=\frac{1}{2\pi \sigma^2}exp(-\frac{x^2+y^2}{2\sigma^2})$ $\frac{\partial G_\sigma}{\partial x}=-\frac{x}{\sigma^2}exp(-\frac{x^2+y^2}{2\sigma^2})$ $\frac{\partial G_\sigma}{\partial y}=-\frac{y}{\sigma^2}exp(-\frac{x^2+y^2}{2\sigma^2})$ $\frac{\partial^2 G_\sigma}{\partial x^2}=\frac{x^2-\sigma^2}{\sigma^4}exp(-\frac{x^2+y^2}{2\sigma^2})$ $\frac{\partial^2 G_\sigma}{\partial y^2}=\frac{y^2-\sigma^2}{\sigma^4}exp(-\frac{x^2+y^2}{2\sigma^2})$ $LoG=\sigma^{2}\left(\frac{\partial^{2} g}{\partial x^{2}}+\frac{\partial^{2} g}{\partial y^{2}}\right)=\frac{x^2+y^2-2\sigma^2}{\sigma^2}exp(-\frac{x^2+y^2}{2\sigma^2})$

Let $LoG = 0$：

$x^2+y^2-2\sigma^2=0$ $\sigma=\frac{r}{\sqrt{2}}$

10. Scale space blob detector

Convolve image with scale normalized Laplacian at several scales 用归一化后的拉普拉斯核与图片卷积
Find maxima of squared Laplacian response in scale space 找到拉普拉斯响应平方的最大值

对于每一个像素点，不仅要比较邻近的八个像素点，还要比较相邻尺度的18个像素点

11. Scale-invariant feature transform (SIFT)

Approximating the Laplacian with a difference of Gaussians:

Laplacian：

$L=\sigma^{2}\left(G_{x x}(x, y, \sigma)+G_{y y}(x, y, \sigma)\right)$

Difference of Gaussians：

$D o G=G(x, y, k \sigma)-G(x, y, \sigma)$ $\begin{aligned} D(x, y, \sigma) &=(G(x, y, k \sigma)-G(x, y, \sigma)) * I(x, y) \\ &=L(x, y, k \sigma)-L(x, y, \sigma) \end{aligned}$

Scale space construction 尺度空间创建
由于计算DoG等价于计算不同$\sigma$的高斯核的差分，所以要像下图计算：

首先是 2n*2n 的高斯核得到的响应，进行差分后才得到DoG，之后在进行下采样从而缩小高斯核的规模，再进行卷积得到响应，接着差分在得到一系列DoG
最后再找出最大的像素，即为角点

12. Representing keypoints with descriptors

用描述符表示关键点

对于每个窗口提取出来的角点，我们算出这些像素的梯度，然后将其每16个分为一个cell（4*4），对于每一个这样的cell我们用一个八维的向量表示它：第一个维度表示0~45°的幅值，第二个维度表示45°~90°的幅值…依次类推，对于cell里的每个梯度值我们通过量化的方式决定其在那个维度，即：若有个像素的梯度值的角度为25°，幅值为3，那么它表示为（3,0,0,0,0,0,0,0），对于每一个像素都这样表示，再把cell里的所有坐标求和，即可得表示结果。
对于16*16大小的图片（关键点），我们可以分为16个cell，每一个cell会有8维向量表示，我们把它们都拼接起来，最就形成了一个128维向量
16 cells * 8 orientations = 128 dimensional descriptor

128 dim vector normalized to 1 对这128维向量进行归一化
Threshold gradient magnitudes to avoid excessive influence of high gradients 对梯度幅值进行阈值处理（模值大于0.2）
- after normalization, clamp gradients > 0.2
- renormalize

13. Numeric Example

Various code available http://www.cs.ubc.ca/~lowe/keypoints/

13. How to achieve the matching result?

Given a feature in $I_1$ , how to find the best match in $I_2$ ?
- Define distance function that compares two descriptors
- Test all the features in I 2 , find the one with min distance
Simple way:
- Calculate L 2 distance, $||f 1 - f 2 ||$
- But, may give good scores to ambiguous (incorrect) matches
Better way:
- Calculate ratio distance = $||f 1-f 2 || / || f 1-f 2 ’ ||$
  - $f_2$ is best match to $f_1$ in $I_2$
  - $f_2 ’$ is $2^{nd}$ best match to $f_1$ in $I_2$
  - gives large values for ambiguous matches

Image Feature Extraction

1. Why computer vision is challenging?

2. Motivation for using local features

3. General Approach

4. Characteristics of good features

5. Keypoint extraction: Corners

6. Corner Detection

6.1 Derivation

6.2 研究$M$​矩阵

6.2.1 假设窗口只朝x，y方向移动

6.2.2 假设窗口只朝有一定转角

6.3 改进

7. The Harris corner detector

8. Harris Detector：Robustness of corner features

8.1 Affine intensity change 像素值强度发生仿射变化，例如光照

8.2 Image translation 图像平移

8.3 Image rotation

8.4 Scaling

9. Blob detection 为了解决尺度问题

9.1 Laplacian of Gaussian

9.2 Laplacian of Gaussian’s scale normalization

9.3 Laplacian of Gaussian’s scale

10. Scale space blob detector

11. Scale-invariant feature transform (SIFT)

12. Representing keypoints with descriptors

13. Numeric Example

13. How to achieve the matching result?

6.2 研究$M$矩阵