Loading [MathJax]/jax/output/CommonHTML/jax.js
Smurf
文章50
标签0
分类6
Image Feature Extraction

Image Feature Extraction

Image Feature Extraction

1. Why computer vision is challenging?

  • Viewpoint variation 视觉角度
  • Illumination 光照
  • Occlusion 遮挡
  • Scale 尺度
  • Deformation 姿态变形
  • Background clutter 背景杂乱
  • Local ambiguity 局部迷惑
  • Object intra class variation 目标内部变换

2. Motivation for using local features

  • Occlusions
  • Articulation 关节

image-20210904202810079

  • Intra category variations 局部不变

image-20210904202838940

3. General Approach

  1. Find a set of distinctive keypoints 找到一系列关键点
  2. Define a region around each keypoint 在关键点周围定义对应的区域
  3. Extract and normalize the region content 从区域中提取内容并归一化
  4. Compute a local descriptor from the normalized region 用特征算子提取特征
  5. Match local descriptors 匹配局部描述信息

4. Characteristics of good features

  • Compactness and efficiency 紧凑且高效,要求关键点远少于图片pixel,且提取得算法是高效的,存储量小的
  • Saliency 具有判别力
    • Each feature is distinctive
  • Locality 局部特征不会变
  • Repeatability 可重复性

5. Keypoint extraction: Corners

image-20210904204836818

image-20210904204908615

  • 平滑部分区域的各个方向梯度都为零
  • 边缘区域有一个方向的梯度不为零
  • 角点位置至少有两个方向的梯度值不为零

6. Corner Detection

  • 最终结果是一个窗口大小的像素,不是只有一个像素

6.1 Derivation

E(u,v)=(x,y)W[I(x+u,y+v)I(x,y)]2
  • 窗口移动前后像素值变化
  • 类似于欧式距离

image-20210904212211890

image-20210904212228132

E(u,v)=(x,y)Ww(x,y)[I(x+u,y+v)I(x,y)]2
  • window function 取窗口为矩形波或者高斯窗口

image-20210904212417658

  • First-order Taylor approximation for small motions [u, v]:
I(x+u,y+v)I(x,y)+Ixu+Iyv
  • 一维泰勒:
f(x0+h)=f(x0)+f(x0)1!h+f(x0)2!h2+o(h2)
  • 二维泰勒:
f(x0+h,y0+k)=f(x0,y0)+fx(x0,y0)h+fy(x0,y0)k1!+fxx(x0,y0)2!h2+fxy(x0,y0)2!hk+fyx(x0,y0)2!hk+fyy(x0,y0)2!k2+...
  • 代入E(u,v)
E(u,v)=(x,y)W(Ixu+Iyv)2=(x,y)wI2xu2+2IxIyuv+I2yv2E(u,v)u2x,yI2x+2uvx,yIxIy+v2x,yI2y

image-20210904213920321

E(u,v)[u,v][x,ywI2xx,ywIxIyx,ywIxlyx,ywI2y][uv]

6.2 研究M​矩阵

  • A horizontal slice ” of 𝐸(𝑢,𝑣)is given by the equation of an ellipse: 做垂直切片,即令E=const
[uv]M[uv]=const

image-20210904214334583

M=[x,ywI2xx,ywIxIyx,ywIxlyx,ywI2y]

6.2.1 假设窗口只朝x,y方向移动

  • M​矩阵为对角矩阵,记为:
M=[a00b]
  • 即没有旋转角度的影响,角点刚好和x,y方向平行
[u,v][a00b][uv]=1

image-20210904215934434

au2+bv2=1u2(a12)2+v2(b12)2=1
  • 这个式子说明,a,b都不接近0,则为角点,这说明E随u,v呈现至少两个方向的变化。
  • 当a或b其中一个接近于0,则为边缘,因为此时椭圆坍塌为一条直线,这说明E随着u,v变化只有一个方向。
  • 当a和b都很接近于零,则说明,E随着u,v基本不变,则说明此时该部分为平滑区域。

6.2.2 假设窗口只朝有一定转角

  • 由于:
M=[x,ywI2xx,ywIxIyx,ywIxlyx,ywI2y]
  • 我们可以知道M是对称矩阵,所以M可以进行正交分解:
M=R1[λ100λ2]R
  • 不难发现,R是决定旋转的旋转矩阵,λ1,λ2M的特征值,所以真正决定该处是否为角点的是M的特征值。

image-20210905151352207

6.3 改进

R=det(M)αtrace(M)2=λ1λ2α(λ1+λ2)2
  • 用这种方法,可以避免计算矩阵的特征值。

image-20210905152039117

  • 如果λ1,λ2可以比拟,且均不接近零,可以推出R>0,则为角点:
R=λ1λ2α(λ1+λ2)2=λ2α(4λ2)=0.84λ2>0
  • 如果λ1,λ2​可以比拟,且均接近零,可以推出|R|0​​​,则为平滑区域:
R=λ1λ2α(λ1+λ2)2=λ2α(4λ2)=0.84λ20
  • 如果λ1λ2,则可以推出R<0,为边缘处:
R=λ1λ2αλ21<0

7. The Harris corner detector

  • Compute partial derivatives at each pixel 计算每个像素值的梯度,即利用高斯偏导核做卷积

  • Compute second moment matrix M in a Gaussian window around each pixel 用窗口遍历所有像素,并计算每个M​矩阵

M=[x,ywI2xx,ywIxIyx,ywIxlyx,ywI2y]
  • Compute corner response function R
R=det(M)αtrace(M)2(α(0.04,0.06))
  • Threshold R​, 对 R​ 进行阈值处理,以去除不必要的噪音
  • Find local maxima of response function nonmaximum suppression) 非极大抑制

8. Harris Detector:Robustness of corner features

8.1 Affine intensity change 像素值强度发生仿射变化,例如光照

image-20210905153852375

  • b 相当于均匀光照,对 R 的计算没有影响;
  • 对于 a 虽然会产生一定增益,但是部分角点的判别不会发生改变

8.2 Image translation 图像平移

image-20210905160344357

  • Derivatives and window function are shift invariant Corner location 检测到的角点会发生位置的变化
  • Corner location is covariant w.r.t . translation

8.3 Image rotation

image-20210905160623756

  • Second moment ellipse rotates but its shape (i.e. eigenvalues ) remains the same

  • Corner location is covariant w.r.t . rotation

8.4 Scaling

image-20210905160724042

  • 会导致角点被检测为边缘

  • Corner location is not covariant w.r.t .

9. Blob detection 为了解决尺度问题

9.1 Laplacian of Gaussian

  • Circularly symmetric operator for blob detection in 2D

image-20210905161142396

2g=2gx2+2gy2
  • Edge = ripple波纹
  • Blob = superposition of two ripples 两个波纹的叠加

image-20210905170618778

  • Spatial selection: the magnitude of the Laplacian response will achieve a maximum at the center of the blob, provided the scale of the Laplacian is matched ” to the scale of the blob.

  • 当拉普拉斯核的规模和斑点相匹配,拉普拉斯响应模值处极为斑点中心处

  • However, Laplacian response decays as scale increases: 拉普拉斯响应会随着规模增大而逐渐减小

image-20210905171418159

9.2 Laplacian of Gaussian’s scale normalization

  • 对于高斯一阶偏导核响应曲线与x轴围成的面积为1σ2πσ与响应强度呈反比:​

image-20210905171634346

  • To keep response the same (scale invariant), must multiply Gaussian derivative by σ

  • Laplacian is the second Gaussian derivative, so it must be multiplied by σ2

  • Effect of scale normalization:

image-20210905172106477

  • Scale-normalized Laplacian of Gaussian:
2norm g=σ2(2gx2+2gy2)

9.3 Laplacian of Gaussian’s scale

  • 拉普拉斯算子在多大尺度上对半径为 r 的二元圆达到最大响应?

image-20210905172345683

  • To get maximum response, the zeros of the Laplacian have to be aligned with
    the circle 为了得到最大的响应,拉普拉斯零点需要与斑点对齐

image-20210905172529389

  • 因为这样才可以把拉普拉斯小于零的部分完全抵消,注意上图,两条曲线做卷积后才得到响应。

  • 演算:

Gσ=12πσ2exp(x2+y22σ2)Gσx=xσ2exp(x2+y22σ2)Gσy=yσ2exp(x2+y22σ2)2Gσx2=x2σ2σ4exp(x2+y22σ2)2Gσy2=y2σ2σ4exp(x2+y22σ2)LoG=σ2(2gx2+2gy2)=x2+y22σ2σ2exp(x2+y22σ2)
  • Let LoG=0
x2+y22σ2=0σ=r2

10. Scale space blob detector

  • Convolve image with scale normalized Laplacian at several scales 用归一化后的拉普拉斯核与图片卷积

  • Find maxima of squared Laplacian response in scale space 找到拉普拉斯响应平方的最大值

image-20210905174440732

  • 对于每一个像素点,不仅要比较邻近的八个像素点,还要比较相邻尺度的18个像素点

11. Scale-invariant feature transform (SIFT)

  • Approximating the Laplacian with a difference of Gaussians:

image-20210905174726451

  • Laplacian:
L=σ2(Gxx(x,y,σ)+Gyy(x,y,σ))
  • Difference of Gaussians:
DoG=G(x,y,kσ)G(x,y,σ)D(x,y,σ)=(G(x,y,kσ)G(x,y,σ))I(x,y)=L(x,y,kσ)L(x,y,σ)
  • Scale space construction 尺度空间创建
  • 由于计算DoG等价于计算不同σ的高斯核的差分,所以要像下图计算:

image-20210905175349385

  • 首先是 2n*2n 的高斯核得到的响应,进行差分后才得到DoG,之后在进行下采样从而缩小高斯核的规模,再进行卷积得到响应,接着差分在得到一系列DoG
  • 最后再找出最大的像素,即为角点

12. Representing keypoints with descriptors

  • 用描述符表示关键点

image-20210905175846479

  • 对于每个窗口提取出来的角点,我们算出这些像素的梯度,然后将其每16个分为一个cell(4*4),对于每一个这样的cell我们用一个八维的向量表示它:第一个维度表示0~45°的幅值,第二个维度表示45°~90°的幅值…依次类推,对于cell里的每个梯度值我们通过量化的方式决定其在那个维度,即:若有个像素的梯度值的角度为25°,幅值为3,那么它表示为(3,0,0,0,0,0,0,0),对于每一个像素都这样表示,再把cell里的所有坐标求和,即可得表示结果。
  • 对于16*16大小的图片(关键点),我们可以分为16个cell,每一个cell会有8维向量表示,我们把它们都拼接起来,最就形成了一个128维向量
  • 16 cells * 8 orientations = 128 dimensional descriptor

image-20210905180739701

  • 128 dim vector normalized to 1 对这128维向量进行归一化
  • Threshold gradient magnitudes to avoid excessive influence of high gradients 对梯度幅值进行阈值处理(模值大于0.2)
    • after normalization, clamp gradients > 0.2
    • renormalize

13. Numeric Example

image-20210906102214828

image-20210906102225222

image-20210906102235139

13. How to achieve the matching result?

  • Given a feature in I1​ , how to find the best match in I2 ?

    • Define distance function that compares two descriptors
    • Test all the features in I 2 , find the one with min distance
  • Simple way:

    • Calculate L 2 distance, ||f1f2||
    • But, may give good scores to ambiguous (incorrect) matches
  • Better way:

    • Calculate ratio distance = ||f1f2||/||f1f2||

      • f2 is best match to f1​​ in I2

      • f2​​​ is 2nd​​ best match to f1​ in I2

      • gives large values for ambiguous matches

image-20210906103123721

本文作者:Smurf
本文链接:http://example.com/2021/08/15/cv/4.1%20Image%20Feature%20Extraction/
版权声明:本文采用 CC BY-NC-SA 3.0 CN 协议进行许可