fitting

1. Def

Choose a parametric model to represent a set of features

2. Challenges

Noise in the measured feature locations
Extraneous data（外点）: clutter (outliers), multiple lines 交叉干扰
Missing data: occlusions 遮挡

3. Overview

If we know which points belong to the line, how do we find the “optimal”
line parameters?
- Least squares
What if there are outliers?
- Robust fitting, RANSAC
What if there are many lines?
- Voting methods: RANSAC, Hough transform
What if we’re not even sure it’s a line?
- Model selection: Snake (not covered)

4. Least squares line fitting

Data: $\left(x_{1}, y_{1}\right), \ldots,\left(x_{n}, y_{n}\right)$
Line equation: $y_{i}=m x_{i}+b$
Find $(m, b)$ to minimize

$E=\sum_{i=1}^{n}\left(y_{i}-m x_{i}-b\right)^{2}$ $E=\|Y-X B\|^{2} \text { where } Y=\left[\begin{array}{c} y_{1} \\ \vdots \\ y_{n} \end{array}\right] \quad X=\left[\begin{array}{cc} x_{1} & 1 \\ \vdots & \vdots \\ x_{n} & 1 \end{array}\right] \quad B=\left[\begin{array}{c} m \\ b \end{array}\right]$

是样本点均匀分布在直线两边

4.1 Least squares line fitting

$\begin{array}{l} E=\| Y-X B\|^{2}\\ =(Y-X B)^{T}(Y-X B)\\ =Y^{T}-2(X B)^{T} Y+(X B)^{T}(X B) \end{array}$ $\begin{aligned} &\frac{d E}{d B}=2 X^{T} X B-2 X^{T} Y=0 \\ &X^{T} X B=X^{T} Y \end{aligned}$

Normal equations: least squares solution to B

$B=\left(X^{T} X\right)^{-1} X^{T} Y$

4.2 Problem

Not rotation-invariant
Fails completely for vertical lines

4.3 Total least squares

Distance between point $(x_i, y_i)$ and line 利用点直线的距离$ax+by=d (a^2+b^2=1): |ax_i + by_i – d|$

$ax+by=d$ $\text{distance}=\frac{|ax_i+by_i-d|}{\sqrt{a^2+b^2}}=|ax_i+by_i-d|$

Find $ (a, b, d) $ to minimize the sum of squared perpendicular distances

$E=\sum_{i=1}^{n}\left(a x_{i}+b y_{i}-d\right)^{2}$ $\frac{\partial E}{\partial d}=\sum_{i=1}^{n}-2\left(a x_{i}+b y_{i}-d\right)=0$ $d=\frac{a}{n} \sum_{i=1}^{n} x_{i}+\frac{b}{n} \sum_{i=1}^{n} y_{i}=a \bar{x}+b \bar{y}$ $\begin{array}{ll}\\ E=\sum_{i=1}^{n}\left(a\left(x_{i}-\bar{x}\right)+b\left(y_{i}-\bar{y}\right)\right)^{2}\\ \ \ \ =\left\|\left[\begin{array}{cc} x_{1}-\bar{x} & y_{1}-\bar{y} \\ \vdots & \vdots \\ x_{n}-\bar{x} & y_{n}-\bar{y} \end{array}\right]\left[\begin{array}{c} a \\ b \end{array}\right]\right\|^{2}\\ \ \ \ =(U N)^{T}(U N) \end{array}$ $\frac{d E}{d N}=2\left(U^{T} U\right) N=0$

$A$ is a $n \times n$ matrix, $x$ is a non-zero vector, if

$A x=\lambda x, \quad x \neq 0$

$x$ is one eigenvectors of $A$, and $\lambda$ is one of eigenvalues.

$\frac{d E}{d N}=2\left(U^{T} U\right) N=0$

Solution to $(U^TU)N=0$, s.t. $|N|^2=1$:

$U=\left[\begin{array}{cc} x_{1}-\bar{x} & y_{1}-\bar{y} \\ \vdots & \vdots \\ x_{n}-\bar{x} & y_{n}-\bar{y} \end{array}\right]$ $N=\left[\begin{array}{cc} a\\ b \end{array}\right]$

eigenvector of $U^T U$ associated with the smallest eigenvalue. 即对应最小特征值的特征向量为其解

$(a,b)^T\dotproduct(x_i-\bar{x},y_i-\bar{y})$相当于投影距离

4.5 Least squares as likelihood maximization

Generative model: line points are sampled independently and corrupted by Gaussian noise in the direction perpendicular(垂直的) to the line
- 即每个点会被独立采样，且其到直线的距离符合高斯分布

Likelihood of points given line parameters (a, b, d):

$\varepsilon=\frac{(ax_{i}+by_{i}-d)^2}{\sqrt{a^{2}+b^{2}}} \Rightarrow \varepsilon=(a x_{i}+b y_{i}-d)^2\sim N\left(0,\sigma^{2}\right)$ $\begin{aligned} P\left(x_{1}, y_{1}, \ldots, x_{n}, y_{n} \mid a, b, d\right) &=\prod_{i=1}^{n} P\left(x_{i}, y_{i} \mid a, b, d\right) \\ & \propto \prod_{i=1}^{n} \exp \left(-\frac{\left(a x_{i}+b y_{i}-d\right)^{2}}{2 \sigma^{2}}\right) \end{aligned}$

Log-likelihood:

$L\left(x_{1}, y_{1}, \ldots, x_{n}, y_{n} \mid a, b, d\right)=-\frac{1}{2 \sigma^{2}} \sum_{i=1}^{n}\left(a x_{i}+b y_{i}-d\right)^{2}$

4.6 Problem:

Problem: squared error heavily penalizes outliers

5. Robust estimators

General approach: find model parameters $\theta$ that minimize

$\sum_{i} \rho\left(u_{i}\left(x_{i}, \theta\right) ; \sigma\right)$

$u_{i}\left(x_{i}, \theta\right)-$ residual of ith point w.r.t. model parameters $\theta$ $\rho$ - robust function with scale parameter $\sigma$
- 可以理解为给原先的距离加了个robust function

The robust function ρ behaves like squared distance for small values of the residual u but saturates(平滑) for larger values of u

Attention：

Robust fitting is a nonlinear optimization problem that must besolved iteratively
Least squares solution can be used for initialization
Scale of robust function should be chosen carefully

$\rho(u ; \sigma)=\frac{u^{2}}{\sigma^{2}+u^{2}}$

$\sigma$ too big: The error value is almost the same for every point and the fit is very poor

$\sigma$ too large: Behaves much the same as least squares

6. RANSAC

Robust fitting can deal with a few outliers 离群值 —— what if we have very many?
Random sample consensus (RANSAC):
Very general framework for model fitting in the presence of outliers
Outline
- Choose a small subset of points uniformly at random. 随机均匀采样
- Fit a model to that subset 拟合模型
- Find all remaining points that are “close” to the model and reject the rest asoutliers(去除外点) 记录模型的内点数
- Do this many times and choose the best model 超过内点数阈值的model
Least squares fit

Randomly select minimal subset of points:

Hypothesize a model 假设模型

Compute error function 计算剩余点到直线的距离

Select points consistent with model 记录内点（设置距离阈值）

Repeat hypothesize and verify loop

Uncontaminated(未污染) sample

从而得到内点最多的模型为我们需要的模型

Steps:

Repeat $N$ times:
Draw $s$ points uniformly at random.Fit line to these s points
Find inliers to this line among the remaining points (i.e., points whose distance from the line is less than $t$)
lf there are $d$ or more inliers, accept the line and refit using allinliers 如果内点足够多，则用这些内点重新拟合，并得到这个model
有k个模型，内点数目大于d，这些都可以输出

6.1 Choosing the parameters

lnitial number of points $s$
- Typically minimum number needed to fit the model.
Distance threshold $t$
Number of iterations $N$
- Choose $N$ so that, with probability $p$, at least one random sample is free from outliers （没有异常值） (e.g. p=0.99) (outlier ratio: e)
- 有0.99的置信度，在N次迭代后，进行随机采样中至少有一个点不是外点
N 如何选？假设100次迭代后，我们采样得到的是outlier，那么可以用公式表示

$\begin{array}{l} \left(1-(1-e)^{s}\right)^{N}=1-p \\ N=\log (1-p) / \log \left(1-(1-e)^{s}\right) \end{array}$

$(1-e)^s$表示随机采样s个点，都是内点，$(1-(1-e)^s)$一次迭代中随机采样的点有外点

Problem:

Outlier ratio e is often unknown a priori, so pick worst case, e.g. 50%,
and adapt if more inliers are found, e.g. 80% would yield e=0.2 即以最坏情况找到的内点百分比，来计算e
Adaptive procedure:
- $N=\infty$, sample_count $=0$ 一开始初始迭代无穷次，采样0次
- While $N>$ sample_count 当当前的迭代数大于已经采样数，则继续运算
  - Choose a sample and count the number of inliers
  - 采样一个样本，去拟合模型，算出内点
  - If inlier ratio is highest of any found so far, set
    
    $\mathrm{e}=1-($ number of inliers $) /($ total number of points)
    - 如果内点率升高，则更新e，因为这说明e应该更小
  - Recompute $N$ from $e$ : 更新N
  - Increment the sample_count by 1
sample count指的是总的拟合次数，即对每个N进行搜索，N随着拟合会逐渐下降，直到降到总拟合次数以下
lnitial number of points s
- Typically minimum number needed to fit the model
Distance threshold t
- Choose t so probability for inlier is p (e.g.0.95). Zero-mean Gaussian noise with std.dev. o: t2=3.84?.
Number of samples N
- Choose N so that, with probability p, at least one random sample is free from outliers (e.g.p=0.99) (outlier ratio: e)
Consensus set size d 判断该模型是否正确
- Should match expected inlier ratio

6.2 RANSAC pros and cons

Pros
- Simple and general
- Applicable to many different problems
- Often works well in practice

Cons
- Lots of parameters to tune 参数过多
- Doesn’t work well for low inlier ratios (too many iterations,
  or can fail completely) 内点少则很难收敛
- Can’t always get a good initialization of the model based on the minimum number of samples 很难得到好的初始值
- Refine by Least Squares

6.3 平面拟合

$\left[\begin{array}{cc} x_a\\ y_a \end{array}\right]= \left[\begin{array}{cc} a & b & c\\ d & e & f \end{array}\right] \left[\begin{array}{cc} x_b\\ y_b\\ 1 \end{array}\right]$

至少要三个点才能拟合

7. Hough Transform

7.1 Voting schemes

Let each feature vote for all the models that are compatible（兼容） with it
- Hopefully the noise features will not vote consistently for any single model 噪点不会在单独的模型下拟合
- Missing data doesn’t matter as long as there are enough features remaining to agree on a good model 数据足够

7.2 An early type of voting scheme

Discretize parameter space into bins
For each feature point in the image, put a vote in every bin in the parameter space that could have generated this point
Find bins that have the most votes

在参数空间进行投票，每个特征点都要在参数空间投票，投票最多的bins，为所需要的参数

7.3 Parameter space representation

A line in the image corresponds to a point in Hough space

What does a point $(x_0 , y_0 )$ in the image space map to in the
Hough space?
- Answer: the solutions of $y_0 = m x_0 + b$
- This is a line in Hough space
- 可以理解为，过图像中的一点，在参数空间的表达式为一条直线

Where is the line that contains both $(x_0 , y_0 ) $and $(x_1 , y_1)$？
- It is the intersection of the lines $b = x_0 m + y_0$ and $b = x_1 m + y_1$

Problems with the ( m,b )
- Unbounded parameter domains
- Vertical lines require infinite m （m，b）是无限的，所以（m,b）的范围无法限定
Alternative: polar（极坐标） representation

Each point $(x,y)$ will add a sinusoid(正弦波) in the $(\theta,\rho)$ parameter space

7.4 Algorithm outline

Initialize accumulator（累加器） H to all zeros
For each feature point ( x,y ) in the image
- For θ = 0 to 180
  - $\rho=xcos\theta + ysin\theta$
  - $H(\theta,\rho)=H(\theta,\rho)+1$
- end
end
遍历图像每个数据点与$\theta$，得到多个$\rho$，从而对bins进行投票

7.5 Basic illustration

$\rho=\rho_0cos(\theta+\theta_0)\ \ s.t.\theta_0=\text{斜率}$ $\rho_0=\frac{\rho} {cos(\theta+\theta_0)}\ for\ any\ line\ through (\rho,\theta)$

所以我们可以发现，一条直线映射后是一个点
而一个点映射后是sec

Other shapes：

A more complicated image

具体过程是先识别出边缘点，再利用边缘点进行拟合

7.6 Effect of noise

Peak gets fuzzy and hard to locate
由于一个点拟合后是余割，那么当有噪音时，会导致，这多个点拟合的余割无法准确交于一个点，造成难以定位

Number of votes for a line of 20 points with increasing noise:

随着噪音增加，那么投票在一个bin的值就会减少

7.7 Random points

Uniform noise can lead to spurious peaks in the array
- 均匀噪声会导致阵列中出现杂散峰值
As the level of uniform noise increases, the maximum number of
votes increases too:

当随机噪声增加，会导致最大投票值也增加，这是因为噪声是均匀分布的，相当于以一定比例增加参数空间所有投票

7.8 Dealing with noise

Choose a good grid / discretization 利用稍微扩大的网格增加容错率
- Too coarse（粗）: large votes obtained when too many different lines correspond to a single bucket 有可能使得许多不同的直线投票到同一个桶，原因在于格子过粗，会导致丢失原本图像中的信息
- Too fine（细）: miss lines because some points that are not exactly collinear cast votes for different buckets 可以这么理解，就是刻画过于精细，导致无法容忍一点噪音，从而无法找到交点
Increment neighboring bins (smoothing in accumulator array)
- 其意思是，在对一个格子投票时，同时对其领域进行投票，但是其投票总和还是1，如下图：

Try to get rid of irrelevant features
- E.g.take only edge points with significant gradient magnitude

7.9 Incorporating（合并） image gradients

When we detect an edge point, we also know its gradient orientation
But this means the line is uniquely determined!
Modified Hough transform:
- 因为对于一个边缘点，其实我们是可以唯一确定他的直线方向的，那么我们就不用对$\theta$进行量化了
- 只需利用梯度的角度值，进行计算相应的$\rho$，并对$\rho$进行量化即可
For each edge point ( x,y )
- θ = gradient orientation at ( x,y）
- ρ = x cos θ + y sin θ
- H(θ, ρ ) = H(θ, ρ ) + 1
end

7.10 Hough transform for circles

How many dimensions will the parameter space have?
Given an unoriented edge point, what are all possible bins that it can
vote for?
What about an oriented edge point?

对于定义一个圆我们需要一个圆心和半径坐标

而对于图中任意一个点，它映射到参数空间是一个圆锥

$(x_0-a)^2+(y_0-b)^2=r^2$

给定点$(x_0,y_0)$，由于我们即不知道方向，也不知道半径，所以形成一个圆锥方程，其过$(x_0,y_0,0)$这一点：

$(a-x_0)^2+(b-y_0)^2=r^2$

如果我们知道半径大小，则退化为一个圆
对于一个确定方向的点，映射为两个射线：

$\left\{ \begin{array}{cc} x_0=a+rcos\theta\\ y_0=b+rsin\theta \end{array}\right.$ $tan\theta=\frac{y_0-b}{x_0-a} \text{ is a constant}$

所以是两条过$(x_0,y_0)$的射线，

显然，这是因为一个点梯度的垂直方向有两边，即可以确定两个圆
我们已知每个边缘点的梯度，我们就可以以此计算出每个边缘点对应的圆心

$\left\{\begin{array}{r} a=x_0-rcos\theta\\ b=y_0-rsin\theta \end{array}\right.$

再进行投票即可

7.11 Application in recognition

We want to find a template defined by its reference point (center) and several distinct types of landmark points in stable spatial configuration
学习特征点与中心点相对的关系

7.10 Hough transform: Pros and cons

Pros
- Can deal with non locality and occlusion 可以处理非局部性和遮挡
- Can detect multiple instances of a model
- Some robustness to noise: noise points unlikely to contribute consistently to any single bin
Cons
- Complexity of search time increases exponentially with the number of model parameters
- Non target shapes can produce spurious （虚假的） peaks in parameter space 非目标形状可能会产生虚假形状(虚假的）参数空间中的峰值
- It’s hard to pick a good grid size