Smurf
文章50
标签0
分类6
Structure from motion (SFM)

Structure from motion (SFM)

Structure from motion (SFM)

Structure from motion (SFM)

1. Affine structure from motion

  • Given a set of corresponding points in two or more images, compute the camera parameters and the 3D point coordinates

  • 从图像中找一些点,再恢复到3D空间

image-20211126101000205

1.1 Recall: Corresponding point detection

  • Detect SIFT features

image-20211126101044938

  • Match features between each pair of images

image-20220201113742762

  • 何种方法可能会出现错误
  • Use RANSAC to estimate fundamental matrix between each pair
    • 不断找8个点去估计F矩阵,对于正确的F,一定可以使得这些点满足矩阵变换关系,那么当F保留内点最多时,则F正确
    • 这样估计出来的F可以用于去除一些错误的匹配
  • Keep only the matches at are “inliers” with respect to the best fundamental matrix
    • 只保留与最佳基本矩阵相关的“内联”匹配

1.2 Structure from motion

  • Given: $m$ images of $n$ fixed 3D points

  • estimate $m$ projection matrices $\mathbf{P}_{i}$ motion

  • estimate $n$​ 3D points $\mathbf{X}_{j}$​ from the $m n$​ correspondences $\mathbf{x}_{i j}$​ structure

image-20211126101549814

1.3 Orthographic Projection

image-20211126101720125

1.4 Affine cameras

  • A general affine camera combines the effects of an affine transformation of the $3 D$ space, orthographic projection, and an affine transformation of the image:
  • Affine projection is a linear mapping + translation in non-homogeneous coordinates (in Euclidean space))
    • 转换为非齐次坐标系

image-20211126102039611

image-20211126102049395

  • Given: $m$ images of $n$ fixed 3D points:
  • Problem: use the $m n$ correspondences $x_{i j}$ to estimate $m$ projection matrices $A_{i}$ and translation vectors $b_{i}$, and $n$ points $X_{j}$

  • Centering: subtract the centroid of the image points in each view

    • 对于每张图片做一个中心化处理
    • 这种变换等价于在空间上先做中心化,在进行投影
  • For simplicity, set the origin of the world coordinate system to the centroid of the 3D points
    • 以世界坐标系的原点为中心点,这样我们就可以简化表达式
  • After centering, each normalized $2 D$ point is related to the $3 D$ point $X_{j}$​ by
  • Let’s create a $2 m \times n$ data (measurement) matrix:
  • $x_{ij}$​是21,$A_i$​是2\3,$X_i$​是3*1

  • The measurement matrix $\mathbf{D}=\mathbf{M S} $​ must have rank 3 !

1.5 Factorizing the measurement matrix

  • Singular value decomposition of D:

image-20211126103159399

  • Obtaining a factorization from SVD:
    • 奇异值分解后,我们选取特征值最大的三个值所对应的$U$和$V$

image-20211126103423408

  • 因为恢复存在歧义性所以恢复出来的三维结构可能会失去深度信息,即会发生变形
    • $HS$为仿射变换,所以会发生变形
  • The decomposition is not unique. We get the same D by using any $3 \times 3$​ matrix $\mathrm{C}$​ and applying the transformations $\mathrm{M} \rightarrow \mathrm{MC}, \mathrm{S} \rightarrow \mathrm{C}^{-1} \mathrm{~S}$​
  • That is because we have only an affine transformation and we have not enforced any Euclidean constraints (like forcing the image axes to be perpendicular, for example)

image-20211126104218988

  • 一种解决方法:加入先验
  • 第二种通过灯箱实验,加入适当约束

2. Projective structure from motion

  • Given: $m$ images of $n$ fixed 3 D points
  • Problem: estimate $m$​ projection matrices $\boldsymbol{P}_{i}$​ and $n$​ 3D points $\boldsymbol{X}_{j}$​ from the $m n$​ correspondences $x_{i j} $​

image-20211126104451271

  • With no calibration info, cameras and points can only be recovered up to a $4 \times 4$​ projective transformation $\boldsymbol{H}$​​ :
    • 加入标定信息的变换

2.1 Problem

  • Factorization method (by SVD) Assume all points are visible.
    • This not true if: occlusions occur
      • 如果出现遮挡,会出现缺失点,导致一些点用不了
    • failure in establishing correspondences
      • 其次如果点的对应关系无法找到也用不了

2.2 Algebraic approach : Two-camera case

  • Compute fundamental matrix $\mathbf{F}$ between the two views From at least 8 point correspondences, compute $F$ associated to camera 1 and 2
  • Use $\mathrm{F}$​ to estimate projective cameras
  • Use these cameras to triangulate and estimate points in 3D

image-20211126105147946

3. Bundle adjustment

  • Non-linear method for refining structure and motion
  • Minimize reprojection error
  • D is the nonlinear mapping
  • Nonlinear minimization: Newton Method

image-20211126105311467

4. What is deep learning? How does it work?

  • Fully connect feedforward neural network
  • Loss function
  • Optimization of deep neural network

4.1 Perceptron

  • Mathematical model

image-20211126105835357

  • Distance from a incorrect classified point to the line

image-20211126110029692

image-20211126110044982

  • Goal: make L(w, b) as small as possible
  • Method: optimize $w$ and $b$ through gradıent descent

image-20211126110116316

  • Let’s optimize w as an example

image-20211126110205932

Question

  • Can we solve the problem below with perceptron?

image-20211126110351578

  • Goal: classifying samples to class A and class B

4.2 Idea: the Two-Layer Perceptron

  • Two-Layer Perceptron

    • For the XOR problem, draw two, instead, of one lines
  • What did it do?

image-20211126110551934

L4.3 et’s consider more general cases

  • Recall: Machine Learning ≈ Looking for a function f

  • Speech Recognition

image-20211126110802082

  • Image Recognition

image-20211126110815562

  • Playing Go

image-20211126110827467

  • Dialogue System

image-20211126110847856

4.4 Framework

image-20211126110940617

image-20211126111008090

image-20211126111047392

image-20211126111200415

4.5 Neuron

image-20211126111333640

4.6 Other activation functions

  • Rectified Linear Unit (ReLU)
  • Leaky ReLU
  • Softplus

4.7 Neural Network

image-20211126111633796

  • Network parameter 𝜃𝜃:all the weights and biases in the “neurons”

4.8 Fully Connect Feedforward Network

image-20211126111738784

image-20211126111820072

4.9 Matrix Operation

image-20211126111918831

image-20211126111954703

4.10 Output Layer as Multi-Class Classifier

image-20211126112101546

  • Softmaxlayer as the output layer
    • 为了更好的可解释性

image-20211126112236370

image-20211126112345078

4.11 Example Application

image-20211126113129050

4.12 Loss for an Example

image-20211126113240123

4.13 Total Loss

image-20211126113256440

4.14 Gradient Descent

image-20211126113342471

image-20211126113351919

image-20211126113408873

4.15 Local Minima

image-20211126113520205

  • Do we really minimize the total loss in engineering?

4.16 Shuffle the samples for each epoch

image-20211126113716067

4.17 Mini-batch Epoch

image-20211126113819013

  • Step #1: Randomly initialize network parameters

image-20211126113851854

  • Mini-batch is Faster

image-20211126114036946

4.18 Vanishing gradient and ReLU

image-20211126114221334

  • 梯度压缩,导致梯度反向传播时只能影响靠近输出的layer的参数

4.19 Rectified Linear Unit (ReLU)

image-20211126114636038

image-20211126114705978

image-20211126114720514

4.20 ReLU -variant

image-20211126114743861

4.21 Learnable activation function -Maxout

  • Learnable activation function [Ian J. Goodfellow, ICML’13]

image-20211126114855754

  • You can have more than 2 elements in a group

  • ReLu是Maxout的特俗情况

image-20211126115038207

  • 因为maxout会随着w变化,所以是可学习的激活函数

  • How many pieces depending on how many elements in a group

image-20211126115151293

  • 不同的样本会产生不同的网络架构,从而改变模式和参数都达到调整

  • Given a training data x, we know which z would be the max

image-20211126115354399

  • Given a training data x, we know which z would be the max

image-20211126115416252

  • Train this thin and linear network1x2x

  • Different thin and linear network for different examples

4.22 Hard to find optimal parameters

image-20211126115500790

4.23 In physical world……

image-20211126115521247

4.24 Review: Vanilla Gradient Descent

  • Start at position $\theta^{0}$
  • Compute gradient at $\theta^{0}$
  • Move to $\theta^{1}=\theta^{0}-\eta \nabla L\left(\theta^{0}\right)$
  • Compute gradient at $\theta^{1}$
  • Move to $\theta^{2}=\theta^{1}-\eta \nabla L\left(\theta^{1}\right)$
  • Stop until $\nabla L\left(\theta^{t}\right) \approx 0$

image-20211126115657540

4.25 Momentum

  • Start at point $\theta^{0}$​
  • Movement $\mathrm{v}^{0}=0$​
  • Compute gradient at $\theta^{0}$​
  • Movement $\mathrm{v}^{1}=\lambda \mathrm{v}^{0}-\eta \nabla L\left(\theta^{0}\right)$​
  • Move to $\theta^{1}=\theta^{0}+\mathrm{v}^{1}$​
  • Compute gradient at $\theta^{1}$​
  • Movement $\mathrm{v}^{2}=\lambda \mathrm{v}^{1}-\eta \nabla L\left(\theta^{1}\right)$​
  • Move to $\theta^{2}=\theta^{1}+\mathrm{v}^{2}$​

  • Movement not just based on gradient, but previous movement

image-20211126115820067

image-20211126115845906

4.26 Dropout

  • 回忆一下,3.8节 (多层感知机) 的图3.3描述了一个单隐藏层的多层感知机。其中输入个数为 4 ,隐藏单元个数为 5 ,且隐藏单元 $h_{i}(i=1, \ldots, 5)$ 的计算表达式为
  • 这里 $\phi$ 是激活函数, $x_{1}, \ldots, x_{4}$ 是输入,隐藏单元 $i$ 的权重参数为 $w_{1 i}, \ldots, w_{4 i}$ ,偏差参数为 $b_{i}$ 。当对该隐藏层使用丟弃法时,该层的隐藏单元将有一定概率被至弃掉。设丟弃概率为 $p$ ,那么有 $p$ 的概率 $h_{i}$ 会被 清零,有 $1-p$ 的概率 $h_{i}$ 会除以 $1-p$ 做拉伸。丟弃概率是丟弃法的超参数。具体来说,设随机变量 $\xi_{i}$ 为 0 和 1 的概率分别为 $p$ 和 $1-p_{\text {。 }}$ 使用丟弃法时我们计算新的隐藏单元 $h_{i}^{\prime}$
  • 由于 $E\left(\xi_{i}\right)=1-p$ ,因此
  • 即丟弃法不改变其输入的期望值。让我们对图3.3中的隐藏层使用丟弃法,一种可能的结果如图3.5所示,其中 $h_{2}$ 和 $h_{5}$ 被清零。这时输出值的计算不再依赖 $h_{2}$ 和 $h_{5}$ ,在反向传播时,与这两个隐藏单元相关的权重 的梯度均为 0 。由于在训练中隐藏层神经元的丢弃是随机的,即 $h_{1}, \ldots, h_{5}$ 都有可能被清零,输出层的计算无法过度依赖 $h_{1}, \ldots, h_{5}$ 中的任一个,从而在训练模型时起到正则化的作用,并可以用来应对过拟 合。在测试模型时,我们为了拿到更加确定性的结果,一般不使用丟弃法。
1
2
3
4
5
6
7
8
9
10
def dropout(X, drop_prob):
X = X.float()
assert 0 <= drop_prob <= 1
keep_prob = 1 - drop_prob
# 这种情况下把全部元素都丢弃
if keep_prob == 0:
return torch.zeros_like(X)
mask = (torch.rand(X.shape) < keep_prob).float()

return mask * X / keep_prob

image-20211126120000491

  • Each time before updating the parameters
  • Each neuron has p% to dropout
    • The structure of the network is changed.
      Thinner!
  • Using the new network for training
  • For each mini-batch, we resample the dropout neurons

image-20211126120100649

image-20211126120202155

  • Why the weights should multiply (1-p)% (dropout rate) when testing?

image-20211126160805493

  • Dropout is a kind of ensemble

image-20211126160829113

  • Train a bunch of networks with different structures

image-20211126160844987

image-20211126160914766

image-20211126160922403

本文作者:Smurf
本文链接:http://example.com/2021/08/15/cv/10.%20Structure%20from%20motion%20(SFM)/
版权声明:本文采用 CC BY-NC-SA 3.0 CN 协议进行许可