Structure from motion (SFM)
Structure from motion (SFM)
1. Affine structure from motion
Given a set of corresponding points in two or more images, compute the camera parameters and the 3D point coordinates
从图像中找一些点,再恢复到3D空间
1.1 Recall: Corresponding point detection
- Detect SIFT features
- Match features between each pair of images
- 何种方法可能会出现错误
- Use RANSAC to estimate fundamental matrix between each pair
- 不断找8个点去估计F矩阵,对于正确的F,一定可以使得这些点满足矩阵变换关系,那么当F保留内点最多时,则F正确
- 这样估计出来的F可以用于去除一些错误的匹配
- Keep only the matches at are “inliers” with respect to the best fundamental matrix
- 只保留与最佳基本矩阵相关的“内联”匹配
1.2 Structure from motion
Given: $m$ images of $n$ fixed 3D points
estimate $m$ projection matrices $\mathbf{P}_{i}$ motion
estimate $n$ 3D points $\mathbf{X}_{j}$ from the $m n$ correspondences $\mathbf{x}_{i j}$ structure
1.3 Orthographic Projection
1.4 Affine cameras
- A general affine camera combines the effects of an affine transformation of the $3 D$ space, orthographic projection, and an affine transformation of the image:
- Affine projection is a linear mapping + translation in non-homogeneous coordinates (in Euclidean space))
- 转换为非齐次坐标系
- Given: $m$ images of $n$ fixed 3D points:
Problem: use the $m n$ correspondences $x_{i j}$ to estimate $m$ projection matrices $A_{i}$ and translation vectors $b_{i}$, and $n$ points $X_{j}$
Centering: subtract the centroid of the image points in each view
- 对于每张图片做一个中心化处理
- 这种变换等价于在空间上先做中心化,在进行投影
- For simplicity, set the origin of the world coordinate system to the centroid of the 3D points
- 以世界坐标系的原点为中心点,这样我们就可以简化表达式
- After centering, each normalized $2 D$ point is related to the $3 D$ point $X_{j}$ by
- Let’s create a $2 m \times n$ data (measurement) matrix:
$x_{ij}$是21,$A_i$是2\3,$X_i$是3*1
The measurement matrix $\mathbf{D}=\mathbf{M S} $ must have rank 3 !
1.5 Factorizing the measurement matrix
- Singular value decomposition of D:
- Obtaining a factorization from SVD:
- 奇异值分解后,我们选取特征值最大的三个值所对应的$U$和$V$
- 因为恢复存在歧义性所以恢复出来的三维结构可能会失去深度信息,即会发生变形
- $HS$为仿射变换,所以会发生变形
- The decomposition is not unique. We get the same D by using any $3 \times 3$ matrix $\mathrm{C}$ and applying the transformations $\mathrm{M} \rightarrow \mathrm{MC}, \mathrm{S} \rightarrow \mathrm{C}^{-1} \mathrm{~S}$
- That is because we have only an affine transformation and we have not enforced any Euclidean constraints (like forcing the image axes to be perpendicular, for example)
- 一种解决方法:加入先验
- 第二种通过灯箱实验,加入适当约束
2. Projective structure from motion
- Given: $m$ images of $n$ fixed 3 D points
- Problem: estimate $m$ projection matrices $\boldsymbol{P}_{i}$ and $n$ 3D points $\boldsymbol{X}_{j}$ from the $m n$ correspondences $x_{i j} $
- With no calibration info, cameras and points can only be recovered up to a $4 \times 4$ projective transformation $\boldsymbol{H}$ :
- 加入标定信息的变换
2.1 Problem
- Factorization method (by SVD) Assume all points are visible.
- This not true if: occlusions occur
- 如果出现遮挡,会出现缺失点,导致一些点用不了
- failure in establishing correspondences
- 其次如果点的对应关系无法找到也用不了
- This not true if: occlusions occur
2.2 Algebraic approach : Two-camera case
- Compute fundamental matrix $\mathbf{F}$ between the two views From at least 8 point correspondences, compute $F$ associated to camera 1 and 2
- Use $\mathrm{F}$ to estimate projective cameras
- Use these cameras to triangulate and estimate points in 3D
3. Bundle adjustment
- Non-linear method for refining structure and motion
- Minimize reprojection error
- D is the nonlinear mapping
- Nonlinear minimization: Newton Method
4. What is deep learning? How does it work?
- Fully connect feedforward neural network
- Loss function
- Optimization of deep neural network
4.1 Perceptron
- Mathematical model
- Distance from a incorrect classified point to the line
- Goal: make L(w, b) as small as possible
- Method: optimize $w$ and $b$ through gradıent descent
- Let’s optimize w as an example
Question
- Can we solve the problem below with perceptron?
- Goal: classifying samples to class A and class B
4.2 Idea: the Two-Layer Perceptron
Two-Layer Perceptron
- For the XOR problem, draw two, instead, of one lines
What did it do?
L4.3 et’s consider more general cases
Recall: Machine Learning ≈ Looking for a function f
Speech Recognition
- Image Recognition
- Playing Go
- Dialogue System
4.4 Framework
4.5 Neuron
4.6 Other activation functions
- Rectified Linear Unit (ReLU)
- Leaky ReLU
- Softplus
4.7 Neural Network
- Network parameter 𝜃𝜃:all the weights and biases in the “neurons”
4.8 Fully Connect Feedforward Network
4.9 Matrix Operation
4.10 Output Layer as Multi-Class Classifier
- Softmaxlayer as the output layer
- 为了更好的可解释性
4.11 Example Application
4.12 Loss for an Example
4.13 Total Loss
4.14 Gradient Descent
4.15 Local Minima
- Do we really minimize the total loss in engineering?
4.16 Shuffle the samples for each epoch
4.17 Mini-batch Epoch
- Step #1: Randomly initialize network parameters
- Mini-batch is Faster
4.18 Vanishing gradient and ReLU
- 梯度压缩,导致梯度反向传播时只能影响靠近输出的layer的参数
4.19 Rectified Linear Unit (ReLU)
4.20 ReLU -variant
4.21 Learnable activation function -Maxout
- Learnable activation function [Ian J. Goodfellow, ICML’13]
You can have more than 2 elements in a group
ReLu是Maxout的特俗情况
因为maxout会随着w变化,所以是可学习的激活函数
How many pieces depending on how many elements in a group
不同的样本会产生不同的网络架构,从而改变模式和参数都达到调整
Given a training data x, we know which z would be the max
- Given a training data x, we know which z would be the max
Train this thin and linear network1x2x
Different thin and linear network for different examples
4.22 Hard to find optimal parameters
4.23 In physical world……
4.24 Review: Vanilla Gradient Descent
- Start at position $\theta^{0}$
- Compute gradient at $\theta^{0}$
- Move to $\theta^{1}=\theta^{0}-\eta \nabla L\left(\theta^{0}\right)$
- Compute gradient at $\theta^{1}$
- Move to $\theta^{2}=\theta^{1}-\eta \nabla L\left(\theta^{1}\right)$
- Stop until $\nabla L\left(\theta^{t}\right) \approx 0$
4.25 Momentum
- Start at point $\theta^{0}$
- Movement $\mathrm{v}^{0}=0$
- Compute gradient at $\theta^{0}$
- Movement $\mathrm{v}^{1}=\lambda \mathrm{v}^{0}-\eta \nabla L\left(\theta^{0}\right)$
- Move to $\theta^{1}=\theta^{0}+\mathrm{v}^{1}$
- Compute gradient at $\theta^{1}$
- Movement $\mathrm{v}^{2}=\lambda \mathrm{v}^{1}-\eta \nabla L\left(\theta^{1}\right)$
Move to $\theta^{2}=\theta^{1}+\mathrm{v}^{2}$
Movement not just based on gradient, but previous movement
4.26 Dropout
- 回忆一下,3.8节 (多层感知机) 的图3.3描述了一个单隐藏层的多层感知机。其中输入个数为 4 ,隐藏单元个数为 5 ,且隐藏单元 $h_{i}(i=1, \ldots, 5)$ 的计算表达式为
- 这里 $\phi$ 是激活函数, $x_{1}, \ldots, x_{4}$ 是输入,隐藏单元 $i$ 的权重参数为 $w_{1 i}, \ldots, w_{4 i}$ ,偏差参数为 $b_{i}$ 。当对该隐藏层使用丟弃法时,该层的隐藏单元将有一定概率被至弃掉。设丟弃概率为 $p$ ,那么有 $p$ 的概率 $h_{i}$ 会被 清零,有 $1-p$ 的概率 $h_{i}$ 会除以 $1-p$ 做拉伸。丟弃概率是丟弃法的超参数。具体来说,设随机变量 $\xi_{i}$ 为 0 和 1 的概率分别为 $p$ 和 $1-p_{\text {。 }}$ 使用丟弃法时我们计算新的隐藏单元 $h_{i}^{\prime}$
- 由于 $E\left(\xi_{i}\right)=1-p$ ,因此
- 即丟弃法不改变其输入的期望值。让我们对图3.3中的隐藏层使用丟弃法,一种可能的结果如图3.5所示,其中 $h_{2}$ 和 $h_{5}$ 被清零。这时输出值的计算不再依赖 $h_{2}$ 和 $h_{5}$ ,在反向传播时,与这两个隐藏单元相关的权重 的梯度均为 0 。由于在训练中隐藏层神经元的丢弃是随机的,即 $h_{1}, \ldots, h_{5}$ 都有可能被清零,输出层的计算无法过度依赖 $h_{1}, \ldots, h_{5}$ 中的任一个,从而在训练模型时起到正则化的作用,并可以用来应对过拟 合。在测试模型时,我们为了拿到更加确定性的结果,一般不使用丟弃法。
1 | def dropout(X, drop_prob): |
- Each time before updating the parameters
- Each neuron has p% to dropout
- The structure of the network is changed.
Thinner!
- The structure of the network is changed.
- Using the new network for training
- For each mini-batch, we resample the dropout neurons
- Why the weights should multiply (1-p)% (dropout rate) when testing?
- Dropout is a kind of ensemble
- Train a bunch of networks with different structures