semi-supervised learning
1. What is semi-supervised learning?
- Humans learn in semi-supervised way
1.1 Why semi-supervised learning helps?
- The distribution of the unlabeled data tell us something.
1.2 Low-density Separation Assumption
希望分开的类内差距尽量大
Given: labelled data set $=\left\{\left(x^{r}, \hat{y}^{r}\right)\right\}_{r=1}^{R}$, unlabeled data set $=\left\{x^{u}\right\}_{u=1}^{U}$
- Repeat:
- Hard label vs Soft label
- Considering using neural network $𝜃^∗$(network parameter) from labelled data
- 软标记对训练没有影响,所以应该使用硬标记
1.3 Entropy-based Regularization
- 我们希望我们的分类是非黑即白的
- 可以通过信息熵来判断是否分类成功,对于无标记数据我们希望其分类越集中越好
1.4 Smoothness Assumption
- Assumption: “similar” $x$ has the same $\hat{y}$
- 数据相似带来标签相似
- More precisely:
- $\mathrm{x}$ is not uniform.
- If $x^{1}$ and $x^{2}$ are close in a high density region, $\hat{y}^{1}$ and $\hat{y}^{2}$ are the same.
- Connected by a high density path
我们可以在训练数据库中插入多个2,使得可以左边的2可以通向右边的2
Classify astronomyvs. travelarticles
- 可以找到一条连通区域,从而进行分类
1.5 Graph-based Approach
- $\text { How to know } x^{1} \text { and } x^{2} \text { are connected by a high density path? }$
Define the similarity $s\left(x^{i}, x^{j}\right)$ between $x^{i}$ and $x^{j}$
Add edge:
- K Nearest Neighbor
- e-Neighborhood
- Edge weight is proportional to $s\left(x^{i}, x^{j}\right)$ Gaussian Radial Basis Function:
- The labelled data influence their neighbors. Propagate through the graph
- 图上的标记会随着路径传播
- 不一定有效
- Define the smoothness of the labelson the graph
- w是特征空间的相似度,S越小越平滑
- Define the smoothness of the labels on the graph
- $y:(R+U)-\operatorname{dim}$ vector
- 在标记传播过程中,会先初始化标签,R表示有标签,U表示原来无标签
- $L:(R+U) \times(R+U)$ matrix
- D是行和放于对角线
- 不同层都可以加如smooth,即传播后会返回每一层的输出以便于计算loss
2. Unsupervised Neural Network
2.1 Recall: Unsupervised learning
Data: x
Just data, no labels!
Goal: Learn some underlying
hidden structure of the data
Examples: Clustering, dimensionality reduction, density estimation, etc.
- K-means clustering
2.2 Auto-encoder
- 希望编码器可以自动凝练特征,编码后又可以恢复
- Output of the hidden layer is the code
2.3 Deep Auto-encoder
- 深度网络更具表征更具判别性
- De-noising auto-encoder
- 希望有噪音的图像能够恢复为无噪图像,即希望自编码器能够自主去噪
2.4 Auto-encoder –Text Retrieval
- The documents talking about the same thing will have close code.
2.5 Auto-encoder –Similar Image Search
2.6 Auto-encoder for CNN
2.7 CNN -Unpooling
2.8 CNN -Deconvolution
- Greedy Layer-wise Pre-training
- 逐层进行训练,训练完后的参数freeze
- 最后进行微调
2.9 Why VAE (Variational Auto-Encoders)?
- 对编码进行插值能否采样?
- 不会
- 但我们希望编码能够一定线性插值得到新的图片
- 将确定性的向量变为一个分布,即对编码进行加噪
- e为来自高斯分布的采样权值,$\sigma$为标准差
2.10 Pokémon Creation
- 垂直方向控制大小,水平方向控制方向
2.11 Problems of VAE
- It does not really try to simulate real images
- 有其重构函数是像素级别的,所以不一定完全相近
3. Generative Adversarial Network (GAN)
3.1 Basic Idea of GAN
- $\text { The data we want to generate has a distribution } P_{\text {data }}(x)$
- A generator G is a network. The network defines a probability distribution.
- 不考虑原始数据的分布
3.2 Generative adversarial networks
- Train two networks with opposing objectives:
- Generator:learns to generate samples
- Discriminator:learns to distinguish between generated and real samples
- 两者互相博弈,最后越来越好
3.3 Evolution
- Generator
- 每一维度都觉了图像某一特征
- Discriminator
3.4 The evolution of generation
- 固定一个更新另一个,从而迭代更新
- The discriminator $D(x)$ should output the probability that the sample $x$ is real
- That is, we want $D(x)$ to be close to 1 for real data and close to 0 for fake
- Expected conditional log likelihood for real and generated data:
- 对于判别器,我们希望区分出真假
- 而生成器则相反,希望他越小越好
- We seed the generator with noise $z$ drawn from a simple distribution $p$
(Gaussian or uniform)
3.5 GAN objective
- The discriminator wants to correctly distinguish real and fake samples:
- The generator wants to fool the discriminator:
Train the generator and discriminator jointly in a minimax game
Update discriminator:
- Repeat for $k$ steps:
- Sample mini-batch of noise samples $z_{1}, \ldots, z_{m}$ and mini-batch of real samples $x_{1}, \ldots, x_{m}$
3.6 Training algorithm in practice
- Update parameters of $D$ by stochastic gradient ascent on
- Repeat for $k$ steps:
- Sample mini-batch of noise samples $z_{1}, \ldots, z_{m}$ and mini-batch of real samples $x_{1}, \ldots, x_{m}$
- Update parameters of $D$ by stochastic gradient ascent on
- Repeat for $k$ steps:
- Update generator:
- Sample mini-batch of noise samples $z_{1}, \ldots, z_{m}$
- Update parameters of $G$ by stochastic gradient ascent on
Repeat until happy with results
Update discriminator: push $D\left(x_{\text {data }}\right)$ close to 1 and $D(G(z))$ close to 0
- The generator is a “black box” to the discriminator
- The generator is exposed to real data only via the output of the discriminator (and its gradients)
- Test time –the discriminator is discarded
3.7 Original GAN results
- 原始GAN比较模糊,因为这样能够难以分类
3.8 Problems with GAN training
Stability
- Parameters can oscillate or diverge, generator loss does not correlate with sample quality
- Behavior very sensitive to hyperparameter selection
只能模仿几个模式而无法生成实际的多模态
Mode collapse
- Generator ends up modeling only a small subset of the training data
3.9 DCGAN
- Early, influential convolutional architecture for generator
- 使用卷积,且不用池化,即用stride代替
- Early, influential convolutional architecture for generator
- Discriminator architecture (empirically determined to give best training stability):
- Don’t use pooling, only strided convolutions
- Use Leaky ReLU activations (sparse gradients cause problems for training)
- Use only one FC layer before the softmax output
- Use batch normalization after most layers (in the generator also)
- 降低对超参敏感程度
3.10 DCGAN results
- Interpolation between different points in the z space
- 即是连续的
- Vector arithmetic in the z space
- Pose transformation by adding a “turn” vector
4. Conditional generation
- To condition the generation of samples on discrete side information (label) 𝑦, we need to add 𝑦 as an input to both generator and discriminator
- 加入类的标签,加入限制
4.1 BigGAN
- Class-conditional generation of ImageNet images up to
- 对Z空间进行截断,防止由于分布带来的模糊,因为只取了一部分作为编码空间,从而提高分辨率
- 但也有可能降低保真度,所以需要tradeoff
5. Image-to-image translation
- Produce modified image $y$ conditioned on input image $x$
(note change of notation)- Generator receives $x$ as input
- Discriminator receives an $x, y$ pair and has to decide whether it is real or fake
- 作为一个对照来进行判别,以增加条件
- 即希望鞋的形状一致
5.1 Translating between maps and aerial photos
- Day to night
- Edges to photos
5.2 Unpaired image-to-image translation
有时候我们并不能得到成对的样本
Given two unordered image collections 𝑋 and 𝑌, learn to “translate” an image from one into the other and vice versa
5.3 CycleGAN
- Given: domains $X$ and $Y$
- 就是我们希望X可以变为Y,Y经过反变换后还可以生成Y
- 就可以限制Y的形状类似X
- Train two generators $F$ and $G$ and two discriminators $D_{X}$ and $D_{Y}$
- $G$ translates from $X$ to $Y, F$ translates from $Y$ to $X$
- $D_{X}$ recognizes images from $X, D_{Y}$ from $Y$
- Cycle consistency: we want $F(G(x)) \approx x$ and $G(F(y)) \approx y$
- Illustration of cycle consistency:
- Translation between maps and aerial photos
- Tasks for which paired data is unavailable
5.4 CycleGAN: Limitations
Cannot handle shape changes (e.g., dog to cat)
Can get confused on images outside of the training domains (e.g., horse with rider)
- 不能对训练数据以外的做拟合
- Cannot close the gap with paired translation methods