Smurf
文章50
标签0
分类6
Transfer learning for CV

Transfer learning for CV

Transfer learning for CV、self-supervised learning

1. Transfer learning for CV

image-20211217101408721

image-20211217095438486

1.1 Why?

image-20211217095600777

1.2 Traditional vs. Transfer Learning

image-20211217095835039

  • 由源数据学到一些共用的知识,在进行微调

  • Traditional machine learning:

    • learn a system for a task, respectively
  • Transfer learning:

    • transfer the knowledge form the source model for the target task
  • Task description

  • Source data: $(x^s, y^s)$ A large amount
  • Target data: $(x^t, y^t)$​ Very little
    • One-shot learning: only a few examples in target domain
  • Example: (supervised) speaker adaption
    • Source data: audio data and transcriptions from many speakers
    • Target data: audio data and its transcriptions of specific user
  • Idea: training a model by source data, then fine-tune the model by target data
    • Challenge: only limited target data, so be careful about overfitting

1.3 Conservative Training

image-20211217100456443

  • 学习率调的很低

1.4 Layer Transfer

image-20211217100528414

  • Which layer can be transferred (copied)?
    • Speech: usually copy the last few layers
    • Image: usually copy the first few layers

image-20211217100601426

1.5 Neural Network Layers: General to Specific

  • Bottom/first/earlier layers: general learners
    • Low-level notions of edges, visual shapes
  • Top/last/later layers: specific learners
    • High-level features such as eyes, feathers

1.6 Multitask Learning

  • The multi-layer structure makes NN suitable for multitask learning
    • 任务相关则可以共享部分参数

image-20211217101104901

1.7 Progressive Neural Networks

image-20211217101220837

  • 不考虑任务相关性
  • 只进行特征共享,但是不共享参数

2. Domain-adversarial training

2.1 Task description: domain adaptation

image-20211217101502380

  • How to remove the domain shift?
  • How to bridge the domain gap?
  • The domain can be a general concept:
    • Datasets: transfer from an “easy” dataset to a “hard” one
    • Modalities: transfer from RGB to depth, infrared images, point cloud……

image-20211217101627652

  • Remove the domain shift

image-20211217101731118

2.2 Discrepancy-based approaches

  • 我们希望两者数据越接近越好,这样在源数据训练可以迁移到目标数据

  • Idea: minimize the domain distance in a feature space

  • Works focus on designing a reasonable distance

image-20211217101914275

  • Example: Metric learning based

2.3 Adversarial-based approaches

image-20211217102042921

  • 我们希望找到一个特征空间可以使他们的特征领域可以混在一起

2.4 Adversarial-based approaches

  • Method 1: Domain-adversarial training

image-20211217102311355

image-20211217102408991

  • 不同于GAN,GAN的分类器希望能分开fake数据,而对抗学习希望分类器越分不开越好

image-20211217102417938

  • 所以我们对于domain classifier不能使用梯度下降,而应该使用梯度反向

image-20211217102659257

  • Method 2: GAN-based methods

image-20211217103006920

2.5 Reconstruction-based approaches

image-20211217103149535

  • The data reconstruction of source or target samples is an auxiliary task that simultaneously focuses on creating a shared representation between the two domains and keeping the individual characteristics of each domain.

2.6 Knowledge distillation

  • Distill the knowledge from a larger deep neural network into a small network

image-20211217103249019

  • Response-based knowledge
    • Use the neural response of the last output layer of the teacher model to transfer.
    • Directly mimic the final prediction of the teacher model.
    • Simple yet effective

image-20211217103307255

  • 大型网络与轻型网络分类越相近,越好

  • Feature-based knowledge

    • Extend the transfer point from the last layer to intermediate layers
    • A good extension of response-based knowledge, especially for the training of thinner and deeper networks.
    • Generalize feature maps to attention maps

image-20211217103432014

  • Relation-based knowledge
    • Both response-based and feature-based knowledge use the outputs of specific layers in the teacher model.
    • Relation-based knowledge further explores the relationships between different layers or data samples.
  • 考虑不同的特征分布

image-20211217103528450

  • Extension: Cross-modal distillation
    • The data or labels for some modalities might not beavailable during training or testing

image-20211217104141848

3. Self-supervised learning

3.1 Motivation

  • Recall the idea of transfer learning: start with general-purpose feature representation pre-trained on a large, diverse dataset and adapt it to specialized tasks

  • Challenge: overcoming reliance on supervised pre-training

image-20211217104304133

image-20211217104341902

3.2 Self-supervised pretext tasks

  • Self-supervised learning methods solve “pretext” tasks that producegood features for downstream tasks.
    • Learn with supervised learning objectives, e.g., classification, regression.
    • Labels of these pretext tasks are generated automatically
  • Example: learn to predict image transformations / complete corrupted images

image-20211217104525627

3.2.1 Self-supervised learning workflow (I)

image-20211217105043956

  • Learn good feature extractors from self-supervised pretext tasks, e.g., predicting image rotations

3.2.2 Self-supervised learning workflow (II)

image-20211217105152186

  • Attach a shallow network on the feature extractor; train the shallow
    network on the target task with small amount of labeled data
  • Evaluate the learned feature encoders on downstream target tasks

3.2.3 Self-supervisedvs. unsupervisedlearning

  • The terms are sometimes used interchangeably in the literature, but self-supervised learning is a particular kind ofunsupervised learning
  • Unsupervised learning: any kind of learning without labels

    • Clustering and quantization
    • Dimensionality reduction, manifold learning
    • Density estimation
  • Self-supervised learning: the learner “makes up” labels from the data and then solves a supervised task

3.3 Self-supervisedvs. Generative learning

  • Both aim to learn from data without manual label annotation.

  • Generative learning aims to model data distribution, e.g., generating realistic images.

    • 希望能生成和真实越相近越好的图片,更注重细节
  • Self-supervised learning aims to learn high-level semantic features with pretext tasks

    • 只学习高阶语义信息

3.4 Types of self-supervised learning

  • 预测遮挡,预测上色,预测未来

image-20211217105924241

  • 预测拼图

image-20211217110002112

  • 对比学习

image-20211217110018571

3.5 Self-Supervision as data prediction

3.5.1Colorization

image-20211217110111322

  • 要考虑固有颜色的歧义性,只要上色会在自然界出现,就不判错

  • Colorization: Training data generation

    • 数据灰度化

image-20211217110200700

  • 用ab作为监督信息
  • 对ab空间进行量化,从而预测一个分布,最终考虑到了颜色的歧义性

image-20211217110310161

3.6 Self-supervision by transformation prediction

  • Pretext task:randomly sample a patch and one of 8 neighbors,Guess the spatial relationship between the patches

image-20211217110648572

image-20211217110639386

3.6.1 Context prediction: Details

  • 切割时留有gap,防止学到这些边缘

image-20211217111005476

3.6.2 Jigsaw puzzle solving

image-20211217111210571

  • 不同于预测位置,而是考虑九个块整体的一个顺序

Details

  • 防止过拟合,只考虑64种组合,其hamming loss较大

image-20211217111411606

3.6.3 Rotation prediction

  • Pretext task: recognize image rotation (0, 90, 180, 270 degrees)

image-20211217111451479

  • During training, feed in all four rotated versions of an image in the same mini-batch

image-20211217111502032

3.7 Contrastive methods

  • Encourage representations of transformed versions of the same image to be the same and different images to be different
    • 希望同种信息越相近越好,不同种数据越不相近越好

image-20211217111650163

  • Encourage representations of transformed versions of the same image to be the same and different images to be different

image-20211217111745082

  • Given: query point $x$, positive samples $x^{+}$, negative samples $x^{-}$
    • Positives are typically transformed versions of $x$, negatives are random examples from the same mini-batch or memory bank
  • Key idea: learn representation to make $x$ similar to $x^{+}$, dissimilar from $x^{-}$(similarity is measured by dot product of normalized features)
  • Given 1 positive sample and $N$ - 1 negative samples, Contrastive loss:
  • This seems familiar as cross entropy loss for a N-way Softmaxclassifier!
    Try to find the positive samples from the Nsamples.
  • $\tau$​​ is the temperature hyperparameter(determines how concentrated the softmaxis)
  • 我们希望温度参数越小越好,这样预测越集中

image-20211217112207576

3.8 SimCLR: A Simple Framework for Contrastive Learning

  • Generate positive samples through data augmentation.

  • Use a projection network 𝒉𝒉(·)to project features to a space where contrastive learning is applied

image-20211217112348651

  • SimCLR:Evaluation

image-20211217112414759

image-20211217113153614

  • Train feature encoder on ImageNet (entire training set)
    using SimCLR.
  • Freeze feature encoder, train a linear classifier on top with
    labeled data.

3.8.1 SimCLRdesign choices: projection head

image-20211217113314855

  • Linear / non-linear projection heads improve representation learning.
    A possible explanation:
    • representation space 𝒛𝒛is trained to be invariant to data transformation
    • contrastive learning objective may discard useful information for downstream tasks
    • by leveraging the projection head 𝒈(ᐧ), more information can be preserved in the 𝒉 representation space
本文作者:Smurf
本文链接:http://example.com/2021/08/15/cv/13.%20Transfer%20learning%20for%20CV/
版权声明:本文采用 CC BY-NC-SA 3.0 CN 协议进行许可