[논문분석] Weaker Than You Think: A Critical Look at Weakly Supervised Learning (ACL 2023)

4 분 소요

논문링크

한줄요약 ✔

Weakly-supervised Learning은 주로 noisy labels가 많다.
- 여전히 fully-supervised model 보다 성능이 안좋은 이유.
기존 WS 방법들은 과장되었다 .
- Clean validation samples:
  - Correct labels를 가진 데이터로 검증하는 것; Early stopping, meta learning 등의 이유로 사용.
  - 차라리 이것을 training dataset에 포함시켰으면 성능이 더 좋게 나오더라 .
    - 이것을 training data로 사용하여 fine-tuning한 모델이 이를 validation set으로 활용한 최신 WSL 모델보다 성능이 좋더라.. (Figure 1)
      - WSL 모델들은 단순 weak labels 보다는 성능 좋긴함..
기존 모델은 단순히 linguistic correlation이 비슷한 것끼리 엮이도록 학습하였는데 이것은 편향때문에 일반화 성능 저하될 수 있다 .
- 긍정적인 것에 부정적인 레이블을 강제하여 further tuning → 일반화 성능 향상 .
Contributions:
- Validation samples를 training data로써 활용하여 fine-tuning.
  - Revisiting the true benefits of WSL.

Preliminaries 🍱

Weak Supervision

definition
- proposed to ease the annotation bottleneck in training machine learning models.
- uses weak sources to automatically annotate the data
drawback
- its annotations could be noisy (some annotations are incorrect); causing poor generalization
- solutions
  - to re-weight the impact of examples in loss computation
  - to train noise-robust models using KD
    - equipped with meta-learning in case of being fragile
  - to leverage the knowledge from pre-trained LLM
datasets
- WRENCH
- WALNUT

Realistic Evaluation

Semi-supervised Learning

often trains with a few hundred training examples while retaining thousands of validation samples for model selection
- 레이블 사람이 수동으로 달아주는 것 한계
discard the validation set and use a fixed set of hyperparameters across datasets
- 일일히 각 데이터셋에 최적 하이퍼파라미터 조합을 사람이 찾음;
prompt-based few-shot learning
- sensitive to prompt selection and requires additional data for prompt evaluation
  - few-shot learning에 모순!
- prompt 예시: 다음 리뷰가 긍정적인지 부정적인지 판단하십시오:”
recent work
- fine-grained model selection 생략
- Number of validation samples strictly controlled

의의

To our knowledge, no similar work exists exploring the aforementioned problems in the context of weak supervision.

Challenges and Main Idea💣

C1) 기존 WS 방법들은 데이터 활용이 과장되었다.

Idea) 차라리 이것을 training dataset에 포함시켰으면 성능이 더 좋게 나오더라

C2) 기존 WS 방법들은 학습 방법에 편향이 존재한다.

Idea) 긍정적인 것에 부정적인 레이블을 강제하여 further tuning → 일반화 성능 향상

Problem Definition ❤️

Given a pre-trained model on \(D_w\) ~ \(\mathcal{D}_n\).

Return a model.

Such that it generalizes well on \(D_{test}\) ~ \(\mathcal{D}_c\).

Methodology 👀

Setup

Formulation

\(\mathcal{X}\): feature.
\(\mathcal{Y}\): label space.
- \(\hat{y}_i\): labels obtained from weak labeling sources; could be different from the GT label \(y_i\).
\(D={(x_i,y_i)}^N_{i=1}\).
- \(D_c\): clean data distribution.
- \(D_w\): weakly labeled dataset.
- \(\mathcal{D}_n\): noisy distribution.
The goal of WSL algorithms is to obtain a model that generalizes well on \(D_{test} ∼ D_c\) despite being trained on \(D_w ∼ D_n\).
baseline: \(RoBERTa-base\).

Datasets

eight datasets covering different NLP tasks in English

WSL baselines

\(FT_W\): standard fine-tuning approach for WSL.
\(L2R\): meta-learning to determine the optimal weights for each noisy training sample.
\(MLC\): meta-learning for the meta-model to correct the noisy labels.
\(BOND\): noise-aware self-training framework designed for learning with weak annotations.
\(COSINE\): self-training with contrastive regularization to improve noise robustness further.

y축(Relative performance improvement over weak labels):

\[G_{\alpha}={(P_{\alpha}-P_{WL}) \over P_{WL}}\]

\(P_{\alpha}\): the performance achieved by weak labels.
\(P_{WL}\): a certain WSL method.

⇒ Without clean validation samples, existing WSL approaches do not work.

과연 정말 clean data가 없으면 WSL 성능이 안 좋을 수밖에 없나? .

Clean Data

⇒ a small amount of clean validation samples may be sufficient for current WSL methods to achieve good performance

⇒ the advantage of using WSL approaches vanishes when we have as few as 10 clean samples per class

Continuous Fine-tuning (CFT)

CFT
- In the first phase, we apply WSL approaches on the weakly labeled training set, using the clean data for validation.
- In the second phase, we take the model trained on the weakly labeled data as a starting point and continue to train it on the clean data.

⇒ the net benefit of using sophisticated WSL approaches may be significantly overestimated and impractical for real-world use cases.

그냥 FT 한 것만으로도 기존 WSL 방법들 상당한 성능 향상 (even when # clean data = low)
L2R의 Yelp 데이터셋 결과의 경우 CFT 이후 오히려 성능이 떨어진 모습인데, 이것은 L2R가 validation loss를 사용하여 파라미터를 업데이트하기 때문에 검증 샘플의 가치가 큰 영향을 주지 않았을지도..

⇒ Pre-training on more data clearly helps to overcome biases from weak labels.

pre-training provides the model with an inductive bias to seek more general linguistic correlations instead of superficial correlations from the weak labels

⇒ contradictory samples play a more important role here and at least a minimum set of contradictory samples are required for CFT to be beneficial

Open Reivew 💗

Discussion 🍟

Major Takeaways 😃

Conclusion ✨

Strength

If a proposed WSL method requires extra clean data, such as for validation, then the simple FTW+CFT baseline should be included in evaluation to claim the real benefits gained by applying the method.

Weakness

it may be possible to perform model selection by utilizing prior knowledge about the dataset
For low-resource languages where no PLMs are available, training may not be that effective
We have not extended our research to more diverse types of weak labels

Reference

Twitter Facebook LinkedIn

[논문분석] Weaker Than You Think: A Critical Look at Weakly Supervised Learning (ACL 2023)

한줄요약 ✔

Preliminaries 🍱

Weak Supervision

Realistic Evaluation

Semi-supervised Learning

의의

Challenges and Main Idea💣

Problem Definition ❤️

Methodology 👀

Setup

Formulation

Datasets

WSL baselines

Clean Data

Continuous Fine-tuning (CFT)

Open Reivew 💗

Discussion 🍟

Major Takeaways 😃

Conclusion ✨

Strength

Weakness

Reference

공유하기

댓글남기기

참고

2024.10.02
Evaluating on Image Hallucination for TTI Generative Models in I-HallA via PaliGemma

2023.12.04
[논문분석] Saliency as Pseudo-Pixel Supervision for Weakly and Semi-Supervised Semantic Segmentation (PAMI 2023)

2023.12.03
[논문분석] Segment Anything (ICCV 2023)

2023.12.03
[논문분석] Learning Transferable Visual Models From Natural Language Supervision (ICMR 2021)

2023.12.03
[논문분석] Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation (CVPR 2018)

2023.12.01
[논문분석] Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization (ICCV 2017)

2023.12.01
[논문분석] Entropy regularization for weakly supervised object localization (PRL 2023)

2023.11.29
[논문분석] Railroad is not a Train: Saliency as Pseudo-pixel Supervision for Weakly Supervised Semantic Segmentation (CVPR 2021)

2023.08.03
[논문분석] PETR: Position Embedding Transformation for Multi-View 3D Object Detection (ECCV, 2022)

2023.08.02
[논문 분석] DETR3D (CoRL 2021)

한줄요약 ✔

Preliminaries 🍱

Weak Supervision

Realistic Evaluation

Semi-supervised Learning

의의

Challenges and Main Idea💣

Problem Definition ❤️

Methodology 👀

Setup

Formulation

Datasets

WSL baselines

Clean Data

Continuous Fine-tuning (CFT)

Open Reivew 💗

Discussion 🍟

Major Takeaways 😃

Conclusion ✨

Strength

Weakness

Reference

공유하기

댓글남기기

참고

2024.10.02 Evaluating on Image Hallucination for TTI Generative Models in I-HallA via PaliGemma

2023.12.04 [논문분석] Saliency as Pseudo-Pixel Supervision for Weakly and Semi-Supervised Semantic Segmentation (PAMI 2023)

2023.12.03 [논문분석] Segment Anything (ICCV 2023)

2023.12.03 [논문분석] Learning Transferable Visual Models From Natural Language Supervision (ICMR 2021)

2023.12.03 [논문분석] Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation (CVPR 2018)

2023.12.01 [논문분석] Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization (ICCV 2017)

2023.12.01 [논문분석] Entropy regularization for weakly supervised object localization (PRL 2023)

2023.11.29 [논문분석] Railroad is not a Train: Saliency as Pseudo-pixel Supervision for Weakly Supervised Semantic Segmentation (CVPR 2021)

2023.08.03 [논문분석] PETR: Position Embedding Transformation for Multi-View 3D Object Detection (ECCV, 2022)

2023.08.02 [논문 분석] DETR3D (CoRL 2021)

2024.10.02
Evaluating on Image Hallucination for TTI Generative Models in I-HallA via PaliGemma

2023.12.04
[논문분석] Saliency as Pseudo-Pixel Supervision for Weakly and Semi-Supervised Semantic Segmentation (PAMI 2023)

2023.12.03
[논문분석] Segment Anything (ICCV 2023)

2023.12.03
[논문분석] Learning Transferable Visual Models From Natural Language Supervision (ICMR 2021)

2023.12.03
[논문분석] Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation (CVPR 2018)

2023.12.01
[논문분석] Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization (ICCV 2017)

2023.12.01
[논문분석] Entropy regularization for weakly supervised object localization (PRL 2023)

2023.11.29
[논문분석] Railroad is not a Train: Saliency as Pseudo-pixel Supervision for Weakly Supervised Semantic Segmentation (CVPR 2021)

2023.08.03
[논문분석] PETR: Position Embedding Transformation for Multi-View 3D Object Detection (ECCV, 2022)

2023.08.02
[논문 분석] DETR3D (CoRL 2021)