4 분 소요

논문링크


한줄요약 ✔Permalink

  • WSSS 사용할 시 Image-level weak supervision의 한계.
    • sparse object coverage
    • inaccurate object boundary
    • co-occurring pixels from non-target objects
  • Explicit Pseudo-pixel Supervision (EPS): two weak supervision으로 pixel-level feedback을 얻는다.
    • localization map distingusih different objects.
      • CAM(Class Activation Map)으로 생성.
    • saliency map rich boundary information.
      • Saliency detection model으로 생성.

Preliminaries 🍱Permalink

  • 일반적으로 WSSS의 전체적인 파이프라인은 two stage로 구성되어 있다.
    • pseudo-mask 생성 (image classifier 이용).
    • pseudo-mask를 GT로 사용하여 각 iteration마다 GT 갱신하며 recursive하게 segmentation model 학습.

Challenges and Main Idea💣Permalink

C1) sparse object coverage

C2) inaccurate object boundary

C3) co-occurring pixels from non-target objects

Idea) Explicit Pseudo-pixel Supervision (EPS) 제안


Problem Definition ❤️Permalink

Given a weak dataset D.

Return a model S.

Such that S approximates the performance of its fully-supervised model T.


Proposed Method 🧿Permalink

Model Architecture (EPS)Permalink

[Figure 2]

image

  • Background를 포함해서 C+1개의 class로 분류하는 classifier로 C+1개의 localization map을 생성해서 saliency map과 비교한다.
  • Foreground
    • C개의 localization map을 합쳐서 foreground map을 생성하고, 이를 saliency map과 매칭시킨다. ⇒ improving boundaries of objects.
  • Background
    • background localization map과 saliency map의 바깥부분 (1Ms) 을 매칭시킨다. ⇒ mitigate the co-occuring pixels of non-target objects.
      • 고양이와 개가 함께 나타나는 이미지를 고려해 봅시다. 여기서 고양이와 개가 대상(target)이고, 다른 객체들은 대상이 아닌(non-target) 객체입니다. 이 때, “co-occurring pixels of non-target objects”는 예를 들어 바닥, 벽, 나무, 가구 등의 픽셀들로 구성될 것입니다. 이 픽셀들은 여러 객체들과 함께 등장하며, 각 객체의 일부가 될 수 있지만, 그 자체로는 특정 객체를 나타내지는 않습니다.

Loss FunctionPermalink

image

Ltotal=Lsal+Lcls

  • Lsal=1HWMsM^s2 .
    • Ms: the off-the-shelf saliency detection model– PFAN [51] trained on DUTS dataset
    • HW를 나누는 이유?
      • normalization: 이미지의 크기가 클수록 loss 값도 커질 가능성이 있습니다. 이를 방지하기 위해 loss 값을 이미지의 크기로 나누어 normalize합니다.
    • marked by red box/arrorw in Figure 2.
    • the sum of pixel-wise differences between our estimated saliency map and an actual saliency map.
    • involved in updating the parameters of C+1 classes, including target objects and the background.
  • Lcls=1CΣi=1Cyilogσ(yi^)+(1yi)log(1σ(yi^)) .
    • σ: sigmoid function.
    • 이진 교차 엔트로피 손실(Binary Cross-Entropy Loss) 공식.
    • marked by blue box/arrorw in Figure 2.
    • only evaluates the label prediction for C classes, excluding the background class.
      • the gradient from Lcls does not flow into the background class.

Joint TrainingPermalink

  • By jointly training the two objectives, we can synergize the localization map and the saliency map with complementary information
  • we observe that noisy and missing information of each other is complemented via our joint training strategy, as illustrated in Figure 3
    • Missing objects: (c)에서 놓친 의자와 보트 객체를 (d)에서는 잘 segment하는 모습.
    • noise 제거: (c)에 존재하는 비행기의 contrail(연기) 같은 것들이 (d)에서는 제거된 모습.

M^s=λMfg+(1λ)(1Mbg)

  • M^s: estimated saliency map.
  • Ms: actual saliency map.
  • Mfg: foreground saliency map.
    • Mfg=Σi=1CyiMi1[O(Mi,Ms)>τ] .
      • O(Mi,Ms): is the function to compute the overlapping ratio between Mi and Ms.
      • Mi: i-th localization map.
        • Assigned to the foreground if Mi is overlapped with the saliency map more than τ%, otherwise the background.
      • yiRC: binary image-level label [0or1].
        • 모델 예측이 객체가 존재하는 경우에 대해서만 각 객체에 대한 saliency map을 합하여 최종 foreground saliency map을 구성한다.
  • Mbg: background saliency map.
    • Mbg=Σi=1CyiMi1[O(Mi,Ms)<=τ]+MC+1 .
  • λ[0,1]: a hyperparameter to adjust a weighted sum of the foreground map and the inversion of the background map.

Experiment 👀Permalink

image

SetupPermalink

DatasetsPermalink

  • PASCAL VOC 2012 and MS COCO 2014
  • augmented training set with 10,582 images
    • Augmentation 빼면 성능 어떤가??

BaselinePermalink

  • ResNet38 pre-trained on ImageNet

Boundary Mismatch ProblemPermalink

image

Co-occurrence ProblemPermalink

  • What is it?
    • Some background classes frequently appear with target objects in PASCAL VOC 2012
  • Dataset: PASCAL-CONTEXTdataset.
    • provides pixel-level annotations for a whole scene (e.g., water and railroad).
  • Evaluation:

image

  • We choose three co-occurring pairs; boat with water, train with railroad, and train with platform. We compare IoU for the target class and the confusion ratio mk,c=FPk,cTPc between a target class and its coincident class.
    • FPk,c: the number of pixels mis-classified as the target class c for the coincident class k.
    • TPc: the number of true-positive pixels for the target class c.
    • k: the coincident class.
    • c: the target class.
  • SEAM 는 특이하게 self-supervised training을 기반으로 하기 때문에 서로 다른 객체에 대해 겹치는 픽셀들에 잘못된 target object 레이블을 할당하고 이것을 정답 레이블로 학습에 활용하여 더 confusion ratio가 더 높게 측정되는 모습이다.

Map Selection StrategiesPermalink

image

  • Naive strategy:
    • The foreground map is the union of all object localization maps; the background map equals the localization map of the background class.
  • Pre-defined class strategy:
    • We follow the naive strategy with the following exceptions. The localization maps of several pre-determined classes (e.g., sofa, chair, and dining table) are assigned to the background map (i.e., pre-defined class strategy)
      • 특정 객체들이 target objects가 아님을 알고 미리 배경으로 지정.
  • Our adaptive strategy:
    • 상기 EPS 내용 참조.

Our adaptive 가 가장 IoU 수치가 높고, 이는 해당 전략이 target object를 가장 잘 표현함을 의미한다.

Comparison with state-of-the-artsPermalink

Accuracy of pseudo-masksPermalink

image

image

Accuracy of segmentation mapsPermalink

image

image

image

image

Effect of saliency detection modelsPermalink

  • Notably, our EPS using the unsupervised saliency model outperforms all existing methods using the supervised saliency model

Open Reivew 💗Permalink

NA


Discussion 🍟Permalink

NA


Major Takeaways 😃Permalink

NA


Conclusion ✨Permalink

  • We propose a novel weakly supervised segmentation framework, namely explicit pseudo-pixel supervision (EPS).

ReferencePermalink

NA

댓글남기기