Focal loss의 응용(Detection & Classification)

Focal Loss for Dense Object Detection

Motivation

• one-stage Network(YOLO,SSD 등) 의 Dense Object Detection 은

two-stage Network(R-CNN 계열) 에비해속도는빠르지만성능은낮다.

• 극단적인 Class 간 unbalance에 기인

# of Hard positives(object) << # of Easy negatives(back ground)

• 클래스 분류에 일반적으로 사용되는 cross entropy loss 함수를 조금 수

정한 Focal Loss 를 제안

Easy sample에 대해서는 작은 가중치를 부여하는 반면 Hard

sample 에는 큰 가중치를 부여해서 학습을 어려운 예제에 집중

RetinaNet 은 one-stage detector 만큼 빠르면서도

기존의 모든 최고 성능의 detector들을 능가하는 정확도

• One stage detector 는 영상 전체 위치에서 오브젝트의 위치, 크기, 비율

을 dense 하게 샘플링

# of Hard positives(object) << # of Easy negatives(back ground)

학습과정에서 loss가 너무나 많은 background examples에 의해 압도

• Detectors 는 한장의 이미지에서 104~105 개의 후보 위치를 제안하지만

실제로는 몇개의 object만 존재.

이러한 unbalance 현상은 2가지 문제를 발생

1) 분류하기 쉬운easy negative 들이 대부분인데 이들이 학습에

기여하는 것이 거의 없기 때문에 학습이 비효율적

2) 수많은 easy negative 들이 학습 과정을 압도하므로 비일반적인

모델이 학습

One stage detector의 문제점

Focal loss에 대한 제안

Scaling factor, (1-𝑝𝑡 )ϒ 는 자동으로 학습과정에서 분류하기 쉬운

예제(easy example)들이 학습에 기여하는 정도를 낮춘다(down-weight).

반면 분류하기 어려운 예제(hard example)들에 대해서 매우 집중대부분이 구분하기 쉬운 배경 예제들이 많은 경우 Focal loss를 사용하면높은 성능을 갖는 dense object detector 를 학습시킬 수 있다.

hard easy

𝑝𝑡

Focal loss에 대한 제안

RetinaNet의 높은 성능을 보이는 이유는 네트워크 디자인이 아니라

새로운 loss 함수 때문

Focal Loss 는 one-stage object detection 에서 object 와 background

의 클래스간 unbalance가 극도로 심한 상황(예를 들면 1:1000)을

해결하기 위해 제안됨.

Binary classification을 위한 cross entropy(CE) loss

Focal loss

𝐶𝐸 𝑝𝑡 = -𝑙𝑜𝑔 𝑝𝑡

FL 𝑝𝑡 = - (1-𝑝𝑡 )ϒ 𝑙𝑜𝑔 𝑝𝑡 , ϒ ≥ 0

Modulating factor

Focusing parameter ϒ는 일반적으로 0~5 사이의 값(2 works best)

(1) 만약 example 을 잘못 분류하고, Pt 가 작은 값인 경우

(1-𝑝𝑡 )ϒ~1 loss함수에 영향을 주지않음

(2) 만약 example 을 잘 분류하고 Pt ~1 인 경우

(1-𝑝𝑡 )ϒ~0 loss for well classified examples is down-weight

Modulating factor 은 easy example들의 loss 에 대한 영향력을 줄임

예를 들어 ϒ=2 경우, Pt = 0.9 로 예측되었던 example 의 원래 CE 로스에 비해서 FL 에서는 CE의 Pt=~0.968 일때 수준의 작은 loss를 받게 되므로 상대적으로 1000x 작아 진다.

▶ 잘못 분류된 examples 의 중요도를 상대적으로 높이는 역활

Focal loss 작동원리

RetinaNet Detector

- Backbone : ResNet + Feature Pyramid Network(FPN)

Constructs pyramid with levels P3 through P7, where l indicates pyramid level (Pl has resolution2l lower than the input). All pyramid levels have C = 256 channels.

- Two task-specific subnetworks : i) The first subnet performs convolutional object classification on the backbone’s output; ii) The second subnet performs convolutional bounding box regression.

Comparison to State of the Art

Applying Focal Loss on Cats-vs-dogs

Classification Task

https://shaoanlu.wordpress.com/2017/08/16/applying-focal-loss-on-cats-vs-dogs-classification-task/#more-1302

The Focal Loss is designed to address the one-stage object detection scenario in which there is an extreme imbalance between foreground and background classes during training (e.g., 1:1000)”

Apply focal loss on toy experiment, which is very highly imbalance problem in classification

Related paper : “A systematic study of the class imbalance problem in convolutional neural networks” published on Oct. 2017 (https://arxiv.org/abs/1710.05381)

https://github.com/shaoanlu/expriment-with-focal-loss/tree/master

Experiment Results

Experiment 1: imbalanced data (10:1)

• Data: 11500 cat images and 1000 dog images• Approach: Fine-tuning ResNet50 top FC layers using focal loss• Input: Features extracted from ResNet50 (keras ImageNet pre-trained

model)• Architecture: Input -> Dense512 -> BN -> Dropout -> Dense512 -> BN -

> Dropout -> sigmoid• Optimizer: RMSProp• Learning rate: 3e-5 -> 1e-5 (30 epochs for each learning rate)

Validation accuracy with different hyper-parameters of focal loss

Experiment Results

Experiment 2: imbalanced data (100:1)

• Data: 11500 cat images and 115 dog images• Approach: Fine-tuning ResNet50 top FC layers using focal loss• Input: Features extracted from ResNet50 (keras ImageNet pre-trained

model)• Architecture: Input -> BN -> Dropout -> Dense512 -> BN -> Dropout ->

sigmoid• Optimizer: adam• Learning rate: 1e-4 ( divide by 3 every 20 epochs)

Experiment Results

α=0.5 and ϒ is gradually increasing from 0.1 to 2 over 60 epochs

Experiment Results

Confusion matrix

Left: model using focal loss (val_acc=0.949)

Right: model using standard cross entropy loss (val_acc=0.868)

Experiment Results

Comparing this two matrices,model using default CE losspredict more cats than dogs.On the contrary, model usingfocal loss predict evenlyamong cats/dogs.

Observations:

- Focal loss did not really outperform standard CE loss on both

balanced/imbalanced data.

This is conceivable since the focal loss is designed for detection.

- The zigzagging curves are caused by high ϒ values

- The zigzagging curves have higher validation accuracy (at their peak) than

default binary_crossentropy loss function.

- If we carefully tuned α and ϒ, focal loss somehow handle imbalanced data well

(despite the oscillating valid. acc.)

Experiment Results

• Model using focal loss is able to achieve much higher valid.

accuracy than standard CE loss in this experiment. (again, if we

ignore the horrible fluctuation)

• The confusion matrix also indicate that the model classify really

well even though there are large imbalance in data.

* This is because focal loss increase loss contribution

from hard examples.

Experiment Results

Technology

Focal loss의 응용(Detection & Classification)