Loss Function Search for Face Recognition

Presented by

Jan Lau, Anas Mahdi, Will Thibault, Jiwon Yang

Introduction

Face recognition is a technology that can label a face to a specific identity. The process involves two tasks: 1. Identifying and classifying a face to a certain identity and 2. Verifying if this face and another face map to the same identity. Loss functions are a method of evaluating how good the prediction models the given data. In the application of face recognition, they are used for training convolutional neural networks (CNNs) with discriminative features. Softmax probability is the probability for each class. It contains a vector of values that add up to 1 while ranging between 0 and 1. Cross-entropy loss is the negative log of the probabilities. When softmax probability is combined with cross-entropy loss in the last fully connected layer of the CNN, it yields the softmax loss function:

[math]\displaystyle{ L_1=-log\frac{e^{w^T_yx}}{e^{w^T_yx} + \sum_{k≠y}^K{e^{w^T_yx}}} }[/math] [1]

Specifically for face recognition, [math]\displaystyle{ L_1 }[/math] is modified such that [math]\displaystyle{ w^T_yx }[/math] is normalized and s represents the magnitude of [math]\displaystyle{ w^T_yx }[/math]:

[math]\displaystyle{ L_2=-log\frac{e^{s cos{(\theta_{{w_y},x})}}}{e^{s cos{(\theta_{{w_y},x})}} + \sum_{k≠y}^K{e^{s cos{(\theta_{{w_y},x})}}}} }[/math] [1]

This function is crucial in face recognition because it is used for enhancing feature discrimination. While there are different variations of the softmax loss function, they build upon the same structure as the equation above. Some of these variations will be discussed in detail in the later sections.

In this paper, the authors first identified that reducing the softmax probability is a key contribution to feature discrimination and designed two design search spaces (random and reward-guided method). They then evaluated their Random-Softmax and Search-Softmax approaches by comparing the results against other face recognition algorithms using nine popular face recognition benchmarks.

Previous Work

Margin-based (angular, additive, additive angular margins) soft-max loss functions are important in learning discriminative features in face recognition. There have been hand-crafted methods previously developed that require much effort such as A-softmax, V-softmax, AM-Softmax, and Arc-softmax. Li et al. proposed an AutoML for loss function search method also known as AM-LFS from a hyper-parameter optimization perspective [2]. It automatically determines the search space by leveraging reinforcement learning to the search loss functions during the training process, though the drawback is the complex and unstable search space.

Motivation

Previous algorithms for facial recognition frequently rely on CNNs that may include metric learning loss functions such as contrastive loss or triplet loss. Without sensitive sample mining strategies, the computational cost for these functions was high. This drawback prompts the redesign of classical softmax loss that cannot discriminate features. Multiple softmax loss functions have since been developed, and including margin-based formulations, they often require fine tuning of parameters and are susceptible to instability. Therefore, researchers need to put in a lot of effort in creating their method in the large design space. AM-LFS takes an optimization approach for selecting hyperparameters for the margin-based softmax functions, but its aforementioned drawbacks are caused by the lack of direction in designing the search space.

To solve the issues associated with hand-tuned softmax loss functions and AM-LFS, the authors attempt to reduce the softmax probability to improve feature discrimination when using margin-based softmax loss functions. The development of margin-based softmax loss with only one parameter required and an improved search space using reward-based method allows the authors to determine the best option for their loss function.

Problem Formulation

Analysis of Margin-based Softmax Loss

Based on the softmax probability and the margin-based softmax probability, the following function can be developed [1]:

[math]\displaystyle{ p_m=\frac{1}{ap+(1-a)}*p }[/math] where [math]\displaystyle{ a=1-e^{s{cos{(\theta_{w_y},x)}-f{(m,\theta_{w_y},x)}}} }[/math] and [math]\displaystyle{ a≤0 }[/math]

[math]\displaystyle{ a }[/math] is considered as a modulating factor and [math]\displaystyle{ h{(a,p)}=\frac{1}{ap+(1-a)} \in (0,1] }[/math] is a modulating function [1]. Therefore, regardless of the margin function (f), the minimization of the softmax probability will ensure success.

Compared to AM-LFS, this method involves only one parameter ([math]\displaystyle{ a }[/math]) that is also constrained, versus AM-LFS which has 2M parameters without constraints that specify the piecewise linear functions the method requires. Also, the piecewise linear functions of AM-LFS ([math]\displaystyle{ p_m={a_i}p+b_i }[/math]) may not be discriminative because it could be larger than the softmax probability.

Random Search

Reward-Guided Search

Optimization

Results and Discussion

Results on LFW, SLLFW, CALFW, CPLFW, AgeDB, DFP

Results on RFW

Results on MegaFace and Trillion-Pairs

Conclusion

In this paper, it is discussed that in order to enhance feature discrimination for face recognition, it is key to know how to reduce the softmax probability. To achieve this goal, unified formulation for the margin-based softmax losses is designed. Two search methods have been developed using a random and a reward-guided loss function and they were validated to be effective over six other methods using nine different test data sets.

Critiques

- Thorough experimentation and comparison of results to state-of-the-art provided a convincing argument. - Datasets used did require some preprocessing, which may have improved the results beyond what the method otherwise would. - AM-LFS was created by the authors for experimentation (the code was not made public) so the comparison may not be accurate. - The test data set they used to test Search-Softmax and Random-Softmax are simple and they saturate in other methods. So the results of their methods didn’t show much advantage since they produce very similar results. More complicated data set needs to be tested to prove the method's reliability.

References

[1] X. Wang, S. Wang, C. Chi, S. Zhang and T. Mei, "Loss Function Search for Face Recognition", in International Conference on Machine Learning, 2020, pp. 1-10.

[2] Li, C., Yuan, X., Lin, C., Guo, M., Wu, W., Yan, J., and Ouyang, W. Am-lfs: Automl for loss function search. In Proceedings of the IEEE International Conference on Computer Vision, pp. 8410–8419, 2019. 2020].

[3] S. L. AI, “Reinforcement Learning algorithms - an intuitive overview,” Medium, 18-Feb-2019. [Online]. Available: https://medium.com/@SmartLabAI/reinforcement-learning-algorithms-an-intuitive-overview-904e2dff5bbc. [Accessed: 25-Nov-2020].

[4] “Reinforcement learning,” Wikipedia, 17-Nov-2020. [Online]. Available: https://en.wikipedia.org/wiki/Reinforcement_learning. [Accessed: 24-Nov-2020].

[5] B. Osiński, “What is reinforcement learning? The complete guide,” deepsense.ai, 23-Jul-2020. [Online]. Available: https://deepsense.ai/what-is-reinforcement-learning-the-complete-guide/. [Accessed: 25-Nov-2020].