Fairness Without Demographics in Repeated Loss Minimization
This page contains the summary of the paper "Fairness Without Demographics in Repeated Loss Minimization" by Hashimoto, T. B., Srivastava, M., Namkoong, H., & Liang, P. which was published at the International Conference of Machine Learning (ICML) in 2018. In the following, an
Overview of the Paper
Introduction
Fairness
Example and Problem Setup
Why Empirical Risk Minimization (ERM) does not work
Distributonally Robust Optimization (DRO)
Risk Bounding Over Unknown Groups
At this point our goal is to minimize the worst-case group risk over a single time-step [math]\displaystyle{ \mathcal{R}_{max} (\theta^{(t)}) }[/math]. As previously mentioned, this is difficult to do because neither the population proportions [math]\displaystyle{ \{a_k\} }[/math] nor group distributions [math]\displaystyle{ \{P_k\} }[/math] are known. Therefore, Hashimoto et al. developed an optimization technique that is robust "against all directions around the data generating distribution". This refers to fact that this distributionally robust optimization (DRO) is robust to any group distribution [math]\displaystyle{ P_k }[/math] of a group [math]\displaystyle{ k \in K }[/math] if the population proportion [math]\displaystyle{ a_k }[/math] of this group is greater than or equal to the lowest population proportion [math]\displaystyle{ a_{min} }[/math] (which is specified in practice). To create this distributionally robustness, the optimizations risk function [math]\displaystyle{ \mathcal{R}_{dro} }[/math] has to "up-weigh" data [math]\displaystyle{ Z }[/math] that cause high loss [math]\displaystyle{ \mathcal{l}(\theta, Z) }[/math]. In other words, the risk function has to over-represent mixture components (i.e. group distributions [math]\displaystyle{ \{P_k\} }[/math]) in relation to their original mixture weights (i.e. the population proportions [math]\displaystyle{ \{a_k\} }[/math]) for groups that suffer high loss. To do this we need to consider the worst-case loss (i.e. the highest risk) over all perturbations [math]\displaystyle{ P_k }[/math] around [math]\displaystyle{ P }[/math] within a certain range [math]\displaystyle{ r }[/math]. This range [math]\displaystyle{ r }[/math] is defined as the radius [math]\displaystyle{ r }[/math] of a chi-squared ball [math]\displaystyle{ \mathcal{B}(P,r) }[/math] around this probability distribution P. This ball is defined so that