Fairness Without Demographics in Repeated Loss Minimization

From statwiki
Jump to navigation Jump to search

This page contains the summary of the paper "Fairness Without Demographics in Repeated Loss Minimization" by Hashimoto, T. B., Srivastava, M., Namkoong, H., & Liang, P. which was published at the International Conference of Machine Learning (ICML) in 2018. In the following, an

Overview of the Paper

Introduction

Fairness

Example and Problem Setup

Why Empirical Risk Minimization (ERM) does not work

Distributonally Robust Optimization (DRO)

Risk Bounding Over Unknown Groups

At this point our goal is to minimize the worst-case group risk over a single time-step [math]\displaystyle{ \mathcal{R}_{max} (\theta^{(t)}) }[/math]. As previously mentioned, this is difficult to do because neither the population proportions [math]\displaystyle{ \{a_k\} }[/math] nor group distributions [math]\displaystyle{ \{P_k\} }[/math] are known. Therefore, Hashimoto et al. developed an optimization technique that is robust "against all directions around the data generating distribution". This refers to fact that this distributionally robust optimization (DRO) is robust to any group distribution [math]\displaystyle{ P_k }[/math] of a group [math]\displaystyle{ k \in K }[/math] if the population proportion [math]\displaystyle{ a_k }[/math] of this group is greater than or equal to the lowest population proportion [math]\displaystyle{ a_{min} }[/math] (which is specified in practice). To create this distributionally robustness, the optimizations risk function [math]\displaystyle{ \mathcal{R}_{dro} }[/math] has to "up-weigh" data [math]\displaystyle{ Z }[/math] that cause high loss [math]\displaystyle{ \mathcal{l}(\theta, Z) }[/math]. In other words, the risk function has to over-represent mixture components (i.e. group distributions [math]\displaystyle{ \{P_k\} }[/math]) in relation to their original mixture weights (i.e. the population proportions [math]\displaystyle{ \{a_k\} }[/math]) for groups that suffer high loss. To do this we need to consider the worst-case loss (i.e. the highest risk) over all perturbations [math]\displaystyle{ P_k }[/math] around [math]\displaystyle{ P }[/math] within a certain radius [math]\displaystyle{ r }[/math]. This radius [math]\displaystyle{ r }[/math] is the radius [math]\displaystyle{ r }[/math] of a chi-squared ball [math]\displaystyle{ \mathcal{B}(P,r) }[/math] around this probability distribution P. This ball is defined so that [math]\displaystyle{ \mathcal{B}(P,r) := \{Q \ll P : D_{\chi^2} (Q || P) \leq r \} }[/math].