Revision as of 16:33, 19 October 2018

This page contains the summary of the paper "Fairness Without Demographics in Repeated Loss Minimization" by Hashimoto, T. B., Srivastava, M., Namkoong, H., & Liang, P. which was published at the International Conference of Machine Learning (ICML) in 2018. In the following, an

Overview of the Paper

Introduction

Fairness

Example and Problem Setup

Why Empirical Risk Minimization (ERM) does not work

Distributonally Robust Optimization (DRO)

Risk Bounding Over Unknown Groups

At this point our goal is to minimize the worst-case group risk over a single time-step [math]\displaystyle{ \mathcal{R}_{max} (\theta^{(t)}) }[/math]. As previously mentioned, this is difficult to do because neither the population proportions [math]\displaystyle{ \{a_k\} }[/math] nor group distributions [math]\displaystyle{ \{P_k\} }[/math] are known. Therefore, Hashimoto et al. developed an optimization technique that is robust "against all directions around the data generating distribution". This refers to fact that this distributionally robust optimization (DRO) is robust to any group distribution [math]\displaystyle{ P_k }[/math] of a group [math]\displaystyle{ k \in K }[/math] if the population proportion [math]\displaystyle{ a_k }[/math] of this group is greater than or equal to the lowest population proportion [math]\displaystyle{ a_{min} }[/math] (which is specified in practice). To create this distributionally robustness, the optimizations risk function [math]\displaystyle{ \mathcal{R}_{dro} }[/math] has to "up-weigh" data [math]\displaystyle{ Z }[/math] that cause high loss [math]\displaystyle{ \mathcal{l}(\theta, Z) }[/math]. In other words, the risk function has to over-represent mixture components (i.e. group distributions [math]\displaystyle{ \{P_k\} }[/math]) in relation to their original mixture weights (i.e. the population proportions [math]\displaystyle{ \{a_k\} }[/math]) for groups that suffer high loss. To do this we need to consider the worst-case loss (i.e. the highest risk) over all perturbations [math]\displaystyle{ P_k }[/math] around [math]\displaystyle{ P }[/math] within a certain radius [math]\displaystyle{ r }[/math]. This radius [math]\displaystyle{ r }[/math] is the radius [math]\displaystyle{ r }[/math] of a chi-squared ball [math]\displaystyle{ \mathcal{B}(P,r) }[/math] around this probability distribution P. This ball is defined so that [math]\displaystyle{ \mathcal{B}(P,r) := \{Q \ll P : D_{\chi^2} (Q || P) \leq r \} }[/math].

Revision as of 16:22, 19 October 2018 (view source) Mpafla (talk \| contribs) No edit summary ← Older edit		Revision as of 16:33, 19 October 2018 (view source) Mpafla (talk \| contribs) (→‎Risk Bounding Over Unknown Groups) Newer edit →
Line 15:		Line 15:
	==Risk Bounding Over Unknown Groups==		==Risk Bounding Over Unknown Groups==

	At this point our goal is to minimize the worst-case group risk over a single time-step <math display="inline">\mathcal{R}_{max} (\theta^{(t)}) </math>. As previously mentioned, this is difficult to do because neither the population proportions <math display="inline">\{a_k\} </math> nor group distributions <math display="inline">\{P_k\} </math> are known. Therefore, Hashimoto et al. developed an optimization technique that is robust "against '''''all''''' directions around the data generating distribution". This refers to fact that this distributionally robust optimization (DRO) is robust to any group distribution <math display="inline">P_k </math> of a group <math display="inline">k \in K</math> if the population proportion <math display="inline">a_k </math> of this group is greater than or equal to the lowest population proportion <math display="inline">a_{min} </math> (which is specified in practice). To create this distributionally robustness, the optimizations risk function <math display="inline">\mathcal{R}_{dro} </math> has to "up-weigh" data <math display="inline">Z</math> that cause high loss <math display="inline">\mathcal{l}(\theta, Z)</math>. In other words, the risk function has to over-represent mixture components (i.e. group distributions <math display="inline">\{P_k\} </math>) in relation to their original mixture weights (i.e. the population proportions <math display="inline">\{a_k\} </math>) for groups that suffer high loss. To do this we need to consider the worst-case loss (i.e. the highest risk) over all perturbations <math display="inline">P_k </math> around <math display="inline">P</math> within a certain ~~range~~ <math display="inline">r</math>. This ~~range~~ <math display="inline">r</math> is ~~defined as~~ the radius <math display="inline">r</math> of a chi-squared ball <math display="inline">\mathcal{B}(P,r)</math> around this probability distribution P. This ball is defined so that		At this point our goal is to minimize the worst-case group risk over a single time-step <math display="inline">\mathcal{R}_{max} (\theta^{(t)}) </math>. As previously mentioned, this is difficult to do because neither the population proportions <math display="inline">\{a_k\} </math> nor group distributions <math display="inline">\{P_k\} </math> are known. Therefore, Hashimoto et al. developed an optimization technique that is robust "against '''''all''''' directions around the data generating distribution". This refers to fact that this distributionally robust optimization (DRO) is robust to any group distribution <math display="inline">P_k </math> of a group <math display="inline">k \in K</math> if the population proportion <math display="inline">a_k </math> of this group is greater than or equal to the lowest population proportion <math display="inline">a_{min} </math> (which is specified in practice). To create this distributionally robustness, the optimizations risk function <math display="inline">\mathcal{R}_{dro} </math> has to "up-weigh" data <math display="inline">Z</math> that cause high loss <math display="inline">\mathcal{l}(\theta, Z)</math>. In other words, the risk function has to over-represent mixture components (i.e. group distributions <math display="inline">\{P_k\} </math>) in relation to their original mixture weights (i.e. the population proportions <math display="inline">\{a_k\} </math>) for groups that suffer high loss. To do this we need to consider the worst-case loss (i.e. the highest risk) over all perturbations <math display="inline">P_k </math> around <math display="inline">P</math> within a certain radius <math display="inline">r</math>. This radius <math display="inline">r</math> is the radius <math display="inline">r</math> of a chi-squared ball <math display="inline">\mathcal{B}(P,r)</math> around this probability distribution P. This ball is defined so that <math display="inline">\mathcal{B}(P,r) := \{Q \ll P : D_{\chi^2} (Q \|\| P) \leq r \}</math>.

Fairness Without Demographics in Repeated Loss Minimization: Difference between revisions

Revision as of 16:33, 19 October 2018

Contents

Overview of the Paper

Introduction

Fairness

Example and Problem Setup

Why Empirical Risk Minimization (ERM) does not work

Distributonally Robust Optimization (DRO)

Risk Bounding Over Unknown Groups

Navigation menu

Fairness Without Demographics in Repeated Loss Minimization: Difference between revisions

Revision as of 16:33, 19 October 2018

Overview of the Paper

Introduction

Fairness

Example and Problem Setup

Why Empirical Risk Minimization (ERM) does not work

Distributonally Robust Optimization (DRO)

Risk Bounding Over Unknown Groups

Navigation menu

Search