When can Multi-Site Datasets be Pooled for Regression? Hypothesis Tests, l2-consistency and Neuroscience Applications: Summary
Main Contributions of the Research Article:
- The main result is a hypothesis test to evaluate whether pooling data across multiple sites for regression (before or after correcting for site-specific distributional shifts) can improve the estimation (mean squared error) of the relevant coefficients (while permitting an influence from a set of confounding variables).
- Show how pooling is can be used even when the features are different across sites. For this they show the L2-consistency rate which supports the use of spare-multi-task Lasso when sparsity patterns are not identical
- Experimental results showing consistent acceptance power for early Alzheimer’s detection (AD) in humans.