Our work on distributed partial least squares (PLS) has been accepted for an oral presentation at SIPAIM 2016. An earlier version of this paper was presented by Marco Lorenzi at MASAMB (Matematical and Statistical aspects of Molecular Biology) in Cambridge in early October.
Title: Secure multivariate large-scale multi-centric analysis through on-line learning: an imaging genetics case study
Authors: Marco Lorenzi, Boris Gutman, Paul M. Thompson, Daniel C. Alexander, Sebastien Ourselin, Andre Altmann
Abstract: State-of-the-art data analysis methods in genetics and related fields have advanced beyond massively univariate analyses. However, these methods suffer from the limited amount of data available at a single research site. Re- cent large-scale multi-centric imaging-genetic studies, such as ENIGMA, have to rely on meta-analysis of mass univariate models to achieve critical sample sizes for uncovering statistically significant associations. Indeed, model parameters, but not data, can be securely and anonymously shared between partners. We propose here partial least squares (PLS) as a multivariate imaging-genetics model in meta-studies. In particular, we propose an online estimation approach to partial least squares for the sequential estimation of the model parameters in data batches, based on an approximation of the singular value decomposition (SVD) of partitioned covariance matrices. We applied the proposed approach to the challenging problem of modeling the association between 1,167,117 genetic markers (SNPs, single nucleotide polymorphisms) and the brain cortical and sub-cortical atrophy (354,804 anatomical surface features) in a cohort of 639 individuals from the Alzheimer’s Disease Neuroimaging Initiative. We compared two different modeling strategies (sequential- and meta-PLS) to the classic non-distributed PLS. Both strategies exhibited only minimal approximation errors of model parameters. The proposed approaches pave the way to the application of multivariate models in large scale imaging-genetics meta-studies, and may lead to novel understandings of the complex brain phenotype-genotype interactions.