Research interests: I am interested in mathematical statistics, from a theoretical and algorithmic point of view, with applications to computational biology.
Keywords: High-dimensional statistics, sparse multi-linear models, graphical models, optimization, gene regulatory network inference, multi-omics data integration, applications to medical research.
Current projects:
In collaboration with Mathilde Pacault (MD, from Brest), we are working on Sequential Probability Ratio Tests for applications to non-invasive prenatal diagnosis (NIPD).
Keywords: Sequential probability ratio tests, applications to medical research.
As part of the PhD works of Camille Champion (IMT Toulouse), I am also interested in sparse robust spectral clustering techniques for clustering large perturbed graphs.
During two years, I was part of the LIONS project, funded by INSERM and the French National Cancer Institute. The main goal of this project was to develop mathematical models to identify deregulated transcription factors involved in specific subtypes of bladder cancer.
Keywords: Deregulation, Gene regulatory networks, Cancer systems biology.
I was previously working at the Stanford Center of Biomedical Informatics Research. My main goal was to create statistical models that allowed the integration of multi-omics data to identify cancer driver genes and to understand their roles within the genome.
Keywords: Multi-omics data fusion, Cancer driver gene discovery, Module network.
PhD works: My PhD works intended to study a theoretical analysis and the use of statistical and optimization methods in the context of gene regulatory networks. Such networks are powerful tools to represent and analyze complex biological systems, and enable the modeling of functional relationships between elements of these systems. The first part of this work was dedicated to the study of statistical learning methods to infer networks from sparse linear regressions, in a high-dimensional setting, and particularly the L2-Boosting algorithm. From a theoretical point of view, some consistency results and support stability results were obtained, assuming conditions on the dimension of the problem.
The second part dealt with the use of L2-Boosting algorithms to learn Sobol indices in a sensitive analysis setting. The estimation of these indices was based on the decomposition of the model with functional ANOVA. The elements of this decomposition were estimated using a procedure of Hierarchical Orthogonalisation of Gram-Schmidt, devoted to build an approximation of the analytical basis, and then a L2-Boosting algorithm, in order to obtain a sparse approximation of the signal. We showed that the obtained estimator is consistant in a noisy setting on the approximation dictionary.
The last part concerned the development of optimization methods to estimate relationships in networks. We showed that the minimization of the log-likelihood could be written as an optimization problem with two components, which consisted in finding the structure of the complete graph (order of variables of the nodes of the graph), and then, in making the graph sparse. We developped GADAG, a combination of a convex program and a tailored genetic algorithm adapted to the particular structure of our problem, to solve it.
Champion, C., Champion, M., Blazère, M. and Loubes, J.M., l1-spectral clustering algorithm: a robust spectral clustering using Lasso regularization.
Submitted.
International journals and conference proceedings:
The 12th International Conference on Bioinformatics and Computational Biology (BICOB) "Identification of deregulated transcription factors in specific bladder cancer subtypes", San Francisco (USA), 23rd of March 2020. Canceled due to coronavirus.
Intelligent Systems for Molecular Biology (ISMB) "Module Analysis Captures Pancancer Genetically and Epigenetically Deregulated Cancer Driver Genes for Smoking and Antiviral Response", Boston (USA), 9th of July 2018. Presented by O. Gevaert.
The 16th International Conference on Bioinformatics InCoB, poster: "Identification of deregulated transcription factors in specific subtypes of cancer", Shenzhen (Chine), 22nd of September 2017.
Keystone symposia on Molecular and Cellular Biology: The Cancer Genome "Pancancer module analysis captures major oncogenic pathways and identifies master regulator of immune response", Banff (Canada), 7th of February 2016.
14th Annual International Conference on Critical Assessment of Massive Data Analysis "Multi-omics data fusion for cancer data", Dublin (Ireland), 11th of July 2015. Presented by O. Geveart.
SIAM Conference on Uncertainty Quantification "L2-Boosting on Generalized Hoeffding Decomposition for Dependent Variables. Application to Sensitivity Analysis", Savannah (USA), 31st of March 2014.
NIPS Whorshop Machine Learning for Computational Biology, poster: "An L2-Boosting algorithm for sparse multivariate regression: application to gene network recovery", Sierra Nevada (Spain), 17th of December 2011.
National conferences:
6ièmes Rencontres R "GADAG : un paquet R dédié à l'inférence de Graphes Acycliques Dirigés par maximum de vraisemblance pénalisé", Anglet (France), 06/30/17.
49ièmes Journées de Statistique de la SFDS "Inférence de Graphes Acycliques Dirigés par maximum de vraisemblance pénalisé", Avignon (France), 05/30/17.
Stanford Cancer Institute Trainees Symposium "Pancancer module analysis captures major oncogenic pathways and identifies master regulator of immune response", Stanford (USA), 02/23/16.
StatMathAppli 2015 "Multi-omics data fusion for cancer data", Fréjus (France), 08/31/15.
45ièmes Journées de Statistique de la SFDS"Résultats sur les algorithmes de L2-Boosting pour les régressions parcimonieuses", Toulouse (France), 05/27/13.
Other talks:
"l1-spectral clustering algorithm: a robust spectral clustering using Lasso regularization": 01/08/21 at University of Rennes.
"Sparse regression and optimization in high-dimensional framework for Gene Regulatory Network inference: application to cancer data": 05/23/19 at University of Rouen, 03/05/18 at University of Montpellier, 01/15/18 at AgroParisTech, 10/09/17 at University of Angers, 02/10/17 at University of Rennes, 01/30/17 at University of Marseille, 01/18/17 at University of Lille I, 12/06/16 at University of Toulouse III, 11/24/16 at INRIA Nancy.
"Identification of deregulated transcription factors in bladder cancer", 11/27/19 at University of Lille, Colloquium CARTABLE for Network Learning, Toulouse, 10/14/16.
"AMARETTO: a multi-omics data integration framework": 02/19/19 at University of Bordeaux, 12/06/18 at INRIA Nancy, 12/14/17 at University of Montpellier, 09/08/17 at INRA Toulouse, 01/21/16 at Stanford University and 09/25/15 webtalk for the Cancer Target Discovery Network.
"Sparse regression and optimization in high-dimensional framework: application to Gene Regulatory Netowrks": 03/17/15 at INRIA Lille, 11/06/14 at University of Nantes, 10/07/14 at University of Toulouse I.
"Statistical causal inference of Gene Regulatory Networks using l1 penalized likelihood" at INRA Jouy-en-Josas.
"An hybrid convex/greedy algorithm for learning DAG": 06/16/2014 at University of Strasbourg, 03/20/14 at MIA Colloquim in Ecully.
"Convex optimization for learning Gene Regulatory Network": 09/12/13 for the NETBIO workgroup in Paris, 06/14/13 at INRA Toulouse.
"Résultats sur les algorithmes de L2-Boosting pour les régressions sparses : cadre formel et extensions à la situation multivariée": 02/04/13 INRA Montpellier.