Suggested Searches

Machine Learn the Roman Universe

Wide-Field Science – Regular
Shirley Ho, New York University, PI

The Roman Space Telescope (Roman) with its large-scale structure (LSS) survey will provide us with data of unprecedented information content to elucidate fundamental questions about our Universe, such as its origins, content, and its future. Progress in any one of these directions could constitute a groundbreaking discovery in physical cosmology. However, nonlinear gravitational evolution makes extracting the pertinent information with traditional methods challenging, as none of the current methods deployed by LSS surveys are able to extract the full information content of the Universe.

To address this challenge, we propose to develop three Machine Learning (ML) based methods to learn the information in the data and determine the cosmological parameters and initial conditions of the Universe. The proposed methods have the potential to optimally (information theoretically) extract information from the Roman LSS data. The first method is based on a Bayesian statistical inference framework, where one first reconstructs the initial conditions and uses that information to learn the data likelihood. The second method is based on unsupervised learning, where we learn the data likelihood as a function of cosmological parameters via a Normalizing Flow. The third method is based on diffusion models, which generate posterior samples of the initial conditions and properties of the Universe from non-linear large-scale structure using score-based generative models.

We will pay special attention to robustness of all of the methods against systematic errors and astrophysical effects, leveraging astrophysical nuisance parameters that can be marginalized over, and utilizing scale separation information. An important contribution of this proposal is the generation of mock survey datasets via deep-learning accelerated simulations of the galaxy surveys. They will also serve as a testbed for our ML methods. As an example use case, we will apply these tools to the problem of extracting information about the initial conditions of the universe via primordial non-Gaussianity from space-based galaxy survey data.

A second major goal of the proposal is to develop a community framework within which different ML methods can be tested and compared. We will create deep-learning accelerated simulated datasets with survey realism that can be used for bench-marking and for blind analyses of different methods using realistic computational simulations. We will promote open access ML tools by releasing both the software and simulated datasets into the public domain, and by providing community support for these products. We will encourage community engagement through data challenges.

Results of this study will provide new ML methods that promise to considerably improve the information content of existing methods of LSS analysis, which could unlock an expanded potential not only for Roman, but also space-based LSS missions to illuminate the fundamental physics of the Universe.