Astro2020 Science White Paper
The Next Decade
of Astroinformatics and Astrostatistics
Thematic Areas: Planetary Systems Star and Planet Formation Formation and Evolution of Compact Objects Cosmology and Fundamental Physics Stars and Stellar Evolution Resolved Stellar Populations and their Environments Galaxy Evolution Multi-Messenger Astronomy and Astrophysics
Name: Aneta Siemiginowska
Institution: Center for Astrophysics Harvard & Smithsonian
Chair, AAS Working Group on Astroinformatics and Astrostatistics Email: email@example.com
Gwendolyn Eadie111eScience Institute, University of Washington, Seattle, WA 98195, USA222DIRAC Institute, Department of Astronomy, University of Washington, Seattle, WA 98195, USA333Department of Astronomy, University of Washington, Seattle, WA 98195, USA, Ian Czekala444Department of Astronomy, University of California, Berkeley, CA 94720 USA, Eric Feigelson555Penn State University, University Park, PA 16802, USA, Eric B. Ford555Penn State University, University Park, PA 16802, USA, Vinay Kashyap666Center for Astrophysics Harvard & Smithsonian, Cambridge, MA 02138, USA, Michael Kuhn 777California Institute of Technology, Pasadena, CA 91109, USA, Tom Loredo888Cornell University, Cornell Center for Astrophysics and Planetary Science (CCAPS) & Department of Statistical Sciences, Ithaca, NY 14853, USA, Michelle Ntampaka999Harvard University, Cambridge, MA 02138, USA, Abbie Stevens101010Department of Physics & Astronomy, Michigan State University, East Lansing, MI 48824, USA111111Department of Astronomy, University of Michigan, Ann Arbor, MI 48109, USA, Arturo Avelino666Center for Astrophysics Harvard & Smithsonian, Cambridge, MA 02138, USA, Kirk Borne121212Booz Allen Hamilton, Annapolis Junction, MD, USA, Tamas Budavari131313Department of Applied Mathematics & Statistics, Johns Hopkins University, Baltimore, MD 21218, USA, Blakesley Burkhart666Center for Astrophysics Harvard & Smithsonian, Cambridge, MA 02138, USA, Jessi Cisewski-Kehe141414Department of Statistics & Data Science, Yale University, New Haven, CT 06511, USA, Francesca Civano666Center for Astrophysics Harvard & Smithsonian, Cambridge, MA 02138, USA, Igor Chilingarian666Center for Astrophysics Harvard & Smithsonian, Cambridge, MA 02138, USA, David A. van Dyk151515Department of Mathematics, Imperial College London, SW7 2AZ, UK, Giuseppina Fabbiano666Center for Astrophysics Harvard & Smithsonian, Cambridge, MA 02138, USA, Douglas P. Finkbeiner999Harvard University, Cambridge, MA 02138, USA, Daniel Foreman-Mackey161616Flatiron Institute, Center for Computational Astrophysics, New York, NY 10010, Peter Freeman171717Carnegie Mellon University, Pittsburgh, PA, USA, Antonella Fruscione666Center for Astrophysics Harvard & Smithsonian, Cambridge, MA 02138, USA, Alyssa A. Goodman999Harvard University, Cambridge, MA 02138, USA, Matthew Graham777California Institute of Technology, Pasadena, CA 91109, USA, Hans Moritz Guenther181818Massachusetts Institute of Technology, Kavli Institute for Astrophysics and Space Research, Cambridge, MA 02139, USA, Jon Hakkila191919Department of Physics & Astronomy, Associate Dean of the Graduate School, University of Charleston, Charleston, SC 29424, USA, Lars Hernquist999Harvard University, Cambridge, MA 02138, USA, Daniela Huppenkothen222DIRAC Institute, Department of Astronomy, University of Washington, Seattle, WA 98195, USA333Department of Astronomy, University of Washington, Seattle, WA 98195, USA, David J. James666Center for Astrophysics Harvard & Smithsonian, Cambridge, MA 02138, USA, Casey Law444Department of Astronomy, University of California, Berkeley, CA 94720 USA, Joseph Lazio202020Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA 91109, USA, Thomas Lee212121University of California Davis, CA 95616, USA, Mercedes López-Morales666Center for Astrophysics Harvard & Smithsonian, Cambridge, MA 02138, USA, Ashish A. Mahabal222222TAPIR Group, Division of Physics, Mathematics, & Astronomy, California Institute of Technology, Pasadena, CA 91125, USA, Kaisey Mandel232323University of Cambridge, Cambridge, CB3 0HA, UK, Xiao-Li Meng999Harvard University, Cambridge, MA 02138, USA, John Moustakas242424Department of Physics & Astronomy, Siena College, Loudonville, NY 12211, USA, Demitri Muna252525Center for Cosmology and AstroParticle Physics, The Ohio State University, Columbus, OH 43210, USA, J. E. G. Peek262626Department of Physics & Astronomy, Johns Hopkins University, Baltimore, MD 21218, USA272727Space Telescope Science Institute, Baltimore, MD 21218, USA, Gordon Richards282828Drexel University, Department of Physics, Philadelphia, PA 19104, Stephen K.N. Portillo222DIRAC Institute, Department of Astronomy, University of Washington, Seattle, WA 98195, USA333Department of Astronomy, University of Washington, Seattle, WA 98195, USA, Jeff Scargle292929Space Science Division, NASA Ames Research Center, Moffett Field, CA 94035-0001, Rafael S. de Souza303030Department of Physics & Astronomy, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA, Joshua S. Speagle999Harvard University, Cambridge, MA 02138, USA, Keivan G. Stassun313131Vanderbilt School of Engineering, Vanderbilt University, Nashville, TN 37235, USA, David C. Stenning151515Department of Mathematics, Imperial College London, SW7 2AZ, UK, Stephen R. Taylor222222TAPIR Group, Division of Physics, Mathematics, & Astronomy, California Institute of Technology, Pasadena, CA 91125, USA, Grant R. Tremblay666Center for Astrophysics Harvard & Smithsonian, Cambridge, MA 02138, USA, Virginia Trimble323232University of California, Irvine, CA 92697, USA, Padma A. Yanamandra-Fisher333333Space Science Institute, Boulder, CO 80301, USA, C. Alex Young343434NASA Goddard Space Flight Center, Greenbelt, MD 20771 USA.
Abstract: Over the past century, major advances in astronomy and astrophysics have been largely driven by improvements in instrumentation and data collection. With the amassing of high quality data from new telescopes, and especially with the advent of deep and large astronomical surveys, it is becoming clear that future advances will also rely heavily on how those data are analyzed and interpreted. New methodologies derived from advances in statistics, computer science, and machine learning are beginning to be employed in sophisticated investigations that are not only bringing forth new discoveries, but are placing them on a solid footing. Progress in wide-field sky surveys, interferometric imaging, precision cosmology, exoplanet detection and characterization, and many subfields of stellar, Galactic and extragalactic astronomy, has resulted in complex data analysis challenges that must be solved to perform scientific inference. Research in astrostatistics and astroinformatics will be necessary to develop the state-of-the-art methodology needed in astronomy. Overcoming these challenges requires dedicated, interdisciplinary research. We recommend: (1) increasing funding for interdisciplinary projects in astrostatistics and astroinformatics; (2) dedicating space and time at conferences for interdisciplinary research and promotion; (3) developing sustainable funding for long-term astrostatisics appointments; and (4) funding infrastructure development for data archives and archive support, state-of-the-art algorithms, and efficient computing.
1. What is the role of astrostatistics and astroinformatics research?
To develop modern methods for extracting scientific information from astronomical data.
Astrostatistics forms the foundation for robust algorithms and principled methods that are applied to a variety of problems in astronomy. Astroinformatics involves the systematic and disciplined development of code, data management and dissemination techniques, high-performance computing, and machine learning based inference. Both astrostatistics and astroinformatics (i.e., astro data science) have been rapidly emerging fields of research rigorously pursued at the intersection of observational astronomy, statistics, algorithm development, and data science Borne (2010); Loredo (2012); Feigelson & Babu (2013); van Dyk et al. (2015); STScI Big Data (2016). The number of articles with keyword ‘Methods: Statistical’ increased by a factor of 2.5 in the past decade; those with ‘machine learning’ increased by 4 times over five years; and those with ‘deep learning’ have more than tripled every year since 2015. Thus, the challenges of astronomical sciences reveal a deep and broad demand for advanced methodology and techniques. Astronomical problems impossible to approach with traditional methods are now forefront research efforts because of advancements in astrostatistics and astroinformatics.
In the next decade, astronomy data will present new challenges, and will make astrostatistics and astroinformatics research a necessity for nontrivial scientific inference in an increasing range of critical research areas. Astronomy ‘big data’ described by the four V’s — volume, velocity, variety, and veracity — demand new methodologies. It is vitally important that the quality and sophistication of the techniques match the quality and sophistication of the data. The specific application of any new method requires research involving data, statistics, algorithm development and computations, and, thus, the combined knowledge and experience of astronomers, statisticians, and computational experts. Cross-disciplinary collaboration and communication at a very high level are critical to such research; conceptual and jargon barriers between disciplines must be overcome.
Several white papers on astrostatistics and astroinformatics research, endorsed by dozens of leaders in the fields, were submitted to the Astro2010 Decadal Survey Loredo et al. (2009); Borne et al. (2009a, b); Ferguson et al. (2009). Since then, some recommendations have been implemented, such as the formation of the Working Group on Astroinformatics & Astrostatistics within the American Astronomical Society, and the Astrostatistics Interest Group within the American Statistical Association. What remains underdeveloped, however, is the formal recognition of and financial commitment to the efforts needed to make necessary progress in astrostatistics and astroinformatics.
Astrostatistics and astroinformatics research impacts all areas of astronomy and needs to be recognized as a science area within astronomy. explorations. Our recommendation is to: (1) create supportive environments for long-term research in astrostatistics and astroinformatics; (2) promote research in this field with specific national level programs, fellowships, professional development, and consulting; and (3) provide sustained funding for long-term research programs.
2. How do modern astrostatistics and astroinformatics methods impact astronomy?
They overcome challenges with data and improve scientific inference.
Astrostatistics and astroinformatics research does not fit traditional thematic boundaries, as it includes both technological development and scientific research in statistical and information sciences. However, these disciplines are now a necessity for modern astronomical research. Tables 1 and 2 highlight recent advances and expected challenges, and indicate the impact of emerging methods in thematic areas of astronomy.
3. How can the state-of-the-art methods be best applied in astronomy?
Through astronomy involvement in active methodology research
Existing statistical and machine learning methods need to be further developed to be applicable in astronomy. For example, adaptation of recent machine learning advancements to address building explanatory models rather than task-specific predictive models requires astronomy involvement in two active research areas of machine learning:
Scalable probabilistic machine learning (including deep
Most ML algorithms seek to make one set of predictions or
point estimates, optimal to one specific end task. In astronomy, methods need to
quantify uncertainty and provide results (e.g., probabilistic catalogs) that
enable uncertainty propagation.
are well suited to this, but are computationally expensive and not easily scalable to large datasets.
Scalable approaches using ML
are being investigated.
Collaboration with statisticians and computer scientists is needed to
develop such methods tailored to astronomers’ needs. Astronomy needs to become a driver of this research.
Interpretable machine learning (especially deep learning): Complex machine learning methods are coming to astronomy (e.g., deep learning methods involving convolutional and adversarial nets for analyzing image data Reiman & Göhre (2018); Fussell & Moews (2018); Pasquet-Itam & Pasquet (2018); Ntampaka et al. (2018), and recurrent neural nets for time series data Narayan et al. (2019)). Unfortunately, such methods often are used as ‘black box’ predictors, while generalizable understanding of a phenomenon requires an interpretable model. The emerging field of interpretable machine learning involves explanatory goals, not just predictive goals. Astronomy needs to actively participate in this research.
The compilations in Tables 1 and 2 highlight two important facts: (1) common methodology is repeatedly used with small alterations across different wave-bands to address diverse problems that span many thematic areas; and (2) duplicated development efforts slow the pace of advance. To facilitate the faster development and dissemination of advanced methods we recommend:
Funding: Astrostatistics and astroinformatics must be recognized as a subfield of astronomical research that affects all of its thematic areas. Proposals in this field must be evaluated by appropriately cross-disciplinary panels.
Communication: Astronomy conferences must make room for methodological discussion, both to disseminate new advances and to raise the awareness for non-experts. Funding for tutorials and other means of communication should be encouraged.
Sustainability: There must be sustained funding through grants and fellowships, to support graduate students and post-docs for several years. Astronomy departments should be encouraged to have more tenure-track positions focused on data science research.
Infrastructure: There must be support for both maintaining data archives and training data sets, for publicly available and supported software, and efficient computing.