Dr. Leo Singer
NASA TOPS—Success Stories of Open Science Series
Get involved as early as possible and contribute to others’ projects: Q&A with Dr. Leo Singer on open science practices
Dr. Leo Singer has been a research astrophysicist at NASA’s Goddard Space Flight Center since 2015, working on the General Coordinates Network (GCN)—NASA’s next-generation time domain and multimessenger astronomy alert system. With a background in gravitational waves, his research projects concern ground-based optical telescopes and multiwavelength follow-up. He also works on real-time analysis of the Laser Interferometer Gravitational-Wave Observatory (LIGO) data, signal processing, Bayesian inference, and synoptic optical transient surveys. He is actively participating in the Astropy Project, a community effort to develop Python packages for astronomy. He co-authored and co-maintains LIGO's open source public alerts pipeline and science outreach material for astronomers, the LIGO/Virgo/KAGRA Public Alerts User Guide.
What is your definition of open science?
When dealing with data, software, and publications, reproducibility and having complete descriptions of the experiment or the analysis you've done are important so that someone else can repeat it. That's what open science is all about. My whole career has been dedicated to more openness because of the way many NASA missions operate. When the Neil Gehrels Swift Observatory launched, its data policy was revolutionary—it had no proprietary period. All the data was public. The science team at NASA didn't have any privileged access to the data that the public didn't have. And that's been a role model. All of the projects that I've worked on have been open-source. If someone doesn't release their code, you have to wonder what bugs they are hiding. In astronomy, there are proprietary periods for data, but the data eventually becomes public. If the data or code isn't ultimately public, I'm not sure it's science because it may not be reproducible. You need to ensure that there is enough of the derivation there that someone else who comes along can follow it. Also, whatever code you've developed to write a paper should be released along with the paper. I think that packaging and distributing software is almost as important a tool for scientists as being able to write manuscripts in LaTeX.
What are the steps you are taking to accelerate open science?
When I was a graduate student, I wanted to use several software packages that weren't easy to install on the computing cluster. So, I volunteered with the MacPorts Project and Debian to help package some open-source astronomy software for various distributions. I also started using NumPy, Matplotlib, and SciPy so intensely that occasionally I'd find a bug or a faster way to do things. I started contributing little fixes and pull requests. One of the projects I worked on as a graduate student was a gravitational wave signal processing pipeline that used GStreamer. Because this software had not been designed for science applications, there were types of filters that needed a little bit more improvement. So I contributed some code to Gstreamer to improve the signal processing. That was my first significant open-source contribution to a really general-purpose project rather than a project that applied narrowly to my own research. And, around the time that I came to NASA’s Goddard Space Flight Center, I got involved in the Astropy Project because I think that they are writing some of the highest quality astronomy software out there. Also, they're just fantastic people to work with and to learn from.
How has open science improved your research? Are there other benefits you have experienced from practicing open science?
Most of my papers have a source code component, and people come across packages I've developed and cite those papers. I kind of pride myself on how I use my open-source contributions to promote my papers and get citations, which is one of the measures by which people get promotions. People don't get promotions because of the software they develop—they get promotions based on how well-cited their journal articles are. Same thing for meeting collaborators. I'll encounter people at conferences, and they'll say, “Oh, I've heard your name.” They've heard my name because they used my software. So it's a great way to make connections with people. Also, I think a lot of soft skills are involved in crafting open-source contributions so that a stranger can understand, review, and accept them. Learning how to write code for someone else's project in a way that it's easy to understand and having technical discussions with strangers about code makes it easier for me to affect the changes I want to make. Through years and years of practice with submitting pull requests and working with people on GitHub, I'm usually pretty confident that when I find a bug in Astropy, NumPy or Matplotlib, I can fix it. And the fix will likely go into production and benefit everyone. It's just so empowering.
What challenges have you faced while practicing open science? What strategies did you use to overcome these challenges?
My biggest challenge is how to transfer expertise to first-time contributors. I often have some idea of what the final product might look like. But, the skill set of the contributor might not be where they can produce that final product. This is one of the difficult things about developing open-source software. It does the first-time contributor no good if the maintainer tells them exactly what to do. You have to teach best practices without taking the agency away from the contributor—you have to do it in a way that's respectful and helps that person grow. Also, anyone in science who deals with software eventually needs to become very proficient with Git. Git is hard to learn—it’s hard to get to the point where you can solve problems with it on your own. It’s a fundamental skill for working on any project moving with some substantial velocity. It's just a complex skill to teach, and it takes time for these skills to percolate through the community.
What would you say to early career researchers who want to practice open science?
I think prospective graduate students selecting a research group to work for should ask these questions: What is their open-source output? Does this lab have relevant software skill sets, or are they 20 years behind the times? Do they put their code on GitHub? Do they publish code along with their papers? Are the data and code produced by this group having an impact? If the answer is no, you're unlikely to learn open science practices by working in that group. But also, these questions require a lot of awareness and savviness on the student's part. Not everyone has enough information to make informed decisions. So I think this is mainly on the senior people—they need to know how to promote open science and develop a recruitment pipeline that keeps their group’s software and data skills current.
Other than that, if you're lucky enough to work in a group that values open science, my next suggestion would be: Don't assume that the tools you use are immutable. In astronomy, at least, almost everyone uses NumPy, SciPy, and Astropy. Those are all open-source packages. They are among the best projects in terms of the quality of the implementation, but their developer communities are also very welcoming. For example, it can be as simple as contributing a pull request if you notice a typo in the NumPy documentation. Get involved as early as possible because it takes years to get really good at contributing to other people's projects.
What are the most urgent things that should be addressed in the field to accelerate open science further?
As a NASA civil servant, the most urgent thing is to reform NASA Procedural Requirements (NPR) 2210, abolish the NASA Open Source Agreement and NASA Contributor License Agreements, and make it clear that NPR 7150 does not apply to science software intended mainly for the public. Then, I can do science for open data and open software much more effectively.
Lastly, is there anything you would like to add regarding open science?
My colleagues and I at NASA’s Goddard Space Flight Center are working on this new science data portal called General Coordinates Network. This is a system that takes gamma rays and other high-energy transients that are detected by space missions, physics experiments, and observatories around the world. And it sends real-time alerts publicly to a community of thousands of astronomers worldwide, who follow up on the sources we've detected from space using their telescopes on the ground. And then, they publish these astronomical bulletins called Circulars to share their observations. This system has been going on since the 90s. But, we're modernizing this and converting it from its antiquated network protocol stack to Apache Kafka, a modern streaming framework for data distribution and analysis. We’re doing a soft launch right now, and I'm really excited about this. Our client software is open-source, but our website is also open-source, which I'm quite proud of.