Africa Ixmucane Flores-Anderson
NASA TOPS—Success Stories of Open Science Series
Learn by doing transparent, replicable, and understandable science: Q&A with Africa Ixmucane Flores-Anderson on open science practices
Originally from Guatemala, Africa Ixmucane Flores-Anderson is a research scientist at the University of Alabama in Huntsville who has been working with satellite data for over 15 years.
Currently, she works with SERVIR Global—a joint initiative between NASA and the United States Agency for International Development (USAID)—as the Land Cover Land Use Change (LCLUC) and Ecosystems Theme Lead. She focuses on using optical and synthetic aperture radar (SAR) satellite datasets for environmental monitoring, including but not limited to 1) water quality of freshwater bodies, 2) forest disturbances, and 3) LCLUC. She also serves as an Envoy of the NASA-Indian Space Research Organization (ISRO) InSAR (NISAR) mission, and she led the creation of the joint SERVIR-SilvaCarbon SAR Handbook on how to use SAR for forest monitoring and ecosystem applications. She is also a Ph.D. candidate in natural resource sciences at McGill University, specializing in SAR applications to improve forest monitoring.
What is your definition of open science?
Open science is a process that allows us to make science understandable, replicable, and accessible, and all the components that make that happen. For example, any discovery has to be understandable and replicable by someone interested in that topic. In order to do that, we have to create all the components and materials because a peer-reviewed publication sometimes does not provide all of that. Even for scripts, you need to allow everyone to access your code with enough descriptions explaining each step in the workflow. By doing that, I can do the research again myself, and others can understand the concept of that discovery. So, open science is a combination of components that make a discovery replicable.
There are different schools of thought in open science. Where do you focus your open science practices?
At SERVIR, we have always pursued open science. In Latin America, Africa, and Asia, we have projects called “services,” which are a combination of activities, datasets, training, tools, applications, you name it. Any activities that are executed at each of the SERVIR hubs are publicly available. We started with open data because we have the commitment to share all the outcomes of our work. Another big component of what we’re working on is making scripts and algorithms transparent and available to the public. I realized that, for satellite data to be completely adopted by end users, it’s not only a matter of providing access to the satellite-derived final map because that's a map that someone else created. People want to see what goes behind the scene, what parameters were used, and how they can recreate it; they want to understand the process and create it for themselves and say, “This is mine. I did it.”
I think that's where our field is moving, particularly with Big Data and Big Data analytics. Now everyone has access to data and is looking forward to generating their own information. And we are making those methods and algorithms accessible to the end user—the whole workflow from beginning to end, and sometimes even providing the script itself. Scripts may not be easily accessible to everyone due to the programming language used, lack of documentation, proprietary software, or script errors. A lot of work is required to make those workflows accessible, understandable, and replicable—that’s what I'm looking forward to providing with my research. And that's what we try to provide with our handbook (the SAR Handbook) at SERVIR. Working with USAID and NASA, we always try to follow open data policies at SERVIR. In fact, some of our activities were to support the implementation of spatial data infrastructures for countries to create their own data systems to make data shareable. Many of the activities we do in these regions depend on local data. Sometimes data infrastructure is one of the hardest things to obtain—just the process of curating information and making it available to other local institutions and groups.
How do you maintain respect and ethical policies with data without jeopardizing local communities?
We ensure that we have targeted and specific services. That allows us to make sure that we are following good practices. Another way to ensure that is through co-development. Any activity that is implemented is driven by a need particular to what the local stakeholder has requested. That stakeholder becomes the co-developer of that service or activity. Eventually, if not immediately, they will be the ones who continue to provide that service—it is in their best interest to make that activity or process happen. All of that is decided through the co-development process and strong communication. I think that becomes key in the development of science. The SERVIR model works because we have relationships in each of these regions and with local scientists who are doing this work and enabling those communications.
What steps are you taking to accelerate open science?
We start with the definition of what project or analysis we will do, which stems from a local need. We scope our project by a particular stakeholder, given our understanding of the region or country. Who are the actors, and who is doing what with what responsibilities? For my research on Lake Atitlán, for example, the lake authority from the Ministry of Environment in Guatemala had the mandate to monitor the lake and issue official information about the conditions of the lake. There was also a university constantly collecting data on the lake. So those two were our main stakeholders for anything that we produced. They were happy with our qualitative research on the algae bloom maps in 2009. It was through those images that they could define the extent of the bloom and the damage; at its highest point, algae covered almost 40% of the lake. We also created quantitative products in collaboration with the lake authority and the university; I generated an algorithm to define chlorophyll concentration with satellite images for Lake Atitlán. Also, we provided training about how all of this works: the workflow, the principles of why we can use satellite data to monitor algae bloom, and algorithms to do it for Lake Atitlán.
We published a paper with our stakeholders because they helped us collect the field observation data; they are co-authors of the paper. They were involved in the whole process and were aware of what was happening. Everything is communicated back and provided to them.
Regarding ethics, I think scientists often forget to credit local scientists who have created and collected data. In my case, we had a relatively small amount of money to pay local scientists to collect the data. But that's not possible most of the time. There are no resources to do that, and researchers end up using data that someone else has created without paying or even recognizing them. They just use the data and publish without acknowledging who created the data. It has to change if we want to do open science. We need to include and engage everyone working on a project.
What is the biggest factor that has helped you successfully practice open science?
I used to be one of the people who received information from the community. Now I feel like I'm on the other side, one of the people who are producing information. So, I can understand what it is like to be on both sides. I also worked at the Guatemalan Protected Areas Council, so I can also wear that hat. Usually, local scientists deal with very limited resources and have a lot of work, and national priorities might be different from those of international programs. So, when we decide what we want to focus on in a new project, it has to come from the local situation. We can have great ideas, but what if there is no one on the other side? They might be interested and may agree with our ideas. But my idea might not be one of the national priorities, and their supervisor will not approve that activity. So, I think it has greatly helped me put myself on both sides.
What challenges have you faced while practicing open science? What strategies did you use to overcome these challenges?
First, something very common in international development is the high turnover of professionals at the local level because of government instability. You have great collaborators working on a national priority. Then, when a new government comes in, the priorities change, and the whole team changes. It's hard to have a successful project or activity when those changes occur. A way to address that is working with universities because we know that what we build and the knowledge that is shared will be maintained. And they already have the basic knowledge of the topic, and they are going to be the professionals of the future. This is a long-term solution that works.
Another thing is that everything is moving very fast with Big Data and analytics in this new era of science and technology. One of the main issues right now, and something that I’m facing personally, is that what you do now becomes obsolete very fast tomorrow. We need stronger programming skills to leverage Big Data. I need a very good team with very good programming skills. And that's a challenge when we try to work in the cloud. I wasn't trained that much in programming skills. I am building them right now to generate the science that I want to generate, because now our questions are different. What I did today requires other things tomorrow. We need to create more and more data. So you need the skills to go with that. There are multiple datasets for the same period, for example. If we want to combine all of them, we need to work with communities facing the same problem. Even if we create scripts to generate something new, we will need to do more work explaining how all of this works to make that accessible, understandable, and replicable.
What is your philosophy on training researchers in open science?
Training is key to make open science a reality. There are a lot of online communities, like Pangeo, where people can build programming skills. But something that I have seen and I'm very fearful of is that we can be very technology-driven and say, “AI [Artificial Intelligence] is going to solve everything,” without really understanding the basic principles behind the technology. There’s a disconnect there. For example, you need to be able to program but also understand the science behind satellite remote sensing. There are a lot of things that we can do with data, but we need a basic level of knowledge. Something that stems from this idea is that satellite remote sensing and observations can be used for different applications in other fields. As time goes by, we find new applications and ways to use satellite remote sensing for all those different fields where building programming skills is not a tradition. But we need to learn new skills to gain new knowledge.
Have you made mistakes while practicing open science? How would you address them differently if you were to do it again?
A big lesson learned for me, has been making the right connections and communicating with the key groups that have the mandate to generate official information. We can’t come to a new place, region, or country and dictate what is happening, contradict the official records, or overstep with the organizations there. We, as scientists, need to be responsible and informed when doing work abroad. The key is to communicate with the key groups and be well informed. Just think, if there are multiple groups providing conflicting information about a national priority, it can be challenging for authorities to know what to do and what to believe.
How has open science improved your research? Are there other benefits you have experienced from practicing open science?
For me, there is no way to do science if it's not transparent, replicable, and understandable. I think everything we do in science should have applicability and benefit someone. Personally, I have benefited from open science because it taught me how to do my job better. I think it makes me a better scientist in general. If I cannot even explain what I am doing, and if someone cannot understand and replicate my work, I don't find purpose in my work. In order to be a good scientist, we need to apply open science principles. Sometimes, people like to be very cryptic, but I find that the scientists I admire are all excellent communicators. Their work is open, transparent, and understandable. And I can replicate it. That's how I learn.
What would you say to researchers who want to practice open science but experience barriers to doing so?
You need to understand that open science definitely includes more hours of work. For example, you need to create your algorithm, publish well-documented scripts, and make sure that the input datasets are accessible to the public. We also need to think about how grants include the needed additional resources to make things shareable, accessible, and replicable. Discussion about open science has to include the funding to create materials, training, white papers, and one-pagers that will make open science a reality. Sometimes this means going the extra mile.
In your opinion, what is the most urgent thing that should be addressed in open science?
To make open science a reality, all the associated documentation has to be created for every grant, like training materials from end to end. For example, scripts have to be written in open-source languages and made public. And all the datasets produced for a given grant have to be available to the public. Much of that is already happening, but we need a comprehensive system that manages all of that in an organized and harmonized way. We need funding and policies that will make this happen faster.
What are some of your favorite open science tools or resources that you’d like to share?
There are so many different tools and platforms that enable open science, but I’m going to mention a few of my favorite ones. First, Open SAR Lab at the Alaska Satellite Facility (ASF) allows us to process SAR data in a Jupyter Notebook. This is also great for teaching. I need to disclaim that I’m a co-investigator with Principal Investigator Franz Meyer on a project that uses Open SAR Lab in Latin America. I use ASF Data Search to generate a time series of SAR data to monitor deforestation. This is a great introductory step for people to access SAR and process data in the cloud. SAR data are very computation-heavy, and the cloud is the optimal way to process these large datasets.
The Land Processes Distributed Active Archive Center (LP DAAC) has the Application for Extracting and Exploring Analysis Ready Samples (AppEEARS ), which is extremely useful to interact with satellite data hosted in the cloud. Something important to mention is that all the data in the NASA DAACs are going to transition to the cloud. I have been able to access LP DAAC directly through my computer terminal and run some analysis in Jupyter Notebooks with the examples they provide on their website.
Last but not least is Google Earth Engine (GEE), which has really changed the game in remote sensing analysis. Effectively, GEE has given us a platform to perform large-scale Big Data analytics at our fingertips. It has opened the door and lowered the barriers to accessing and processing Big Data. This is a user-friendly interface that has good documentation with many examples that make things easily accessible and replicable.
In addition to those tools and platforms, I would like to highlight the work we did at SERVIR with the SAR Handbook, as it lowers the entry barrier to work with open-source SAR data and provides resources and scripts to analyze and derive information from SAR images.
Lastly, is there anything you would like to share regarding open science?
I truly believe NASA is at the forefront of open science. It's so strong because there is a lot of attention on making all our data and methods available. We have the products accompanied by theoretical algorithms and datasets that are derived from satellite data.