When more is less – what makes environmental data useful?

When more is less – what makes environmental data useful?

Improvements in technology and data processing now enable the effects of air pollution to be monitored by individuals. However, as Dawn Nafus discusses this radical singularisation poses challenges for broad-based community action.

From computer vision systems that detect smokestack emissions, to low-cost air quality sensor networks, data, and now AI, have become important resources for both grassroots environmental organisations and governments alike. Some data are hugely effective in democratic deliberation, others less so.

What makes some environmental data more useful than others? One answer has to do with whether it is integrated into systems where people can act on it. Another has to do with who gets meaningful access and control, and who is pushed to the wrong end of the digital divide. As I accidentally learned the hard way, yet another answer has to do with the qualities of the data itself: not whether it is big or small, rigorously cleaned or raw and messy, but something even more peculiar. It turns out that some datasets are structured in ways that make it impossible to ignore long-held assumptions. While that might raise fascinating questions for researchers, it can also make for a hard road ahead.

some datasets are structured in ways that make it impossible to ignore long-held assumptions.

“What makes data useful?” is not a question I set out to ask. I was doing a participatory research project with a group of communities in a large industrial corridor in the United States that suffered from heavy pollution. The local community coalition had successfully advocated for a series of high-grade, real-time air monitors to be placed throughout the corridor. For a while, the data was flowing. It wasn’t everything the communities wanted, but there still was a prodigious amount of it, about once every minute across a dazzling array of pollutants. We wanted to know if this data could be combined with consumer health devices like smart watches and blood oxygen monitors to see the effects of pollution in real time.

Were these new technical tools up to the job?

If they were, that would signal a major milestone for digital accessibility. That’s where my role came in. At the time, my team and I were building data accessibility tools for people who  wanted to know more about what their smart watch could tell them beyond “take more steps,” but did not want to learn programming skills to do it. If the communities did not need to turn to a professional data scientist to get their experiment done, we were onto something.

So we started experimenting

As I report in my new article, things went well until they took an unexpected turn. In a small pilot, community members loaded themselves up with health gadgets for two months, and there appeared to be correlations between some of the health data and some pollutants. While correlation is not causation, it did suggest more investigation was warranted. The second, larger pilot got trickier. No matter how we sliced and diced the data, or how many experts we consulted, nothing seemed to hold. Infuriatingly, the two air monitors, about two miles apart, seemed to tell entirely different stories about what was in the air. A pollutant would be present at one, but not the other, but then sometimes at both, but not always following the same patterns of ups and downs. With an inconsistent story about the air, it was harder to see any relationship with people’s bodies.

If this were more than a test of the technology, we could look to all sorts of reasons that could explain why our experiment went awry. There could have been data quality or sample size issues, or we might have needed to collect participants’ location, wind and topography data to account for the variation across the two monitors. It might even be the case that there is nothing to see at all in this way, although locals’ experiences of their own bodies suggest otherwise.

Yet the closer I looked, the more peculiar it became. Say we did consult a specialist who could build us a model to incorporate wind and topography, and even an air chemist who could incorporate how pollutants react with one another while airborne, and how they fall to the ground at different rates. If we were to look that closely, it would not look like what we typically see on maps of air quality, where an airshed is shown to have higher or lower levels of pollution. It would have multiple, interacting chemicals moving through the air, some quickly and others lingering.

All I could see was relentless atypicality at scale, with each body suffering just that little bit differently, like a toxic homage to Tolstoy.

When we add into the mix that fact that people are hardly stationary, and the air is unlikely to be the same thing from one pocket to the next, the very notion of a shared airshed began to unravel before my eyes. All I could see was relentless atypicality at scale, with each body suffering just that little bit differently, like a toxic homage to Tolstoy.

This was no coincidence. The data was structured to zoom us in to specific times and specific places with a level of specificity that we don’t typically see. That made it just good enough to make a fundamental question unavoidable: is there such thing as a shared exposure? At one level, it’s an absurd question. When we zoom out again, it is plain as day that we do have shared exposure. Wildfire smoke makes its way from one continent to another, and microplastics make their way onto the remotest mountaintops. Patterns of environmental injustice are real. At another level, it is not absurd at all. These are the kinds of questions that precision medicine and rare disease communities ask all the time to great effect.

There might (or might not be) something empirically worthwhile to see in such radical singularisation, where each combination of pollutants, at each precise moment in time and place, interacts with a body in particular ways. For communities seeking to establish the harms of environmental injustice, it is a harder road to go down because it doesn’t fit with the expectation of a single cause and a single effect, which is what regulations are currently built around.

Does this mean that granular, real-time data is useless for environmental advocacy? Of course not. There is only more work to do. In fact, a key area of undone science is applying machine learning techniques to air quality datasets. With interest flooding in to “AI for good,” that science might just get done. What it does mean, though, is that the scale and shape of the data needs to match the expectations of what you are trying to achieve. Expectations change with time, and if anything, the experiment shows the need for a broader cultural repertoire of ways to understand the air—a repertoire that goes beyond simple stories of elevated levels or machine classifications. More isn’t necessarily better, but it is different.


This post draws on the author’s article, Unclearing the air: Data’s unexpected limitations for environmental advocacy, published in Social Studies of Science.

The content generated on this blog is for information purposes only. This Article gives the views and opinions of the authors and does not reflect the views and opinions of the Impact of Social Science blog (the blog), nor of the London School of Economics and Political Science. Please review our comments policy if you have any concerns on posting a comment below.

Image Credit: Marek Piwnicki via Unsplash.

Print Friendly, PDF & Email