By Genna Reed, Union of Concerned Scientists
Instead of listening to over a half million comments and abandoning a truly harmful rule, the EPA has forged ahead with its restricted science proposal, according to the New York Times reporting of a supplemental notice that exposes many of the flaws of the original proposal in high definition.
The draft rule, issued by disgraced former Administrator Scott Pruitt in April 2018, stated that unless a scientific study’s underlying raw data and models, including computer code, were made publicly available, it could not be used to inform agency rulemaking, effectively disallowing the agency from considering the best available science.
The new supplemental notice reveals just how wasteful and unscientific the rule actually is. The reason for making the data available, according to the notice, is so that EPA can reanalyze studies using the original data to confirm the findings and show that they are “capable of being substantially reproduced.” By reanalysis, the agency clarified that they mean taking the original data and modeling methods and recalculating the results—in effect to check that the researchers didn’t make any basic math errors.
The phrase in quotes above comes directly from a 2001 OMB guidance meant to help agencies align best practices for weighing the scientific evidence. The broader scientific community tackles reproducibility in many ways, and EPA has already created its own policies on peer review and information quality based on those guidelines. EPA’s new transparency rule would add in an impossible requirement for public access to raw data and model coding, clearly stepping beyond the provisions of current EPA policies and practices and above and beyond the needs of the agency. Suffice it to say that the agency has not identified that the current review process doesn’t catch simple calculation errors or that some epidemic of calculation mistakes is plaguing agency science.
Let’s think about what reanalysis would mean in practice.
No matter how much you love a topic, you wouldn’t be tempted to check every footnote of a nonfiction book written by a well-respected historian, read the original sources, and verify that you would reach the same conclusion if you were to write a similar sentence. You would never do it, not just because you don’t have the time or interest (and all that extra work would make reading insufferable) but because you trust that a well-established historian deployed a painstaking fact-checking and editing process, including fact-checking by editors and other researchers, so that you could just absorb the well-vetted information and enjoy the book.
Now picture EPA attempting to do this same exercise for the underlying variables, raw data, and calculation methods for studies used to inform every one of its policy decisions, including risk assessments on air pollutants, pesticides, Superfund chemicals, and more. If you think fact-checking every book would take the fun out of reading, the restricted science rule would grind EPA science to a halt. This would be astronomically costly in terms of time and resources, assuming it is even possible.
It is also unnecessary. The EPA already has methods for reviewing the quality and strength of scientific studies. And the agency doesn’t need to reanalyze each and every study informing an air quality standard or to make a pesticide regulation decision, because the process of peer-review is a trusted way to ensure that researchers asked the right questions, designed the study according to best practices, and made the appropriate assumptions to reach their conclusions. The goal of reanalysis is to give this administration a chance to rule out some studies and attempt to discredit others by coming to different conclusions using the same data. It is a bureaucratic standard for scientific information, not a scientific one.
Broad application of a reproducibility standard will harm, not help
EPA’s draft rule reveals that the agency wants to use original data and methods to reanalyze rules and prove that they are reproducible before using them to support a rule. But why the focus on reproducibility as the criteria for a study’s credibility? The ability for a study to be reproduced or replicated should not be held as the gold standard for checking the credibility of a study. Instead, EPA scientists should have the flexibility to use scientific criteria to judge the rigor and validity of evidence informing rules. In fact, EPA already does this well, and the rule does not point to any evidence to the contrary.
There are many reasons why a study might not be able to be reproduced or replicated, not the least of which to protect the privacy, trade secrets, intellectual property and other confidentiality concerns associated with the underlying data. Challenges also arise when studying environmental hazards. Observational data must be used for certain studies of air and water pollution and it is often not possible or ethical to recreate the conditions under which people were exposed to a contaminant. Even OMB acknowledges that the reproducibility standard cannot be applied to all science: “OMB urges caution in the treatment of original and supporting data because it may often be impractical or even impermissible or unethical to apply the reproducibility standard to such data. For example, it may not be ethical to repeat a ‘‘negative’’ (ineffective) clinical (therapeutic) experiment and it may not be feasible to replicate the radiation exposures studied after the Chernobyl accident.”
Even if this rule was about showing reproducibility (which it isn’t), this is not something that can be solved by the end users of the data, who can’t address large-scale challenges in the scientific community at large, like the lack of infrastructure and resources needed to ensure privacy protections for sensitive data. The NAS touched on this in a recent report, arguing that funding agencies should invest in the research and development of open-source tools and related trainings for researc so that transparency is fostered at the beginning of the scientific process instead of being used as an opportunity to exclude crucial public health studies that have already been conducted.
Life before reanalysis
This proposal fails to solve any existing problem. A study looking at the 79 requests through the Information Quality Act to EPA asking that the agency correct or reconsider the data supporting its regulatory decisions between 2002 and 2012 found only two asking for the raw data. One possible explanation for this low number is that the public already has access to the science EPA relies on and can fully access methods, summary data, results, and interpretation, as can all peer reviewers. In developing the National Ambient Air Quality Standards (NAAQS), for example, EPA collects all relevant data and studies used to develop the Integrated Science Assessment and the merits of those scientific studies are publicly debated by the Clean Air Scientific Advisory Committee. The requirement for access to raw data is entirely unnecessary and causes significant problems by arbitrarily precluding the use of critical studies.
Alarmingly, we now know the rule would change the EPA’s approach to evaluating scientific evidence. Right now the agency uses best practices that allow for a weight of the evidence approach, and this rule would not just change the process but codify it, making it much harder to reverse. Part of that codification would be to add “data availability” to the list of assessment factors that the EPA will use to determine the quality of a study. These factors have been guiding EPA’s data quality procedures since 2003 and include soundness, applicability and utility, clarity and completeness, uncertainty and variability, and evaluation and review. They help the agency to assess the context surrounding the objectiveness and integrity of scientific information and to determine how to use it in its weight of the evidence approach. The last factor, “evaluation and review,” is where the agency already independently validates the research without using underlying data by asking the following questions:
a) To what extent has there been independent verification or validation of the study method and results? What were the conclusions of these independent efforts, and are they consistent?
b) To what extent has independent peer review been conducted of the study method and results, and how were the conclusions of this review taken into account?
c) Has the procedure, method or model been used in similar, peer reviewed studies? Are the results consistent with other relevant studies?
d) In the case of model-based information, to what extent has independent evaluation and testing of the model code been performed and documented?
These guiding questions make it so that the agency doesn’t have to fully reanalyze every study and instead can focus on the weight of the evidence approach which “considers all relevant information in an integrative assessment that takes into account the kinds of evidence available, the quality and quantity of the evidence, the strengths and limitations associated with each type of evidence and explains how the various types of evidence fit together.” Including data availability in the weighting injects an arbitrary parameter, unrelated to the study quality and reliability.
One option offered in the new notice would be an arrangement where if a study’s underlying data are not made available, the EPA should place less weight on the results “to the point of entirely disregarding them” even if the results were strong, compelling, the work met the highest scientific standards and it had passed through extensive scientific peer review. In other words, EPA is proposing to give the most weight to studies with publicly available data rather than to studies that provide the strongest scientifically supportable evidence.
Because there are many environmental health studies relying on medical data and other private information, the restrictions of this rule would mean a failure to evaluate the best available science. And as if this proposal couldn’t get any worse, these restrictions would not only apply to studies developed now and into the future but also retroactively. It’s as if the political leadership of EPA has decided that any scientific work that came before them is suspect and that only “their” science may be considered.
In addition to being unnecessary, setting a goal of reanalysis for each and every study used by the EPA is completely unattainable. Its implementation would make it impossible for EPA to do its job.
This is an industry-devised plan to bury EPA under mountains of red tape
The only way this draft rule makes sense is when you remember the forces behind it. The idea that government science is secretive and needs to be made more public is a premise devised by tobacco industry lobbyists as far back as the mid-1990s. In a memo to RJ Reynolds Tobacco Company, lobbyist (and Trump administration transition team member) Christopher Horner wrote about the need to create “required review procedures which EPA and other federal agencies must follow in developing extra-judicial documents” and constructing “explicit procedural hurdles the Agency must follow in issuing scientific reports” because “our approach is one of addressing process as opposed to scientific substance.”
This wasn’t just about tobacco. These lobbyists were thinking about EPA’s efforts to address everything from mercury emissions, hazardous waste, and dioxins, to air pollution. They knew that the science showing harm from a range of industrial products and processes would result in stronger protections and the only way to stop it would be to break the process. This policy will indeed break EPA’s process at our expense, and we will continue to fight with the scientific community to stop its advancement.