State-Variable and Representativeness Errors Conceal “ Clean Diesel ” Harm : Methodologically Fallacious ACES Research

In 2015 authors of four joint US-government and auto-and-oil-industry studies, ACES, claimed to have done the first comprehensive evaluation of lifetime exposure to new-technology-diesel exhaust (NTDE-2007), so-called “clean diesel” required by US emissions standards for year-2007 and later heavy-duty trucks. ACES claimed to have found no evidence that NTDE-2007 causes lung cancer. However, since at least 2012, the World Health Organization (WHO), International Agency for Research on Cancer, American Public Health Association, and many other scientists say any diesel exhaust, especially diesel particulate matter, causes lung-cancer, cardiovascular, and neurological problems. Who is right about diesel exhaust, ACES or WHO? This question is important both because the US and other governments cite ACES research in their diesel-exhaust standard-setting, and because the auto and oil industries use ACES conclusions to claim new diesel exhaust is virtually harmless. This article (1) begins the task of assessing the ACES-versus-WHO scientific debate. It (2) argues that the ACES research is fatally flawed because it neither studies what it claims nor does so in an unbiased way. Instead the article (3) shows that ACES research (3.1) relies on state-variable biases (in focusing mainly on NO2 and mass, not also on DPM and particle size/ number), and (3.2) exhibits representativeness errors (in using only the healthiest animals, too-small sample sizes, and non-lifetime exposures). Despite some ACES strengths, the article (4) concludes that because ACES fails to fully assess the worst NTDE-2007 harm and typical exposures to typical subjects, therefore it draws no valid conclusions about NTDE-2007 harm.


Introduction
Burning fossil fuels has driven much of the economic progress and military dominance of the past few centuries. Without oil and coal, the Industrial Revolution and its massive increase in incomes, manufacturing outputs, and standards of living probably would not have occurred. And as Hitler learned in World War II, Germany's small oil reserves were a factor in "its military defeat" (Becker, 1981).
Yet oil-and coal-created prosperity has been bought at a price, one that is especially high for diesel fuels. Apart from the many carcinogens such as benzene and formaldehyde in typical diesel exhaust, its particulate-matter (DPM) emissions are deadly; they are carcinogenic, have no safe dose, and thus exhibit a no-threshold, linear concentration-response relationship (Pope and Dockery, 2006;Dominici et al., 2003;Laden, 2006). In the US alone, tens of millions of diesel engines, mostly heavy-duty trucks, emit pollutants that cause 21,000 avoidable, premature deaths annually; the cancer risks from diesel vehicles are 7 times greater than the combined risk of all 187 other air toxics that the US Environmental Protection Agency (US-EPA) regulates (CATF, 2005a; see EPA, 2014; SCAQMD, 2008). In the UK, DPM alone causes 29,000 preventable premature deaths each year (COMEAP et al., 2010). Diesel-related threats are even worse in developing nations.
In 2015 the diesel debate came to a head when authors of four joint US-government and auto-and-oil-industry studies, ACES, said that apart from older, dirtier diesel engines, the latest-technology diesel was virtually harmless. ACES authors claimed to have done the first comprehensive evaluation of lifetime exposure to new-technolo-gy-diesel exhaust (NTDE-2007), so-called "clean diesel" based on US requirements for 2007-and-later-models of heavy-duty diesel trucks. ACES claimed to have found no evidence that NTDE-2007 causes lung cancer ( Greenbaum et al., 2015). However, groups such as the World Health Organization (WHO), International Agency for Research on Cancer (e.g., IARC, 2012a; IARC 2012b), American Public Health Association (APHA), and government agencies say NTDE-2007 merely reduces but does not eliminate diesel harm, especially harm from DPM. Because diesel exhaust has no safe dose, they say it still causes avoidable lung-cancer, cardiovascular, and neurological problems (APHA, 2014).

Importance of the ACES-versus-WHO/ IARC/APHA Debate
Who is right, the 2005-2015 ACES researchers or WHO-IARC-APHA? This question is important for at least four reasons. One reason is that the leading physicians and scientists agree that diesel exhaust is a major public-health problem (APHA, 2014). A second reason for the importance of the debate is that both top medical and scientific research associations say politics has interfered with diesel-related medical science. For well over two decades, they say the freight and oil industries repeatedly have used the courts to try to block clean-air, diesel, and particulate matter (PM) standards and health studies; yet they say these industries have argued, at the same time, that these very studies (that they have been blocking) are needed prior to any additional diesel regulation (Monforton, 2006; see Crump and Landingham, 2012). "From early days" of diesel research, says a prominent scientific-journal editor, DPM studies have "been subject to a series of legal actions initiated by industry bodies…which has delayed the publication of these [DPM] papers" (Ogden, 2010, p. 727).
A third reason for the importance of the ACES-versus-WHO/IARC debate is regulation. Because the US government cites the ACES research in its rulemaking about diesel-exhaust standards, it is important to determine whether these alleged grounds for not strengthening diesel regulations are scientifically defensible (EPA, 2012). After all, industry groups claim that because NTDE-2007 harm is unknown, thus controversial, the controversy should be resolved before imposing any new diesel regulations (Carter, 2014). Yet leading government and university scientists say diesel harm is well known and that industry is merely trying to delay regulations by claiming the harm is controversial (Michaels, 2008).
A fourth reason for the importance of the diesel debate is its scientific implications for clean-energy research. On one hand, government groups, physicians, environmentalists, and medical scientists say "clean diesel" is an oxymoron, as diesel has no safe dose (Monforton, 2006). On the other hand, oil and auto industries say the ACES studies show "clean diesel" is virtually harmless and should not be confused with the dirtier "old diesel" studied by IARC/WHO. Indeed, diesel-industry spokespeople suggest that "the new diesel engines are now so clean that the findings from this [WHO/IARC] monograph [that condemns diesel as carcinogenic]…are no longer relevant to today's situation" (Carter, 2014). Who is right about NTDE-2007?
One way to begin to access the important ACES-versus-WHO/IARC debate is to ask whether, in challenging scientific consensus about diesel-exhaust harm, ACES has actual-ly studied what it claims--and done so accurately. This article shows that the 2005-2015 ACES research is fatally flawed, both because it does not do what it claims to do, and because it exhibits several well-known scientific biases. Instead of including full assessment of the most harmful components of NTDE-2007, ACES researchers make at least two state-variable errors--in focusing mainly on NO 2 and mass, not also on particle size/number, the main determinants of DPM harm. Likewise they exhibit two major representativeness biases in using only the healthiest animals over a short term, and in using too-small sample sizes. As a result, they fail to consider typical, genuinely representative exposures. Consider each of these four problems in order. Yet ACES studies did not assess the total diesel-exhaust risk, especially DPM, because they did not fully and correctly determine DPM exposures. Instead ACES researchers exposed rats to one of three target dilutions of nitrogen dioxide, NO 2 , or to filtered air as a control. As ACES editors admit, "Exposure levels were set based on NO 2 rather than PM…because… calibrating exposures based on PM would have been problematic" (Greenbaum et al., 2015, p.2), that is, too difficult for the ACES researchers to do, although many scientists have done such measurements.

The NO 2 State-Variable Error
The ACES authors and editors admitted that classic studies of diesel exhaust are based on direct DPM measures, the most hazardous component of diesel exhaust (Greenbaum et al., 2015, p. 2). Because the ACES researchers assessed effects of NO 2 rather than full and correct DPM risks--when DPM is responsible for 78 percent of the total diesel-vehicle cancer risk, they may have addressed no more than 22 percent of the relevant diesel risk. This means that their claims, to have done a groundbreaking, "comprehensive" study of diesel exhaust, are misplaced ( 2013). Of course, DPM is blown by the wind, and not all of it would be released to every part of the US. Hence not everyone would be exposed to all US DPM, although it travels for miles because the PM is so small. Yet in industrial areas of hundreds of US towns, every day people are exposed to DPM from 10 or more diesel trucks. Even if all 10 trucks were NTDE-2007, people easily are exposed to substantial DPM. Besides, as already mentioned, although NTDE- To understand the ultrafine/fine PM threat from NTDE-2007, recall that according to scientific consensus, PM--an air-suspended mixture of solid or liquid particles has no safe dose and exhibits a linear concentration-response relationship ( 2013), although the precise harm depends on the number, size, shape, surface area, chemical composition, solubility, and origin of the PM (Pope and Dockery, 2006). According to size, PM is classified into three main categories, coarse, fine, and ultrafine. PM of 2.5 to 10 mm (PM 10 ) is inhalable coarse particles. PM of 2.5 to 0.1 mm (PM 2.5 ) is inhalable fine particles, and PM of 0.1mM or less (PM 0.1 ) is inhalable ultrafine particles.
Ultrafine is the most dangerous of all types of PM because it can easily pass into the nose, through the blood-brain barrier, and directly into the brain, where it causes disease and brain dysfunction (Oberdörster et al., 2004;Cassee et al., 2013). Ultrafine PM also is much more potent than fine and coarse PM, in inducing oxidative stress, reactive oxidative species, and inflammation (Li et  Because most DPM is ultrafine, it has four ultrafine characteristics that make it especially deadly. These include having small size; a large surface area, and thus worse inflammatory properties; being a Trojan-Horse pollutant; and having ability to travel great distances. Its small size enables DPM to enter either the nose and then the brain, or the lungs, bloodstream, and all bodily organs, where it can cause chronic inflammation and organ degeneration (CATF, 2005b; Peters et al., 2006;Terzano et al., 2010). Its small size also means it has relatively larger surface areas. For the same mass, smaller ultrafine or fine particles like DPM are far greater in number and have much greater surface areas than do coarse particles. As a result, DPM has much greater opportunity to interact with cell surfaces and cause inflammatory damage (EPA, 2013).
A third ultrafine and DPM characteristic, being a Trojan Horse pollutant, means that the DPM attracts other diesel-exhaust carcinogens, toxins, and metals such as arsenic, cadmium, formaldehyde, polyaromatic hydrocarbons or PAHs, and zinc. They adhere to the ultrafine PM, form fine PM, enter the brain or lungs and can travel to all bodily organs, where they can cause chronic inflammation leading to diseases such Alzheimer's, autism, birth defects, cancer, Parkinson's, and even death

The Mass State-Variable Error
ACES researchers also erroneously minimize NTDE-2007 harm because they use another flawed state variable--mass --as an indicator of diesel-exhaust exposure and hazard. Yet recall (from the preceding section) that for the same total mass, smaller ultrafine or fine particles like DPM have much larger numbers and surface areas, therefore pose much greater health harm than larger particles. Recall also that the larger DPM surface areas mean they have much greater opportunity to interact with cell surfaces and cause inflammatory and other damage that is much worse than larger particles having the same total mass (EPA, 2013). For instance, scientists know that, per unit of mass, ultrafine PM can be about 65 times more hazardous than coarse or fine PM (e.g., Sager and Castranova, 2009).
How did ACES researchers erroneously attempt to use mass, as a state variable, to supposedly assess DPM harm? Consider two examples of ACES errors in this regard. One instance concerns ACES attempts to measure DPM mass concentrations at the inlet and middle of the animal diesel-exhaust-exposure chambers. They claimed the different DPM-mass concentrations at these two spots would distinguish DPM from PM from the animals themselves; based on the mass differences, they claimed that the "major portion" of PM mass and hazard was from the animals themselves, not DPM (McDonald et al., 2015, p. 21). Yet for reasons already given in the previous section, one cannot distinguish DPM from animal PM, based on mass, as the ACES researchers attempt to do; instead one also must use particle number and surface area (EPA, 2013; Sager and Castranova, 2009), given that most DPM is ultrafine and fine PM, therefore much more hazardous than animal PM, which is mostly coarse PM.
For ACES researchers to use PM-mass differences as a way to distinguish DPM from experimental-animal PM such as feces or manure is erroneous and incomplete for at least three reasons. One reason is that metals tend to be toxic and carcinogenic, and DPM is mostly metals, whereas animal PM is not.
Another reason is that animal PM is typically directly emitted, coarse PM, whereas DPM is not the less hazardous, coarse PM, but the more hazardous ultrafine and fine PM, as just mentioned. Moreover, animal PM does not appear to become smaller or more hazardous because during decomposition, particle size of animal PM remains the same, typically coarse, and thus less hazardous. Finally, because animal PM is not a Trojan-horse pollutant, as DPM is, it does not carry PM hazards such as formaldehyde and PAHs (Copeland, 2014;Hansen et al., 1976).
In other words, largely because the ACES researchers made many false factual assumptions already outlined... such as that animal and diesel PM can be distinguished from each other on the basis of mass, or that particle number and surface area are not necessary to separate DPM from animal PM levels--they erroneously underestimated DPM exposure and harm. They invalidly trimmed the data on DPM harm, just as they did when they invalidly assumed they could measure DPM by measuring mainly NO 2 levels. In using the state variable of mass to measure DPM levels and harm, ACES researchers made at least two scientific errors, against which a US National Academy of Sciences committee warned. They falsely assumed that urban or DPM air pollution is not different from rural or animal-waste PM. Their other error is ignoring the fact that less hazardous, coarse PM-not the more hazardous fine and ultrafine PM of DPM--is what is "often encountered" in animal wastes (US-NRC, 2003).
Another way, in which ACES researchers erroneously used the state variable of mass to assess DPM exposure levels and harm, occurred when they evaluated the most hazardous NTDE-2007 pollutants by mass. As result, they invalidly assumed that mass indicates degree of hazard, something that is obviously false for nanomaterials and for fine and ultrafine PM, as already argued. After making this false assumption, ACES authors erroneously inferred that because mass-based particle concentrations were low, DPM harm was low. ACES authors likewise assumed that because their "calculated" ratio of mass: NO 2 was much lower, by a factor of 30, than in earlier studies, therefore they could conclude that DPM was mostly removed (Bemis et al., 2015, p. 150). Yet as already argued, the PM of NTDE-2007 typically has less mass but far greater numbers and surface areas of particles and therefore up to 65 times the typical DPM hazard. In other words, because the ACES researchers used an invalid state variable, mass, they erroneously concluded that DPM was mostly removed. They ignored the fact that their results are consistent with NTDE-2007 particles being smaller is size, greater in surface area, and therefore far more hazardous than traditional DPM. Thus, as already noted, although government says NTDE-2007 has 10 percent less PM by mass than traditional PM, NTDE-2007 does not remove 90 percent of DPM hazards because the much smaller PM of NTDE-2007 is far more hazardous, once one considers particle size and surface area (EPA, 2013; Sager and Castranova, 2009). ACES researchers, however, ignore this scientific consensus about relevant state variables for DPM. Instead, they erroneously claim that "the steep drop in particle mass…significantly decreased the overall toxicity of NTDE-2007 compared with the toxicity" of traditional diesel exhaust (Bemis et al., 2015, pp. 154-5).
Interestingly, reviewers of the ACES research also noticed these ACES state-variable problems namely, ACES assuming that NTDE-2007 toxicity is a function of NO 2 rather than DPM levels, and assuming that NTDE-2007 toxicity is reduced because of reduced DPM mass in NTDE-2007, as compared to DPM in traditional diesel. The reviewers warned that "although engine-generated PM mass was greatly reduced [in NTDE-2007], substantial numbers of particles…in the [far more hazardous] ultrafine range…were detected. These levels are in the range of (or somewhat higher than) those found on or near major roads in urban areas and in environments in which diesel-powered traffic dominates…[Therefore] it is possible that components of NTDE-2007 other than NO 2 may have contributed to the effects reported" (Bemis et al., 2015, p. 156). However, ACES researchers ignored these reviewer comments, instead continued to use flawed state variables of NO 2 and mass, and thus drew the invalid conclusion that "observed" NTDE-2007 harm is minimal or nonexistent. ACES authors minimized NTDE-2007 harm because they ignored the fact that NTDE-2007 filters produce far greater numbers of far more hazardous ultrafine particles. Thus the ACES researchers do not fully assess the most relevant and largest contributors to NTDE-2007 harm: DPM number and surface area rather than merely NO 2 and particle mass.

The Life-Span Representativeness Error
ACES researchers likewise underestimate and minimize diesel-exhaust harm in a third main way: They use test subjects whose NTDE-2007 exposures trim the magnitude of actual DPM doses. That is, although the ACES researchers claim to have done "lifetime cancer and non-cancer assessment" in rats exposed to NTDE-2007, the exposures were not lifetime but partial-lifetime. As a result, although the title of the 2015 ACES report itself claims it does "lifetime assessment" of NTDE-2007 exposures, it does not.
How did ACES authors "trim the data" on supposed "lifetime" exposures to NTDE-2007? The authors say they received their experimental rats when they were 6 weeks of age, then quarantined them for at least another 2 weeks (McDonald et al., 2015, p. 11). This means that all ACES rats were 2 months of age or older. Yet researchers agree that when using rat studies to calculate effects on humans, each rat month of age is equivalent to 3 years of human age (Sengupta, 2013). This means that the ACES studies were equivalent to human studies whose subjects were already 6 years of age and older--far beyond the period of greatest vulnerability to pollutants.
Moreover, studies of humans 6 years of age and older are not "lifetime" exposure studies, contrary to what the ACES researchers repeatedly claim. Indeed, because the ACES researchers failed to use subjects, equivalent to those 6 years of age and younger, for at least two reasons they failed to test the most sensitive members of the population.
First, human subjects under 6 years of age can be 40 to 50 times more sensitive than adults when both are subjected to the same levels of pollutants (Makhijani, 2006).This is why scientists can predict rates of autism and IQ losses, based on young children's exposures to diesel exhaust, especially PM (e.g.  Genc and Zadeoglulari, 2012). Thus by ignoring young subjects, ACES researchers falsely report less harm from diesel exhaust than actually occurs.
Second, given the latest understanding of epigenetic effects, scientists now believe that because very young humans under age 6 are so plastic, their early environmental-pollution exposures typically "program" them for various diseases later in life (e.g., Grandjean, 2013;Rassoulzadegan et al., 2006;Tollefsbol, 2014). Epigenetics research thus indicates that if subjects under age 6 receive fewer environmentally-harmful exposures, they will be far less likely to have any sort of disease in later life. By pre-selecting as their experimental subjects, those who have not had this typical, below-age-6, exposure, ACES researchers have biased their studies against finding any diesel harm from NTDE-2007 and trimmed the data on diesel harm. Contrary to their own explicit claims, ACES authors clearly have not considered lifetime NTDE-2007 exposures, but only NTDE-2007 exposures during the least-sensitive portion of life.

The Small-Sample Representativeness Error
In a fourth way, ACES researchers have not studied what they claim to have studied, and therefore draw invalid conclusions that deny NTDE-2007 harm. Not only did they did not study lifetime human exposure to NTDE-2007, as they falsely claimed, but they did not study representative samples of subjects. Instead, they used very small samples of rats, theoretically 140 male rats and 140 female rats at each of four exposure levels, for a total of 280 rats maximum at a single exposure level (Greenbaum et al., 2015,p. 2). Likely as a result, they drew false-negative conclusions about NTDE-2007 harm. Scientists agree that any sample size below several thousand is typically too low to detect even very large harmful effects. As a result, they typically use sample sizes at least in the thousands (e.g., Ein-Dor et al., 2006). Thus the ACES research used sample sizes that were at least 8-10 times too small to detect most significant effects. Standard error is larger with smaller samples, partly because the variation in a smaller sample is less than the variation in a larger sample.
For instance, recall that each 10 ug/m 3 increase in NO 2 causes those exposed to have a 4 percent increase in premature lung cancer (Hamra et al., 2015). That is, each 10 ug/m 3 increase causes 4 in every 100 people, or 40 in every 1000 people, who are exposed to have premature cancer, when they each otherwise would not have had it. But this, in turn, suggests that each 1 ug/ m 3 increase in NO 2 might cause 1 in every 1000 exposed people to contract premature cancer. But because of genetic and inter individual variability, to adequately test whether some exposure causes 1 in every 1000 people to have premature cancer, obviously one would need a sample size much larger than 1000. Hence it is puzzling that the ACES researchers did not use samples of thousands of rats, at each exposure level, given that rat generations are quite short, that lifetime effects on rats are easy to test, and that animal testing is relatively inexpensive, compared to human testing.
Moreover, for several reasons, the ACES false-negative bias is even worse than is apparent. This is because ACES sample sizes were really much smaller than the authors claim. For one thing, because the ACES scientists sacrificed 10 animals at the end of each of four time periods (1, 3, 12, 24 months), at each of 4 exposure levels, the number of rats tested at each exposure level theoretically could not be 140 females and 140 males, but 100 males and 100 females. Yet because many rats died during the studies, ACES (159) researchers admitted "some groups in the 12 and 24-month exposures had between 3 and 5 animals." But if so, the ACES sample sizes at least for some exposure lev-6 els actually were much smaller than 100 females and 100 males. Thus, both the initial ACES sample size and the final sample sizes were too small, by at least 800 to 1000 percent to detect most harm, even if the studies had been designed correctly with respect to state variables, sampling biopoints, and so on. Given the too-small, non representative samples, ACES studies exhibit a false-negative bias that makes it impossible to draw conclusions about NTDE-2007 harm.
An additional representativeness error in the ACES research may be that most US experimenters use Sprague-Dawley and not Wistar-Han strains of rats, as ACES did. Most researchers seem to view the ACES Wister-Han rats as experimentally unacceptable, in part because Wistar-Han rats areless susceptible to cancer and naturally have longer lifetimes (e.g., Hayakawa et al., 2013;Zmarowski et al., 2012;Kacew and Festing, 1996). Indeed, even the ACES researchers noted that the Wistar-Han rat is less susceptible to cancer, has a "relatively low incidence of background lung tumors" (McDonald et al., 2015); even the ACES authors say their Wistar-Han rats are "less sensitive to chemically induced neoplastic and non-neoplastic outcomes," compared with F344 and other rats such as Sprague-Dawley. But if so, how can the ACES authors justify their conclusions that NTDE-2007 is safe, if they used less sensitive experimental animals. Again, the flawed ACES methods appear to lead to false-negative conclusions, false conclusions that NTDE-2007 does not cause cancer. ACES did not do representative testing, thus underestimated NTDE-2007 harm.

Conclusions
ACES authors fail to rationally justify their conclusions that NTDE-2007 does not cause cancer because (A) they did not correctly and completely study the main components of the pollutant that they claimed to have studied, diesel exhaust; (B) they did not use correct methods, likely able to detect most of the harmful NTDE-2007 effects, and (C) they did not study representative exposure subjects during representative, lifetime exposure periods. The ACES authors err regarding (A) because they attempted to evaluate levels of DPM exposures, the most hazardous part of NTDE-2007, by erroneously studying different NO 2 levels, instead of measuring DPM levels themselves. They also erroneously studied only total PM mass, instead of also assessing PM number and surface area. As a result, they studied incomplete, therefore erroneous NTDE-2007 state variables. The ACES authors err regarding (B) because they attempted to evaluate DPM hazards by erroneously using only low-powered, small-sample studies. As a result, they used statistical methods that were 8-10 times too small to detect most of the NTDE-2007 harmful effects. The ACES authors likewise err regarding (C) because they studied a less-sensitive type of experimental rat during the least-sensitive periods of the rats' lives, rather than representative, lifetime exposures, as they claimed. As a result, at best the ACES authors' conclusions hold only for less-sensitive types of rats, only for shorter time periods, and only for NO 2 exposures and not DPM, the most hazardous component of NTDE-2007. Together, all these biases and errors of the ACES authors studying the wrong or incomplete pollutants, exposures, experimental subjects, sample sizes, and time frames mean that in all these ways, the ACES results exhibit strong false-negative biases. In other words, even before hearing the supposed ACES conclusions, the ACES errors mean that they were certain to underestimate harm caused by NTDE-2007. Moreover, the errors that the ACES authors made are not sophisticated ones. They are textbook examples of how to bias science, how to purportedly show that a harmful pollutant is not harmful. Both ethics and science demand better.