Eklund A, Nichols TE, Knutsson H.
Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates.
Proc Natl Acad Sci U S A. 2016 Jul 12;113(28):7900-5. Epub 2016 Jun 28
PubMed.
This is an important and positive study for the field of neuroimaging. Though there are some flaws in the analyses, as pointed out by the SPM folks, the important points that this study brings forward need to be examined closely.
What does the study say?
There are two components of any image acquired—the noise and the signal (or the findings). The noise in the fMRI images is high and is due to several factors, such as physiological fluctuations, scanner noise, measurement error. In any neuroimaging experiment, the detection of the signal (i.e., changes to the brain due to the disease process) from all the noise is defined as “findings” of the study. The study by Eklund et al. suggests that most of the studies may not have used appropriately rigorous methods for filtering out the noise. However, it is also important to note that the authors re-corrected their original statement and said only about 1/10 of all study results may have been conducted with faulty corrections, but not all 40,000 as estimated in the paper. Some methods available are indeed lenient in filtering out noise, which may have led to faulty corrections.
What does it mean for the neuroimaging field?
This paper is going to force the neuroimaging community to use rigorous methods for reporting results. The quality of peer-reviewed neuroimaging publications will improve significantly, because reviewers are aware of these obvious flaws in methodologies. An important point to note is that over the years both the fMRI acquisition and processing methods have improved immensely and are aimed at minimizing the noise in fMRI signal. This will further propel interest in development of better fMRI acquisition, processing, and analysis methods.
What does this mean for AD fMRI studies?
In the two subject groups we study—AD and MCI—AD studies are less likely to be affected by this controversy because the fMRI signal decreases due to significant neurodegeneration in AD compared to normal is significantly higher than the noise, and noise is less likely to be detected as “findings.” However, in MCI subjects, fMRI studies have found both increases in signal (i.e., compensation) as well as decreases in signal compared to controls which may have been partly due to the differences in the lenient analysis methods.
A number of aspects of this recent paper have been somewhat sensationalized in various different ways. The first of these is the estimated number of papers that are truly affected, and this is something that the original authors wished to change, and they have now been able to submit an erratum.
A better estimate of the number of papers potentially affected is contained in a blog post by one of the authors, Tom Nichols. From this you can see that the original number of 40,000 is a vast overestimate and is no longer supported by the original authors.
Another issue that has not been clearly reported is that strong results (p-values a lot less than 0.05) would remain unaffected by this issue. Furthermore, any result that had been independently replicated on separate data can still be trusted. Such replication is crucial, not only because of the statistical issue raised in Eklund et al., but also because of other known issues, such as reporting bias of results just under p=0.05 versus ones just over 0.05, and the lack of corrections applied to analyses that are repeated with different parameter settings. Thankfully a great many studies and results over the last 15 years have in fact been replicated, and so the impact on the field is more minimal than has been reported and talked about.
Somewhat more problematically, it seems that there is still some section of the neuroimaging community publishing without applying any form of multiple comparison correction at all. Results from the Eklund et al. paper clearly show how bad this can be with respect to false positives, and the re-estimation of the number of papers where this happens is more worrying (about 13,000 from the above blog). Failure to apply any multiple comparison correction is well known to be problematic, and all of the major software packages insist on users applying multiple comparison correction to obtain valid statistics. It is possible that the majority of these papers are older, from the early days when reviewers may have been less well informed, but it still seems that this is, unfortunately, happening now as well. This is something that hopefully the Eklund et al. paper can help to eradicate.
As for software, we have always been aware that the corrections based on random-field theory are approximate, but this helps to put into context the degree of that approximation using modern, null data for testing. Software packages have all been tested in various ways in the past, but often with simulated/synthetic data in order to measure false-positive rates, with real data used to test sensitivity. This testing paradigm, of using large data sets of resting-state fMRI, will help all of us in software development to form even better tests for our software. In addition, it shows the benefits of permutation-based analysis, which is available in the major software packages such as FSL. Such permutation tests have been the default, or only, option in FSL for the analyses of other imaging modalities such as structural and diffusion data, and so we are planning to also make permutation tests the default for group-level analyses of task fMRI in our upcoming release.
Comments
Mayo Clinic and Foundation
This is an important and positive study for the field of neuroimaging. Though there are some flaws in the analyses, as pointed out by the SPM folks, the important points that this study brings forward need to be examined closely.
What does the study say?
There are two components of any image acquired—the noise and the signal (or the findings). The noise in the fMRI images is high and is due to several factors, such as physiological fluctuations, scanner noise, measurement error. In any neuroimaging experiment, the detection of the signal (i.e., changes to the brain due to the disease process) from all the noise is defined as “findings” of the study. The study by Eklund et al. suggests that most of the studies may not have used appropriately rigorous methods for filtering out the noise. However, it is also important to note that the authors re-corrected their original statement and said only about 1/10 of all study results may have been conducted with faulty corrections, but not all 40,000 as estimated in the paper. Some methods available are indeed lenient in filtering out noise, which may have led to faulty corrections.
What does it mean for the neuroimaging field?
This paper is going to force the neuroimaging community to use rigorous methods for reporting results. The quality of peer-reviewed neuroimaging publications will improve significantly, because reviewers are aware of these obvious flaws in methodologies. An important point to note is that over the years both the fMRI acquisition and processing methods have improved immensely and are aimed at minimizing the noise in fMRI signal. This will further propel interest in development of better fMRI acquisition, processing, and analysis methods.
What does this mean for AD fMRI studies?
View all comments by Prashanthi VemuriIn the two subject groups we study—AD and MCI—AD studies are less likely to be affected by this controversy because the fMRI signal decreases due to significant neurodegeneration in AD compared to normal is significantly higher than the noise, and noise is less likely to be detected as “findings.” However, in MCI subjects, fMRI studies have found both increases in signal (i.e., compensation) as well as decreases in signal compared to controls which may have been partly due to the differences in the lenient analysis methods.
University of Oxford
A number of aspects of this recent paper have been somewhat sensationalized in various different ways. The first of these is the estimated number of papers that are truly affected, and this is something that the original authors wished to change, and they have now been able to submit an erratum.
A better estimate of the number of papers potentially affected is contained in a blog post by one of the authors, Tom Nichols. From this you can see that the original number of 40,000 is a vast overestimate and is no longer supported by the original authors.
Another issue that has not been clearly reported is that strong results (p-values a lot less than 0.05) would remain unaffected by this issue. Furthermore, any result that had been independently replicated on separate data can still be trusted. Such replication is crucial, not only because of the statistical issue raised in Eklund et al., but also because of other known issues, such as reporting bias of results just under p=0.05 versus ones just over 0.05, and the lack of corrections applied to analyses that are repeated with different parameter settings. Thankfully a great many studies and results over the last 15 years have in fact been replicated, and so the impact on the field is more minimal than has been reported and talked about.
Somewhat more problematically, it seems that there is still some section of the neuroimaging community publishing without applying any form of multiple comparison correction at all. Results from the Eklund et al. paper clearly show how bad this can be with respect to false positives, and the re-estimation of the number of papers where this happens is more worrying (about 13,000 from the above blog). Failure to apply any multiple comparison correction is well known to be problematic, and all of the major software packages insist on users applying multiple comparison correction to obtain valid statistics. It is possible that the majority of these papers are older, from the early days when reviewers may have been less well informed, but it still seems that this is, unfortunately, happening now as well. This is something that hopefully the Eklund et al. paper can help to eradicate.
As for software, we have always been aware that the corrections based on random-field theory are approximate, but this helps to put into context the degree of that approximation using modern, null data for testing. Software packages have all been tested in various ways in the past, but often with simulated/synthetic data in order to measure false-positive rates, with real data used to test sensitivity. This testing paradigm, of using large data sets of resting-state fMRI, will help all of us in software development to form even better tests for our software. In addition, it shows the benefits of permutation-based analysis, which is available in the major software packages such as FSL. Such permutation tests have been the default, or only, option in FSL for the analyses of other imaging modalities such as structural and diffusion data, and so we are planning to also make permutation tests the default for group-level analyses of task fMRI in our upcoming release.
View all comments by Mark JenkinsonMake a Comment
To make a comment you must login or register.