Marco Del Giudice
Abstract
In this paper, I highlight a problem that has become ubiquitous in scientific applications
of machine learning methods, and can lead to seriously distorted inferences about the phenomena
under study. I call it the prediction-explanation fallacy. The fallacy occurs when researchers use
prediction-optimized models for explanatory purposes, without considering the tradeoffs
between explanation and prediction. This is a problem for at least two reasons. First, predictionoptimized models are often deliberately biased and unrealistic in order to prevent overfitting, and
hence fail to accurately explain the phenomenon of interest. In other cases, they have an
exceedingly complex structure that is hard or impossible to interpret, which greatly limits their
explanatory value. Second, different predictive models trained on the same or similar data can be
biased in different ways, so that multiple models may predict equally well but suggest conflicting
explanations of the underlying phenomenon. In this note I introduce the tradeoffs between
prediction and explanation in a non-technical fashion, present some illustrative examples from
neuroscience, and end by discussing some mitigating factors and methods that can be used to
limit or circumvent the problem.
Keine Kommentare:
Kommentar veröffentlichen