Seemingly routine data–preprocessing choices can exert outsized influence on the conclusions drawn from randomized controlled trials (RCTs), particularly in behavioral science where data are noisy, skewed and replete with outliers. We demonstrate this influence with two fully specified multiverse analyses on simulated RCT data. Each analysis spans 180 analytical pathways, produced by crossing 36 preprocessing pipelines that vary outlier handling, missing-data imputation and scale transformation, with five common model specifications. In Simulation A, which uses linear regression families, preprocessing decisions explain 76.9% of the total variance in estimated treatment effects, whereas model choice explains only 7.5%. In Simulation B, which replaces the linear models with advanced algorithms (generalized additive models, random forests, gradient boosting), the dominance of preprocessing is even clearer: 99.8% of the variance is attributable to data handling and just 0.1% to model specification. The ranges of mean effects show the same pattern (4.34 vs. 1.43 in Simulation A; 15.30 vs. 0.56 in Simulation B). Particular pipelines—most notably those that standardize or log-transform variables—shrink effect estimates by more than 90% relative to the raw-data baseline, while pipelines that leave the original scale intact can inflate effects by an order of magnitude. Because preprocessing choices can overshadow even large shifts in statistical methodology, we call for meticulous reporting of these steps and for routine sensitivity or multiverse analyses that make their impact transparent. Such practices are essential for improving the robustness and replicability of behavioral-science RCTs.

The Effects of Data Preprocessing Choices on Behavioral RCT Outcomes: A Multiverse Analysis / Veltri, Giuseppe A.. - In: MULTIVARIATE BEHAVIORAL RESEARCH. - ISSN 0027-3171. - 2025:(2025), pp. 1-16. [10.1080/00273171.2025.2575399]

The Effects of Data Preprocessing Choices on Behavioral RCT Outcomes: A Multiverse Analysis

Veltri, Giuseppe A.
2025-01-01

Abstract

Seemingly routine data–preprocessing choices can exert outsized influence on the conclusions drawn from randomized controlled trials (RCTs), particularly in behavioral science where data are noisy, skewed and replete with outliers. We demonstrate this influence with two fully specified multiverse analyses on simulated RCT data. Each analysis spans 180 analytical pathways, produced by crossing 36 preprocessing pipelines that vary outlier handling, missing-data imputation and scale transformation, with five common model specifications. In Simulation A, which uses linear regression families, preprocessing decisions explain 76.9% of the total variance in estimated treatment effects, whereas model choice explains only 7.5%. In Simulation B, which replaces the linear models with advanced algorithms (generalized additive models, random forests, gradient boosting), the dominance of preprocessing is even clearer: 99.8% of the variance is attributable to data handling and just 0.1% to model specification. The ranges of mean effects show the same pattern (4.34 vs. 1.43 in Simulation A; 15.30 vs. 0.56 in Simulation B). Particular pipelines—most notably those that standardize or log-transform variables—shrink effect estimates by more than 90% relative to the raw-data baseline, while pipelines that leave the original scale intact can inflate effects by an order of magnitude. Because preprocessing choices can overshadow even large shifts in statistical methodology, we call for meticulous reporting of these steps and for routine sensitivity or multiverse analyses that make their impact transparent. Such practices are essential for improving the robustness and replicability of behavioral-science RCTs.
2025
Veltri, Giuseppe A.
The Effects of Data Preprocessing Choices on Behavioral RCT Outcomes: A Multiverse Analysis / Veltri, Giuseppe A.. - In: MULTIVARIATE BEHAVIORAL RESEARCH. - ISSN 0027-3171. - 2025:(2025), pp. 1-16. [10.1080/00273171.2025.2575399]
File in questo prodotto:
File Dimensione Formato  
HMBR_A_2575399_O.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 2.01 MB
Formato Adobe PDF
2.01 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/466372
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact