Researchers are often reminded that replications are a cornerstone of empirical science (e.g., Koole & Lakens, 2012). However, we don’t need to regard every replication as equally valuable. Although most researchers will agree that a journal editor who rejects a manuscript reporting 20 high-powered direct replications of the Stroop-effect (Stroop, 1935) is making the right decision, they also know that some replications are worthy of being performed and published. Cumulative scientific knowledge requires a balance between original research and close replications of important findings.
The question when a close replication of an empirical finding is of sufficient value to the scientific community to justify being performed and published is an important question for any science that operates within financial and time constraints. Some years ago, I started a project on The Replication Value. The goal of the replication value was to create a quantitative and objective index to determine the value and importance of a close replication. The Replication Value can guide decisions of what to replicate directly, and can serve as a tool both for researchers to assess whether time and resources should be spent on replicating a finding, and for journal editors to help determine whether close replications should be considered for publication.
Developing a formula that can quantify the value of a replication is an interesting challenge. I realized I needed more knowledge of statistics before I could contribute, and, even though we were working with a pretty large team, I think it’s even better if even more people contribute suggestions.
Now, Courtney Soderberg and Charlie Ebersole have taken over the coordination of this project, and from now, anyone who feels like contributing to this important question can generate candidate formulas. Read more about how to contribute here. Want to demonstrate how the replication value can only be computed using Bayesian statistics? Convinced we need to rely on estimation instead? Show us what’s the best way to quantify the value of replications, and earn authorship to what will no doubt be a nice paper in the end.
I’m not going to give away my approach completely – I don’t want to limit the creativity of others – but I want to give some pointers to get people started.
I think at least two components determine the Replication Value of empirical findings: the impact of the effect, the precision of the effect size estimate. Quantifying the impact of studies is notably difficult, but I think citation counts are an easy to use proxy. Based on the idea that more data yields a better estimate of the population effect size, sample size is a dominant factor in precision (Borenstein, Hedges, Higgins, & Rothstein, 2009). The larger the sample, the lower the variance of the effect size estimate, which leads to a narrower confidence interval around the effect size estimate. We can take the precision of the effect size estimate: The confidence interval for r is calculated by first transforming r to Fisher’s z:
A very good approximation of the variance of z is:
The confidence interval can then be calculated as normal:
95% CI=r ±1.96*√(Vz )
The values acquired through this procedure can be transformed back to r using:
r= (e^(2 × z)-1)/(e^(2 × z)+1)
where the z value is the z transformed upper or lower boundary of the 95% CI.
By expressing the width of the confidence interval of the effect size estimate of an effect as a percentage of the total possible width of the confidence interval, we have an index of the precision of the effect size estimate, which I call the ‘spielraum’, or the playing field, based on the conceptual similarity to the precision of a theoretical prediction in Meehl’s (1990) work on appraising theories.
Now the tricky thing is how these two factors interact, and determine the replication value. While I’m going back to solve that question, perhaps you want to propose a completely different approach. I mean, really, this is a question that requires Bayesian statistics, right? Are citation counts the absolutely worst way to quantify impact?
See how to contribute here: https://docs.google.com/document/d/1ufO7gwwI2rI7PnESn4wDA-pLcns46NyE7Pp-3zG3PN8/edit I really look forward to your suggestions.