I think it is important people report confidence intervals to provide an indication of the uncertainty in the point estimates they report. However, I am not too enthusiastic about the current practice to report 95% confidence intervals. I think there are good reasons to consider alternatives, such as reporting 99.9% confidence intervals instead.
I’m not alone in my dislike of a 95% CI. Sterne and Smith (2001, p. 230) have provided the following recommendation for the use of confidence intervals:
Confidence intervals for the main results should always be included, but 90% rather than 95% levels should be used. Confidence intervals should not be used as a surrogate means of examining significance at the conventional 5% level. Interpretation of confidence intervals should focus on the implications (clinical importance) of the range of values in the interval.
This last sentence is, along with world peace, an excellent recommendation of what we should focus on. Neither seems very likely in the near future. I think people (especially when they just do theoretical research) will continue to interpret confidence intervals as an indication of whether an effect is statistically different from 0, and make even more dichotomous statements than they do with p-values. After all, a confidence intervals includes 0 or not, but p-values come in three* different** magnitudes***.
We know that relatively high p-values (e.g., p-values > 0.01) provide relatively weak support for H1 (or sometimes, in large samples which give high power, they actually provide support for H0). So instead of using a 90% CI, I think it’s a better idea to use a 99.9% CI. This has several benefits.
First of all, instead of arguing to stop reporting p-values (which I don’t think is necessary) because confidence intervals give us exactly the same information, we can report p-values as we are used to (using p < 0.05) and 99.9% CI that tell us whether an effect differs from 0 with p < .001. We can now immediately see whether an effect is statistically different from 0 using the typical alpha-value, and whether it is still statistically different from 0 if we would have used a much stricter alpha level of 0.001. Note that I have argued against using a p<.001 as a hard criterion to judge whether a scientific finding has evidential value, and prefer a more continuous evaluation of research findings. However, when using 99.9% CI, you can use the traditional significance criterion we are used to, while at the same time looking what would have happened had you followed stricter recommendations of p < .001. Since evidential value is negatively correlated with p-values (the lower the p-value, the higher the evidential value, all else being equal, see Good, 1992), any effect that would have been significant with a p < .001 has more evidential value than an effect only significant at p < .05.
Second, confidence intervals are often incorrectly interpreted as the range which will contain the true parameter of interest (such as an effect size). Confidence intervals are statements about the number of future confidence intervals that will include the true parameter, not a statement about the number of parameters that will fall within the confidence interval.
The more intuitive interpretation people want to use when they see a confidence interval is to interpret it as a Capture Percentage (CP). My back-of-an-envelope explanation in the picture below shows how a 95% CI is only a 95% CP when the parameter (such as an effect size) you observe in a single sample happens to be exactly the same as the true parameter (left). When this is not the case (and it is almost often not exactly the case) less than 95% of future effect sizes will fall within the CI from your current sample (see the right side of the figure). In the long run, so on average, a 95% CI has a 83.4% capture probability.
This is explained in Cumming & Maillardet (2006), who present two formula to convert a Confidence Interval to a Capture Percentage. I’ve made a spreadsheet in case you want to try out different values:
Capture probabilities are interesting. You typically have only a single sample, and thus a single confidence interval, so any statement about what an infinity of future confidence intervals will do is relatively uninteresting. However, a capture percentage is a statement you can make based on a single confidence interval. Based on a single interval, it will say something about where future statistics (such as means or effect sizes) are likely to fall. A value of 83.4% is a little low (it means on average 16.6% of the time you will be wrong in the future). For a 99.9% confidence interval, the capture percentage is 98%. That’s two easy to remember numbers, and being 98% certain of where you can expect something is pretty good.
So, changing reporting practices away from 95% confidence intervals to 99.9% confidence intervals and 98% capture intervals has at least two benefits. The only downside is that confidence intervals are a little wider (see below for an independent t-test with n = 20 and a true d of 1), but if you really care about the width of a confidence interval, you can always collect a larger sample. Does this make sense? I'd love to hear your ideas about using 99.9% confidence intervals in the comments.