A blog on statistics, methods, philosophy of science, and open science. Understanding 20% of statistics will improve 80% of your inferences.

Sunday, February 21, 2016

Where are all the competent researchers?


In response to failed replications, some researchers argue that replication studies are especially convincing when the people who performed the replication are ‘competent’ ‘experts’.

Paul Bloom has recently remarked: “Plainly, a failure to replicate means a lot when it’s done by careful and competent experimenters, and when it’s clear that the methods are sensitive enough to find an effect if one exists. Many failures to replicate are of this sort, and these are of considerable scientific value. But I’ve read enough descriptions of failed replications to know how badly some of them are done. I’m aware as well that some attempts at replication are done by undergraduates who have never run a study before. Such replication attempts are a great way to train students to do psychological research, but when they fail to get an effect, the response of the scientific community should be: Meh.

This mirrors the response by John Bargh after replications of the elderly priming studies yielded no significant effects: “The attitude that just anyone can have the expertise to conduct research in our area strikes me as more than a bit arrogant and condescending, as if designing the conducting these studies were mere child's play.” “Believe it or not, folks, a PhD in social psychology actually means something; the four or five years of training actually matters.

So where is the evidence we should ‘meh’ replications by novices that show no effect? And how do we define a ‘competent’ experimenter? And can we justify the intuition that a non-significant finding by undergraduate students is ‘meh’, when we are more than willing to submit the work by the same undergraduate when the outcome is statistically significant?

One way to define a competent experimenter is simply by looking who managed to observe the effect in the past. However, this won’t do. If we look at the elderly priming literature, a p-curve analysis gives no reason to assume anything more is going on than p-hacking. Thus, merely finding a significant result in the past should not be our definition of competence. It is a good definition of an ‘expert’, where the difference between an expert and novice is the amount of expertise one has in researching a topic. But I see no reason to believe expertise and competence are perfectly correlated.

There are cases where competence matters, as Paul Meehl reminds us in his lecture series (video 2, 46:30 minutes). He discusses a situation where David Miller provided evidence in support of the ether drift, long after Einstein’s relativity theory explained it away. This is perhaps the opposite as replication showing a null effect, but the competence of Miller, who had the reputation of being a very reliable experimenter, is clearly being taken into account by Meehl. It took until 1955 before the ‘occult result’ observed by Miller was explained by a temperature confound.

Showing that you can reliably reproduce findings is an important sign of competence – if this has been done without relying on publication bias and researchers’ degrees of freedom. This could easily be done in a single well-powered pre-registered replication study, but over the last years, I am not aware of researchers demonstrating their competence in reproducing contested findings in a pre-registered study. I definitely understand researchers prefer to spend their time in other ways than defending their past research. At the same time, I’ve seen many researchers who spend a lot of time writing papers criticizing replications that yield null results. Personally, I would say that if you are going to invest in defending your study, and data collection doesn’t take too much time, the most convincing demonstration of competence is a pre-registered study showing the effect.

So, the idea that there are competent researchers who can reliably demonstrate the presence of effects, which are not observed by others, is not supported by empirical data (so far). In the extreme case of clear incompetence, there is no need for an empirical justification, as the importance of competence to observe an effect is trivially true. It might very well be true under less trivial circumstances. These circumstances are probably not experiments that occur completely in computer cubicles, where people are guided through the experiment by a computer program. I can’t see how the expertise of experimenters has a large influence on psychological effects in these situations. This is also one of the reasons (along with the 50 participants randomly assigned to four between subject conditions) why I don’t think the ‘experimenter bias’ explanation for the elderly priming studies by Doyen and colleagues is particularly convincing (see Lakens & Evers, 2014).

In a recent pre-registered replication project re-examining the ego-depletion effect, both experts and novices performed replication studies. Although this paper is still in press, preliminary reports at conferences and on social media tell us the overall effect is not reliably different from 0. Is expertise a moderator? I have it on good authority that the answer is: No.

This last set of studies shows the importance of getting experts involved in replication efforts, since it allows us to empirically examine the idea that competence plays a big role in replication success. There are, apparently, people who will go ‘meh’ whenever non-experts perform replications. As is clear from my post, I am not convinced the correlation between expertise and competence is 1, but in light of the importance of social aspects of science, I think experts in specific research areas should get more involved in registered replication efforts of contested findings. In my book, and regardless of the outcome of such studies, performing pre-registered studies examining the robustness of your findings is a clear sign of competence.

18 comments:

  1. I think experimental expertise really does exist, but I take the point that we need to be skeptical about assuming it is playing a role in (non)-replications.

    One approach is to ensure that all experiments build in "self-testifying" indicators of experimental competence. Positive as well as negative controls, manipulation checks and the like.

    ReplyDelete
    Replies
    1. Hi Tom, yes, I'm fully on board with making this an empirical question. But so far, I think the intuition is brought up too often as a fact, while the available data contradict it.

      Delete
  2. Paul Bloom has recently remarked: “Plainly, a failure to replicate means a lot when it’s done by careful and competent experimenters, and when it’s clear that the methods are sensitive enough to find an effect if one exists. Many failures to replicate are of this sort, and these are of considerable scientific value. But I’ve read enough descriptions of failed replications to know how badly some of them are done. I’m aware as well that some attempts at replication are done by undergraduates who have never run a study before. Such replication attempts are a great way to train students to do psychological research, but when they fail to get an effect, the response of the scientific community should be: Meh.”

    When i was a student at university, i almost never saw a professor (competent researcher?) in the lab. Isn't a lot of psychological research actually carried out by research assistants, who could be undergraduates?

    If so, then i think many of the *original* experiments Bloom talks about might have very well been executed by undergraduates themselves, which makes his whole reasoning strange.

    ReplyDelete
    Replies
    1. I fully agree - indeed my main point is that this line of reasoning lacks an empirical basis. I simply don't know how to define 'competent'.

      Delete
  3. I run replication experiments with undergrad students. We use large sample sizes, and each study gets an additional positive control. As controls we've used the Retrospective Gambler's Task and the Gain/Loss Framing Effect, both of which have very well defined effect sizes thanks to the Many Labs projects. When the sample sizes are good and he positive controls turn out as expected, we can feel assured that even undergrads can do quality and informative work (here's an example: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0140806).

    It's also funny to hear researchers worry about undergraduate researchers in the context of failed replications, when no one seems to worry about undergrad researchers in the context of failed results. I run an undergrad powered neurobiology lab, and have found hat most undergrads with good training can perform electrophysiology, qpcr, and lots of other extremely time and skill-dependent techniques. Our results wi undergrads have been replicated win our own lab and across others. This is the norm in lots of the life sciences. Herding participants into cubicles to comlplete a psychology study (often fully on a computer) does not exactly compare in terms of the need for expertise...

    ReplyDelete
    Replies
    1. Very nice example of a high quality study using positive controls! Well done! Indeed, if you design such a strong study, and train undergraduates well, I see little reason why they cannot collect high quality data.

      Delete
  4. This comment of Paul's struck me as incorrect:
    "The attitude that just anyone can have the expertise to conduct research in our area strikes me as more than a bit arrogant and condescending, as if designing the conducting these studies were mere child's play.”

    The point is not that these students are "designing" the studies. The original authors did that, and that is the part that requires a PhD and years of experience. However once you've done the hard part of designing the study, yes it should be possible for a competent student to conduct the research and replicate the finding, except in the cases where the experiments require particular technical expertise. This is not to say that all replication attempts are done well of course. But the claim that one needs a Phd and many years of experience to even attempt a replication doesn't ring true.

    ReplyDelete
    Replies
    1. That's indeed an important distinction to make! Designing a good study is not easy, but I agree that when the hard work is done, performing the study is easier. But, let me play the devils advocate. Isn't this similar to saying 'creating a good dish is difficult and requires an experienced cook, but following a recipe is not difficult? I hear the cook vs. recipe counterpoint a lot. Is there a point? Or are our studies most often more like an instant mix than can't go wrong?

      Delete
    2. I thought about that while writing this, but decided not to go there. As you know, what's been said before about this issue that if your experiment requires some 'secret sauce' that you can't write down in sufficient detail that another reasonably intelligent person with the right background can't replicate your method than what you're doing isn't science. I agree with that statement if for no other reason than if the next generation of psychologists cannot recreate your testing conditions, then your research method dies with you and then what have we gained from your work? Also, by way of precedent..when the original cold fusion result failed to replicate, it wasn't an acceptable defense to argue that other physicists weren't doing it right, and so should it be with psychology.

      Delete
    3. I agree with your points, Brad.

      Like you, I'm a bit reluctant to entertain the chef analogy. One problem with it is that it assumes that the experienced chef can produce the dish perfectly every time. But food production, like the production of "scientific findings," suffers from selection biases.

      It could very well be the case that the chef only shows us the dishes that came out in the desired way; the slightly over-cooked dishes are discarded. As a consequence, those who taste the chef's final results assume the chef is infallible. And those who attempt to follow the recipe themselves and fail end up being seen as failures. But they don't appreciate that the original chef might have less than a 100% success rate in creating the dish themselves.

      Some things are simply hard to create; they involve a large number of factors that are difficult to control and are largely unknown. Producing it once or twice doesn't make you an expert. But producing the same thing repeatedly by following a transparent protocol demonstrates your ability to execute well. And, even if that is not a necessary skill for creative cooking, it is a necessary skill for the transmission of knowledge.

      Like Daniel suggested: We have no basis in psychology for evaluating research expertise. Part of the problem simply boils down to the fact that we never get to see what doesn't leave the kitchen.

      Delete
    4. As someone who loves cooking and who likes to spend time learning about cooking I don't think you guys understand how cooking works. If I had to sum up my comparison between people who can and can't cook (incl. my past selves) then I would say people who can cook are able to improvise dishes from some of the available ingredients, while people who can't cook need to follow the recipe and if they are forced to design a new meal the result is often disastrous. Replication is easy, design of new meal especially from constrained set of available ingredients is difficult. The main difference between you and a pro chef is that they buy expensive/quality ingredients and use expensive machines and tools. There is simply no way to make a steak with a pan, oven or a grill that is as delicious as a steak done with a sous-vide toolkit.

      Good cooking is similar to good science - design is difficult esp. with few available resources, attention to detail and exact description is important, good recipes are easily replicable and always taste good, when experimenting then change one factor at a time and taste/test, know your tools, know your ingredients and it always helps to throw some money at you tools and ingredients. (Also there are lot of poor tools and ingredients and many bad recipes.)

      Delete
  5. Suppose that a competent researcher can replicate a robust psychology effect nearly every time in the laboratory. Suppose also (as is usually the case) that this effect is virtually undetectable in ecologically valid settings.

    Example: Bystander effect.

    Luckily a brand new graduate student is tasked with replicating this phenomenon. Just like in the real world, where we are actually trying to understand cognition, they read the directions wrong and make critical errors in measurement and data analysis. Thus, the effect fails to replicate. This communicates the correct thing: that the effect in question is unreliable. Unreliability means subject to unpredictable forces like incompetent junior graduate students.

    Very few graduate students, even in psychology are unable to replicate the findings of universal gravitation even (especially!) after a few beers. The fact that these students struggle to replicate well known effects is a valid critique of those effects. It has the valuable effect of making the original researcher have to retrench their discussion to include caveats like "only when we do it at our lab." Those caveats are useful things to know!

    ReplyDelete
  6. I would venture that the issue with confounding experience with competence goes much deeper. It has long been assumed that possession of a PhD implies competence. This is a naive assumption. Our PhDs are as good as our training, and well, the fact is that there is no real agreement about what type of training experimental researchers should receive. The actual training researchers do receive is at the discretion of particular departments/supervisors. I'm all for academic freedom, but I have a hard time accepting that as motivation to not produce some type of standardization of the training required to produce competence.

    ReplyDelete
    Replies
    1. I guess that I disagree here. Psychology is too broad in its methods to develop a one size fits all training program. Furthermore, such a program would be extremely restrictive in that it would inhibit the development of new techniques, since students would have even less time to explore. We're probably doing more or less the right thing which is to focus on statistical competence and let the other methods training emerge as needed from the lab/PI. Some would argue we don't even go far enough in the stats and I think they're right.

      Delete
    2. I expect Monica also mainly meant the focus on some standardization in statistics (and perhaps methods, and philosophy of science) training. Beyond that, there is indeed a lot of room for differences. But the empirical basis should be solid of any PhD training, and it is not, in my experience.

      Delete
    3. Daniel understood what I meant exactly. We need a solid core beginning with 1. philosophy of science, many researchers don't quite understand what scientific thinking is, or how to engage in it productively, 2. methods, we shouldn't confuse the particulars of conducting eye tracking or response time experiments with a solid grounding in experimental methodology. There are general structures that apply across research environments. They are what make what we do science. 3. statistics, oh my are most of us woefully undertrained, with the biggest problem being the cookbook approach, and not enough time spend on understanding why things are done the way they are, and particularly, on their explanatory limits. 4. Meta-understanding. By this I mean training in how what you do fits in, first, with immediately adjacent fields, and then, with other types of science. In short, I suppose what I'm trying to get at is the truism associated with holding university degrees before they became a dime dozen: the reason a degree (B.A. or B.S.) could get you a job in the 1960s was because it was evidence that you'd been taught how to think and how to learn. If you could do those things, it didn't really matter if the industry that hired you was automotive or chemical, you were equipped to learn how to function productively in that environment. In contrast, I have lost count of how many PhDs I've met who are essentially one-trick ponies. They know how to do one thing, unquestioningly and strictly within the constraints of their original training, and have no real clue about how little they know about anything beyond it. The dialogue typically goes like this.

      Me: I read your paper on x but have a question about how you controlled for y.

      PhD: oh really? My supervisor said that's how I should do it.

      Me: ok but there are a,b,c problems with that.

      PhD: but my supervisor said it was fine.

      Me: Have you considered that your supervisor might be misinformed?

      PhD: (Blank look of terror).

      Last I checked, this attitude is closer to religion than to science.

      Delete
    4. "We need a solid core beginning with 1. philosophy of science, many researchers don't quite understand what scientific thinking is, or how to engage in it productively, 2. methods, we shouldn't confuse the particulars of conducting eye tracking or response time experiments with a solid grounding in experimental methodology. There are general structures that apply across research environments. They are what make what we do science. 3. statistics, oh my are most of us woefully undertrained, with the biggest problem being the cookbook approach, and not enough time spend on understanding why things are done the way they are, and particularly, on their explanatory limits. 4. Meta-understanding."

      Yes! I received an education regarding a research master in behavioral science and i found that almost all of this was not present, or fundamental information was left out.

      This has made me wonder if it could be possible to come up with (part of) a curriculum which tackles main issues, and which any university in the world could adopt. Perhaps there could even be some sort of "stamp of approval" handed out by an institution/society that would develop such a program which universities could then mention on their respective websites/curriculum descriptions.

      Is it possible to create such a program, and could that help in elevating the quality of the education students receive?

      Delete
    5. That would be a dream come true, wouldn't it? The benefits to science, and to a grounded sense of proficiency in practitioners, would be enormous. That said, I have no idea how something like this would come about, other than to create it ourselves. Otherwise, we're working on a a methods hub with the goal of creating community- approved standards for doing research. The project is inching along at the moment, but it is moving. I'll post updates to my twitter account @aeryn_thrace.

      Delete