Quasi-revolution in psychology and reproducibility of extraordinary results

Artem Akopyan
The University of Western Ontario

Keywords: replication, ESP, randomness, confirmatory research


This article deals with the issue of reliability in contemporary research. Specifically, the idea of purposeful replications of psychological studies is discussed. Next, the two main types of replication are presented and their pertinence to psychology is explained in connection with Darryl Bem's (2011) notorious article that claimed overwhelming evidence of precognition. The reader is shown the potential shortcomings that might come about as a result of neglecting literal replications or misinterpreting the results obtained with the help of each form of replication Finally, some modern changes in research practices in psychology are mentioned and their significance in scientific testing and theorizing are briefly outlined.


The main purpose of scientific enquiry is the testing of intuitive suppositions. The potential of psychological science is enormous, and experimental research in psychology frequently produces remarkable results that lead to bold claims and new perspectives on phenomena of interest. In order for the potential of psychology to be fully realized, statistical results obtained by one researcher should be reproducible at will by any of his/her colleagues. Replication occurs when a psychologist, wanting to verify or falsify a theory or a concept obtained through informed study by a colleague, performs an experiment which is intended to be in many ways similar to the initial study; tasks, materials and procedures carried out as a part of this latter study may or may not be the same.

Classes of Replications

In the realm of replications, Schmidt (2009) distinguished two main classes based on the degree of the researcher's adherence to the procedural specifications provided by the initial author. A conceptual replication is one that tests a claim about the same phenomenon as the primary source, yet deviates to a varying extent in the use of measurement instruments, sampling, or other aspects of experimental procedure. Literal replications, on the other hand, observe as closely as possible the execution of procedures as they are described in the original report (for that reason literal replications are also sometimes referred to as "close replications").

Replications are an important part of confirming or falsifying the result of a previous study. Schmidt (2009) speculated that "successful" conceptual replications cannot be interpreted as supplementary evidence in favour of the hypothesis tested in the initial study. Likewise, some authors point out the confusion between literal and conceptual replication, as well as between confirmatory and exploratory research (Wagenmakers et al., 2012). In formulating unambiguous criteria that would help determine whether a study can be considered a replication, Schmidt referred to "…eight classes of variables that define the total research reality (p.93) (see Table 1).

The most important class is termed primary information focus. This construct describes the instructions, materials, and events that create a certain stimulus complex for the participant.

Table 1. Eight classes of research reality (as found in Schmidt, 2009)

Class 1: Primary information focus, immaterial

This construct describes the instructions, materials, and events that create a certain stimulus complex for the participant. Immaterial subclass includes information conveyed to participants

Class 2: Primary information focus, material

material realization that is necessary to convey this information

Class 3: Participant characteristics

e.g., gender

Class 4: Specific research history of participants

includes prior experiences and motivation for the participation in the experiment

Class5: cultural and historical context in which the study is embedded

e.g., point in time when the experiment is performed

Class 6: control agent

the experimenter who is interacting with the participants

Class 7: specific task variables

minute material circumstances such as typing font, color of paper, etc.

Class 8: the modes of data reduction and presentation.

how the assessment of experimental effect is transformed and reported 

If one considers experimental manipulation (treatment) with all of the potential contributing factors in mind, an experiment may be likened to a function that maps the Class 2-7 characteristics into corresponding scores which are later subjected to statistical analysis. Let [a1, b1, c1, d1,…α1] be the set of personality characteristics of participant 1; likewise, the set of personality characteristics of subsequent participants will be indexed according to the order of appearance of the participant in the discussion. The hypothetical experiment then involves the application of treatments T = [t1, t2, ti], where the set T consists of all characteristics of a primary information focus pertaining to that particular treatment. A literal replication of the initial study where T was employed, produces a mapping (ai, bi,.., λi) to T(ai, bi,…, λi], whereas conceptual replication performs a set of manipulations T* = [t*1, t*2, t*3…, t*i] on [ai, bi, ci, di, …λi], resulting in T*(ai, bi, ci,…, λi). Schmidt (2009) acknowledged that a replication can never be exact in the sense of performing the very same experiment twice because Class 2-7 specifications cannot be emulated with perfect accuracy; however, a literal replication embodies the best approximation to the initial experiment because the execution of experimental procedure is observed as closely as possible; thus, the recently introduced emphasis on literal replications is in some sense justified, but mostly because human beings are not always good at making inferences about environments (in this case, results of experiments) that appear to be different in more ways than they appear similar.

Potential Shortcomings

Which of the two types of replication is more difficult to achieve? The personality characteristics [ai, bi, ci…λi] react to the treatment, and the resulting scores are collected (albeit with less-than-perfect precision) by a psychologist; whether a statistically significant result is achieved depends exactly upon how the treatment parameters interact with personality characteristics. If the combined treatments are not sufficiently unequivocal in their effect on the output scores, the researcher will have obtained a so-called "failed replication": effect of treatment not exceeding that of a chance finding. Similarly, if the net effect of experimental treatment set T (or T*) is directional inasmuch as it yields an above-chance result, having multiple distinctions between replication and initial study would always lead to the detection of a statistically significant result. Because many psychologists (Open Science Collaboration, 2012) see literal replication as the ultimate test of whether a hypothesis/theory is sound, the interpretation of "failed" literal replications is problematic due to the fact that precise influence of treatment specifications on personality is not known. In such cases, psychologists duly rely on fundamental, "classic" concepts that exist in their area of interest (priming in cognitive psychology, out-group bias in social psychology, among others). If one replicates any one of the classic psychological experiments and obtains a significant result, that latter result is in a certain sense taken for granted because priming, for instance, is a priori expected to facilitate recall. Schmidt indicated that decisions of a psychologist about the treatment to be administered might (and does) lead to subsequent presentation of the study as either a replication or independent test of a similar yet different hypothesis. In practice, a fellow researcher is most inclined to emphasize procedural differences' potential confounding if he/she does not agree with the result of that study as such; on the other hand, if the results appear plausible, he/she cites the study as yet another confirmation of the underlying theory because "priming ought to always facilitate recall in principle." Thus the key to understanding the real effect of an experimental treatment set [t1, t2,…, ti] must be key to interpreting experimental findings. Advocates of the popularization of literal replications dislike the ambiguity arising from mixed results produced by conceptual replications, yet confirmatory power of a literal replication hinges upon both the specific personality and treatment set characteristics. In view of the importance of procedural distinctions in producing statistically significant results, speculations about the various impacts of treatment on output scores will be in line with this approach. The ultimate goal of such efforts is the identification of one-to-one correspondences between a given element tn of the (hypothetical) treatment set and a commensurate change in the set of output scores. If theories depend upon propositions about the interaction between experimental manipulations and notable results, the meta-analytic approach described above would avoid the conflicting interpretations of experimental findings that result from disagreement about some aspect of a theoretical construct in question, including which another construct should or should not covary with it.

The use of replication, if conducted properly, can help protect the scientific community from proliferation of mistaken (in some cases, fraudulent) claims. Such insulation is especially important in the study of phenomena not readily embraced by the majority of psychologists; parapsychological events are a perfect example of such ambivalence. For instance, a recent controversy was instigated by Darryl Bem's article (2011) that reported statistically significant results that supported the existence of precognition. Participants were able to predict the side of the screen (left/right) where an erotic stimulus would appear after the prediction has been registered; also, participants' performance on word recall appeared to be significantly greater for words that were rehearsed after the test.

The main difficulty in settling the dispute lay in the inferential procedures as, strictly speaking, no number of experiments would allow scientists to gain absolute confidence in the correctness of either hypothesis. Moreover, in spite of the overwhelming skepticism surrounding the issue of psi, Dean Radin, one of the world's leading specialists in parapsychology, suggested that the outward disagreement between precognition and traditional science is illusory and is merely the result of preconceived notions instilled in aspiring psychologists by jeer pressure from their more conservative and eminent colleagues (Science and the taboo of psi, 2008). If that is the case, the issue of precognition remains unresolved as scientists project their beliefs about psi onto a pre-selected subset of published studies that do not find the supposed effect of precognition, if any, significantly exceeding that of a chance finding. Because the studies of Bem (2011) were challenged by subsequent replications and critical assessments of other investigators, an open-minded scientist is confined to disagreeable conclusions based on articles published thus far, and ultimately – the aforementioned preference for one set of beliefs or another in spite of wealth of published experiments (LeBel & Peters, 2011).

The idea of a replication (conceptual or literal) is simple and potentially powerful as there is no such thing as a "replication attempt" or "failed replication," provided a researcher is diligent in conducting the replication. However, a replication that produces the data consistent with the primary source is not in itself an unequivocal validation of it; likewise, a replication which does not lead to the same conclusion cannot justify a dismissal of the "source" hypothesis. The popularization of literal replications is not a panacea from uncertainty in scientific discourse, for establishing the laws of nature requires the knowledge of the number of factors determining an outcome as well as insights about the nature of those factors are more likely to be confounded.

Modern Changes in Research Practice

Still, psychologists must be aware of the benefit of literal replications as well as the distinction between confirmatory and exploratory research in order to plan studies appropriately and to correctly interpret those of their colleagues. Moreover, statistical science is currently being used for the formulation of fine-grained cognitive models; for instance, Dr. Etienne LeBel at the University of Western Ontario is developing a sophisticated technique for understanding participants' individual differences based on a modification of multinomial modeling (see Batchelder & Riefer, 1990). Web-sites including PloS and PsychFileDrawer, the Open Science Framework launched the Reproducibility Project (2012), and Bayesian statistics (centered around the Bayes Theorem) are being introduced into contemporary research practices, allowing for meta-analyses of data-to-hypothesis fit based on series of literal replications. With initiatives like these in place, the practical significance of psychological findings will rise considerably. Literal replications are a vital part of the ongoing quasi-revolution in psychology because by reaching consensus about how the literal replication is best carried out with the initial author, an inquisitive researcher can be sure that any differences in the materials, procedures, or participant demographics is conceptually negligible and the output scores obtained in a later study are comparable with the initial set and may be merged with it to result in a more compelling evidence in favour of the theory being scrutinized; conceptual equivalence among researchers is therefore central to the practical merit of literal replications. 


Batchelder, William H., and David M. Riefer. 1990. Multinomial processing models of source monitoring. Psychological review 97, (4): 548-564

Bem, D.J. (2011). Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect. Journal of personality and social psychology 100, (3): 407-425, https://www.lib.uwo.ca/cgi-      bin/ezpauthn.cgi/docview/851236583?accountid=15115 (accessed December 17, 2012).

LeBel, E. & Peters, K. (2011). Fearing the future of empirical psychology: Bem's (2011) evidence of psi as a case study of deficiencies in modal research practice. Review of        General Psychology 15, (4): 371-379

Open Science Collaboration. (2012). An Open, Large-Scale, Collaborative Effort to Estimate the Reproducibility of Psychological Science. Submitted for Perspectives on Psychological Science.

Science and the taboo of psi with Dean Radin. Retrieved December 17, 2012 from http://www.youtube.com/watch?v=qw_O9Qiwqew

Schmidt, S. (2009). Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Review of General Psychology 13, (2): 90-100, https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi/docview/621988722?accountid=15115 (accessed December 17, 2012).

Wagenmakers, E-J., Wetzels, R., Borsboom, D., van der Maas, H.J.L., & Kievit, R.A. (2012) - An agenda for purely confirmatory research. Submitted for Perspectives on Psychological Science.


©2002-2016 All rights reserved by the Undergraduate Research Community.

Research Journal: Vol. 1 Vol. 2 Vol. 3 Vol. 4 Vol. 5 Vol. 6 Vol. 7 Vol. 8 Vol. 9 Vol. 10 Vol. 11 Vol. 12 Vol. 13 Vol. 14 Vol. 15
High School Edition

Call for Papers ¦ URC Home ¦ Kappa Omicron Nu

KONbutton K O N KONbutton