Scott Armstrong« and Raymond Hubbard"

'The Wharton Schooi. University of Pennsylvania. Philadelphia. PA 19104

6College of Business and Public Administration Drake University, Des Moines. IA 50311

As Cicchetti indicates, agreement among reviewers is not high.

This conclusion is empirically supported by Fiske and Fogg (1990), who reported that two independent reviews of the same papers typically had no critical point in common. Does this imply that journal editors should strive for a high level of reviewer consensus as a criterion for publication? Prior research suggests that such a requirement would inhibit the publication of papers with controversial findings. W e summarize this re-search and report on a survey of editors.

Prior research. Horrobin (1990) suggests that the primary function of p e e r review should be to identify new and useful findings, that is, to promote the publication of important innova-tions. This function is typically subordinated to the quality control aspects of peer review, however. T h e quality control approach looks for agreement among the reviewers. The result, Horrobin claims, is that competent research yielding relatively unimportant findings is more readily accepted for publication.1

He provides n u m e r o u s examples of harsh p e e r review given to important research that presents controversial results.

The popular press often reports difficulties associated with the publication of important research findings. The scanning tunneling microscope (STM) is a case in point. The STM is capable of distinguishing individual atoms and has been hailed as one of the most important inventions of this century. It earned a Nobel Prize in physics for its inventors. Nevertheless, the first attempt to publish the results produced by the STM in 1981 failed because a journal referee found the paper "not interesting enough" (Fisher 1989).

Armstrong (1982c) provides additional examples of lapses in the peer review system, along with summaries of empirical evidence that discontinuing findings about important topics are difficult to publish. Among these, the experimental studies by Goodstein and Brazis (1970) and Mahoney (1977) are of particu-lar interest. They found that reviewers w e r e biased against negative findings. They rejected these papers on the basis of poor methodology while accepting papers with confirmatory outcomes that used t h e identical methodology.

Given the above results, one might expect that if editors rely on consensus among reviewers for their publication decisions, few controversial findings will be published. This problem could be especially serious in social science journals. TTiese journals generally have low acceptance rates and their editors may decide to publish only manuscripts with high agreement among reviewers.

A survey of journal editors. To assess how journals treat

empirical papers that present controversial findings, we con-ducted a survey of 20 current or recent editors of American Psychological Association (APA) journals. The two-page ques-tionnaire, together with a stamped, self-addressed return enve-lope, was mailed out in March 1990. We followed up with phone calls 10 days after the mailing.

Replies were received from 16 of the 20 editors. One question asked: "To the best of your memory, during the last two years of your tenure as editor of an APA journal, did your journal publish one or more papers that were considered to be both controver-sial and empirical? (That is, papers that presented empirical evidence contradicting the prevailing wisdom.)" Seven editors could recall none.2 Four said "yes" and indicated that there was one paper. Three editors replied that there was at least one. Two said that they published several such papers. It seems that controversial empirical papers do get published, but infre-quently. Almost half the editors could not recall publishing such papers \n the past two years.

We then asked about the peer review for the one published controversial empirical paper that they remembered most clear-ly. Hie question was worded: "How did the reviewers respond to this paper?" A five-point scale from "unanimously accepted"

to "unanimously rejected" was provided, as well as a "don't recall" option. One of the nine respondents to this question reported unanimous acceptance, three reported "majority in favor," four reported "even split," and one answered "don't recall." In response to a question on this published paper's contribution to the discipline, one editor said "not important"

four said "somewhat important," and four selected the highest rating, "important."

The editors were also asked if they had rejected any papers that were controversial and empirical. Six of the editors stated that they did not receive such papers, and four said they could not recall any. The six editors who rejected papers with contro-versial findings did so, they said, because of poor methodology and poor supporting arguments. Of the rejected papers that the editors "remembered most clearly," only one was "unan-imously rejected;" a "majority not in favor" was reported for two, an "even split" for two, and a "majority in favor" for one.

Three papers were rated as "not important," and three as

"somewhat important."

These results suggest that one can get reviewer agreement on controversial empirical papers. Moreover, most of these papers are published without high levels of reviewer agreement. Ap-parently, editors do not rely solely on reviewer agreement.

It is interesting that our survey found only two instances of unanimous reviewer agreement for empirical papers with con-troversial findings. In one case, the recommendation was to reject. In the other, it was to accept. In the case of the accepted manuscript, it should be noted that the editor had invited this submission and had selected reviewers who, he said, were sympathetic to its content.

Our survey indicates that some controversial empirical pa-pers do get published, even when there is disagreement among the reviewers. The willingness of editors to publish such papers is encouraging. On the other hand, 7 of 16 editors could recall no instances of publishing controversial empirical findings. Conse-quently, we consider some strategies to increase the odds of publishing this type of paper in the next section.

Possible solutions. Some methods that are currently used by journals should help.

1. Some journals' editorial policies allow the author to submit a list of possible referees, one of whom would be selected.

2. Items can be included on structured rating sheets so that reviewers rate the extent to which the findings are controver-sial. Editors can then give such ratings more weight.

3. Additional reviews can be sought when papers are judged to contain controversial findings. (This strategy was used for only one of the nine published papers and for only one of the six rejected papers in our survey.)

4. Special appeal procedures may help for controversial pa-pers. This might involve other members of the editorial board.

5. Controversial papers can be reviewed initially without revealing the findings. This procedure is currently used by the International Journal of Forecasting. It has not been used frequently but, when used, it has been beneficial.

6. Provide a section of the journal for "Controversial Find-ings. " The selection of an editor for such a section would indicate the journal's willingness to provide space for such studies.

Unfortunately, the one application of this approach that we know (Armstrong 1982b) has produced only one submission, and the findings reported in that submission were not contro-versial, only the methods were.

Rather than looking for agreement, it might be useful to seek reviewers to act as advocates. This advocacy system would be used for papers that are designated as containing controversial results. A paper could be so designated by the author, the editor, or a reviewer, after which special advocacy procedures would be used. This might include some of the above mentioned suggestions. In addition, one could use more reviewers in an effort to find an advocate. An advocate could insist on publica-tion; a note could be included with the published paper so that reviewers are, in a sense, willing to stake their reputations on the paper.3 Through this note, the readers would receive information about the nature of the acceptance. All referees could be given the opportunity to write peer commentary on the paper. This procedure would greatly increase the likelihood that important papers would be published. The increased effort given to reviewing might also improve quality control.

Conclusions. Controversial empirical papers are expected to receive harsh treatment in peer review, but our survey indicates that such works occasionally get published, sometimes without much peer agreement. More can be done to encourage publica-tion, however. We suggest ways to accomplish this, in particu-lar, the use of an advocacy procedure that explicitly recognizes the need to promote this type of research.

N O T E S

Department ot Epidemiology and BiostaOstícs, McGill University School of Medicine, Montreal H3A 1A2, Canada

The following remarks arc cast largely in terms of peer review of manuscripts for possible journal publication, but they also apply generally to peer review of grant proposals.

Cicchetti consistently misses the mark. The purpose of peer review is not reliability, but to improve decisions concerning publication and funding, and these authors simply do not dis-cuss this critical matter.

Cicchetti fairly states the value of both redundancy and

"creative" disagreement in peer review, but fails to acknowl-edge adequately that editors and grants managers choose (and should choose) reviewers for their different, complementary

expertise. For example, a report on a randomized trial of a new drug for the control of hypertension might be sent to a car-diologist, a pharmacologist, and a statistician. They would, and should, be alert to quite different kinds of strengths and prob-lems, and there is no reason to expect either their detailed reports or their summary judgments to agree. Too much agree-ment is in fact a sign that the review process is not working well, that reviewers are not properly selected for diversity, and that some are redundant. Without this negative point, measures of inter referee agreement are of no value in assessing peer review mechanisms.

Cicchetti refers to the role of the reviewer in informing the judgment of the editor or grants manager, but does not ade-quately stress the point that reviewers are no more than sources of relevant information. I know of no leading journal where decisions about publication are made by a "vote" of the re-viewers. As a former editor (of JNCI [Journal of the National Cancer Institute], 1974-1980) I had a section on the reviewers' form asking for a judgment about publication (publish as submit-ted, publish with minor revisions, etc.) and regularly found that it was of little value in sorting out the merits of a paper. There is no substitute for careful study of specific comments, integrated with the wisdom of editorial board members and, sometimes, special consultants. As a result, it was not unusual for us to publish papers that three reviewers had recommended for disapproval, and vice versa.

A further point is that editors can adjust for (or sometimes deliberately use) reviewer bias. There has been few studies of the comments of peer reviewers to date, and all have focused on what reviewers write, not on the critical issue of how they have affected the information base on which a decision was made. I knew and regularly used reviewers who could never bring themselves to criticize a colleague directly, though their de-tailed comments were full of insight. And I used others who could never find a paper good enough to publish; with appropri-ate interpretation, their comments, too, were helpful. On rare occasion, when it appeared that an editorial decision might be challenged on the basis of the position or prestige of an author rather than scientific merit, I deliberately chose reviewers from one or the other camp to ensure that a strong and balanced review would be on the record. Some other editors do the same, and our journals have been the stronger for it.

The paper by Peters and Ceci (1982) is a weak reed. Shortly after it was published, I wrote to Peters with some specific questions about their work. I made at least two telephone calls to verify his address at the time, but received no reply to my letter. Folklore to the contrary, few first-class letters are really lost by the Postal Service. I must assume that I received no reply because their answers would have undercut the strength of the conclusions in their paper. I cannot find my copy of the letter at this late date, but I recall that two points of special interest were how they "randomly" chose the papers they resubmitted (in more detail than was given in their paper), and how (also in detail) they revised titles and content to reduce the likelihood of detection of their own fraud. Most long-time editors have had the experience of publishing papers and almost immediately regretting the decision to publish, so that biased selection of winners or losers is simply not informative about practice in general.

I am concerned that Cicchetti accepts without comment the appropriateness of studies carried out without the consent of the subjects, whether journals (and editors) or reviewers. Substan-tial investments of time, and direct financial investments as well, have been requested under false pretenses in the name of

"science." I know and understand the arguments that some research cannot be carried out if the subject is properly in-formed, but reject any notion that such research thereby be-comes ethical.

Editors do, and should, base their editorial decisions in part on results. Many negative studies are never properly

com-pleted; others are presented in slap-dash fashion. Some are trivial because few knowledgeable investigators would have expected anything other than negative results; still others have samples too small to have much chance of showing a real effect even if one should be present. Many other negative studies are indeed published in the sense of "made public," but not as full-length original contributions. Instead, their results may be disseminated as abstracts, in short sections of later papers that extend the work, or even by word of mouth. Arching over all of this is the proper concern of editors about their readers' in-terests. I know of no evidence that readers are harmed by editorial decisions that depend in part on results. Many fewer people, and different people, may need to know that something did not work than would need to know what did work. A good editor must be even more concerned about readers' legitimate interests than about authors' complaints, and the "need to know" is chief among these. Thus, some kinds of bias against publication of negative results in the usual full form is entirely appropriate and should be encouraged.

Cicchetti's section 7, on improving the reliability of peer review, tacitly takes improved reliability as an important goal.

But the fundamental objective of peer review, and of the manuscript selection process in general, is not "fairness" to authors (though that may be a welcome byproduct). It is to improve decisions. Will larger numbers of reviewers, better training, or instructions for reviewing improve decisions? The matter has not been studied, perhaps because no one has yet devised a good measure of the quality of decisions to publish or disapprove. I know of no good statistical evidence that blinding reviewers to authors, or authors to reviewers, affects editorial decisions in generally good or bad ways. There is substantial anecdotal evidence, however, that both the strengths and the weaknesses of a paper are appraised more accurately when reviewers know who the authors are, but not vice versa.

I find no recognition here that editorial decisions can, do, and should make use of criteria other than abstract scientific/

technical merit. Such criteria include originality, the suitability of the topic for a given journal, readability and the appropri-ateness of length and style, the need for a balance of topics in journals with broad coverage, the importance of findings to readers, and even whether there is reason to suspect uncon-scious bias or deliberate error in the data or the analysis.

Overall, I believe that Cicchetti's paper shows a misunder-standing of the role of peer review as an aid to editorial decisions and grants management.

The predictive validity of peer review: A neglected issue

Robert F. Bornstein

Department of Psychology, Gettysburg College, Gettysburg, PA 17325

Cicchetti's analysis of inter-reviewer reliability in manuscript and grant proposal assessments is both timely and valuable, and will help to resolve a number of unsettled issues in this area.

Cicchetti - like most researchers investigating aspects of the peer review process - focuses mainly on reliability issues in peer review. His analysis confirms that inter-reviewer reliability in manuscript and grant proposal assessments is generally quite low. An important question remains unanswered, however:

What do we know about the validity of peer review? Peer review is, at least in part, an assessment tool designed to identify the best research efforts in a given sample of manuscripts (see Bornstein 1990; Eichorn & VandenBos 1985). Thus, we should be able to demonstrate empirically that peer reviews have predictive validity and that reviews can discriminate high-quality from low-high-quality research.

Unfortunately, designing studies to investigate the predictive

validity of peer review is considerably trickier than designing studies to assess inter-reviewer reliabilities. In particular, diffi-culties in selecting an appropriate criterion measure with which to assess research quality have hindered efforts to conduct empirical research on this topic. Researchers typically use journal citation frequency as a criterion measure in these stud-ies, testing the hypothesis that, if manuscript reviews, have predictive validity, then papers that receive highly positive reviews should be those that report the most important, well-designed studies. These papers should therefore be cited more frequently than papers that receive less positive reviews (Gott-fredson 1978). Although citation frequencies have been used to assess journal quality and the eminence of individual re-searchers (Garfield 1972; Lindzey 1977), the use of citation indices as a measure of the quality of a particular piece of research is questionable for several reasons.

First, we make a number of assumptions regarding the quality of research based on the journal in which it appears. If a paper is published in a prestigious journal, we infer that it must be good and valuable research. Were the same paper to appear in a less prestigious journal, it would most likely be seen as less rigorous and important, and we would be less likely to cite it. Clearly, the well-known "halo effect" (Nisbett & Wilson 1977) influences our perceptions of psychological research.

Second, variables unrelated to research quality will influence the number of citations a paper receives. Mediocre research in an area that is tangentially related to a variety of topics will probably receive a greater number of citations than excellent research in a more obscure and narrowly defined area. Research on experimental design and methodology tends to be the most widely cited in all branches of science (see Lindsey 1978). This is not surprising, given that such papers have implications for a

In document Szakértői bírálat (peer review) a tudományos kutatásban Válogatott tanulmányok a téma szakirodalmából U (Pldal 62-79)