• Nem Talált Eredményt

A Cross-Disciplinary Investigation Behavioral and Brain Science, 14 (1991) 119-186

3. Empirical issues: Methodology and data- data-analytic strategies

4.5. Reliability of manuscript reviews: Physical sciences

As far as w e are aware, no formal studies of the reliability of p e e r review have b e e n u n d e r t a k e n for manuscript or abstract submissions to j o u r n a l s in the physical sciences, yet t h e r e is a prevailing belief that levels of interreferee a g r e e m e n t are substantially higher for journals in the physical sciences than in o t h e r areas studied. This

conclu-Table 2. Levels of reviewer agreement in the evaluation of the scientific merit of submitted manuscripts A. Behavioral Science

No. of R, or

Journal Reviews Kappa Value Sources

"Social Problems" (1958-61) 193 . 4 0 ' Smigel it Ross (1970)

"Journal of Personality and Social Psychology" 286 .26» Scott (1974)

"Sociometry" 140 .21« Hendrick (1976)

"Personality and Social Psychology Bulletin" 177 .21« Hendrick (1977)

"Journal of Abnormal Psychology" (1973-8) 1319 .19« Cicchetti & Eron (1979; and unpublished)

"American Psychologist" (1977-8) 87 .54» Cicchetti (1980); Scarr & Weber (1978)

"American Psychologist" (1978-9) 72 .38« Ciccheti (unpublished)

"Journal of Educational Psychology" 325 .34« Marsh & Ball (1981) (1978-80)

"Developmental Review" 72 .44« Whitehurst (1983; 1984)

"American Sociological Review" 22 .29« Hargens it Herting (1990)

"Law & Society Review" 251 .23« Hargens it Herting (1990) B. Medicine

No. of R, or

Journal Reviews Kappa Value Sources

2 Untitled Biomedical Journals 1572 . 3 4 ' Orr & Kassab (1965)

"New England Journal of Medicine" 496 . 2 6 ' Ingelfinger (1974) A Major Medical Subspeciality Journal 866 .37« Cicchetti it Conn (1978)

"British Medical Journal" 707 .31* Lock (1985)

"Physiological Zoology" 209 .31« Hargens & Herting (1990)

Note: »Intraclass R values; ''Kappa values; The criteria of Cicchetti & Sparrow (1981); Fleiss (1981); in which kappa or R, values <

,40=POOR; .40—.59= FAIR; .60 - 74 = GOOD; and .75-1.00=EXCELLENT. Note that levels of observed agreement (where available) ranged between 68.30% and 77.00% and the levels of chance-corrected agreement were all significant at or beyond the .05 level. Note also that the R value of .54 for reviews of the manuscripts submitted to the "American Psychologist" dropped to .38 on replication.

sion seems to b e based on a statement made some years ago about one of t h e most prestigious journals in the physical sciences:

We have found, for example, that in a sample of 172 papers evaluated by two referees for the Physical Review (in t h e period 1948-56), agreement was very high. In only five cases did t h e referees fully disagree, with one r e c o m m e n d i n g acceptance and the other, rejection. For t h e rest, t h e recommended decision was the same, with two-thirds of these involving minor differences in t h e character of proposed revisions (Zuckerman & Merton 1971, p. 67).

Unfortunately, this brief analysis provides no answers to some very basic questions: (1) What type of rating system was used by the referees? (2) Given the high acceptance rates of Physical Review, how much agree-ment b e t w e e n reviewers would one expect on the basis of chance alone? (3) What is m e a n t by "minor differences in the character of proposed revisions"? and (4) How repre-sentative a subset is this sample of all the manuscripts submitted at that time?

The question of representativeness seems t h e most important. C o m m e n t i n g recently on this issue, Hargens (1988) and H a r g e n s and H e r t i n g (1990b, p. 17) note the following;

One reason that studies of referee reliability are rela-tively rare for physical-science journals is that such

journals often use t h e single initial r e f e r e e system.

Thus, data on pairs of referee assessments of all submis-sions are unavailable for these journals. Those manu-scripts that do receive at least two i n d e p e n d e n t referee evaluations u n d e r this system are an unrepresentative subset of all manuscripts. Thus, nonexperimental data on referee a g r e e m e n t for these journals, such as the evidence r e p o r t e d by Zuckerman and M e r t o n , should b e reviewed with caution.

Hargens is right in his conclusions, especially with respect to the s t r u c t u r e of the journal Physical Review during the early study period (1948-56) from which the Zuckerman & Merton (1971) data were d e r i v e d . From that time until 1969, t h e Physical Review did not allocate separate sections to physics specialty areas or suhfields.

Beginning in 1970 however, and continuing to the present, the Physical Review allocated its total pages to four distinct suhfields: genera) physics, c o n d e n s e d mat-ter, nuclear physics, and particles and fields. Data, deriv-ing from the Physical Review and Physical Review Let-ters, Annual Report 1986, indicate that although the overall acceptance rate of Physical Review for 19867 (75%) remained consistent with previous years (an average of 77% between 1969 and 1986), the p e r c e n t a g e of manu-scripts accepted in t h e four suhfields varied r a t h e r wide-ly. These data indicate that the acceptance rates were 81% for nuclear physics and 78% for c o n d e n s e d matter,

b u t only 70% and 69% for general physics and particles a n d fields, respectively. T h e nonparametric J o n c k h e e r e (1970) test of trend (Leach 1979) showed a highly signifi-c a n t trend, produsignifi-cing a Z value of 21.41 (p < .00001).

T h i s is interesting in its o w n right because it is consistent w i t h t h e known higher manuscript-rejection rates for m o r e general disciplines c o m p a r e d to more specific ones, t h e latter being thought of as "more experimentally and observationally oriented, with an emphasis on rigour of observation and analysis" (Zuckerman & Merton 1971, p.

77).

W h a t further implications do such data h a v e ? It s e e m e d plausible that e v e n within the Physical Review j o u r n a l , as the subfields b e c o m e more and m o r e general, t h e r e should be progressively less d e p e n d e n c e on the deliberations of a single reviewer for any given m a n u -script. Would the p a t t e r n of acceptance rates across the is also statistically significant (Jonckheere 1970, Z = 6.87, p < .00001).

S i n c e manuscripts r e q u i r i n g more than one r e v i e w e r t e n d t o b e those that a r e problematic, these data indicate t h a t e v e n within the same physics journal t h e single initial r e f e r e e system is not uniformly applied, but, rather, varies as a function of t h e subfield, with more general s u b f i e l d s having higher rejection rates and also r e q u i r i n g m o r e reviewers before manuscripts are finally a c c e p t e d for publication. We w o u l d predict that if the e d i t o r s of

Physical Review were willing to undertake a reviewer reliability study of manuscripts submitted in the four subfields, one would find appreciably higher levels of a g r e e m e n t for nuclear physics and condensed matter than for particles and fields and general physics. These recent findings are also of great theoretical importance, since they allow one's reasoning to come "full circle" to the conclusion that Merton's normative model is not even wholly appropriate for the physical sciences. Another way of p u t t i n g this is that physics itself appears to share many Review Letters also suggest that some of the problems about t h e applicability of Merton's (1973) normative model may not be unique to medical and behavioral science. According to the editors of Physical Review Letters: " T h e referees, representative of the readers, are severe j u d g e s of t h e papers. Only about 45% of the 2,300 papers submitted each year are accepted for publication"

(Adair & Trigg 1979, p. 475).

T h e editors continue in their Statement of Policy for the journal:

F o r t h e majority of the papers t h e comments of the two r e f e r e e s are sufficiently equivocal so that the editor cannot decide, with confidence, on the disposition of t h e p a p e r . Further information is sought from the authors, from further communication with the original r e f e r e e s , from other referees, and/or from the Divi-sional Associate Editors. T h e editors initiate an average of five written communications p e r paper to referees, authors, and Associate Editors to gather the informa-tion which allows them to c o m e to a conclusion con-c e r n i n g t h e disposition of t h e paper. Even then, for most papers, accepted or r e j e c t e d , the evidence is not Table 3. The parallel relationship between acceptance rates for manuscripts submitted to "Physical Review"

and the use of one or more reviewers

A. 1986 Data (N = 5264 Total Manuscripts [MS])

No. MS No. MS %

Subfield Received Accepted Accepted

C. nuclear physics 540 440 81%

B. condensed matter 2281 1786 78%

A. general physics 1325 931 70%

D. particles & fields 1118 775 69%

Across all Subfields 5264 3932 75%

Subfield

completely conclusive and the editors must judge as best they can the inconclusive evidence which bears on t h e subjective acceptance criteria (Adair & Trigg 1979, p. 476).

Consistent with Adair's assessment, Lazarus (1982, p.

219) notes that with respect to levels of interreviewer agreement for manuscripts submitted to t h e Physical Review Letters, "in only 1 0 - 1 5 % of cases do two referees agree on acceptance or rejection the first t i m e around -and this with the authors' -and institutional identities knownl"

Adair (1982) has expressed optimism that this situation will improve. Formal studies of the reliability of peer review for manuscripts submitted to physical science journals, especially in t h e more general areas, must be conducted, however, so that our conclusions can b e based on m o r e quantitative results than have b e e n available thus far. Since the Physical Review Letters has been considered one of t h e t w o most prestigious publications in t h e field (Beyer 1978; Lodahl 1970), and, similar to the general journals in behavioral science and medicine, it does use t h e two-initial-referee system, a more quan-titative assessment of peer-review practices should be of more than passing interest to an important segment of the scientific community. If such a study were undertaken, we would predict that levels of referee consensus for Physical Review Letters would be of the same relatively low o r d e r of magnitude (typically below R, of .40) charac-terizing general journals in many other disciplines.

T h e 1985-86 rejection rates of Physical Review Letters (consistent with t h e o r d e r i n g of those for t h e Physical Review) are the highest for the general subfields of gener-al physics (74%, or 631 manuscripts [MS] rejected/854 MS received) and cross disciplinary physics (68%, or 71/106); the rejection rate was lowest for t h e much more specific subfield, atoms and molecules (52%, or 243/470).

Moreover, these data a r e consistent with journal rejec-tion rates in psychology (Summary Report of Journal Operations 1988) in which general focus journals have the highest rejection rates, for example, the Journal of Ap-plied Psychology (93%), Psychological Review (89%), and the Journal of Experimental Psychology (JEP): General (81%). At t h e same time, t h e more specific focus journals have t h e lowest rejection rates, for example, JEP: Learn-ing, Memory, and Cognition (58%), the Journal of Com-parative Psychology (39%), and Behavioral Neuroscience (also 39%). These data are also consistent with those reported by Lock (1985) for medical journals. Similarly, Hargens (1988, p. 139) notes that "cultural anthropology journals have higher rejection rates than physical an-thropology journals, and rates for journals in social, ab-normal, and educational psychology exceed those in ex-perimental, comparative, and physiological psychology."

During t h e early 1980s, t h e general focus (cultural) jour-nal, American Anthropology, had a rejection rate of 85%, while t h e American Journal of Physical Anthropology evidenced a sharply contrasting rejection rate of only 22%

(Hargens 1988, p. 150).

O u r work and that of Hargens and Herting (1990b), support t h e argument that while manuscripts submitted to t h e journals studied in t h e behavioral and medical areas seem routinely to receive at least two independent reviews, this option is used in physics and related fields only when a manuscript seems problematic. In contrast to

t h e experience of Physical Review and other physics journals (e.g., Abt 1988), fewer than 1 in 4 manuscripts (22% of 2274 manuscripts) submitted to t h e general Journal of Abnormal Psychology in 1973 received re-views based on the deliberations of a single referee.

Moreover, the overwhelming majority of t h e m (52/59 or 88%) w e r e rejected.

Since t h e only comprehensive study of p e e r review of grant proposals was undertaken by Cole et al. (1981), this area is completely open to further research. Roy (1985) r e m i n d s us that t h e r e are five systems of grant review which are so different that criticisms aimed at one of them are not applicable to t h e others. For example, although all five systems use mail reviewers, they differ in terms of: (a) who selects the reviewers (i.e., program managers or p e e r s unknown to the program managers); (b) the specific method of grant evaluation (average of referees' ratings, or t h e decision of an independent panel of peers); and (c) w h e t h e r or not p e e r reviews are followed by a panel site visit. O n e interesting research question accordingly con-cerns how such differences might influence both the reliability and validity of grant reviews.

4.6. Reliability of grant reviews. The major source of data