• Nem Talált Eredményt

Deciding between theories on the basis of summary effect sizes

PART II. THE TREATMENT OF INCONSISTENCIES RELATED TO EXPERIMENTS IN

III. THE EVALUATION OF THEORIES WITH RESPECT TO EXPERIMENTAL RESULTS IN

15. S UMMARY EFFECT SIZES AS EVIDENCE

15.2. Deciding between theories on the basis of summary effect sizes

As we have seen in Sections 11 and 15.1, statistical meta-analysis yields more fine-grained results than the customary practice of hypothesis testing. While the latter provides a dichotomy of significant vs. non-significant results, the former produces effect sizes. Therefore, in order

to confront and compare predictions drawn from rival theories with results of statistical meta-analyses, more refined predictions are needed, too. The application of the concepts of ‘weak/

relative/strong evidence’ has to be re-thought, too. More precisely, only the concept of weak evidence has to be adjusted; relative and strong evidence can be applied as laid down in (ER) and (ES).

Let’s start with the concept of ‘weak evidence’ as defined in (EW) in Section 13.2. Sup-pose that the meta-analysis we conducted is a reliable data source. This yields a plausible in-ference with the following structure:

(EWM) 0 < |If theory T correctly describes metaphor processing, then the effect of factor F on the processing times of metaphors should be x.|R < 1

0 < |The effect of factor F on the processing times of metaphors is y.|meta-analysis < 1 0 < |x and y are in harmony.|R < 1

0 < |Theory T correctly describes metaphor processing.|I < 1

The first premise of (EWM) presents a prediction drawn from theory T by researcher R. As we have seen in Section 15.1, this step involves many uncertainties; therefore, this statement is not true with certainty but only plausible. Similarly, as our analyses in Section 11.3 exemplify, the second premise presenting a datum, namely, the summary effect size of a statistical meta-anal-ysis can be only a plausible statement, too, since the synthesis of even a huge number of ex-periments is not capable of balancing out all possible systematic errors which may burden the single experiments. As the third premise captures, the conclusion can be regarded as plausible on the basis of this inference as an indirect source if the prediction and the result of the meta-analysis are in harmony. y is a real number between -1 and 1. Predictions, however, are in most cases considerably less informative than the summary effect size and often stipulate solely the presence of an effect without characterising its size. This yields a 3-point scale of ‘there is an effect – there is no effect – there is a reverse effect’. In other cases, x is a category from the 6-point scale ‘reverse large – reverse moderate – reverse small – no effect – small – moderate – large effect’. Thus, the critical point of the evaluation of such inferences is the comparison of a rather rudimentary prediction and a precise summary effect size. Accordingly, the following rule of thumb presents itself as a first possible explanation of the “harmony” required by the third premise of (EWM):

(RTR) In cases in which the prediction P drawn from a theory T cannot be refined and it stip-ulates only the presence and direction of the effect, a statistical meta-analysis provides weak evidence for P if the summary effect size indicates the presence and the direction of the effect (irrespective of its magnitude) as P did.

In all other cases, the summary effect size is a datum which has to be interpreted as weak evidence against P.

(RTR) yields only a quite rudimentary evaluation. For example, summary effect sizes of 0.2 and 0.9 may both fulfil the requirements and count as weak evidence for the prediction that there should be an effect. In the case of the experiments in Case Study 5, we obtain that the

statistical meta-analyses we conducted provide weak evidence for the predictions of both the-ories if we apply (RTR) and compare the predictions and the summary effect sizes separately.

This interpretation of the results is, however, not satisfactory, since it does not take into con-sideration the relative strength of the effects, that is, whether the predictions the two rival the-ories yield also stipulate which factor should be stronger.

If we are in possession of predictions which specify an effect size at least on the scale

‘reverse large – reverse moderate – reverse small – no effect – small – moderate – large effect’, then the following rule of thumb might enable us to apply the definition (EW) to summary effect size data:

(RTS) A statistical meta-analysis provides weak evidence for the predictions of theory T if the predictions and the related summary effect sizes indicate similar effect sizes, that is, both show a small/moderate/large (reverse) effect or no effect.

In all other cases, the summary effect size is a datum which has to be interpreted as weak evidence against the predictions of the theory at issue.

With the help of (RTS), the results presented in the previous section can be evaluated as fol-lows. The statistical meta-analyses carried out yield in the case of comprehensibility ratings strong evidence for the predictions from IPAM, and via this, for the model of metaphor pro-cessing delineated in Table 26; in contrast, with grammatical form preference and comprehen-sibility latencies the summary effect sizes provide weak evidence against both rival theories, since none of them was capable of predicting the effect of all three factors correctly. This eval-uation of the results is, however, strongly counter-intuitive, since IPAM fared, as Table 31 shows, considerably better with grammatical form preference ratings than did CMH.

Indeed, a less strict version of (RTS) is also possible, which formulates looser stipulations of weak evidence, and establishes a “neutral zone” between evidence and counter-evidence:

(RTL) A statistical meta-analysis provides weak evidence for the predictions of a theory if the predictions and the related summary effect sizes indicate similar effect sizes, that is, if both show a (reverse) small/moderate/large effect. In contrast, statistical meta-analysis provides weak evidence against the predictions of a theory if the difference between the predictions and the summary effect sizes is at least two levels on the ‘large reverse effect – moderate reverse effect – small reverse effect – no effect – small effect – mod-erate effect – large effect’ scale. If the difference is only one level, then the prediction has neutral plausibility on the basis of the summary effect size as evidence, i.e., the datum at issue is indecisive.

These looser guidelines yield that the summary effect size data in the case of the comprehen-sion latencies and comprehensibility ratings provide weak evidence for both theories, since the distance between their predictions and the summary effect sizes never transgresses the 1-level mark. Since the predictions of IPAM are more precise (that is, they often correctly predicted the size of the effect), the summary effect size data provide relative evidence for the predictions of Glucksberg’s theory. As for grammatical form preferences, the summary effect sizes provide

strong evidence for the predictions of IPAM, and via them, to the model presented in Table 26, since they were in two cases correct and in one case there was only 1-level difference; in a sharp contrast, CMH was in a 2-level error with two factors and in a 1-level error in one case.

There is indeed a fourth possibility: we may focus on the relationship of the factors of

‘conventionality’ and ‘aptness’ and interpret the predictions of the rivals not in separation but in such a way that the key predictions have to be fulfilled, that is, either conventionality or aptness has to be clearly stronger, while familiarity has to be similar to aptness or slightly weaker or stronger. At a more general level this yields the following rule of thumb for complex (more-factor) predictions:

(RTR) A statistical meta-analysis provides weak evidence for the predictions of a theory if the predictions estimate the relative strength of the relevant factors correctly.

If we build our evaluation on (RTR), then we obtain that grammatical form preference and comprehensibility ratings data provide strong evidence for (IPAM), while comprehensibility ratings provide weak evidence against both rivals, since according to them, there should be a big difference between conventionality and aptness, while there was no substantial difference.

Of course, it is also possible to elaborate quantitative methods for the comparison of pre-dictions and summary effect sizes. This is, however, a quite complicated task.

To sum up, the best way to interpret the upshot of our analyses is that the statistical meta-analyses we conducted make the IPAM predictions (and via this, the related hypotheses of the theory) moderately plausible, while they show the CMH predictions and related hypotheses slightly implausible. This means that a decision between the two theories based on our results is unequivocal but cannot be final and is fallible. Nonetheless, they suggest more clearly and strongly than any single experiment could that IPAM should be preferred over CMH – at least, on the basis of the totality of experiments conducted so far. New experiments making use of revised experimental designs, however, may overwrite this decision in future.

Outline

KAPCSOLÓDÓ DOKUMENTUMOK