• Nem Talált Eredményt

PART II. THE TREATMENT OF INCONSISTENCIES RELATED TO EXPERIMENTS IN

11. I NCONSISTENCY RESOLUTION AND STATISTICAL META - ANALYSIS IN RELATION TO EXPERIMENTS IN

11.3. Case study 5, Part 2: Meta-analysis as a tool of inconsistency resolution

11.3.3. Comprehensibility ratings

Table 19 in Appendix 1 presents the relevant data pertaining to the relationship between com-prehensibility ratings and familiarity. Thibodeau et al. (2016, 2018) was – after thorough con-sideration – dropped from the analyses. The reason for this decision was that the instructions for evaluating the familiarity of metaphors were not formulated clearly enough, as the uncer-tainty of the researchers also shows: they labelled the same factor ‘conventionality’ in Thibodeau et al. (2016) but ‘familiarity’ in Thibodeau et al. (2018).87

According to Figure 23, all experiments produced a positive correlation, but there seem to be subgroups.

Figure 23. Random-effects model of comprehensibility ratings with familiarity as a decisive factor

87 The instructions asked participants to focus on the base/vehicle term by capitalizing it; but they had to judge its “conventionality” not in isolation but in a metaphorical context in the same sentence (and in a supporting/not supporting metaphorical/literal wider context). It is also not clear on the basis of a comparison of the instructions and the excerpts, whether this judgement should be made in relation to the base/vehicle term’s metaphorical meaning or to the target/topic term.

Study name Time point Subgroup within study Statistics for each study Correlation and 95% CI

Lower Upper Relative Relative

Correlation limit limit Z-Value p-Value weight weight

Marscharketal832 1983 high 0.910 0.888 0.928 26.457 0.000 8.90

Marscharketal831 1983 moderate 0.820 0.781 0.852 21.046 0.000 8.91

Katzetal88 1988 moderate 0.820 0.652 0.911 6.011 0.000 7.95

McKay04 2004 high 0.930 0.908 0.947 23.277 0.000 8.84

Laietal09 2009 high 0.877 0.821 0.916 13.204 0.000 8.68

Cardilloetal10pred 2010 low 0.300 -0.164 0.655 1.276 0.202 7.44

Cardilloetal10nom 2010 low 0.270 -0.196 0.636 1.142 0.254 7.44

Sanford 2010 low 0.182 -0.139 0.468 1.111 0.267 8.21

Bambinietal14without 2014 low 0.600 0.461 0.710 7.000 0.000 8.70

Bambinietal14context 2014 low 0.400 0.270 0.516 5.636 0.000 8.83

CampbellRaney16 2016 high 0.970 0.955 0.980 19.516 0.000 8.65

Cardilloetal17 2017 moderate 0.790 0.534 0.913 4.418 0.000 7.44

0.767 0.605 0.868 6.371 0.000

-1.00 -0.50 0.00 0.50 1.00

Favours A Favours B

Meta Analysis

The summary effect size of 0.767 [0.605; 0.868] means that according to this set of experi-ments, there is a very strong relationship between familiarity and comprehensibility ratings. As for the prediction interval, it is [-0.217; 0.978] – that is, it allows everything from a small reverse to a large effect for a future experiment. The heterogeneity analysis reinforces our im-pression that the experiments do not share a common true effect. Namely, the total amount of the observed between-study variance is very high: Q = 344.861. This value is significantly different from its expected value: df(Q) = 11. The I2 value is 96.81; from this we can conclude that practically the whole amount of the observed variance is real and cannot be ascribed to random error. The standard deviation of the true effect sizes is T = 0.485.

These findings clearly motivate a subgroup analysis. A possibility is shown in Table 20.

group below average effect

size average effect size above average effect size

experiments Cardillo2010pred Cardillo2010nom Sandford2010

Bambini2014withcontext Bambini2014with-outcontext

Marschark1983/1 Katz et al.1988 Cardillo2017

Marschark1983/2 McKay2004 Lai2009

CampbellRaney2016

summary effect size 0.392 [0.193; 0.561] 0.814 [0.694; 0.890] 0.929 [0.894; 0.953]

within group variance 9.524 (p = 0.049) 0.118 (p = 0.943) 28.582 (p < 0.001) between groups

vari-ance

306.637

Table 20. Three groups of experiments on comprehensibility ratings with familiarity as a decisive factor

Only the above average group indicates heterogeneity; this results from the very high precision of the estimates of the true (underlying) effect size by these experiments. The three groups are distinct from each other, as the confidence intervals and the between groups variance indicate.

Duval and Tweedie’s trim and fill model does not indicate missing experiments, nor any sign of publication bias; Egger’s test is non-significant, too (p = 0.32). Figure 24 presents the funnel plot, whose asymmetry might result from the heterogeneity we detected.

Figure 24. Funnel plot for comprehensibility ratings with familiarity as a decisive factor

B) Aptness

Table 21 in Appendix 1 includes the relevant experimental data on the basis of which the effect of aptness on comprehensibility ratings can be determined.

Figure 25 presents the results of the random-effects analysis.

Figure 25. Random-effects model of comprehensibility ratings with aptness as a decisive factor

The most striking feature of this set of experiments is that it contains several experiments which produce a very precise estimation of the effect size, although there are also some experiments which have a quite wide confidence interval. In sum, these experiments provide a very high and very precise summary effect size of 0.789 with a 95% confidence interval as narrow as [0.719; 0.844]. The prediction interval is [0.347; 0.944]. From this we can conclude that a future experiment will yield a moderate to large effect. As for the heterogeneity analysis, the total amount of the observed between-study variance is very high in this case, too: Q = 244.131.

This value is significantly different from its expected value df(Q) = 17. The I2 value is 93.037, signalling that the observed variance is not due to random error but is real, i.e. the experiments do not share a common true effect size. The standard deviation of the true effect sizes is T = 0.312. These findings clearly motivate a subgroup analysis. We might try to divide up the ex-periments in such a way that the three exex-periments by McQuire et al. belong to one group, because they produced an effect size below 0.5, and the other experiments belong to the second group. This grouping is, however, not satisfactory because the second group shows a high amount of heterogeneity. A second attempt might be to classify the experiments into 3 groups.

This grouping fares better, yielding three significantly different groups. See Table 22 for the details.

Study name Subgroup within studyTime point Statistics for each study Correlation and 95% CI

Lower Upper Relative Relative

Correlation limit limit Z-Value p-Value weight weight

SternbergNigro80 low 1980 0.610 0.274 0.813 3.249 0.001 4.67

TourangeauSternberg81 low 1981 0.640 0.276 0.844 3.126 0.002 4.35

Marscharketal832 high 1983 0.870 0.840 0.895 23.090 0.000 6.59

Marscharketal831 moderate 1983 0.820 0.781 0.852 21.046 0.000 6.61

Kusumi87 moderate 1987 0.830 0.755 0.883 11.458 0.000 6.16

Katz88 moderate 1988 0.820 0.652 0.911 6.011 0.000 5.02

Gagné021both moderate 2002 0.810 0.635 0.906 5.856 0.000 5.02

Gagné021met moderate 2002 0.770 0.567 0.885 5.302 0.000 5.02

ChiappeKennedyChiappehigh 2003 0.940 0.882 0.970 9.677 0.000 5.19

McKay04 low 2004 0.590 0.491 0.674 9.511 0.000 6.48

JonesEstes062 low 2006 0.684 0.592 0.759 10.474 0.000 6.41

Utsumi072met moderate 2007 0.816 0.748 0.866 12.827 0.000 6.32

Utsumi072sim moderate 2007 0.740 0.646 0.812 10.226 0.000 6.28

Thibodeauetal16 high 2016 0.883 0.870 0.895 47.923 0.000 6.74

CampbellRaney16 high 2016 0.970 0.955 0.980 19.516 0.000 6.12

McQuireetal171elderly low 2017 0.440 -0.003 0.739 1.947 0.052 4.35

McQuireetal171young low 2017 0.427 -0.019 0.731 1.881 0.060 4.35

McQuireetal171litex low 2017 0.407 -0.043 0.720 1.781 0.075 4.35

0.789 0.719 0.844 12.709 0.000

-1.00 -0.50 0.00 0.50 1.00

Favours A Favours B

Meta Analysis

group

below average effect size average effect size above average effect size

experiments SternbergNigro1980 TourengeauSternberg1981 McKay2004

JonesEstes2006/2 McQuire2016/1young McQuire2016/1litexp McQuire2016/1elderly

Marschark1983/1 Kusumi1987 Katz1988 Gagné2002/1both Gagné2002/1met Utsumi2007/2met Utsumi2007/2sim

Marschark1983/2 ChiappeKennedy&

Chiappe2003 CampbellRaney2016 Thibodeau2016

summary ef-fect size

0.580 [0.449; 0.687] 0.803 [0.742; 0.852] 0.920 [0.886; 0.944]

within group variance

6.132 (p = 0.409) 4.568 (p = 0.600) 45.832 (p < 0.001) between

groups vari-ance

56.533 (p < 0.001)

Table 22. Three groups of experiments on comprehensibility ratings with aptness as a decisive factor

Finally, we can look for publication bias. Duval and Tweedie’s trim and fill model indicates 5 missing studies. See Figure 26.

Figure 26. Funnel plot for comprehensibility ratings with aptness as a decisive factor

As in all previous cases, Egger’s test is not significant, p = 0.09. A cumulative analysis does not provide support for our suspicion that there is publication bias, either, because there is no clear tendency among the cumulative effect sizes. See Figure 27.

Figure 27. Cumulative meta-analysis for comprehensibility ratings with aptness as a decisive factor

The ambiguity in the tests might result from the circumstance that the heterogeneity was high, which restrains the evaluation of the case.

C) Conventionality

Table 23 in Appendix 1 presents data from experiments investigating the role of conventional-ity on comprehensibilconventional-ity.

Similarly to our decision in Section 11.3.3A in relation to familiarity, Thibodeau et al.

(2016, 2018) was excluded from the analyses. As Figure 28 shows, the results of the experi-ments are in harmony.

Figure 28. Random-effects model of comprehensibility ratings with conventionality as a decisive factor

The summary effect size of 0.36 with a 95% confidence interval of [0.281; 0.435] indicates a moderately strong relationship between base/vehicle conventionality and comprehensibility ratings with a prediction interval of [0.179; 0.517]. The total amount of the observed between-study variance is very low in this case, Q = 1.905, p = 0.592, I2 = 0, which means that it is completely due to random error.

Outline

KAPCSOLÓDÓ DOKUMENTUMOK