• Nem Talált Eredményt

PART II. THE TREATMENT OF INCONSISTENCIES RELATED TO EXPERIMENTS IN

11. I NCONSISTENCY RESOLUTION AND STATISTICAL META - ANALYSIS IN RELATION TO EXPERIMENTS IN

11.3. Case study 5, Part 2: Meta-analysis as a tool of inconsistency resolution

11.3.2. Comprehension latencies

Chronologically, it was familiarity whose impact on comprehension latencies was first checked with the help of experiments. Table 14 in Appendix 1 summarises the most important charac-teristics of 14 related experiments.

Study name Time point Statistics for each study Correlation and 95% CI

Lower Upper Relative Relative

Correlation limit limit Z-Value p-Value weight weight

ChiappeKennedy012 2001 0.570 0.104 0.831 2.335 0.020 9.62

UtsumiKuwabara0512 2005 0.470 0.132 0.710 2.650 0.008 15.52

Utsumi071 2007 0.450 0.240 0.620 3.960 0.000 23.46

Roncero13 2013 0.470 0.305 0.608 5.126 0.000 26.60

Dulcinati14 2014 0.100 -0.120 0.310 0.892 0.373 24.80

0.393 0.215 0.546 4.136 0.000

-1.00 -0.50 0.00 0.50 1.00

Meta Analysis

As Figure 18 reveals, the great majority of the confidence intervals overlap, suggesting that there should be no heterogeneity in the results.

Figure 18. Random-effects model of comprehension latencies with familiarity as a decisive factor

The experiments together produce an effect size of -0.314 very precisely, since the 95% confi-dence interval is as narrow as [-0.388; -0.237]. The prediction interval at [-0.494; -0.109] is quite narrow, too, indicating that a future experiment should yield a small to moderate reverse effect of the metaphorical frame on the comprehension latencies. The Q-statistic reinforces our impression that the results of the experiments are in harmony with each other, since its value of 18.733 is not significantly different from the expected 3, p = 0.132. The standard deviation of the true effect sizes, T = 0.087, is low. An I2 value of 30.604 indicates that only 30% of the observed variance is real, and 70% is due to random error. From these data a very important conclusion can be drawn. Namely, the experiments above seemed to constitute diverging evi-dence in the sense that 5 of the 14 studies produced insignificant results, while 9 produced significant ones. In the absence of heterogeneity, however, this conflict can be resolved: the outcome of the experiments can be interpreted as an instance of converging evidence for the summary effect size, i.e., a small-moderate effect.

According to Duval and Tweedie’s trim and fill model, three small experiments are miss-ing from the left side. See Figure 19.

Figure 19. Funnel plot for comprehension latencies with familiarity as a decisive factor

Study name Time point Statistics for each study Correlation and 95% CI

Lower Upper Relative Relative

Correlation limit limit Z-Value p-Value weight weight

BlaskoConnine931 1993 -0.243 -0.380 -0.096 -3.209 0.001 13.56

BlaskoConnine932 1993 -0.352 -0.531 -0.143 -3.220 0.001 8.93

Arzouanetal071 2007 -0.380 -0.565 -0.159 -3.269 0.001 8.17

Arzouanetal073 2007 -0.638 -0.791 -0.410 -4.631 0.000 5.40

Laietal09 2009 -0.543 -0.689 -0.355 -5.021 0.000 8.27

Sanford10 2010 -0.284 -0.507 -0.026 -2.153 0.031 7.07

ThibodeauDurgin11 2011 -0.249 -0.454 -0.018 -2.113 0.035 8.34

CailliesDeclercq111a 2011 -0.344 -0.578 -0.058 -2.335 0.020 5.92

CailliesDeclercq111b 2011 -0.175 -0.449 0.130 -1.125 0.261 5.72

CailliesDeclercq111c 2011 -0.225 -0.489 0.077 -1.466 0.143 5.77

CailliesDeclercq111d 2011 -0.099 -0.401 0.223 -0.598 0.550 5.23

Gioraetal121 2012 -0.306 -0.516 -0.060 -2.420 0.016 7.49

Gioraetal122 2012 -0.209 -0.439 0.047 -1.603 0.109 7.35

Cardilloetal17 2017 -0.150 -0.556 0.313 -0.623 0.533 2.78

-0.314 -0.388 -0.237 -7.576 0.000

-1.00 -0.50 0.00 0.50 1.00

Meta Analysis

Nonetheless, the adjusted values are similar to the observed values. Egger’s test is not signifi-cant, p = 0.934, suggesting that there is no bias. Since the power of this test is weak, a cumu-lative meta-analysis seems to be advisable. Figure 20 also shows that there is no sign of any publication bias.

Figure 20.Cumulative analysis for comprehension latencies with familiarity as a decisive factor

The upshot of the tests presented is that there is no publication bias.

B) Aptness

The second factor which had been regarded as relevant by some researchers is aptness. Consult Table 15 in Appendix 1 for the details.

The confidence intervals, as shown in Figure 21, overlap only slightly.

Figure 21. Random-effects model of comprehension latencies with aptness as a decisive factor

The summary effect size lies at -0.269 [-0.408; -0.117], indicating a small reverse effect. The prediction interval of [-0.662; 0.24] is indeterminate insofar as it allows a remarkable reverse effect but also a small effect. Since the confidence intervals do not overlap in each case, there should be some amount of heterogeneity. In fact, the Q-statistic (32.961) is significantly dif-ferent from its expected value, df(Q) = 8. The I2 value of 75.729 indicates that almost 75% of the observed variance is real, and only 25% is due to random error. The standard deviation of the true effect sizes is T = 0.201. This suggests that the experiments do not share a common true effect size and motivates a subgroup analysis. If we create three groups on the basis of the

Study name Time point Cumulative statistics Cumulative correlation (95% CI) Lower Upper

Point limit limit Z-Value p-Value BlaskoConnine931 1993 -0.243 -0.380 -0.096 -3.209 0.001 ThibodeauDurgin11 2011 -0.245 -0.361 -0.122 -3.842 0.000 BlaskoConnine932 1993 -0.272 -0.371 -0.167 -4.932 0.000 Arzouanetal073 2007 -0.357 -0.509 -0.184 -3.910 0.000 Laietal09 2009 -0.399 -0.539 -0.238 -4.597 0.000 Gioraetal121 2012 -0.382 -0.503 -0.246 -5.227 0.000 Gioraetal122 2012 -0.359 -0.471 -0.236 -5.436 0.000 Sanford10 2010 -0.349 -0.450 -0.240 -5.962 0.000 CailliesDeclercq111a 2011 -0.347 -0.439 -0.249 -6.561 0.000 CailliesDeclercq111c 2011 -0.337 -0.423 -0.245 -6.840 0.000 CailliesDeclercq111b 2011 -0.326 -0.409 -0.238 -6.933 0.000 Cardilloetal17 2017 -0.320 -0.401 -0.235 -7.058 0.000 CailliesDeclercq111d 2011 -0.308 -0.388 -0.224 -6.888 0.000 Arzouanetal071 2007 -0.314 -0.388 -0.237 -7.576 0.000 -0.314 -0.388 -0.237 -7.576 0.000

-1.00 -0.50 0.00 0.50 1.00

Favours A Favours B

Meta Analysis

Study name Time point Statistics for each study Correlation and 95% CI

Lower Upper Relative Relative

Correlation limit limit Z-Value p-Value weight weight

BlaskoConnine933 1993 -0.267 -0.455 -0.056 -2.461 0.014 11.99

Brisandetal011 2001 -0.030 -0.206 0.148 -0.329 0.742 12.95

Brisandetal012 2001 -0.125 -0.295 0.053 -1.380 0.168 12.97

Gagné021both 2002 -0.460 -0.704 -0.120 -2.584 0.010 8.24

Gagné021metaphors 2002 -0.500 -0.729 -0.170 -2.854 0.004 8.24

ChiappeKennedyChiappe 2003 -0.550 -0.749 -0.260 -3.443 0.001 8.77

JonesEstes062 2006 -0.487 -0.605 -0.349 -6.215 0.000 13.21

UtsumiSakamoto10 2010 -0.061 -0.278 0.162 -0.535 0.593 11.82

UtsumiSakamoto112 2011 -0.038 -0.257 0.184 -0.333 0.739 11.82

-0.269 -0.408 -0.117 -3.419 0.001

-1.00 -0.50 0.00 0.50 1.00

Meta Analysis

effect sizes in such a way that the first group consists of experiments with an effect size be-tween -0.2 and -0.3, the second group of experiments with effect sizes close to -0.5, and a third group of experiments with effect sizes close to 0, then we obtain three homogenous groups.

See Table 16 for an overview.

group below average effect

size average effect size above average effect size experiments Brisand2001/1

-0.067 [-0.164; 0.032] -0.267 [-0.455; -0.056] -0.495 [-0.588; -0.389]

within group

Table 16. Three groups of experiments on comprehension latencies with aptness as a decisive factor

As with the previous cases, neither the experimental design, nor the formulation of the task nor the range of metaphors seems to influence the effect sizes.

As for publication bias, for the application of Duval and Tweedie’s trim and fill model we would need at least one further experiment. The low power of Egger’s test makes its application questionable in this case, too.

C) Conventionality

Table 17 in Appendix 1 summarises the most important data pertaining to the relevant experi-ments. Figure 22 presents how statistical meta-analysis makes the comparison and combination of these results possible.

Figure 22. Random-effects model of comprehension latencies with conventionality as a decisive factor

The summary effect size is -0.184 with a 95% confidence interval of [-0.345; -0.013], indicat-ing a small reverse effect of conventionality on comprehension times. The prediction interval is much wider at [-0.653; 0.387]. All effect sizes are below the null-value, but they can easily be divided into two groups: the experiments conducted by Bowdle and Gentner produced a correlation coefficient close to -0.4, while the other three experiments show an effect size only

Study name Time point Statistics for each study Correlation and 95% CI

Lower Upper Relative Relative

Correlation limit limit Z-Value p-Value weight weight

BowdleGentner052fig 2005 -0.375 -0.559 -0.157 -3.276 0.001 19.04

BowdleGentner055met 2005 -0.408 -0.583 -0.196 -3.619 0.000 19.13

JonesEstes062 2006 -0.038 -0.214 0.139 -0.422 0.673 22.42

UtsumiSakamoto10 2010 -0.035 -0.254 0.187 -0.309 0.757 19.70

UtsumiSakamoto112 2011 -0.064 -0.281 0.159 -0.559 0.576 19.71

-0.184 -0.345 -0.013 -2.106 0.035

-1.00 -0.50 0.00 0.50 1.00

Meta Analysis

slightly below 0. The total amount of the observed between-study variance, i.e. the Q-statistic (12.621), is significantly different from its expected value of 4, p = 0.013. The I2 value of 68.307 indicates that about 68% of the observed variance is real, and only a third is due to random error. The standard deviation of the true effect sizes is T = 0.162. In this case, a sub-group analysis with the authors as a variable seems to be a quite natural choice. See Table 18.

group below average above average

experiments JonesEstes2006/2 UtsumiSakamoto2010 UtsumiSakamoto2011/2

BowdleGentner2005/2figuratives BowdleGentner2005/2metaphors summary effect size -0.045 [-0.162; 0.074] -0.392 [-0.523; -0.243]

within group variance 0.040 (p = 0.980) 0.051 (p = 0)

between groups variance 12.530

Table 18.Two groups of experiments on comprehension latencies with conventionality as a decisive factor

As regards publication bias, we have a too small number of experiments at our disposal to check this.

11.3.3. Comprehensibility ratings

Outline

KAPCSOLÓDÓ DOKUMENTUMOK