• Nem Talált Eredményt

PART II. THE TREATMENT OF INCONSISTENCIES RELATED TO EXPERIMENTS IN

11. I NCONSISTENCY RESOLUTION AND STATISTICAL META - ANALYSIS IN RELATION TO EXPERIMENTS IN

11.3. Case study 5, Part 2: Meta-analysis as a tool of inconsistency resolution

11.3.4. Comprehensive analyses

In the interim summaries, we compared the effect of the factors of conventionality, familiarity and aptness on the performance of participants in three distinct rounds, that is, by applying different task types. This perspective can be widened in three directions. First, we can try to generalise these results by co-analysing the outcome of the three rounds conducted in Sections

Subgroup within study Study name Time point Cumulative statistics Cumulative correlation (95% CI) Lower Upper

Point limit limit Z-Value p-Value

high Thibodeauetal16 2016 0.883 0.870 0.895 47.923 0.000

high Marscharketal832 1983 0.880 0.869 0.891 53.188 0.000

moderate Marscharketal831 1983 0.861 0.821 0.893 18.255 0.000

low McKay04 2004 0.816 0.702 0.889 8.252 0.000

moderate Kusumi87 1987 0.819 0.725 0.882 9.668 0.000

high CampbellRaney16 2016 0.862 0.778 0.915 9.802 0.000

low JonesEstes062 2006 0.844 0.756 0.902 9.743 0.000

moderate Utsumi072met 2007 0.840 0.761 0.895 10.733 0.000

moderate Utsumi072sim 2007 0.832 0.755 0.886 11.203 0.000

high ChiappeKennedyChiappe 2003 0.845 0.776 0.894 12.048 0.000

moderate Katz88 1988 0.843 0.778 0.890 12.569 0.000

moderate Gagné021both 2002 0.841 0.778 0.887 13.054 0.000

moderate Gagné021met 2002 0.837 0.776 0.883 13.426 0.000

low SternbergNigro80 1980 0.829 0.766 0.876 13.412 0.000

low TourangeauSternberg81 1981 0.822 0.759 0.870 13.477 0.000

low McQuireetal171elderly 2017 0.811 0.746 0.861 13.218 0.000

low McQuireetal171young 2017 0.800 0.732 0.853 12.965 0.000

low McQuireetal171litex 2017 0.789 0.719 0.844 12.709 0.000

0.789 0.719 0.844 12.709 0.000

-1.00 -0.50 0.00 0.50 1.00

Favours A Favours B

Meta Analysis

Study name Time point Statistics for each study Correlation and 95% CI

Lower Upper Relative Relative

Correlation limit limit Z-Value p-Value weight weight

McKay04 2004 0.420 0.299 0.528 6.284 0.000 40.39

Utsumi072met 2007 0.341 0.146 0.510 3.351 0.001 18.28

Utsumi072sim 2007 0.337 0.142 0.507 3.309 0.001 18.26

Gokcesu0923 2009 0.284 0.107 0.444 3.097 0.002 23.07

0.360 0.281 0.435 8.327 0.000

-1.00 -0.50 0.00 0.50 1.00

Favours A Favours B

Meta Analysis

11.3.1-3 and asking whether a general pattern emerges in the relationship of the three factors with each other.

conv. fam. aptness average effect size average I2 average T grammatical form

preference 0.273 0.393 0.551 0.406 77.338 0.228

comprehension

latencies -0.184 -0.314 -0.269 -0.256 58.213 0.150

comprehensibility

ratings 0.36 0.767 0.789 0.639 63.282 0.266

average of the ab-solute values of

the effect sizes

0.272 0.491 0.536 average I2 51.986 62.361 84.486 average T 0.142 0.246 0.256

Table 24. Comparison of the effect sizes, I2 and T values related to aptness/conventionality/familiarity in the three groups of experiments

As a check of the columns in Table 24 indicates, the results of the three types of experiments are in harmony and thus provide converging evidence for the hypothesis that all three factors influence metaphor processing, while conventionality has a weaker effect than aptness and familiarity. This evidence is considerably stronger than any evidence gained from individual experiments. As the last two rows show, conventionality produced the most consistent results, because the amount and impact of the dispersion in the true effects is the smallest. In contrast, aptness showed a high proportion and large amount of real variance.

The second possible line of analysis is to investigate and compare the data related to the three tasks, i.e., a horizontal analysis of Table 24. Experiments dealing with comprehension times clearly produced the smallest summary effect sizes (in absolute value) and the most con-sistent results (the smallest amount of real variance and smallest standard deviation of the true effects).

A third promising route could be a deeper and more comprehensive analysis of Tables 9, 11, 13, 16, 18, 20, and 22, that is, separate comparisons of the three factors’ behaviour in the three experiment types. For example, we can check whether there is a connection between the researchers and the effect size of the experiments in relation to one particular factor. This was not possible for the analyses we conducted in the subsections 11.3.1-3, because there was not enough data at our disposal. In the tables mentioned above, on the basis of our meta-analyses, we could group the experiments related to an experiment type-relevant factor pair into two or three groups on the basis of their effect sizes. If we assign 1 to experiments belonging to the below average effect size groups, 3 to the experiments in the average effect size groups, as well as to cases when there was no heterogeneity among the experiments, 5 to the above average effect sizes, and 2 to the below average and 4 to the above average experiments in cases in which there were only two groups, and add up the values obtained in the three experiment types for each factor separately, we get the results in Table 25 (see Appendix 1).

As these analyses reveal, there are only a few researchers who have conducted experiments in relation to all three factors:

– Chiappe and Roncero both belong to the below average effect size group with convention-ality, and to the above average group with aptness and familiarity.

– Dulcinati is in the average group with conventionality and aptness, and in the below aver-age group with familiarity. This is the most balanced performance.

– McKay was average with conventionality, below average with aptness, and above average with familiarity. This is the most unbalanced performance.

– Utsumi was below average with conventionality and aptness, and above average with fa-miliarity.

As for the two rival theories’ point of view regarding the crucial contrast of conventionality vs.

aptness, there are three types:

– conventionality in a higher group than aptness: Bowdle & Gentner, McKay;

– aptness in a higher group than conventionality: Chiappe, Roncero;

– conventionality and aptness in the same/similar group: Dulcinati, Jones & Estes, Utsumi.

11.4.4. Interim summary

To sum up, our main results can be summarised as follows:

– With the help of random-effects models, we combined the results of a series of experi-ments conducted over the past few decades pertaining to the impact of conventionality, familiarity and aptness on metaphor processing in three types of experiments. These anal-yses yielded considerably more reliable and accurate estimates of the impact of the factors mentioned than single experiments do, because the calculation of the summary effect sizes synthesised the whole range of the available information. Additional analyses also pro-vided information about the precision of these estimates (confidence intervals) and their dispersion (prediction intervals).

Caveats: The summary effect size is an estimate of the true effect size on the basis of a relatively small set of experiments; in some cases, the number of experiments was very low. Adding further experiments (to be conducted in the future or conducted in the past but unavailable to me and thus not included in these analyses) could modify the results. A further concern is the high amount of heterogeneity we found in the majority of the cases.

These problems decrease the reliability of the outcome of our meta-analyses, because they reduce their ability to counter-balance the shortcomings of the individual experiments.

By performing heterogeneity analyses, we were able to decide whether the results of a set of experiments are consistent or there might be subgroups among them. Our attempts at identifying factors which could distinguish these subgroups, however, were unsuccessful.

Therefore, we chose another route: we divided the experiments on the basis of their effect sizes into 2 or 3 subgroups, and then checked with the help of subgroup analysis whether these groups are in fact different from each other.

Caveats: The small number of experiments as well as the lack of exact replications of the experiments (and via this, the low reliability of the individual experiments) make the groupings questionable, because it is not clear whether the subgroups we identified are stable constructs.

A simple comparison of the subgroups in relation to the experimental design, the formu-lation of the task in the control experiments, or the range of metaphors in the stimulus materials pointed to the conclusion that a subgroup analysis based on these factors is point-less. Therefore, it seems that they did not influence the outcome of the experiments.

Caveats: The concerns we mentioned in relation to the previous point emerge in this case, too.

– In some cases, we found that there is no heterogeneity among the experiments at issue.

This makes it possible to resolve the alleged conflict between significant and insignificant results, too. Namely, the relative closeness of the effect sizes and the overlap of their con-fidence intervals motivate the re-interpretation of the outcome of the experiments as an instance of converging evidence for the summary effect size.

Caveats: The concerns we mentioned in relation to the previous two points emerge in this case, too.

– Further and deeper theoretical and empirical research should be done in relation to the predictions which can be drawn from theories, as well as the definition and operationali-zation of the concepts ‘conventionality’, ‘familiarity’ and ‘aptness’. The stimulus materi-als used in experiments should be revised, too, alongside the experimental designs in order to rule out, for example, boredom effects arising from the huge number of very similar tasks to be performed by participants, or to prevent participants’ naïve theories of metaphor or other conscious considerations from influencing their performance and distorting the results. Our investigations underline several researchers’ concerns that while testing one of the factors, the other two (or even further ones) should be carefully controlled for. The grouping of the experiments we presented in Section 11.3.4 offers an especially promising starting point for a thorough comparative analysis of the experiments conducted so far.

– Our findings prompt the suggestion that statistical meta-analysis should be part of a thor-ough and radical revision of the methodology in this research field.

12. Conclusions: Inconsistency resolution with the help of cyclic re-evaluation and

Outline

KAPCSOLÓDÓ DOKUMENTUMOK