• Nem Talált Eredményt

Kruskal-Wallis Test

In document IBM SPSS Exact Tests (Pldal 141-145)

The Kruskal-Wallis test (Siegel and Castellan, 1988) is a very popular nonparametric test for comparing K independent samples. When , it specializes to the Mann-Whitney test. The Kruskal-Wallis test has good power against shift alternatives.

Specifically, you assume, as in Hollander and Wolfe (1973), that the one-way layout, u, shown in Table 8.2, was generated by the model

WBC≤7 WBC>7

Figure 8.4 Hematologic toxicity data grouped into a 2 x K contingency table for the median test

Figure 8.5 Pearson’s chi-square results for hematologic toxicity data, divided by the median

4.3171 4 .365 .429

28 Pearson

Chi-Square N of Valid Cases

Value df

9 cells (90.0%) have expected count less than 5. The minimum expected count is 1.71.

1.

K = 2

for all and . In this model, is the overall mean, is the treatment effect, and the ’s are identically distributed unobservable error terms from an unknown distribution with a mean of 0. All parameters are unknown, but for identi-fiability, you can assume that

Equation 8.30

The null hypothesis of no treatment effect can be formally stated as

Equation 8.31 The Kruskal-Wallis test has good power against the alternative hypothesis

Equation 8.32 Notice that this alternative hypothesis does not specify any ordering of the treatments in terms of increases in response levels. The alternative to the null hypothesis is simply that the treatments are different, not that one specific treatment elicits greater response than another. If there were a natural ordering of treatments under the alternative hypothesis—

if, that is, you could state a priori that the ’s are ordered under the alternative hypoth-esis—a more powerful test would be the Jonckheere-Terpstra test (Hollander and Wolfe, 1973), discussed on p. 135.

To define the Kruskal-Wallis test statistic, the first step is to convert the one-way layout, u, of raw data, as shown in Table 8.2, into a corresponding one-way layout of scores, w, as shown in Table 8.3. The scores, , for the Kruskal-Wallis test are the ranks of the obser-vations in the pooled sample of size N. If there were no ties, the set of values in Table 8.3 would simply be some permutation of the first N integers. However, to allow for the possibility that some observations might be tied, you can assign the mid-rank of a set of tied observations to each of them. The easiest way to explain how the mid-ranks are computed is by considering a numerical example. Suppose that are all tied at the same numerical value, say 55. Assume that these four observations would occupy positions 15, 16, 17, and 18, if all the N observations were pooled and then sorted in ascending order. In this case, you would assign the mid-rank to these four tied

observations. Thus, .

More generally, let denote the pooled sample of all of the N observations sorted in ascending order. To allow for the possibility of ties, let there be g distinct observations among the sorted ’s, with distinct observations being equal to the smallest value, distinct observations being equal to the second smallest value, distinct observations being equal to the third smallest value, and so on, until, finally, distinct observations are equal to the largest value. It is now possible to define the

i = 1 2, ,…nj j = 1 2, ,…K μ τ

K-Sample Inference: Independent Samples 133

mid-ranks precisely. For , the distinct mid-rank assumed by all of the observations tied in the lth smallest position is

In this way, the original one-way layout of raw data is converted into a corresponding one-way layout of mid-ranks.

Next, for any treatment j, where , define the rank-sum as

Equation 8.33

The Kruskal-Wallis test statistic, , for any , can now be defined as

Equation 8.34

where is a tie correction factor given by

Equation 8.35

The Kruskal-Wallis test is also defined in Chapter 11, using the notation developed for analyzing contingency tables. The two definitions are equivalent. Since the test is applicable to both continuous and categorical data, the test statistic is defined twice, once in the context of a one-way layout and once in the context of a contingency table.

Let t denote the value of T actually observed from the data. The exact, Monte Carlo, and asymptotic p values based on the Kruskal-Wallis statistic can be obtained as discussed in “P Value Calculations” on p. 123. The exact two-sided p value is computed as shown in Equation 8.7. The Monte Carlo two-sided p value is computed as in Equation 8.11, and the asymptotic two-sided p value is computed as shown in Equation 8.16. One-sided p values are not defined for tests against unordered alternatives like the Kruskal-Wallis test.

Example: Hematologic Toxicity Data, Revisited

l = 1 2, ,…g e1

have significantly different response distributions. This time, however, the test statistic actually takes advantage of the relative rankings of the different observations instead of simply using the information that an observation is either above or below the pooled median. Thus, you can expect the Kruskal-Wallis test to be more powerful than the median test. Although it is too difficult to obtain the exact p value for this data set, you can obtain an extremely accurate Monte Carlo estimate of the exact p value based on a Monte Carlo sample of size 10,000. The results are shown in Figure 8.6.

As expected, the greater power of the Kruskal-Wallis test leads to a smaller p value than obtained with the median test. There is, however, a difference between the asymptotic inference and the exact inference computed by the Monte Carlo estimate. The Monte Carlo estimate of the exact p value is 0.038 and shows that the exact p value is guaranteed to lie in the range with 99% confidence. Thus, the null hypothesis can be rejected at the 5% significance level. The asymptotic inference, in contrast, was unable to estimate the true p value with this degree of accuracy. It generated a p value of 0.052, which is not significant at the 5% level.

Figure 8.6 Monte Carlo results of Kruskal-Wallis test for hematologic toxicity data

4 11.88

Grouping Variable: Drug Regimen 2.

Based on 10000 sampled tables with starting seed 2000000.

3.

1. Kruskal-Wallis Test

2. Grouping Variable: Drug Regimen

3. Based on 1000 sampled tables with starting seed 2000000.

Test Statistics1,2

0.033 0.043

( , )

K-Sample Inference: Independent Samples 135

In document IBM SPSS Exact Tests (Pldal 141-145)