• Nem Talált Eredményt

Jonckheere-Terpstra Test

In document IBM SPSS Exact Tests (Pldal 145-151)

The Jonckheere-Terpstra test (Hollander and Wolfe, 1973) is more powerful than the Kruskal-Wallis test for comparing K samples against ordered alternatives. Once again, assume that the one-way layout shown in Table 8.2 was generated by the model Equation 8.29. The null hypothesis of no treatment effect is again given by Equation 8.31. This time, however, suppose that the alternative hypothesis is ordered. Specifically, the one-sided alternative might be of the form

Equation 8.36 implying that as you increase the index j, identifying the treatment, the distribution of responses shifts to the right. Or else, the one-sided alternative might be of the form

Equation 8.37 implying that as you increase the index j, identifying the treatment, the distribution shifts to the left. The two-sided alternative would state that either or is true, without specifying which.

To define the Jonchkeere-Terpstra statistic, the first step, as usual, is to replace the original observations with scores. Here, however, let the score, , be exactly the same as the actual observation, . Then and W, as defined by Equation 8.3, is the set of all possible permutations of the one-way layout of actually observed raw data. Now, for any , you compute Mann-Whitney counts (see, for example,

Lehmann, 1976,), , as follows. For any ,

is the count of the number of pairs, , which are such that plus half the number of pairs, which are such that . The Jonckheere-Terpstra test statistic, , is defined as follows:

Equation 8.38

The mean of the Jonckheere-Terpstra statistic is

Equation 8.39

being equal to the second smallest value, distinct observations being equal to the third smallest value, and so on, until, finally, distinct observations are equal to the largest value. The variance of the Jonckheere-Terpstra statistic is

Now, let be the observed value of T. The exact, Monte Carlo, and asymptotic p values based on the Jonckheere-Terpstra statistic can be obtained as discussed in “P Value Calculations” on p. 123. The exact one- and two-sided p values are computed as in Equation 8.8 and Equation 8.9, respectively. The Monte Carlo two-sided p value is computed as in Equation 8.11, with an obvious modification to reflect the fact that you want to estimate the probability inside the region instead of the region . The Monte Carlo one-sided p value can be similarly defined. The asymptotic distribution of T is normal, with mean of and variance . The asymptotic one- and two-sided p values are obtained by Equation 8.17 and Equation 8.18, respectively.

Example: Space-Shuttle O-Ring Incidents Data

Professor Richard Feynman, in his delightful book What Do You Care What Other People Think? (1988), recounted at great length his experiences as a member of the presidential commission formed to determine the cause of the explosion of the space shuttle Challenger in 1986. He suspected that the low temperature at takeoff caused the O-rings to fail. In his book, he has published the data on temperature versus the number of O-ring incidents, for 24 previous space shuttle flights. These data are shown in Figure 8.7. There are two variables in the data—incident indicates the number of O-ring incidents, and is either none, one, two, or three; temp indicates the temperature in Fahrenheit.

K-Sample Inference: Independent Samples 137

The null hypothesis is that the temperatures in the four samples (0, 1, 2, or 3 O-ring incidents) have come from the same underlying population distribution. The one-sided alternative hypothesis is that populations with a higher number of O-ring incidents have their temperature distributions shifted to the right of populations with a lower number of O-ring incidents. The Jonckheere-Terpstra test is superior to the Kruskal-Wallis test for this data set because the populations have a natural ordering under the alternative hypothesis. The results of the Jonckheere-Terpstra test for these data are shown in Figure 8.8.

Figure 8.7 Space-shuttle O-ring incidents and temperature at launch

Figure 8.8 Jonckheere-Terpstra test results for O-ring incidents data

The Jonckheere-Terpstra test statistic is displayed in its standardized form

Equation 8.40

whose observed value is

Equation 8.41

The output shows that , , and . Therefore,

. The exact one-sided p value is

Equation 8.42 The exact two-sided p value is

Equation 8.43 These definitions are completely equivalent to those given by Equation 8.8 and Equation 8.9, respectively. Asymptotic and Monte Carlo one- and two-sided p values can be sim-ilarly defined in terms of the standardized test statistic. Note that is asymptotically normal with zero mean and unit variance.

The exact one-sided p value of 0.012 reveals that there is indeed a statistically signif-icant correlation between temperature and number of O-ring incidents. The sign of the standardized test statistic, , is negative, thus implying that higher launch temperatures are associated with fewer O-ring incidents. The two-sided p value would be used if you had no a priori reason to believe that the number of O-ring incidents is negatively correlated with takeoff temperature. Here the exact two-sided p value, 0.024, is also statistically significant.

4 24 29.500 65.000 15.902 -2.232 .026 .024 .012 .001

Temperature

Grouping Variable: O-Ring Incidents 1.

Introduction to Tests on R x C Contingency Tables

This chapter discusses hypothesis tests on data that are cross-classified into contingency tables with r rows and c columns. The cross-classification is based on categorical variables that may be either nominal or ordered. Nominal categorical variables take on distinct values that cannot be positioned in any natural order. An example of a nominal variable is color (for example, red, green, or blue). In some statistical packages, nominal variables are also referred to as class variables, or unordered variables. Ordered categorical variables take on distinct values that can be ordered in a natural way. An example of an ordered categorical variable is drug dose (for example, low, medium, or high). Ordered categorical variables can assume numerical values as well (for example, the drug dose might be categorized into 100 mg/m2, 200 mg/m2, and 300 mg/m2). When the number of distinct numerical values assumed by the ordered variable is very large (for example, the weights of individuals in a population), it is more convenient to regard the variable as continuous (possibly with ties) rather than categorical. There is considerable overlap between the statistical methods used to analyze continuous data and those used to analyze ordered categorical data. Indeed, many of the same statistical tests are applicable to both situations. However, the probabilistic behavior of an ordered categorical variable is captured by a different mathematical model than that of a continuous variable. For this reason, continuous variables are discussed separately in Part 1.

This chapter summarizes the statistical theory underlying the exact, Monte Carlo, and asymptotic p value computations for all the tests in Chapter 10, Chapter 11, and Chapter 12. Chapter 10 discusses tests for contingency tables in which the row and column classifications are both nominal. These are referred to as unordered con-tingency tables. Chapter 11 discusses tests for contingency tables in which the column classifications are based on ordered categorical variables. These are referred to as singly ordered contingency tables. Chapter 12 discusses tests for tables in which both the row and column classifications are based on ordered categorical vari-ables. These are referred to as doubly ordered contingency tvari-ables.

Table 9.1 shows an observed contingency table in which is the count of the number of observations falling into row category i and column category j.

r c× r c×

r c×

r c× xij

9

The main objective is to test whether the observed contingency table is consistent with the null hypothesis of independence of row and column classifications. Exact Tests computes both exact and asymptotic p values for many different tests of this hypothesis against various alternative hypotheses. These tests are grouped in a logical manner and are presented in the next three chapters, which discuss unordered, singly ordered, and doubly ordered contingency tables, respectively. Despite these differences, there is a unified underlying framework for performing the hypothesis tests in all three situations.

This unifying framework is discussed below in terms of p value computations.

The p value of the observed contingency table is used to test the null hypothesis of no row-by-column interaction. Exact Tests provides three categories of p values for each test. The “gold standard” is the exact p value. When it can be computed, the exact p value is recommended. Sometimes, however, a data set is too large for the exact p value computations to be feasible. In this case, the Monte Carlo technique, which is easier to compute, is recommended. The Monte Carlo p value is an extremely close approximation to the exact p value and is accompanied by a fairly narrow confidence interval within which the exact p value is guaranteed to lie (at the specified confidence level). Moreover, by increasing the number of Monte Carlo samples, you can make the width of this confidence interval arbitrarily small. Finally, the exact p value is always recommended. For large, well-balanced data sets, the asymptotic p value is not too different from its exact counterpart, but, obviously, you can’t know this for the specific data set on hand without also having the exact or Monte Carlo p value available for comparison. In this section, all three p values will be defined. First, you will see how the exact p value is computed. Then, the Monte Carlo and asymptotic p values will be discussed as convenient approximations to the exact p value computation.

To compute the exact p value of the observed contingency table, it is necessary to:

1. Define a reference set of tables in which each table has a known probability under the null hypothesis of no row-by-column interaction.

2. Order all the tables in the reference set according to a discrepancy measure (or test statistic) that quantifies the extent to which each table deviates from the null hypothesis.

3. Sum the probabilities of all tables in the reference set that are at least as discrepant as the observed table.

Table 9.1 Observed r x c contingency table

Rows Col_1 Col_2 Col_c Row_Total

Row_1

Introduction to Tests on R x C Contingency Tables 141

In document IBM SPSS Exact Tests (Pldal 145-151)