• Nem Talált Eredményt

Non-Parametric Tests for a Single Sample (Tests for Normality, Goodness-of-fit tests,

In document Biometry (Pldal 68-71)

4. HYPOTHESIS TESTING

4.2. Non-Parametric Tests for a Single Sample (Tests for Normality, Goodness-of-fit tests,

First, the tests concerning the distribution of a population will be introduced, as normality is an important assumption for the estimation and the hypothesis testing of the population parameters (µ, P, σ) and their differences. In Chapter 3.7. some methods of assessing normality were presented. In this section statistical tests for normality will be discussed.

To check for normality the first presented tests are the z test approximations for skewness and kurtosis. The null hypotheses state, that the distribution is normal. The sampling distribution of the test statistic is Student’s t distribution with degrees of freedom n – 1, if the sample values are used to estimate – for a large sample size the sampling distribution approaches the standard normal distribution. If both of the absolute values of the test statistic are less, than 2.0 (the approximate critical value of a standard normal, or Student’s t variable), the data are assumed reasonably normal. The tests are sensitive to outliers.

Table 8 The null hypothesis and the test statistic for the tests of skewness and kurtosis

Skewness Kurtosis

Null hypothesis H0: = 0 H0: = 0

Test statistic = =

The tests for skewness and kurtosis can be combined. The Jarque-Bera test defined as

= 6 +1 4

is χ2 distributed with ν = 2 under the assumption that H0 is correct, that is, the data set is normally distributed. This test can be used for only large sample sizes.

EXAMPLE 4.1

Using the results of EXAMPLE 3.9 test the normality of the average weight of Pinot Noir grape clusters by the z test approximations for skewness and kurtosis.

= = .. = 1.533 and = = .. = −1.473

so there is not enough evidence to conclude that the null hypothesis is false.

69

There are other tests to check, that the data fit a normal distribution. Tests called goodness-of-fit tests are applicable to answer the question, whether a sample data set is likely to come from a population with a specified F0 distribution, or, in other words, whether a population from which the sample was taken, is likely to follow a particular theoretical distribution (e.g. normal).

Therefore these tests are capable of checking the normality of a distribution. In this case, the null hypothesis is the following: H0: = .

According to the traditional approach we wish to find out to what extent the frequency (fi), with which the observed data fall within a set of categories, differs from an expected – maybe a theoretical – frequency (fi*). For testing the normality or another theoretical distribution, first, a frequency distribution with frequency classes will be constructed from the sample. The class frequencies are called observed frequencies (fi = Oi). For each frequency class an expected frequency (fi* =Ei) will also be computed from the specified theoretical distribution.

The null hypothesis of the goodness-of-fit test states that the observed frequencies for all classes are equal to the expected frequencies. The null hypothesis is H0: Oi = Ei, claiming that the class probabilities for all classes are equal to the probabilities of the specified theoretical distribution.

The test statistic is defined as

= O −

As it is computed from sample data, it is an estimate. The sampling distribution of the value of the test statistic is with degrees of freedom ν = c – k (c is the number of categories, k is the number of independent quantities, e.g. if one constant is required viz. the total number of observations, then the ν =c – 1).

Note here, that this test statistic can also be used for testing the association. The test of independence (two-way tables) compares the frequencies in one set of categories with the frequencies observed in another set of categories. The formula of the test statistic is the same, the only difference is the way how the expected frequencies are calculated. The observed frequency fij= Oi, and the fij* = Ei values are determined from the observed sample data.

Accordding to the null hypothesis, if the variables are totally independent from each other, then the ratio of the frequencies is constant across the categories:

.= ⋯

70

This looks like a two-tail test, but the most extreme left tailed value for the test statistic is zero, which means that the condition of the null hypothesis is perfectly met. Therefore despite the formulas of the hypotheses, this test – asthe Jarque-Bera test – is a right-tailed test.

Another goodness-of-fit test is the Kolmogorov–Smirnov goodness-of-fit test, which determines the probability that a set of data falls within the expected standardized proportions of the normal distribution. The Kolmogorov-Smirnov test compares the observed cumulative relative frequencies with the expected ones, and the largest absolute difference between the frequencies provides the value of the test statistic dmaxobserved. The appropriate critical value is obtained from the probability distribution of dmaxcritical(c,n).

The combination of skewness and kurtosis coefficients and the Shapiro–Wilk test are the most powerful tests for checking the normality for (small) samples of less than 20 elements.

After checking the normality the parametric test can be used. In the next section the step by step guide for hypothesis testing will be followed, based on the normality assumption.

EXAMPLE 4.2

Using the data of BROILER_CHICKENS database, test the normality of the weight of the chickens by the Jarque-Bera test, the chi-squared goodness-of-fit test, and the Kolmogorov–Smirnov and Shapiro–Wilk tests (by SPSS). For type A soyabean feed, on days 10 (n=143)

= 6 +1

For the chi-squared goodness-of-fit test the predicted frequencies were calculated from the normal probability distribution. For standardization the estimates, i.e. the sample mean and standard deviation were used, therefore the degrees of freedom is c – k = 6 – 2 – 1.

χ2 = 2.168 < 7.815 = χ2(6-2-1)0.95

PKolmogorov-Smirnov = 0.200 > α = 0.05 PShapiro-Wilk = 0.143 > α = 0.05

so there is not enough evidence to conclude that the null hypothesis is false, i.e., normality cannot be rejected..

71

In document Biometry (Pldal 68-71)