• Nem Talált Eredményt

Point estimations of the correlation coefficients

8.2 Interval estimations, confidence intervals

8.2.4 Point estimations of the correlation coefficients

The ”empirical formula” of the estimate of the correlation coefficient r based on n inde-pendent pairs of observations (xi, yi) reads:

ˆ

We remind (cf. Section 6.2.1) that the correlation coefficient r provides sufficient infor-mation on the measure of dependence of the components of the random variable first and foremost in case of a two-variable normal distribution.

Examples

Example 8.12. Determine the empirical value (estimate) of the correlation coefficient r for the data of Example 8.11 on the birth weight and relative weight growth data, see Table 8.4.

. . .

Figure 8.7: The connection of birth weight x and relative weight growth y with the regression lines.

Solution: The details of the calculation, needed for the application of formula (8.8), are: and r equals 0 in case of independency, the result shows moderate correlation, especially if we can assume that the distribution of the random vector variable (birth weight, weight growth) is normal or close to normal. Fortunately, the correlation coefficient is a relatively ”robust” measure, which can often be applied even in spite of the common and frequent shortcomings of the data. The observed values are plotted in Fig. 8.7.

Example 8.13. The numbers of bird species and the geographical north latitudes at-tributed to the sample areas are given in Table 8.5, see also Fig. 8.8. Estimate the corrrelation coefficient r.

xi yi 39.217 128 38.8 137 39.467 108 38.958 118 38.6 135 38.583 94 39.733 113 38.033 118 38.9 96 39.533 98 39.133 121 38.317 152 38.333 108 38.367 118 37.2 157 37.967 125 37.667 114

Table 8.5: Relationship between the geographical north latitude xi (degrees) and the number of bird species (yi) [MCD][page 215].

Solution: The details of the calculation, needed for the application of formula (8.8), are:

17

P

i=1

(xi−x)2 = 7.5717,

17

P

i=1

(yi−y)2 = 5118.00,

17

P

i=1

(xi−x)(yi−y) =−91.085.

From here we get ˆr=−0.4629. The result reflects a moderate negative correlation.

Remark: Note that here the geographical coordinates have been considered as obser-vations with respect to a random variable. This suggests that the observation locations could be selected to a certain extent ”at random”. Another problem is that the number of species is in fact a discrete random variable. Furthermore, we do not know whether the distribution of the random variable (geographical latitude, number of species) is close to normal. We deliberately chose such an example where the nature of the data requires some carefulness.

. . .

x, north latitude in degrees y, number of species

37 38 39 40

90 160

Figure 8.8: Geographical latitudex vs. number of species y.

Point estimation of Spearman’s rank correlation coefficient

For this estimation one can use the empirical (and, at the same time, estimation) formula

ˆ (cf. Section 6.2.2). It can be written in the simpler form

ˆ

as well. The estimate is biased.

Point estimation of Kendall’s rank correlation coefficient For this estimation one can use the empirical formula

ˆ

τ = 4A

n(n−1)−1 (8.10)

(cf. Section 6.2.2), which is an unbiased point estimate. (Remember that for the sake of simplicity τ was only defined above for untied ranks.)

Examples

Example 8.14. The level of professional knowledge and suitability of ten university students were ranked by a specialist. The data is contained in Table 8.6.

(student identifier) A B C D E F G H I J rank of knowledge, ri 4 10 3 1 9 2 6 7 8 5 rank of suitability,si 5 8 6 2 10 3 9 4 7 1

Table 8.6: Ranking the knowledge and suitability of people [ARM][page 405].

The data pairs (ri, si) are plotted on a point diagram in Fig. 8.9.

-6

b

b

b

b

b

b

b

b b

b ri, rank of knowledge si, rank of suitability

1 2 3 4 5 6 7 8 9 10

1 2 3 4 5 6 7 8 9 10

Figure 8.9: Relationship between the level of knowledge (r) and suitability (s).

Estimate Spearman’s% and Kendall’s τ rank correlation coefficients between the level of knowledge and the suitability.

Solution: For the determination (estimation) of Spearman’s rank correlation coeffi-cient we can use formula (8.9). First calculate the quantities ri −si. These are −1, 2,

−3, −1, −1, −1, −3, 3, 1, 4 (in order of succession). The square sum of these data is

52. Since N = 10,

ˆ

%= 1− 6·52

990 = 0.685.

To determine (estimate) Kendall’s coefficient we should re-order the data according to the increasing rank of one of the variables:

1 2 3 4 5 6 7 8 9 10

2 3 6 5 1 9 4 7 10 8

It is easy to see that the number A can be obtained if going along the upper row we count the number of cases where the rank in the lower row is exceeded by the rank to the right of it. According to this, A = 8 + 7 + 4 + 4 + 5 + 1 + 3 + 2 + 0 + 0 = 34. Hence, on the basis of formula (8.10)

ˆ

τ = 136

90 −1 = 0.511.

Taking into account that both rank correlation coefficients are located in the interval (−1,1) and that their expected value is 0 in case of independency, in our case both rank correlation coefficients reveal a moderate positive stochastic relationship.

Example 8.15. An experiment was designed to study in how many days rats can be trained to be able to orient in a labyrinth, and how many mistakes they do daily on an average. The data is contained in Table 8.7. Estimate Spearman’s correlation coefficient

%.

Solution: The pairs of ranks from Table 8.7, suitably ordered, are:

1 2 3 4 5 6 7 8 9 10

3.5 1 2 3.5 6 6 6 8 10 9 This yields

10

P

i=1

(ri−si)2 = 12.5. By using formula (8.9), the estimate of Spearman’s rank correlation coefficient is ˆ% = 1− 6·12,5990 = 0.924. The result clearly suggests a positive relationship.

index of rat

(i)

average no. of mistakes

(xi)

rank of xi (ri)

no. of days needed

(yi)

rank ofyi

(si) (ri−si)2

1 11 1 10 3.5 6.25

2 15 3 7 2 1

3 18 4 10 3.5 0.25

4 28 8 21 8 0

5 27 7 16 6 1

6 21 5 16 6 1

7 12 2 5 1 1

8 43 10 24 9 1

9 25 6 16 6 0

10 39 9 28 10 1

Table 8.7: Data for Example8.15 [HAJ][page 244].

Chapter 9

Hypothesis testing

Introduction

When examining a hypothesis or an assumption, we can decide about accepting or re-jecting them on the basis of certain observations. Assume for example that we would like to decide whether a coin is fair. Let us flip the coin n times, and note the number of

”heads”. If the coin is fair, then in most cases the proportion of ”heads” will be close to n2. So, if we experience the latter case, then we should probably accept the hypothesis that the coin is fair, i.e., we will not reject it (the question of which of the two expressions is better to use here is conceptional). However, if the number of ”heads” ”significantly differs” from n2, then we reject the hypothesis that the coin is fair. We can also think of hypotheses of completely different nature. We may assume, for example, the guilt of a prisoner at the bar, a certain illness of a person, etc. The hypothesis which, as a null hypothesis, is always contrasted to somealternative hypothesis (see below), is often denoted as H0. The alternative hypothesis is usually denoted byH1. A given hypothesis H0 can be contrasted with different alternative hypotheses. In the simplest caseH1is the opposite of the statement involved in H0. ThenH1 is sometimes not even mentioned. If we can set several alternative hypotheses H1, H10, H100, . . . depending on the nature of the problem, then the pairsH0 vs. H1, H0 vs. H10, H0 vs.H100, . . . impose different hypothesis tests. So, the choice of the alternative hypothesis depends of the given circumstances and the raised statistical question. Since, as it was said, the alternative hypothesis is not necessarily the negation ofH0, in certain cases the rejection ofH0 does not automatically mean that the alternative hypothesis can be accepted.

It should be noted that several basic questions arise in connection with the above concepts, which we cannot mention and discuss here (cf. with the Neyman–Pearson problem and the hypothesis conception of R. A. Fisher, and their relationship).

On the basis of what has been said above, our decision can be based on a random observational result. In the decision process we can make two kinds of errors. It can

happen that a true hypothesis (more precisely, a hypothesis of ”true” logical value) is rejected. This is called type I error (error of the first kind). In this case, provided that the alternative hypothesis is the negation of the null hypothesis, at the same time – falsely – we accept as a consequence the alternative hypothesis. In the other case H0 is false, we still accept it, falsely. The latter type of error is called type II error (error of the second kind) (cf. Table 9.1).

H0 is

accepted rejected true right decision type I error H0

false type II error right decision

Table 9.1: For the explanation of type I and type II errors.

Some examples of null hypotheses H0 from the field of mathematical statistics:

(a) some parameter, e.g., the expected value m of a certain random variable is equal to 0.5

(b) for the previous parameter m: 3 < m <6

(c) by the coin flip, the probability p of heads isp= 0.5.

Examples for alternative hypotheses in the above case (a):

H1 :m 6= 0.5 H10 :m <0.5 H100 :m >0.6.

Examples for the previous notions

Consider the hypothesis H0 concerning the fairness of a coin (P(heads = 0.5)) on the basis of 6 flips. The alternative hypothesis H1 is: ”H0 is not true, i.e., the coin is unfair”.

The latter alternative hypothesis, in a bit awkward terms, is two-sided in the sense that there is no specification of the direction of any difference from 0.5. Taking into account this, we perform the following two-sided test. We choose the decision strategy that in case the observation with regard to the result of the flip, i.e., the number of ”heads” falls into the interval [2,4], then we will accept the hypothesis. If, however, the result of the flip falls outside of this interval, then we will reject the hypothesis. So, the interval [2,4]

(together with its boundaries) is the region of acceptance, while the region of rejection as two sides of the acceptance region is the union of the intervals [0,1] and [5,6]. This is why this type of tests is called two-sided tests. Now, if the hypothesis H0 is true, i.e.,

the coin is fair, then, in view of n = 6, p= 12, the result of the flip, i.e., the observation, falls into the interval [2,4] with a probability of

P(2) +P(3) +P(4) =

and falls outside of it with a probability of 0.21875. This means that in case the hypoth-esis H0 is true, we will take a good decision with a probability of 0.78125, and we will make a type I error with a probability of 0.21875. (It is worth examining the probability of the type I error also for the case where the coin is still fair, but the region of acceptance is the interval [3,5].) The case where the hypothesisH0is false is more complicated. The reason is that the extent of ”falseness” plays an essential role in the probability of the type II error. For example, if the coin only slightly differs from the regular shape, then the observation falls into the region of acceptance with a probability of approximately 0.78125 again, which means thatH0 will be accepted (even if it is false), and so we make the type II error, the probability of which is 0.78125 (abnormally big). If by ”falseness”

we mean a kind of concrete deformity of the coin, and the alternative hypothesis is aimed at this, then in a given, concrete case we can give the concrete probability of the type II error. However, if the deformity is rather uncertain, then the probability of the type II error cannot usually be given. In connection with this, we remark that, due to the usually undetermined nature of the probability of the type II error, it would in fact be better to talk about not rejecting the hypothesisH0 rather thanaccepting the hypothesis.

A decision about the null hypothesis H0 is made mostly on the basis of some obser-vation of a random variable Y assample statistic, in our case, test statistic. An example is the ratio of the results ”heads” and ”tails”. Namely, if according to a previously fixed strategy the observed sample value y of Y falls into a region A, called region of accep-tance, then the hypothesis H0 will be accepted againstH1. In the opposite case, where y falls outside the region of acceptance, called region of rejection orcritical region, the hy-pothesis will be rejected. The region of acceptance is usually fixed in such a way that the probability α of the type I error is some previously fixed (small) value. The probability that our decision is right provided H0 is right (!), i.e., the probability that we avoid the type I error is obviously 1−αin this case, which probability is calledsignificance level of the test. When carrying out a statistical test, this level 1−α is to be determined before the observation and the decision. In practice, αis usually 0.99 or 0.95, occasionally 0.90.

When defining the region of acceptance, we should, of course, take into account the concrete alternative hypothesis (see below).

Deciding about the type of the region of acceptance (and consequently the region of rejection, too) is usually a complicated question, and it can significantly depend on whether the alternative hypothesis H1 is one-sided or two-sided. All this will soon be illustrated with several examples.

In some cases it is hard and senseless to insist on the question of one-sidedness or two-sidedness of a hypothesis. In contrast, for tests is generally a straightforward concept such a classification. Namely, in case of a one-sided (one-tailed) test the region of rejection falls onto a simple tail of the (discrete) probability distribution or density function. In case of two-sided (two-tailed) tests the region of rejection falls onto two ”tails” of the probability distribution or density function.

In the case of a two-sided alternative hypothesis or two-sided test the region of ac-ceptance is very often chosen as an interval (acrit, bcrit) or [acrit, bcrit], and the region of rejection is the set (−∞, acrit]∪[bcrit,+∞) or (−∞, acrit)∪(bcrit,+∞), cf. Fig. 9.1.

Here acrit andbcrit are thecritical values corresponding to the significance level of the test. If the significance level is 1− α then it is more expedient to use the notations acrit1, bcrit2, whereα =α12. Then for the test statistic Y we have

P(acrit1 < Y < bcrit2|H0) = 1−α and

P(Y < acrit1|H0) +P(bcrit2 < Y|H0) =α12 =α.

Often α1 and α2 are chosen such that P(Y < acrit1|H0) = P(bcrit2 < Y|H0) = α2 holds.

-acrit1 bcrit2

first part of the region

of rejection

second part of the region

of rejection region of acceptance

Figure 9.1: Critical values, region of acceptance and regions of rejection.

Note that the equality P(acrit1 < Y < bcrit2|H0) = 1−α itself does not usu-ally determine uniquely the values acrit1, bcrit2 (cf. Fig. 9.2), unless there is some relationship between them.

Such a relationship does hold in some important cases. For example, if the probability density function is symmetrical to zero, then (for a reason, not discussed in detail here) the region of acceptance should be chosen as

−acrit,α2, acrit,α2

.

All this is true for the t-distributions and the standard normal distribution, which are common test statistics in the cases where H0 holds.

Instead of H0, we often indicate the concrete null hypothesis by the formula. This reads for example asm =m0 (see below), then, instead of the previous formula, we may write:

P(acrit < Y < bcrit |m=m0) = 1−α.

Figure 9.2: Area under the graph of the probability density functionf for the test statistic Y, and the area of size 1−α between the two pairs of critical values, corresponding to the same significance level of 1−α.

In another typical case with one-sided tests, where the hypothesis and the alternative hypothesis both are one-sided, the alternative hypothesis is usually also one-sided, the region of acceptance is of the form (−∞, acrit,α2) or (acrit,α2,+∞) (cf. Fig. 9.3).

-acrit

region of acceptance region of rejection the test statistic falls here with a

probability of 1−α

the test statistic falls here with a probability ofα

Figure 9.3: The region of acceptance (−∞, acrit) and the region of rejection (acrit,+∞) with the critical value acrit in a one-sided test. If the significance level of the test is 1−α, then for the test statisticY we have P(Y < acrit|H0) = 1−α and P(acrit< Y|H0) = α.

In the following in most cases we will not write crit in the index, writing onlyaα, etc.

Example for a one-sided test and a region of acceptance

In the previous example it is a natural alternative hypothesis that the shape of the coin is unfair. Most frequently, we tacitly mean the hypothesis H1: ”H0 is not true”. Naturally, there are other possibilities as well. For example, by the coin flip an alternative hypothesis can be the following H10: ”the coin has been manipulated so that the probability of the result ”heads” is 0.6”, see the above example. Here we only remark that we have to choose a different decision strategy when we test H0 against H1 than if we test it against H10, see later in this section in connection with the one-sided and two-sided tests.

”p-value” test

In hypothesis tests, very often the basis of the decision strategy is the so-called p-value.

This latter is the probability that – if the null hypothesis is true – the value of the test statistic as a random variable is not ”more extreme”, e.g., greater than the value of the test statistic as observed on the sample.

So, then

p:=P(value of test statistic≤observed value|H0 is true).

For example, in case the region of rejection is of the type (−∞,critical value), the null hypothesis is rejected or accepted at the given significance level depending on whether the observedp-value does not exceed the critical value corresponding to the given significance level or whether it exceeds this critical value. It is obvious that this decision strategy is in fact the same as that introduced above, at the most it is more ”tempting” to change the critical value or the significance level for the justification of some preconception. We only dealt with the application of the p-value because it is often mentioned.

A hypothesis or alternative hypothesis is also commonly called one-sided or two-sided, see above, although these attributes are not so unequivocal as in the cases of one-sided and two-sided tests. For example, the hypothesis H0 : m1 ≤ a can be called one-sided, while the alternative hypothesis H1 :m1 6=a two-sided (see also above).

Sometimes it happens that we apply a one-sided test or a one-sided critical region for a null hypothesis that is neither really one-sided, nor two-sided, see for example Section 9.2.1.

We can distinguish between simple and composite hypotheses. For example, H0 : m1 =a is a simple hypothesis, while H00 :m1 6=a is a composite hypothesis.

An example for the role of the alternative hypothesis

As we mentioned above, the alternative hypothesis can influence the choice between a one-sided and a two-sided test. To shed light on this question, we consider the following example.

On the individuals of a subspecies A one can observe the realization of a certain continuous quantitative character G with the probability density function gA shown in Fig. 9.4.

Let us make a decision for the sake of simplicity from the G-value (= g) of only one sample (element) about the hypothesis H0 that the sample is from subspecies A, contrasted with the alternative hypothesisH1 below. An important circumstance is that there is a decision constraint between H0 and H1 as well as between H0 and H10 (see below). In connection with this, let us choose the regions of rejection as (−∞, gcrit 1)

-6

1 2 3 4 5 6 7 8 9 10

b

b

b

b

b

b

b

b

b b

r

r

r

r

r

r

r

r

r

0.05 0.10 0.15 0.20 0.25 0.30

gA gB

g

Figure 9.4: Illustration for the role of the alternative hypothesis in decision making. The observed g value in no way supports the validity of the density function gB against gA. and (gcrit 2,+∞). (The assembly of the two regions is a two-sided critical region, so the test is a two-sided test.) Assume that g falls into the region of rejection (−∞, gcrit 1).

Therefore we reject the hypothesis H0. Of course, in the present case we tacitly assumed that the alternative hypothesis H1 is as follows: ”H0 does not hold (i.e., the sample is not from subspecies A)”. However, if the alternative hypothesis isH10: ”the sample originates from subspecies B”, and the individuals of B have a density function gB with reference to the character G (see Fig. 9.4), then the concrete location of the point g in no way supports the rejection of H0 (against H1), i.e., the acceptance of H10, since the position of g obviously less supports H10 thanH0.

As mentioned, we have no possibility to touch upon all essential aspects of hypothesis tests. We only mention two questions [VIN][page 115].

IfH0 is a composite hypothesis, for example m < m0 for the expected valuem, then of course for each concrete partial hypothesis m0 < m0 the test statistic may have a different distribution. So, in case H0 holds, the critical region corresponding to the given significance level can be different in each case (see all this in connection with the concrete case of the one-sided t-test in Section 9.1.1).

Somewhat similarly, the probability of the type II error is unambiguous only if the alternative hypothesis H1 is simple.

9.1 Parametric statistical tests

In the following points we will first deal with so-called parametric tests, where the hy-pothesis refers to the parameters of a parametric distribution family, such as for example the mean or the standard deviation of the normal distribution or in other cases the parameter of the exponential distribution.