• Nem Talált Eredményt

Tests of Significance

In document Statistical Analysis* BY (Pldal 52-61)

After the computations have been m a d e it is necessary t o interpret the statistics b y use of appropriate tables, t h e most frequently used of which will be of (1) t h e normal deviate, (2) " S t u d e n t ' s " ty (3) chi square and (4) Snedecor's " P . " T h e tables are t o be used in testing t h e null hypothesis, as considered in section 2.3, as follows:

(a) a statistics, Z>, is computed from t h e sample, with the associated df;

(b) T h e appropriate table is entered with the df and the desired probability level, and the tabular value D0 noted, then

(c) If D > D0 t h e null hypothesis has not been proved and t h e fact t h a t D is as large as it is cannot be ascribed t o sampling fluctua­

tions.

Or, alternately, and equivalently

(a) t h e appropriate table is entered with the df and the probability noted for which Z)0, the tabular value, is closest to D. For present purposes t h e following evaluations will be given (of course in a purely conventional manner), if the probability, (Prob) is in t h e following ranges:

P r o b > .05—no significance, i.e., the null hypothesis is not disproved and t h e statistic m a y be as great as it is merely as a result of sampling fluctuations;

.05 > P r o b > .01—almost significant or indecisive, indicated b y ( v O ;

.01 > P r o b > .001—significant, indicated b y (*) .001 > Prob—highly significant, indicated b y (**).

By "significance" is understood t h a t the null hypothesis is disproved and the statistic is as great as it is not as the result of sampling fluctuations from a certain known or assumed per cent population. T h e precise application of significance tests m a y involve a high degree of subtlety, b u t the procedures to be indicated are considered adequate for spectrographic purposes.

T h e general test of significance involves a test ratio, T, defined as follows—

τ - s - h

where S is the sample value of a statistic, h is the parameter value by hypothesis and σ8 is the standard deviation of the sampling distribution of t h e statistic. M a n y times the significance of the difference between statistics as estimated from two samples is desired, in which case the general formula becomes

Τ = { S a " S b ) ~ { 0 h ) =

-(r(Sa — Sh) <Td

where Sa — Sb = d is the difference between the statistics (usually the difference between sample means), a(Sa — Sb) = ad is t h e standard deviation of t h e difference; t h e hypothetical difference, or difference between population parameters in zero, since t h a t is t h e real meaning of t h e significance test. W h e n t h e number of observations, or samples, is large enough t h e distribution of t h e statistics t o be used in this dis­

cussion m a y be assumed t o be normal. However when t h e n u m b e r of observations is small, tables specially computed for t h e relevant sampling distributions involved m u s t be consulted.

1.8.1. The Normal Deviate. This is the ratio of the deviation of the sample value (mean) from t h e hypothetical mean of t h e normal popula­

tion or assumed true value, t o t h e standard deviation of t h e sample mean. I t has been remarked t h a t for spectrographic data, it m a y be assumed t h a t sample means are normally distributed, with s t a n d a r d deviation σο/y/n, where η is t h e size of t h e sample a n d σ0 t h e s t a n d a r d deviation of an individual measurement. T h e value of σ0 is assumed known or derived from a sufficiently extended sequence of observations, so t h a t it can be considered as derived from a large sample.

1.8.2. "Student's" t. W h e n t h e sample available for t h e determina­

tion of σβ is small (i.e., the determination of the s t a n d a r d deviation of the mean of a small sample), t h e distribution of the small-sample means is not normal and consequently another table, namely, t h a t of s t u d e n t ' s t has to be consulted. T h e df associated with the "t" is t h a t of t h e

s t a n d a r d deviation, or variance, involved, i.e., η — 1. W h e n η is more t h a n 30, t m a y be t r e a t e d as a normal deviate without serious inaccuracy.

1.8.8. Chi-Square, χ2. T h e sample is divided into a n u m b e r of categories or classes and t h e frequency of occurrence of t h e sample values within each class is compared t o t h e theoretical frequency. T h e exact form of t h e ratio is as follows:

where / , is t h e sample frequency, fh is t h e expected of hypothetical fre­

quency and t h e s u m m a t i o n is over η classes or categories. As m a y be expected, t h e df of χ2 is closely associated with t h e n u m b e r of classes minus t h e n u m b e r of constraints. T h u s , if t h e entire sample is divided into a set of independent, m u t u a l l y exclusive classes, t h e constraint is the total n u m b e r of observations a n d t h e df is η — 1. W h e n the n u m b e r of classes, or df, is more t h a n 30, \ / 2 χ2 — \ / n — 1 m a y be treated as normal deviate ( t h a t is, normally distributed a b o u t means equal t o zero a n d s t a n d a r d deviation equal t o 1).

T h e χ2 test m a y be used in m a n y forms b y suitable transformations.

However, for present purposes, t w o applications will be i m p o r t a n t , namely (1) to test t h e agreement of t h e errors of spectrographic d a t a with t h e normal distribution low and (2) t o test t h e d a t a for " h o m o g e n e i t y of v a r i a n c e " as a necessary condition for t h e application of t h e " F " test.

T h e use of χ2 as a significance test should be limited t o t h e cases in which enough observations are available. F o r best results, t h e n u m b e r of categories should be a b o u t 30, with t h e expected frequency not less t h a n 10 in each category. I n a n y event one should n o t use this test for less t h a n 100 observations total, a n d classes in which t h e expected occurrence is less t h a n 10.

I.8.4. "F," "Z," or Variance Ratio Test. This is merely t h e ratio of two variance m e a n squares as displayed in column 3 of either T a b l e I I or Table V. T h e ratio is always t a k e n so t h a t it is greater than 1, t h a t is the larger variance mean square is always in the numerator. There are two df values associated with this test, n\ = t h e df of t h e numerator, i.e., the larger variance m e a n square, a n d n2 = t h e df of t h e denominator.

These df are shown in t h e second columns of t h e analysis of Variance Tables I I or V on t h e same line as t h e m e a n squares. There are two forms of tables for using this ratio. T h e F or Snedecor "F" table exhibits t h e ratio directly while t h e " Z " or Fisher " Z " t a b l e uses t h e transformation F = e2ay or Ζ = -J- log« F. Hence if only a Ζ table is available, one m u s t enter it b y t a k i n g one half t h e n a t u r a l logarithm of t h e ratio.

Referring t o the published significance tables, their forms m a y be described as follows.

T h e c or normal deviate table is a one-way table. T h a t is, under t h e heading of probability is given t h e value of c, t h u s at P r o b .05, c = 1.96 or approximately 2.

T h e t a n d χ2 tables are similarly constructed and are two-way tables.

T h e top row or column headings are t h e probability values while in t h e left margin or column is arranged, in increasing order, t h e values of t h e df, sometimes indicated b y N. Sometimes t h e b o t t o m row of t h e t table is written for oo df, t h a t is, t h e values of c.

T h e F or Ζ tables are three-way tables, i.e., m u s t be entered by use of three numbers, n\, n2 (or N\, N2) a n d t h e probability or significance level;

hence the tables cannot be displayed in their entirety in one listing.

Snedecors "F" table usually displays n, or N\ (the df of t h e larger variance mean square) in t h e t o p row or column headings, and t h e df of t h e smaller variance m e a n square, n2 or N2, in t h e left margin. T h e R o m a n t y p e gives t h e F values at P r o b = .05 (or 5%) and t h e bold-face t y p e (dark) at P r o b = .01 (or 1%). T h e Ζ tables are usually displayed as separate tables, each headed b y t h e probability (.20, .05, .01 and .001 are common values) with N\ in t h e t o p row a n d N2 in t h e left margin.

A difficulty m a y arise in t h e use of t h e t or c tables, in t h a t t h e indi­

cated probability in one set m a y be twice t h a t of another. This is accounted for as follows: if one considers the algebraic deviate, i.e., t h e difference between t h e sample m e a n and t r u e m e a n with proper algebraic sign t h e value of t h e probability is one half of t h a t when only the numeri­

cal difference is considered. As a guide, when t h e numerical difference is considered, t h e value of c, or of t for oo df, is approximately 2 (1.96) at P r o b .05 (c = U> = 2 for P r o b 1/20).

NOTE: The four tests, c, t, χ2 and F were developed for different purposes and at different times, and it is not surprising that their interrelations have tended to be obscured. They are essentially ratios of two variances, or standard deviations, although these ratios may be subject to transformations. The relations of c, t, and χ2

may be easily explained in relation to F. As has been noted F is a three-dimensional table involving N\, N2 and Ρ ( = probability), t2 is a variance ratio whose numerator has always ldf, i.e., Ni = 1. χ2 is a variance ratio whose denominator is fixed by hypothesis, which is equivalent to its having <» df, i.e., N2 = 0 0. c has a numerator with ldf and a denominator fixed by hypothesis, i.e., df = oo. Hence c is a special case of both t and χ2.

For example:

The value of F for N\ - 1, N2 - 10, Ρ = .05 is 4.96 and the value of t for Ν = 10, Ρ = .05 is 2.228; hence t2 = 2.228 X 2.228 = 4.96 = F ( l , 10).

The value of F for Nx = 10, N2 = « , Ρ = 05 is 1.83, and the corresponding χ2/ 1 0 is 18.31/10 = 1.83. The relation between t and d has already been stated, namely c = t^.

1.9. Interpretation of the Analysis of Variance

The tests of significance will now be applied t o the various parts of the analysis of variance as displayed in Tables I I and V.

1.9.1. Total Mean Square. T h e t o t a l m e a n square (Vv) or m e a n square reduced for regression of y on χ (S2V.X), is t o be considered as an estimate of t h e population of deviations or errors in t h e spectrographic data. T h e corresponding s t a n d a r d deviations will be indicated by συ (standard error) a n d Sy.x or σν.χ (standard error of estimate). I t will be assumed t h a t t h e number of observations in t h e total, or total df, is large, at least 100.

Normality of the Population. While the strict normality of t h e error population is not necessary for t h e application of t h e significance tests, it is advisable t o investigate whether t h e population is approximately normal. I t is known t h a t χ2 is a n linearly additive function and t h a t it is more indicative if larger numbers are available. Consider t h e experiment t o determine an analytical curve, and compute the Sv.x. The observations m a y be classified in mutually exclusive sets in m a n y ways, b u t since t h e connotation of the r a n d o m variate is t h a t of an error, and since one is normally interested in the frequency of the larger errors, the following division is suggested. C o m p u t e the significant differences, di = CiSy.x, choosing the values of c»:

Per cen t include d 70 80 90

c 1.04 1.28 1.64 Remainde r

T h a t is, the deviations of the observed content of the standard samples from the known value are list and classified according t o the value of ci} i.e., in the first cell are all values contained within (xx + 1.04Sy.x), (x2 ± 1.04:Sy.x), . . . , (xn + 1.04*S,,.X) and so on for t h e other values of C. T h e theoretical frequency is given in t h e t o p line. A table is illustrated, for the results obtained in the analysis of a certain t y p e of stainless steel for the eight listed elements, χ2 is computed according t o t h e formula and summed.

If one considers each row as a sample of an infinite population, ldf is lost for t h e finite sample and t h e sum of χ2 is 16.3 for 24df, which is quite non-significant (80%). If, on t h e other hand, each row were t o be considered as drawn from a restricted universe with a s t a n d a r d deviation equal t o t h e sample value (the m e a n or true value is fixed), 2df are lost in each row, a n d there'are 16 available df. T h e χ2 is still non-significant, for χ2 = 16 for 16c?/ has a significance level of 4 0 % .

One m a y t r e a t the frequences almost as a variance table and sum each column, a n d compute a pooled χ2. This is 5.8, which is non-significant for 2 or 3d/.

T h e conclusion is t h a t t h e errors m a y be considered as being normally distributed.

TABLE VIII

Chi Square Test for Normality of Error Distribution Stainless Steel

70% 80% 90% Remainder

i i i i e i i i e i i L 1ST IN X

Theor. Obs . Theor. Obs . Theor. Obs . Theor. Obs .

V 260 182 190 26 21 26 23 26 26 1.659 Cu 500 350 355 50 50 50 40 50 55 2.571 Mo 200 140 145 20 15 20 18 20 22 1.828 Sn 260 182 174 26 23 26 30 26 33 3.199 Pb 110 77 83 11 7 26 30 26 9 2.286 Cr 240 168 177 24 19 24 24 24 20 2.190 Mn 200 140 148 20 16 20 20 20 16 2.057 Ni 240 168 173 24 22 24 22 24 23 .524 1407 1445 201 .73 201 188 201 204 16.314 5.8 Example- V' *» = ( 1 9° " 1 8 2 ) 2 + ( 2 6 ~ 2 1 ) 2 + <26 ~ 2 3 ) 2 + (26 ~ 2 6 ) 2

uxampie. ν . χ 182 + 26 + 26 + 26

= τ\\

+ Η + A

- 1-659

1.9.2. Comparison of Mean Squares. T h e total mean square, as has been stated, is t o be considered as a n estimate of t h e variance of t h e parent population of errors. I t m a y be desired t o compare t h e total mean squares, or t h e square root of t h a t q u a n t i t y , called t h e total s t a n d a r d error, as they are obtained from repetitions of experiments similar t o those analyzed in Table I I or Table V.

Let σ0 2 represent t h e mean square, determined from a n extensive series of observations, t h a t is t o be used as t h e reference value a n d let σι2 represent t h e mean square as determined from a new experiment of ni observations. T h e n it m a y be concluded t h a t σ ι2/ σ0 2 is significantly < 1

ο» 2 χ 2 (η )

whenever ~2 < — — where χα2(η\) is t h e value of χ2 for nxdf a t t h e a

O~0 U\

level of significance, a n d conversely, —t is significantly > 1 when 2

Co

This test is equivalent t o considering σι2/σο2 = F(ni, n0) a t the desired level and determining the significance of F, for (ni, n0)df) from the F table. A more critical discussion of this test will be found in t h e references.

1.9.3. Homogeneity of Variance. T h e homogeneity of variance is a mathematically necessary condition for t h e validity of the F test. This m a y be explained as follows: T h e F, or variance ratio, is designed t o test t h e significance of t h e differences between the m e a n values within an effect. Referring t o t h e experiment P4Q3P2$6, Table I I , it is assumed t h a t t h e mean values of Pi, P2, Ρ3, PA for example, are not identical and the significance of the differences between t h e means is estimated by t h e F test; in a similar fashion one m a y test t h e significance of t h e differences between t h e Qi, Q2, Qz means, or between t h e twelve Ρ X Q means etc.

However the F test is valid only if the variances within Pi, P2, P3, P4 or the Q», or the P»Qy are not significantly different, t h a t is, statistically identical.

An entirely satisfactory homogeneity of variance test has not yet been devised. T h e usual test is known as B a r t l e t t ' s test, which considers t h e natural logarithm of the ratio of the arithmetic mean t o t h e geometric mean of t h e variances within classes is distributed approximately as χ2.

T h e precise formulation is as follows: Let m = t h e n u m b e r of m u t u a l l y exclusive classifications of the d a t a (4P, 3Q, 12P X Q, or 6£, for exam­

ple), / ξ = the df within each class (assumed equal for every P, Q, Ρ X Q, or S), (SS) = the set of sum of squares within the selected classes, and (V) = the set of mean sauares within t h e selected class. T h e n B a r t l e t t ' s test states t h a t

9 ο o a o c s 1 arithmetic mean of (SS)

X2 v m_ i ) = 2.3026/.ra. logio 7-. F T c c V

geometric mean of (SS) r. Ο Λ Ο /, s 1 arithmetic mean of (V)

= 2.3026/.m. logio τ—· FTTTV geometric mean of (V) For computational purposes it is perhaps simpler t o determine

XSS Vo = arithmetic mean of (V) = .

a n d

x % -d = 2.3026 logic [(log 7 o ) 2 / - 2 ( / l o g V)}

where x2( m — 1) is χ2 for (m — 1) degrees of freedom.

T h e selection of a suitable division into classes cannot be decided for for all experiments. I t is not advisable t o choose classes with only few observations within them. If regression is not t o be considered, Ρ (plates) m a y be a suitable classification. If regression is important, as in determining a working curve, the group of standards should be a suitable classification if, and this is important, m a n y standards are used.

W h e n only a few s t a n d a r d s , say three or four, are used, t h e test is a p t t o be misleading.

This test m a y be unsatisfactory in its application t o spectrographic d a t a for t w o reasons:

( 1 ) If t h e variances are not proved homogeneous, there is no satis­

factory alternative. T h e best one can do is t o consider t h e rational subgroups t h a t h a v e physical interest and t r y t o decide just w h a t has happened.

(2) While in m a n y other applications, B a r t l e t t ' s test m a y be dis­

t u r b e d b y some class having an u n d u l y large variance, t h u s increasing t h e value of t h e arithmetic mean, the writer has found t h a t spectrographic d a t a almost always present a difficulty of t h e opposite sort; t h a t is, some class, whether it is a plate, or group of standards almost invariably has a very low variance, for example, t h e points m a y lie too close t o the analytical curve. This will m a k e t h e geometric mean very low and consequently, t h e χ2 is m u c h too high. T h u s if only three standards are used, it is not unusual t o find in an extended investigation t h a t one group of three, and it does not t a k e more t h a n one, lies immediately on t h e analytical curve. Consequently, t h a t set or class has a zero variance, t h u s t h e geometric mean of t h e variances of t h e entire experiment is zero a n d χ2 becomes infinite. I n this case, or others approximately similar, t h e spectrographer m u s t use his j u d g m e n t whether t o discard t h e offending set or t h e entire test.

1.94. "F" Test (Table V). T h e use of this test is really t h e purpose behind t h e construction of Table V, and consequently will be considered a t some length.

T h e ratio of t w o independently distributed variances, ρ — larger mean square

smaller mean square

is called t h e F ratio or F value, and tables have been computed correlating t h e F value, t h e df of t h e m e a n squares involved, a n d a probability.

T h e meaning of these tables involves t h e null hypothesis, t h a t is, it is assumed t h a t there is no significant reason for t h e F ratio being as large as it is other t h a n t h a t of chance or sample fluctuations. If t h e prob­

ability is high t h e null hypothesis is tenable; otherwise, if t h e probability is low, t h e null hypothesis m u s t be t a k e n as u n w a r r a n t e d . As an illustra­

tion, let it be assumed t h a t a comparison is m a d e between Ρ means and Q means (any other two t e r m s could h a v e been selected) and t h a t t h e Ρ means mean square is t h e greater. T h e question is whether t h e con­

tribution of Ρ means (i.e., t h e differences between t h e m e a n values of t h e plates) t o t h e t o t a l variance is really greater t h a n t h a t of Q means, i.e.,

is actually based on some physical difference in t h e effect of Ρ means compared t o Q means, or whether t h e higher value of t h e mean square of Ρ means m a y be ascribed t o a r a n d o m effect, or sampling fluctuations, in view of t h e available df. T h e null hypothesis assumes t h e difference is random, and t h e table furnishes t h e probability t h a t a value of F at least as large as the one obtained or specified will occur b y chance. I t m a y be considered unfortunate t h a t t h e hypothesis is a negative one, t h a t is, t h a t there is no physical significance and consequently t h e answer does not have the force if a positive hypothesis were made. However, on the basis of t h e null hypothesis if t h e probability is high, say .20, one m a y assume t h a t t h e differences between t h e mean value of Ρ means compared t o the differences between t h e m e a n values of Q means, even though greater, has one chance in five of being a r a n d o m effect. If on the other hand, the probability is low, say .001, meaning t h a t the chance of t h e F ratio being as high as it is merely as a consequence of sample fluctuation is only 1:1000, one m a y reasonably conclude t h a t there is a physical significance t o t h e differences between t h e m e a n values of Pt means compared to the differences between t h e m e a n values of Qj means.

I t is a clear logical step from a high or significant difference between t h e mean values of Pi means t o a high significant contribution of t h e effect t o t h e t o t a l variance.

I t should be noticed t h a t significance has been determined only as a m u t u a l relationship between effects or comparisons as t h e y m a y now be called. However, almost any practical investigation involves more t h a n two comparisons and it is desirable t o establish one of these as a measure or reference for t h e comparison of all t h e others. I n some instances, there seems t o be a logical choice. F o r example, if t h e investigation involved replicates (R), t h e n it m a y appear reasonable t o consider t h e R effect as a non-reducible source of error, and. t o pool it together with its interactions into a Σ Ρ t e r m and call it t h e error term. W h e n such a comparison is not available, it is customary t o pool all the higher order interactions into an error t e r m (considering interaction as a discrepance) particularly if no clear physical meaning can be found for these terms.

If such a logical choice is not available, t h e n each investigation m u s t be considered independently t o see whether it is reasonable t o select some other common comparison. If such a choice is still not available, t h e n only m u t u a l comparisons are possible. I n t h a t case the tabular prob­

ability should be doubled, since presumably either of t h e terms m a y have been larger t h a n t h e other. Some analysts pool all non-significant terms into a common error term, b u t the logical justification for t h e procedure is not too evident.

In document Statistical Analysis* BY (Pldal 52-61)