• Nem Talált Eredményt

Complete Balanced Experiment for Means. Consider an experi

In document Statistical Analysis* BY (Pldal 30-42)

1.5. Analysis of Variance. Computation

1.5.2. Complete Balanced Experiment for Means. Consider an experi

m e n t of form PAQZR2S6, where t h e following meanings m a y be a t t a c h e d : Ρ = photographic plates; four levels, or four plates used

Q = excitation condition; three levels

R — repetition in immediate succession; two levels S = specimen; at six levels

T h e dependent variable, or observation is t a k e n to be the Δ log / of a line pair corresponding to an element common to all samples and a line of t h e matrix element; 36 spectra m a y be placed on a plate by appropriate masking.

T h e connotations peculiar to the factors are different Plates, P, m a y be considered t h e block unit. Evidently plates m a y be used only once, and for present purposes are considered indistinguishable except for a serial-like designation or dictionary ordering. Q, the excitation conditions m a y or m a y not h a v e a q u a n t i t a t i v e graduation; t h u s t h e differences between t h e Qi, Q2, Qz m a y be t h a t of known secondary volt­

age, capacitance, or even exposure times. R, t h e repetition factor, is

assumed indistinguishable between Ri a n d R2; t h a t is, Ri on Pi does not in a n y peculiar way correspond to 721 on P2 r a t h e r t h a n to R2 on P2. T h e sample factor, Si is n o t t a k e n t o indicate a different composition between the levels b u t some other characteristic, electrode size for example. T h e influence of differences in composition will be treated later under regression.

T h e set of data, i.e., the set of Δ log I observations or measurerhents is arranged as is illustrated in Table I. One proceeds to compute various

TABLE I

Pi P2 Pz PA

Qi Qz Qi Q* Q* Qi Q* Qz QI Q* Qz

_

R\ « 2 Ri Ri R 2 Ri Ri Rx Ri Ri Ri Ri β2 Ri « 2 Ri Λ2 Ri « 2 .Si

S2 — — — — — — — — — — — — — • - —

S*

SA St

56 — — — — — — —

sums of squares of deviations from the mean. T h e computations in general are a p t to be laborious unless performed in a methodical manner.

First, a general simplification in the computation will be indicated. For

Ν

a n y set of numbers, t h e sum of squares (of deviations) i.e., ^ (Yi — y)2, summed over all values of i will be computed by means of t h e formula,

Ν Ν

y - .ν)2 = y Y^ - (" v: ) 2>

L, LI Λ

an algebraic identity t h a t m a y be easily proved:

V

Ν (Yi

- ;/)

2

= ^

Y* - *y ^ Y i + yy2

T h e latter form is easily computed on modern computing machines with automatic multiplication, since in t h e process of squaring, t h e sum of

Yi appears on one dial a n d t h e sum of squares, Yi2, on another. T h e following symbols will be used hereinafter: ΣΡι will mean t h e sum of all observations on P i ; SPiQi means t h e sum of all measurements on P1Q1 etc.

There are 4 X 3 X 2 X 6 = 144 observations in all, i.e., Ν = 144.

C o m p u t e ( Σ 7 )2/ 1 4 4 ; this q u a n t i t y will be called t h e correction factor for t h e mean, or simply t h e correction factor and will be indicated b y t h e symbol cf. T h e following computations are now m a d e : { Σ2Ρ = ( Σ Ρ )2}

144

(a) (Total) sum of squares: ^ Y2 — cf

,M , m f S2Pt + Σ2Ρ2 + Σ2Ρ3 + Σ2Ρ4

(b) (Ρ) sum of squares: ^ cj (c) (Q) sum of squares: cj (d) (Ρ X Q) sum of squares:

Σ2Ρι(?ι + · • · + Σ2Ρ3& + · · · + S2P4Q3 12

— cf — (Ρ) sum of squares (Q) sum of squares

, χ , m Γ Σ25χ +" • • • + Σ2^6 ,

(e) (ο) sum of squares: 2 4 c/

ω

/ ο w c\ * Σ2Ρι$ι + · · · + Σ2Ρ4δ6 . , N

(Ρ X A S) sum of squares: g cf — (P)

sum of squares — (AS) sum of squares

/ w n v λ Σ20ΐΑ?1 + ' ' · + Σ2ζ)3Α $6 χ /Γ. .

(g) Χ S) sum of squares: g c/ — (Q) sum of squares — (AS) sum of squares

/ ix / Ε > v . n v. c,N r Σ2ΡιβΐΑ§ι + · · · + Σ2Ρ403α5 6

(h) (Ρ X Q Χ A S) sum of squares: 2

— cf — (P) sum of squares — (Q) sum of squares — (AS) sum of squares — (Ρ X Q) sum of squares — (Ρ X S) sum of squares

— (Q X £) sum of squares

(i) (ΣΡ) sum of squares: Remainder, i.e., the difference between t h e total sum of squares and all the sums previously computed.

I t should be noted t h a t t h e divisor, i.e., the n u m b e r in t h e denomina­

tor of t h e above fractions, is t h e n u m b e r of observations used in t h e summation of each of the terms in t h e n u m e r a t o r .

T h e computed values m a y be exhibited in an Analysis of Variance Table.

Source of

Referring to t h e Analysis of Variance Table, t h e first column, headed

" S o u r c e of V a r i a t i o n, , lists t h e m a i n effects P , Q, S their first order inter­ independent factors, t h e comparisons obtained above are independent or orthogonal, a n d their sum of squares a n d df are linearly additive, respec­

tively, t o give a combined sum of squares a n d a combined df correspond­

ing t o t h e combined or pooled effects. T h e physical reason for t h e pooling of t h e R terms, as h a s been s t a t e d before is t h a t P i a n d P2 are indistinguishable when compared in a n y classification larger t h a n t h e cell PtQjSk. I n a similar fashion, if t h e entire experiment were t o be repeated over several days, introducing a new effect, D, since t h e various Ρ levels within each d a y h a v e only a serial difference, Ρ would h a v e t o be pooled with its D interactions, i.e., Ρ + Ρ X D would be t h e least combination t h a t would h a v e physical meaning.

TABLE II Analysis of Variance

T h e second column, headed " D e g r e e s of freedom (d/)," is derived as follows: degrees of freedom has the connotation of t h e n u m b e r of observa­

tions or levels corresponding to an effect minus t h e n u m b e r of constraints.

I n this example t h e constraint is t h e use of t h e mean. However in keep­

ing with t h e formalistic a t t i t u d e assumed t h r o u g h o u t this discussion, df is to be considered as a n u m b e r computed from t h e d a t a to be used in entering tables of statistical significance. T h e rule for computing this n u m b e r is as follows: (a) t h e df of t h e total is Ν — 1, where Ν is t h e n u m b e r of observations, (b) t h e df of a main effect is the n u m b e r of levels of t h a t effect minus 1, for example, df(P) = 4 — 1 = 3 , and (c) the df of an interaction is t h e product of the df of the component main effects entering into t h e interaction, for example, df(P X Q) = 3 X 2 = 6.

i T h e third column, headed " m e a n square (V)," is t h e quotient of the corresponding sum of squares by t h e associated df.

T h e fourth column, headed " C o m p o n e n t s of variance," indicated by σ2 with various subscripts on line with the various sources of variation m a y be computed step wise, starting a t t h e b o t t o m . I t is to be noted t h a t the coefficients of t h e various components is t h e n u m b e r correspond­

ing to the multiplicity of the effect corresponding to the subscript, except σ0 2. T h e interpretation of these components is intimately connected with their significance and will be discussed later.

N O T E : T h e abov e computatio n procedur e ma y b e easil y adapte d t o experiment s that hav e unequa l clas s numbers , fo r exampl e a differen t numbe r o f observation s o r repetitions i n Si tha n i n S2, b y a n arithmetica l adjustmen t o f th e divisors , whic h wil l still remai n th e numbe r o f term s use d i n formin g th e sum s i n th e numerator s sepa ­ rately considered . Howeve r detail s o f thi s modificatio n an d it s influenc e o n signifi ­ cance , particularl y interactions , an d othe r refinement s w i l l b e omitted . T h e inter ­ ested reade r i s advise d t o consul t th e reference s fo r complet e discussions .

1.5.3. Complete Balanced Experiment, for Regression (Covariance) Analysis. Qualitative Interpretation of Effects. T h e previous computa­

tion, to anticipate a later discussion, was designed t o estimate the sig­

nificance of differences of mean values between various classifications of the d a t a due to the various effects. However, spectrographic d a t a is largely concerned with establishing a line or relation between two varia­

bles. T h e most common example is the relation between the Δ log / of a line pair corresponding to an element and matrix line with the log per cent of t h a t element in the sample, S. T h e relation, of course, need not be limited to t h a t mentioned for it could easily be t h a t of t h e Δ log J of a sample and the secondary voltage in t h e effect Q, for example. Or, to be sure, the experiment m a y be designed to yield, simultaneously, two lines; one line correlating Δ log / with per cent and t h e other correlating

Δ log / with voltage. However, the discussion will be restricted to the determination of one line, assumed straight, namely, t h a t correlating Δ log I with log per cent.

Consider the previous experiment PAQ*R2S6. T o each sample Si, there corresponds an independent, known variable, not subject t o error, namely t h e log per cent, X». Sx is always associated with Xh S2 with X2 etc, i.e., t h e samples or s t a n d a r d s are t h e same t h r o u g h o u t t h e entire experiment. I t will simplify t h e computations if the log per cent values are transformed into deviations from t h e mean, i.e., Xi = Xi — x;

otherwise, when computing Σζ;Υ\· it will be necessary to subtract another

n u m b e r of paired observations.

T h e introduction of t h e new condition, i.e., correlation of Y(Si) ( = Δ log I (Si)) with Xi introduces some new and i m p o r t a n t considera­

tions. Consider R a t only 1 level, i.e., P4QZS6, then a regression line will be obtained within each PiQj combination or twelve regression lines in all.

These regression lines, as m a y have been apparent, are least square solu­

tions and hence will pass through (x, y); with y determined within each PiQj cell. T h e regression lines in general will not be identical, b u t will move around in t h e (x, y) plane. Kinematical considerations indicate the generalized motion of a line t o be resolvable into two independent components, namely (1) a translation a n d (2) a rotation a b o u t some point. Since χ is fixed for all combinations of PiQj t h e translation of t h e line will become a motion along t h e F-axis corresponding to t h e different values of y in t h e PiQj cell while t h e rotation will be reflected in the different value of bn. This m a y also be seen from t h e equations. T h e twelve regression lines will be of t h e form Ϋ = y + bn%, η = 1, 2, · · · 12. Consequently, the analysis of variance reduced for regression should distinguish three main varieties of error, namely: (1) the error due to the variation in y or means (this has been analyzed in the previous computa­

tion of t h e analysis of variance), (2) t h e error due to t h e variation in bn, in regression coefficient, or t h e variation in slopes; this will be called covariance or regression analysis, a n d (3) the error due to the points not lying immediately on the line or t h e error due t o deviations.

Accordingly, reconsidering t h e original experiment, it is seen t h a t while it is still of t h e form PAQ*S6R2 there is an added i m p o r t a n t modifica-tion^ namely t h e set Si t o S& is a recognizable class correlated with the six independent variables, X\ t o x&.

T h e computation m a y be divided into three parts, n a m e l y : (1) means, (2) slopes or regression (covariance analysis) and, (3) deviations. T h e procedure m a y well be arranged as follows.

Prepare a covariance table (Table I I I ) , t h a t is, compute Σχν = ΣχΥ for every PxQj combination, summing Yijk(Ri) + Fi ;fc(P2) m e a c n com­

bination t o obtain t h e Y for each S, as illustrated. For easy reference it is suggested t h a t t h e values of t h e covariance be exhibited in a m a n n e r

and call this q u a n t i t y t h e sum of squares reducible for regression, with t h e symbol rf. This q u a n t i t y is analogous in its use t o cf. I t is t o be noted t h a t t h e denominator in t h e above fraction, as well as in t h e similar fractions t o be used below, is t h e p r o d u c t of Σχ2 into t h e n u m b e r of sets of S t h a t are used in t h e s u m m a t i o n of t h e covariance represented by each t e r m in t h e n u m e r a t o r .

T h e analysis of covariance reduced for regression, of Y on X(S), m a y now be performed along lines quite parallel t o t h e previous analysis of variance.

(1) Means. C o m p u t e , according t o t h e previously indicated pro­

cedure for t h e analysis of variance, t h e s u m of squares corresponding t o t h e total a n d t h e various effects, i.e., P , Q, Ρ X Q, Σ Ρ , S,P X S, Q X S, Ρ X Q X S a n d total. Represent t h e t e r m s as follows:

Sum of squares for t h e m e a n : ( P m e a n ) , (Q m e a n ) , ( Ρ X Q m e a n ) ; (>S), ( Ρ X 5 ) , (Q X S), ( Ρ X Q X S), a n d

Total

(2) Slopes, Regression or Covariance Analysis.

(a) T o t a l sum of squares reduced b y regression of Υ on xSS2y.x = T o t a l - rf = ΣΥ2 - cf - rf = Σν2 - rf

ΛΛ m ^ * S2x y ( P Q + • · • + Σ2χνΑ)

(b) ( P reg) sum of squares: ΰΣχϊ rf

(c) (Q reg) sum of squares: rf (d) ( Ρ X Q reg) sum of squares: 22^2

-rf - {P reg) - (Q reg).

(5) Deviation Analysis.

(a) (S dev) sum of squares = (S) sum of squares — rf (b) ( P dev) sum of squares = (Ρ X S) sum of squares —

( P reg) sum of squares

(c) (Q dev) sum of squares = (Q X S) sum of squares — (Q reg) sum of squares

(d) ( Ρ X Q dev) sum of squares — ( Ρ X Q X S) s u m of squares — ( Ρ X Q reg) sum of squares.

(4) Analysis of Replication. If R is still retained as a mere repetition factor for individual spectra, for example if spectrum PiQiSiR^ is fol­

lowed immediately b y spectrum P1Q1S1R2 a n d similarly t h r o u g h o u t t h e entire experiment, no further analysis is possible. On t h e other hand, if t h e unit of repetition is t h e set of six spectra (PiQiStRh i = 1, · · · , 6),

for example a set of six spectra, Rh one for each specimen, is exposed in succession and after a time lapse the other set of six for R2) is exposed, or perhaps the first set is exposed for each combination of PiQj for all twelve combinations of and t h e n the series repeated to expose the second set, R 2, t h e repetitions will be distinguishable within each PiQj if not between t h e m . For example, t h e two sets in P2Qs are distinguishable as Ri and R2, where the subscripts have only a serial relationship, and no peculiar correspondence or relationship is implied between set Ri in P2Qz to Ri in any other PiQj as opposed t o correspondence of Rι in P2Qz t o Rι or R2 in any other PiQj. T o repeat, the repetitions are dis­

tinguishable within PiQj b u t not between them.

Under this assumption of the n a t u r e of the repetitions, the analysis m a y be extended as follows.

T h e covariance Table I I I is extended, using Table I, so t h a t the covariance is computed explicitly for each PiQjRk set, or 4 X 3 X 2 = 24 numbers. These numbers m a y be exhibited in a m a n n e r similar to Table I I I , b u t with two entries in each PiQjRk cell, one for Ri and the other for R2 as indicated in Table IV. However the Ri and R2 values are not t o be distinguished b y any index common between t h e different T h e following sums of squares m a y now be computed in addition to those previously obtained.

(a) Sum of squares: (computation independent of any " r e ­ m a i n d e r " process)

N o t e : There are 4 X 3 X 6 = 72 terms in the numerator, (b) ( Σ Ρ means) Sum of squares:

C o m p u t e twelve numbers similar to the one indicated for all PiQj

PiQi-or, equivalently:

(c) ( Σ Ρ reg) Sum of squares:

Analysis of Variance Reduced for Regression. Ν = 144

Source of variation Degrees of freedom (df) Mean square (V) Total 144 - 2 = 142 Total reduced for regression

Or, if one wishes to summarize the three principal t y p e s of errors, one m a y prepare condensation as follows:

TABLE V(o)

Source of variation Degrees of freedom Mean square

Means 23 (Σ Means)/23

Regression 23 ( S R e g ) / 2 3

Deviations 96 (Σ Dev)/96

Total 142 Total reduced for regression

142

Tables V and V(a) should be carefully considered. T h e second column lists degrees of freedom, df, n o t quite t h e same as those in T a b l e I I ; 1 df, for regression, is lost from the total, hence the total df=N—2 = 142.

T h e df for means and regression is the same as in Table I I , t h a t is, for main effects, Ρ and Q, t h e df is t h e n u m b e r of levels minus 1, and t h e df of Ρ X Q = df(P) X df(Q). Since there are twenty-four repeated sets (of S), there are 24 — 1 = 23 df, respectively, for b o t h means and regression. Since t h e regression is around S, t h e S effect loses 2 df, one for the m e a n and one for regression; hence the df (S dev) = 6 — 1

— 1 = 4 . On account of its linear properties, the df of a deviation effect, say P, m a y be derived from the mode of c o m p u t a t i o n of P, namely df(P dev) = df(P X S) - df(P reg) = 15 - 3 = 12. Or else, one m a y consider Ρ dev as equal to the interaction of Ρ X S dev and the df(P X S dev) = 3 X 4 = 12, by the usual rule.

NOTE: It may be perhaps unnecessary to remark that in an analysis of this type the sum of squares as well as the covariance or regression factors for all effects and interactions are non-negative numbers.

Although t h e full q u a n t i t a t i v e meaning of t h e various m e a n squares will be only appreciated after the discussion on significance, t h e following-discussion m a y be considered as an introduction t o their qualitative meanings. T h e viewpoint will be t h a t of t h e null-hypothesis.

Total: If the total (variance) m e a n square were zero it would m e a n t h a t all twenty-four observations on each of t h e six samples, Si, are identical and t h a t t h e y all lie immediately on t h e regression line (analyti­

cal curve) drawn through the six points (xiyi, X2V2, Xzyz, x^y*, Zbys, x%y%)

where each point is the identical location of the twenty-four observations.

This composite condition, involving slopes, means, deviations, plates, excitation condition and repetitions is admittedly on the improbable

side. Consequently, it m a y be assumed t h a t t h e total (variance) mean square, after reduction for regression, is not zero.

There are twenty-four individual regression lines (analytical curves), one line for each PiQjRk cell or combination. T h e equations of these lines are all of t h e form Ϋ = a + bx, where a = y and b is t h e slope, i.e.,A the observed a n d calculated value.

T h e qualitative interpretations of the effects m a y be t a b u l a t e d under their implications with regard to a, b and deviations, under the null hypothesis.

i.e., t h e sum of t h e squares of t h e differences of average Δ log / for each Si from t h e calculated values, using t h e overall regression line is zero.

Ρ XQ mean = 0 {a(P~Qi) - a0} = {a(Pt) - a0J + {a(Qy) - a0} , i.e.

a(PiQi) = a(Pi) + α φ - a0 Ρ X Q reg = 0 &(Ρ,ρ3·) = 6(P,) + 6(Qy) - &o

Ρ X Q dev = 0 Σ { Fn(Pt-Qy) - Ϋη(Ρ&)}2 = Σ { Fn( P , ) - Yn(Pi)j2

R means = 0 a(P.QyPi) = a(PiQjR2) P reg = 0 KP&jRx) = 6(PlQyP2)

Ρ dev = 0 Σ ί Κ ^ Ρ^ Ρ Ο - ^ ( Ρ^ Ρ χ ) }2 = Σ{ Fn(Pt-QyP2) - ?n(PiQyP2)l2

2P = 0 F„(PiQyP0 = Fn(PiQyP2)

i.^. Computation. General Procedure for Orthogonal Comparisons.

Reduction to Single Degrees of Freedom

1.6.1. Introduction. While t h e computational procedure presented

In document Statistical Analysis* BY (Pldal 30-42)