• Nem Talált Eredményt

SIMPLIFIED METHODS FOR PERIOD ANALYSIS*

N/A
N/A
Protected

Academic year: 2022

Ossza meg "SIMPLIFIED METHODS FOR PERIOD ANALYSIS* "

Copied!
8
0
0

Teljes szövegt

(1)

SIMPLIFIED METHODS FOR PERIOD ANALYSIS*

by

J. REBIANN

Department of Civil Engineering Mathematics, Technical lTniversity, Budapest (Received February 8, 1972)

Presented by Prof. P. R6zsA

1. Introduction

In connection with the statistical analysis of time series, several methods have been developed for the analysis of periods which were generally based on different heuristic considerations. Most of the methods developed are mathemat- ically interesting, but their application in practice is rather difficult and laborious, and commonly requires a very long series of data. Most of the statistical methods evolved are appropriate for hypothesis testing, that is, if one has some guess on the length p of the period, then these methods may help to check whether p can really be the length of the period. The guess concerning the length of the period might result from considerations relating to the nature or from certain statistical properties of the process producing the data series (e.g., most frequent time intervals between outstanding values). One may not have any serious guess on the period length or there may be various periods in the data series, therefore it is desirable to develop a method for the period analysis to directly furnish the length of the period, so to say lending itself to calculate the period or, in other words, which procedure, in contradistinction to the hypothesis testing methods is rather of "estimation" nature.

True, alternatives of the "harmonic analysis" are suitable for the explicit determination of the period length, but only after a series of transformations by lengthy trial-and-error calculations.

This all made imperative to develop a simple statistical method for the relatively quick determination of the "main" period length convenient for computer use. To the author's knowledge, no comparable methods have been published in the special literature to now.

Prior to the description of the direct method for finding the length of the period, a hypothesis testing method reported of by W HITTAKER and ROBE'iSON

[1] should briefly be outlined. It started from heuristic considerations ·which gaye the author an inspiration to c1eyelop the direct method. Essentials of the former method are as follows.

* Based on research done at the Institute of Water :Management and Hydraulic f:ngineering.

(2)

220 J. REDIA.Y.Y

Be the data senes obseryed:

In the considered case, the Ui values are the monthly normal 'water levels of the Lake Balaton (for example, u1 is the mean level of N oyember 1921 and Un that of October 1958). Be the hypothesis that p is the real period length.

Diyiding the data series to subsets of length p:

If, in fact, p is the real period, then every ro'w of the above table is by and large of the same course which means that the values llip +1' . . . , u(i +1)p

fit a kind of wave line.

The elements of eyery row fitting about the same curye, column sums U1 , U2 , • • • , Up describe this curve with an m-fold amplitude. If, in turn, the data are divided to other than p lengths, then the data of each row cannot be said to approach the same curve; on the contrary, the rows so to say com- pensate each other. Hence, expressing the difference between the largest and the smallest column sum among U1 , U2, • • • , Up:

JU(p)

=

max Ui - min Ui

1 1

the function U(p) will be the maximum if p is the real period.

If for a different p' value JU(p')

>

JU(p) then the hypothesis that p is the real period should be rejected.

This method has been applied for trials on the data series of the water levels of the Lake Balaton which will be reported below.

The data collected by SZESZTAY [2] were grouped for primary investi- gation according to periods 9, 10, 11, 12, 13, 14 and 15. The following LlU(p) =

= max Ui - min Ui yalues were obtained for each of the assumed periods:

i i

p: 9 10 11 12 13 15

.JU(p): 109 141 325 1193 161 210 401

The result markedly shows the 12-month periodicity.

The calculations were done with a desk-top calculator by making use of a data series of 38 years.

(3)

:lIETHODS FOR PERIOD ASALYSIS 221

Unfortunately. the distribution of .:'1 U(p) is not known exactly, thus no lower limit number for ~U(p) may he established above that p may be con- sidered as the real period. Anyhow, a principle can be laid down; the higher .d U(p), the more p may be considered as a period.

2. Direct method for finding the period

The basic idea that dividing a periodic series of data into subsets of lengths corresponding to the period, these lengths "will be roughly of the same course (or, at least, for a certain criterion these subsets are more similar than if it would be divided into subsets of other lengths), will now be utilized in another way.

The starting data series is again the series of the mean water levels of the Lake Balaton (456 data from 1921 to 1958 by SZESZTAY [2]). Be the data series

The monthly mean water levels are continuous random variables. Thus, in fact, this is a set of continuous random variables ~l' ~~, • • . , ~.156' The problem is much simplified without loss of efficiency by replacing the set of continuous variables by a set of discrete variables using the following simple transformation (rounding-off customary in statistics).

The possible ·water level values are grouped as:

15

----1--+-1--::--11--1---

1

--->

cm

10 ~o 30 40 140 150

The range of values is divided into intervals of 10 cm, the values lo·wer than 10 cm are coded by 1, those between 10 and 20 cm by 2, and so on, an d the values higher than 140 cm by 15, that is:

Vi

=

[ui ] 1

where [ ] denotes the integer part, 10 cm being the unit.

Thus a coded table is obtained for the monthly mean water level values:

VI' V 2, •• "VJ56 the actual values of which are listed in Table 1.

Suppose now that the same coded series of data Vi' V 2, ••• , V456 is written on two tapes (but the first tape contains it twice consecutively) which are then superimposed and shifted relatively to each other; count how many times two identical values are in coincidence at every shifting. With actual data, the following situation will be seen:

~ ~ 6 6 6 6 {6 {6 4 3 {6 1 1 1 1 {1 7 2 3 3 :):) 6 6 6 6 6 6 4 3 6 1 1 11 1 7 ...

2*

(4)

'fable 1

1 2 :1 ,1, S 6 7 B 9 10 11 ]2 I:{ ]4. ]5

l'-'

HO ISO l~

X

1921 S S (I) 6 (I 6 6 6 ·1, :1 6}

]1

2 1922 I I I I 7 2 :1 :1 I I I I J

:1 192:1 :1 :1 S S (I 7 H H 7 6 S 'I,

,I, 192·1, S () 7] 7] 7 10 11 11 II 10 9 H

S 192;' H a] 7J 7J H 7 H 9] 7 6 6 6

() 1926 7 HJ 9 9

Hq

lOl 91 J9J 9 9 9 H

7 1927 9 10 11 10 lOJ 10 J f9J

191

Hl 8 7 7

8 I92H 7 71 7 B] 9 9 19 9J IlJ 6 6] 61

9 ,i929 6 7J H f8J H] 101 11 ]0 9 H f6 J f6J

10 1.93H 7 H 7 IH HJ J()J to 9 7 6

t

6 16

11 19:11 Hl 9 11 11 1:1 14 H, I:{ ] I 9 B B

12 19:12 HJ 71 7 H] 8 10 10

1

91 B 7 71 7]

1:1 19:1:1 7 7J H HJ 9 9 ]OJ 9J 9 H, PJ J7J

H 19:\11, H to .10 10 101 101 9 H HI HJ

17

l7 '-<

IS .19:\;' 7 8 9) 9 10J IOJ 10 9 HJ 7 6 6 ::tJ

16 ,19:16 6 7 9J 10 11 II ] I 11] ]0) H 7 H t>J

17 19:17 H B 8 9

10}

1:1 .12 ]1 J 10J 9 H 71 ~ ~

18 19:\B 9 11 11 11 ]0 9 9 9) H( 71 61 (7J ~

19 19:19 6'] (I 7 HI H H H 9J BI 7J 6J l7 :.?,

2H 19'1,0 6J 7 H HJ 9 Iq III 1J 9 9 9 6

21 19,1,1 ] I 11 If) 10 11 HJ IIJ 10 B 7 61 7

22 1942 7 7 H H, ]0 12 12 11 91 81 6J 8

2,1 1911,:1 SI 6 6 faJ 9 H H 9 9J BJ 7 61

2"- 19'1 .. (, ;,J 7 a lH 10 10 10 .10 ]0 8 6 6J

2:; 19~,S 7 a 9 IH 11 11 9 8 71 61 S JI

2(1 19'1,6 51 6 7] H, 9 9 H 7 7J 6J 6 6

27 19'1,7 SJ 7 7J HJ 12 IS I;' IS 15 1:1 ]2 S

28 19'f,H H 8 8 9 9 10 IH 9 10 10 9 61

29 19,1,9 7 7 7 7] (I 7 7 (j 61 S] 11,1 6J

:10 1950 ,I, S S 7J H 8 9 7 (i

J

SJ 'J.) S

:11 1% I () (I In 11 12 11 I.I 1:1 ] :1 11 9 B

:12 1%2 7 7] B 91 11 10 91 91 H 6 S 61

:1:1 19S:1 (11 7) 9 9J 9 9 9) 9J 9 a 7 6J

:\tI. 1%,1, 6J (I (i (I 7 H HI 10 I lO 91 H III

:IS I95S a 7 a 9] 11 11 11 IH) 9 J9J 9 IlJ

:1(1 1956 9 9 9

9J

to] 12 ]2 U 11

19

H 6

:17 .1957

1

6 7 7 a ] n J ]0 9 9 H H 7 7

:w

I95H 7 6 6} 7 B a 7 7 7 6 6} S

{ 7H '''lill('.i,lelu','s

(5)

METHODS FOR PERIOD A:V.-I.L YSIS 223

If it is true that subsets of actual period length are more similar than are those of other lengths, then the most of coincidences may be observed when the tapes are shifted by the true length of the period. This fact may also serve as the statistical definition of the period, if it is completed by certain numerical stipulations. The series of data is a stochastic process, hence, a random function, thus, the number of coincidences is a random variable. In the case of a strictly periodic, non-random function, if the shifting equals the period, then all of the data coincide, that is, if N is the length of the data series, then shifting by the period length would bring about N coincidences whereas shifting

by a different length would result in less of coincidences.

Let us examine now that in the case of a stochastic data series ho'w many coincidences may be expected in a certain position.

The number of coincidences depends partly on the statistical distribution of each value in the data series and partly on the relative position of the individual values, i.e. on the spacing of identical digits.

Be the statistical distribution of the data series:

11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Confronting the data series with itself in a certain shifting is the same as taking two random permutations of the data

and writing under each other. What is the probability of a coincidence at the ith place?

Denote hy AI' Az' ..• , Al5 the event that at the ith place the digit 1, 2, ... , 15 is found twice under each other. Then

Since AI' Az, ••• , A 15 are mutually exclusive events,

15

P(A1+Az ... . A I5 )

=

~ p~.

k=l

If ~i is a characteristic variable allotted to the ith place which assumes the value 1 if at the ith place there is a coinCidence and the value 0 if not, then

15 15

P(~i=I)= ~

pL

P(~i =0) = 1-~ p~.

k=l "=1

(6)

224 .T. REDL·L\S

N

If the data series is of length lV, then the random yariable ~i

= ::E

~i

i=l

(number of coincidences in a data series of length lV) is a random variable of binomial distribution, with mathematical expectation and standard deviation

V

15 (

D(~)

= lV...f

pl1

respectively.

On the basis of the lVIoivre -Laplace limit theorem, ~ i5 of approximately normal distribution, thus, if the data series is of statistical distribution (ll), then the relationship

15

(

'

~

N"y

PT· )

P

~~;=;=;=l='~':;= >

1,28 0,1 holds for the coincidences in a data series shifted at random.

The statistical distribution for the data series of the "water levels of the Lake Balaton is the following:

PI = 0.02 pz = 0.00 ...

Pa = 0.01

P.I = 0.01 P;, = 0.04 Pr. = 0.15 P; 0.17 Ps = 0.19 P,. = 0.17 PIO= 0.11 Pll = 0.09 P12= 0.02 Pla= 0.01 P14= 0.00 ...

P15= 0.01

pi

= 0.0004 = 0 ....

P5

= 0.0001

p~ 0.0001

pg

= 0.0016

p~ = 0.0225

p~ = 0.0289

p~ = 0.0361 = 0.0289 Plo= 0.0121 PIt= 0.0081 Plz= 0.0004 Pla= 0.0001 PI4= 0 ....

P15= 0.0001

if:

= 0.1394

P(;i

=

1)

=

0.14; P(;i

=

0)

=

0.86

AI(~)=456·0,14~64, D(~) V457·0,14·0.86?S7.

From relationship (3):

p(~

764 >-:1,28J

=P(~>73)<0.1.

This means that the shift including 73 or more coincidences should be considered as a period. (Here, a 90% reliability is sufficient.)

(7)

.1IETHODS FOR PERIOD A.YAL 1'S1S 225

Remember that from the data in Table 1 it can directly be read off that shifting by 12 yields 78 coincidences, therefore the 12-month periodicity of the

data series is beyond doubt.

In order to see whether there is another longer period in the data series, the coincidences are to be tested by the outlined procedure. (It should be noted that the data series in Table 1 is rather short for this purpose.)

It would be rather impractical and hardly feasible to actually write down the data on tapes and count the number of coincidences in different shiftings. Therefore, development of an algorithm likely to be used for com- puter programming has been aimed at.

This algorithm delivers the exact number of coincidences to be expected from a given shift.

For an easier understanding of the algorithm let us again imagine that the data of Table 1 are written down on long paper tapes with centimetre graduation, consecutively twice on the upper tape.

Assume that, for example, in the 9th shifting kl coincidences haye been observed between digits 1, further k~ coincidences between digits 2, and ... k15 coincidences between digits 15. Examining now in the data series (on the centimetre-tape) at which graduations the digit 1 occurs, recording them and finding their distances, among these distances that of 9 will be encountered just kl times, among the distances between location marks of digits 2 that of 9 occurs k2 times, and so on.

For the sake of clearness let us see the location marks of the water leyel ..-alue 1. In Table 1 the digits 1 may be found at graduations 12, 13, 14, 15, 21, 22, 23, 24. Since there are 456 data, the coincidences will be obseryed in 455 shiftings, therefore this data series will be written consecutiyely twice on the upper tape. In the data series of double length, the graduations at digit 1 and their distances vield Table 2.

Table 2

(;rad~

nations 12 13 1·1 15 21 ."):) 23 24 468 ·169 -I ~o -la ·I~~ ·l~8 ·179 480

atdigitl:

12 2 3 9} 10 11 12

13 1 2 8 9} 10 11 455

14 1 7 8 9} 10 454 455

15 6 7 8 9} 453 454 455

21 1 2 3 447 448 449 450

22 1 2 446 447 448 449 455

23 445 446 447 448 45·t 455

24 444 445 446 447 453 454 455

468 1 2 3

469 470 471 477 ,178

(8)

226 J. REDIAS"

The distances of every graduation to all higher graduations should be established up to 456. In Table 2, for shifting by e.g. 9, four digits of 1 are seen to coincide because among the distances that of 9 occurs four times.

This table of distances should be compiled for the graduations at digits 2, 3, 4, ... , 15. Counting in each of the 15 tables of distances the occurrences of graduation 9 yields the number of coincidences for a shift by 9. And counting in each of the 15 tables the repetitions of every encountered distance (1, 2, ... , 455) deEvers the number of coincidences for every shifting. As concerns t'le data in Table 1, the frequency of the distance corresponding to the actual period must be above 72.

Summary

Probability criteria are involved to define the concept of periodicity. then the length of Hn eventual period is determined from the number of coincidences of random sets of data.

A numerical example is presented to illustrate that the method suggested in the paper is. from computing aspects. more advantageous than the methods used so far for the analysis of periods.

References

1. WHlTTAKER. E. T .• ROBIl'<SOl'<. G.: Calculus of Observations. 3rd ed. Blackie and Sons, 1940.

2. SZESZTAY K.: Water Conservancy of the Lake Balaton. Studies and Research Results. Xo. 9.

Bp. 1962, VITUKI (In Hungarian).

Ass. Prof. Dr. J 6zsef REIlIlAl'Ol'O, 1111 Budapest, lVIuegyctem rkp. 3, Hungary

Hivatkozások

KAPCSOLÓDÓ DOKUMENTUMOK

The plastic load-bearing investigation assumes the development of rigid - ideally plastic hinges, however, the model describes the inelastic behaviour of steel structures

Using the scaffolding system, the authors visualized the structure of a small protein in near-atomic detail, potentially enabling the visualization of cellular proteins by cryo-EM..

This transborder image of Hürrem has her own voice in the letters which she wrote to her beloved husband and Sultan of the Ottoman Empire, Suleiman, while he creates an image of

thematized by the film.22 Little Otik, a tale o f ‘a tree-root brought to life by maternal desire and paternal woodwork’,23 offers a sinister reading of the myth of monstrous

Major research areas of the Faculty include museums as new places for adult learning, development of the profession of adult educators, second chance schooling, guidance

The decision on which direction to take lies entirely on the researcher, though it may be strongly influenced by the other components of the research project, such as the

In this article, I discuss the need for curriculum changes in Finnish art education and how the new national cur- riculum for visual art education has tried to respond to

In the first piacé, nőt regression bút too much civilization was the major cause of Jefferson’s worries about America, and, in the second, it alsó accounted