Mathematical Approach to Everyday Life

(1)

Mathematical Approach to Everyday Life

Dr Ivana Djolović idjolovic@tfbor.bg.ac.rs

University of Belgrade, Technical faculty in Bor Bor, Serbia

Today, in the modern informatical society we are surrounded with different kind of stories in the media related to some predictions, claims, confidence levels and conclusions.

Verbal expressions and everyday phrases are presented to

audince in order to warn or just inform people, but mathematics stays in “the backstage”. Correctly used mathematics and statistics could be powerful tool for explanation of many situations in everyday life.

This talk will be devoted to some statistical interpretations of real life situation. Starting from some real situation, we will discover where the statistical interpretation is hidden. Also we will emphasize potential traps in understanding the situation.

(2)

9 ...9 out of 10 women recommend anti-age cream...

9 ...30% chance of snow...

9 ...the average lifetime of a light bulb is 562 days...

9 ...certain medication is the best solution for headache...

9 ...6-years old children spend 200 minutes watching TV...

9 ...less than 5% of our items are defective...

9 ...washing detergent A is more effective than others...

9 ...drinking 2 liters of water per day is healthy...

9 ...100% success in teaching...

?

9 Can I beleive in all those numbers?

9 How did they get those numbers?

9 Real life or suspicious information?

9 Who was included in the survey?

Can we test and check such claims?

(3)

Claim: A 3-month-old babies sleep an avarage 20 hours in a 24 hours.

Mathematical (statistical) interpretation 1:

A past study claimed that 3-month-old babies sleep an avarage 20 hours in a 24 hours. A researcher took a random sample of 20 babies and obtained that they slept an avarage 19 hours 15 minutes in a 24 hours. Assume that the sleeping times of all 3-month-old babies are normally distributed and population standard deviation is 45 minutes.

Using the 5% significance level, test the claim of the earlier study.

Mathematical (statistical) interpretation 2:

A past study claimed that 3-month-old babies sleep an avarage 20 hours in a 24 hours. A researcher took a random sample of 20 babies and obtained that they slept an avarage 19 hours 15 minutes in a 24 hours with standard deviation of 45 minutes. Assume that the sleeping times of all 3-month-old babies are normally. Using the 5%

significance level, test the claim of the earlier study.

The same problem? The same text? NO!!!

(4)

Hypothesis Testing -

Hypothesis tests about the mean

(hypothesis tests are used to confirm (accept) or deny (reject) a claim that is made about a population)

X – random variable – characteristic

(

x1, x2, x3, ... x_n

)

sample n – sample size

Population Sample

µ-

population mean x-sample mean

σ-

population standard deviation s-sample standard deviation

1 2 3 ... _n

x x x x

x n

+ + + +

=

2 2 2 2

2 (x1 x) (x2 x) (x3 x) ... (x_n x)

s n

− + − + − + + −

= , s= s²

$² ( ¹ )² ( ² )² ( ³ )² ... ( )²

1

x x x x x x xn x

s n

− + − + − + + −

= − ^,^$

s2

s= ^$

Notation:

s- sample standard deviation

$s- improved sample standard deviation

(5)

Elements in the hypothesis tests:

9 Null hypothesis H₀ (a claim about a population parameter that is assumed to be true until it is declared false)

9 Alternative hypothesis H₁ (true if the null hypothesis is false)

Null hypothesis vs Alternative hypothesis

Real situation

H0 true H₀ false Decision

Accept

H0 OK Type II Error

β Reject

H0 Type I Error α

OK

9 α - the significance level 9 C – the rejection region

9 T – test- statistics (random variable)

9 Statistically significant = Significantly different (the null hypothesis is rejected;

very small probability of happening just by chance;the difference between x and μ is statistically significant)

9 (Statistically) not significantly different (the difference between x and μ is so small that it may have occured just by chance)

(6)

Hypothesis tests about the mean

µ

1. σ known (

²

)

: ,

X N μ σ

Null hypothesis: H0(μ μ= 0)

9 Alternative hypothesis: H1(μ μ≠ 0)

Two-tailed test; the rejection region: ^C= −∞ −( ^, ^zα] [∪ ^zα^,∞), where ( ) ¹

2 z_α −α

Φ = ;

9 Alternative hypothesis: H1(μ μ> 0)

Right-tailed test; the rejection region: ^C=[^zα^,∞), where ( ) ^{1 2}

z_α −2α

Φ = ;

9 Alternative hypothesis: H1(μ μ< 0)

Left-tailed test; the rejection region: ^C= −∞ −( ^, ^zα], where ( ) ^{1 2}

z_α −2α

Φ = ;

T X

n μ σ⁻

=

→

^t ^x ⁰

n σμ

= −

(7)

( )^zα

Φ 0.4 0.45 0.475 0.48 0.49 0.495

z_α 1.285 1.645 1.96 2.055 2.325 2.575

(8)

( )^z_α ^0.475 ^zα ^1.96

Φ = ⇒ =

( )^z_α ^0.49! ^2.32 ^z_α ^{2 3}^.3 ^z_α ^2.3²⁵

Φ = ⇒ ≤ ≤ ⇒ ≈ or z_α ≈2.33 or z_α ≈2.32

(9)

Research 1:

A past study claimed that 3-month-old babies sleep an avarage 20 hours ina a 24 hours. A researcher took a random sample of 20 babies and obtained that they slept an avarage 19 hours 15 minutes in a 24 hours. Assume that the sleeping times of all 3-month-old babies are normally distributed and population standard deviation is 45 minutes.

Using the 5% significance level, test the claim of the earlier study.

X - the sleeping times of all 3-month-old babies; ^{X N}^:

(

^{μ σ}^, ²

)

45 min 0.75h

σ = = (σ known) – population standard deviation n=20 (sample size)

19 15 min 19.25

x= h = h(sample mean)

5% 0.05

α= = (significance level)

Test-statistics ⁰ 19.25 20 0.75 4.47

20 t x

n μ σ

− −

= = ≈ −

1)H0(μ=20) vs H1(μ≠20)

two-tailed test;

the rejection region: ^C= −∞ −( ^, ^zα] [∪ ^zα^,∞) ( ) ^{1 0.05}

2 0.475 1.96

z_α ⁻ z_α

Φ = = ⇒ = ⇒^C ^{= −∞ −}( ^{, 1.96}] [^∪ ^1.96,^∞)

Since t≈ −4.47∈C, we reject H0(μ=20) - the claim of the earlier study 2)H0(μ=20) vs H1(μ>20)

right-tailed test;

the rejection region: ^C⁼[^z_α^,^∞) ( ) ^{1 2 0.05}

2 0.45 1.645

z_α ^{− ⋅} z_α

Φ = = ⇒ = ⇒^C⁼[^1.645,^∞)

Since t≈ −4.47∉C, we accept H0(μ=20) - the claim of the earlier study 3)H0(μ=20) vs H1(μ<20)

left-tailed test;

the rejection region: ^C= −∞ −( ^, ^zα]

( ) ^{1 2 0.05}

2 0.45 1.645

z_α ^{− ⋅} z_α

Φ = = ⇒ = ⇒^C^{= −∞ −}( ^{, 1.645}]

Since t≈ −4.47∈C, we reject H0(μ=20) - the claim of the earlier study

(10)

1.

σ not known (

²

)

: ,

X N μ σ

Null hypothesis: H0(μ μ= 0)

9 Alternative hypothesis: H1(μ μ≠ 0)

Two-tailed test; the rejection region: ^C^{= −∞ −}

(

^, ^tⁿ⁻^1;^α^{⎤ ⎡}_{⎦ ⎣}^∪ ^tⁿ⁻^1;^α^,^∞

)

9 Alternative hypothesis: H1(μ μ> 0)

Right-tailed test; the rejection region: ^C⁼^⎡_⎣^tⁿ⁻^1;2^α^,^∞

)

9 Alternative hypothesis: H1(μ μ< 0)

Left-tailed test; the rejection region:^C^{= −∞ −}

(

^, ^tⁿ⁻^1;2^α^⎤_⎦

t-distribution (Student’s t distribution)

n-1 – degrees of freedom

1 T X

S n

μ

= −

−

→

⁰

1 t x

s n

μ

= −

−

OR

$ T X

S n

μ

= −

→

^t ^x ⁰

s n

μ

= −

$

(11)

(12)

(13)

Research 2:

A past study claimed that 3-month-old babies sleep an avarage 20 hours in a 24 hours. A researcher took a random sample of 20 babies and obtained that they slept an avarage 19 hours 15 minutes in a 24 hours with standard deviation of 45 minutes. Assume that the sleeping times of all 3-month-old babies are normally. Using the 5%

X - the sleeping times of all 3-month-old babies; ^{X N}^:

(

^{μ σ}^, ²

)

45 min 0.75

s= = h– sample standard deviation

σ unknown – population standard deviation n=20 (sample size)

19 15 min 19.25

5% 0.05

Test-statistics ⁰ 19.25 20 0.75 4.36

1 20 1

t x s n

μ

− −

= = ≈ −

− −

1)H0(μ=20) vs H1(μ≠20)

two-tailed test;

the rejection region: C= −∞ −

(

, t_n₋1;_α⎤ ⎡⎦ ⎣∪ t_n₋1;_α,∞

)

1; 20 1;0.05 19;0.05 2.093

tn₋ _α =t ₋ =t = ⇒ ^C^{= −∞ −}( ^{, 2.093}] [^∪ ^2.093,^∞)

the rejection region: C=⎡⎣t_n₋1;2_α,∞

)

1;2 20 1;2 0.05 19;0.10 1.729

tn₋ _α =t _{− ⋅} =t = ⇒^C⁼[^1.729,^∞)

left-tailed test;

(

, t_n₋1;2_α⎤⎦

1;2 20 1;2 0.05 19;0.10 1.729

tn₋ _α =t _{− ⋅} =t = ⇒^C^{= −∞ −}( ^{, 1.729}]

(14)

Research 3:

A past study claimed that 3-month-old babies sleep an avarage 20 hours in a 24 hours. A researcher took a random sample of 2000

babies and obtained that they slept an avarage 19 hours 15 minutes in a 24 hours with standard deviation of 45 minutes. Using the 5%

Research 4:

babies and obtained that they slept an avarage 19 hours 15 minutes in a 24 hours. Assume that the population standard deviation is 45

minutes. Using the 5% significance level, test the claim of the earlier study.

Where is the assumption that the sleeping times of all

3-month-old babies are normally?

(15)

Central Limit Theorem

If one takes random samples of size

n

from a population of mean μ and standard deviation σ, then, as n gets large, X approaches the normal distribution, that is: X N: , ²

n μ σ

⎛ ⎞

⎜ ⎟

⎝ ⎠

X – random variable – characteristic

(

^X¹^, ^X²^, ^X³^, ^... ^Xⁿ

)

^sample

n – sample size

1 2 n

X X X

X n

+ + +

= K

( )1 ( )2 ( )n ( )

E X =E X =K=E X =E X =μ

( ) ( ) ( ) ( )

2 2 2 2 2

1 2 n

X X X X

σ =σ =K=σ =σ =σ .

( )

^X¹ ^X² ^Xⁿ ^{( )}

E X E E X

n μ

+ + +

⎛ ⎞

= ⎜⎝ ⎟⎠= =

K ,

( )

²^{( )} ²

2 2 X1 X2 Xn X

X n n n

σ σ

σ =σ ^⎛⎜⎝ ⁺ ^{+ +} ^⎞⎟⎠= =

K .

If X N: ( ,μ σ²) then X N: , ² n μ σ

⎛ ⎞

⎜ ⎟

⎝ ⎠ , for all n (either small (n<30) or large sample) If X has unknown distribution (not normal distribution) and known standard deviation σ , then X N: , ²

n μ σ

⎛ ⎞

⎜ ⎟

⎝ ⎠ for large sample n≥30.

BUT

for CLT, we need the following:

9 a large sample size

9 known standard deviation σ What about the case: a sample is large and σ is not known?

(16)

Hypothesis tests about the mean µ according to a sample size

1.Small sample

1.1. σ known 1.2. σ not known

2.Large sample

2.1.σ known (CLT):X N: , ² n μ σ

⎛ ⎞

⎜ ⎟

⎝ ⎠, that is ^:

( )

^0,1

n X_σ⁻^μ N

2.2 .σ not known: distributon of random variable ^X

S n

μ

−

$ can be approximated with normal distribution (as the sample size becomes larger, the t- distribution approaches the standard normal distribution)

The rejection regions can be obtained in the following way:

9 two-tailed test

( ^, ] [ ^, )

C= −∞ −z_α ∪ z_α ∞ or ^C= −∞ −( ^, ^tⁿ−^1;α⎤ ⎡⎦ ⎣∪ ^tⁿ−^1;α^,∞), where ( ) ¹

z_α −2α

Φ = ;

9 right-tailed test

[ ^, )

C= z_α ∞ or^C=⎡⎣^tⁿ−^1;2α^,∞), where ( ) ^{1 2}

z_α −2α

Φ = ;

9 left-tailed test

( ^, ]

C= −∞ −z_α or C= −∞ −( , t_n₋1;2_α⎤⎦, where ( ) ^{1 2}

z_α −2α

Φ = .

(17)

Research 3:

babies and obtained that they slept an avarage 19 hours 15 minutes in a 24 hours with standard deviation of 45 minutes. Using the 5%

X - the sleeping times of all 3-month-old babies; No assuption about the distribution!!!

45 min 0.75

s= = h– sample standard deviation

σ unknown – population standard deviation

(as the sample size becomes larger, the t-distribution approaches the standard normal distribution)

n=2000 (LARGE sample) 19 15 min 19.25

5% 0.05

Test-statistics ⁰ 19.25 20 0.75 44.7 1 2000 1 t x

s n

μ

− −

= = ≈ −

− −

1)H0(μ=20) vs H1(μ≠20)

two-tailed test;

(

, t_n₋1;_α⎤ ⎡⎦ ⎣∪ t_n₋1;_α,∞

)

1; 2000 1;0.05 1999;0.05 ;0.05 1.96

tn₋ _α =t ₋ =t =t_∞ = ⇒^C^{= −∞ −}( ^{, 1.96}] [^∪ ^1.96,^∞)

BUT the rejection region can be also

( ^, ] [ ^, )

C= −∞ −z_α ∪ z_α ∞ where ( ) ¹

z_α −2α

Φ =

For α =0.05 we have ( ) ;0.05

1 0.05

0.475 1.96

z_α −2 z_α t_∞

Φ = = ⇒ = =

(18)

2)H0(μ=20) vs H1(μ>20)

the rejection region: C=⎡⎣t_n₋1;2_α,∞

)

1;2 2000 1;2 0.05 1999;0.10 ;0.10 1.645

tn₋ _α =t ₋ _⋅ =t =t_∞ = ⇒^C⁼[^1.645,^∞)

[ ^, )

C= z_α ∞ where ( ) ^{1 2}

z_α −2α

Φ =

For α =0.05 we have ( ) ;0.10

1 2 0.05

0.45 1.645

z_α − ⋅2 z_α t_∞

Φ = = ⇒ = =

Since t≈ −44.7∉C, we accept H0(μ=20) - the claim of the earlier study

3)H0(μ=20) vs H1(μ<20)

left-tailed test;

(

, t_n₋1;2_α⎤⎦

1;2 2000 1;2 0.05 1999;0.10 1.645

tn₋ _α =t _{− ⋅} =t = ⇒^C^{= −∞ −}( ^{, 1.645}]

( ^, ]

C= −∞ −z_α where ( ) ^{1 2}

z_α −2α

Φ =

For α =0.05 we have ( ) ;0.10

1 2 0.05

0.45 1.645

z_α − ⋅2 z_α t_∞

Φ = = ⇒ = =

(19)

Research 4:

babies and obtained that they slept an avarage 19 hours 15 minutes in a 24 hours. Assume that the population standard deviation is 45

minutes. Using the 5% significance level, test the claim of the earlier study.

X - the sleeping times of all 3-month-old babies; No assuption about the distribution!!! CLT!!!

45 min 0.75h

σ = = (σ known) – population standard deviation n=2000 (LARGE sample)

19 15 min 19.25

5% 0.05

Test-statistics ⁰ 19.25 20

44.72 0.75

2000 t x

n σμ

− −

= = ≈ −

1)H0(μ=20) vs H1(μ≠20)

two-tailed test;

the rejection region: ^C= −∞ −( ^, ^zα] [∪ ^zα^,∞) ( ) ^{1 0.05}

2 0.475 1.96

z_α ⁻ z_α

Φ = = ⇒ = ⇒^C ^{= −∞ −}( ^{, 1.96}] [^∪ ^1.96,^∞)

the rejection region: ^C=[^zα^,∞) ( ) ^{1 2 0.05}

2 0.45 1.645

z_α ^{− ⋅} z_α

Φ = = ⇒ = ⇒^C⁼[^1.645,^∞)

left-tailed test;

the rejection region: ^C= −∞ −( ^, ^zα]

( ) ^{1 2 0.05}

2 0.45 1.645

z_α ^{− ⋅} z_α

Φ = = ⇒ = ⇒^C^{= −∞ −}( ^{, 1.645}]

(20)

Example A: A farmer is supposed to deliver potatoes to a grocery store in packages (bags) that weight 20 kilos (kg) in average. The grocery store claims that the packages are in average under 20 kilos.

A random sample of 50 packages of potatoes has an average of 19.4 kilos and standard deviation 1.9 kilos. Test the claim of the store with 1% significance level.

X - the weights of farmer’s packages; σ unknown – population standard deviation n=50 (largesample)

x=19.4kg(sample mean) 1.9kg

s= – sample standard deviation 1% 0.01

Test-statistics ⁰ 19.4 20 1.9 2.21

1 50 1

t x s n

μ

− −

= = ≈ −

− −

( )

0 20

H μ= vs H1(μ<20)

left-tailed test;

(

, t_n₋1;2_α⎤⎦

1;2 50 1;2 0.01 49;0.02 2.4

tn₋ _α =t _{− ⋅} =t ≈ ⇒^C^{= −∞ −}( ^{, 2.4}]

Since t≈ −2.21∉C, we accept H0(μ=20) - the claim of the farmer

Notice:

Since the sample is large, the rejection region can be also: ^C= −∞ −( ^, ^zα] where ( ) ^{1 2}

z_α −2α

Φ =

For α =0.01 we have ( ) ;0.02

1 2 0.01

0.49 2.325

z_α − ⋅2 z_α t_∞

Φ = = ⇒ = ≈

(21)

Example B: A journalist claims that all adults in her city spend an average of 2 hours or more per week on jogging. A researcher wanted to test this claim. (S)he took a sample of 25 adults from that city and asked them about the time they spend per week on jogging. Their responses are as follows:

30 min, 1h, 20 min, 0 min, 1h 15min 45 min, 1h, 2h, 2h15mni, 3h, 0 min, 30min, 1h45min, 1h30min, 2h30min, 1h, 2h30min, 3h, 3h30min, 1h,

0min, 15min, 20min, 45min, 1h15min.

Assume that the times spent on jogging per week of all adults from this city are normally distributed. Using the 10% significance level test the claim of the journalist.

X - the times spent on jogging per week of all adults from the ciy; ^{X N}^:

(

^{μ σ}^, ²

)

n=25 (smallsample) x=?

30 60 20 0 75 45 60 120 135 180 ... 20 45 75

min 76.6 min

x + + + + + + +25 + + + + + +

= =

s=?

$ _s₌_?

2 2 2 2 2 2 2 2 2 2 2

2 30 60 20 0 75 45 60 120 ... 20 45 75 2 2 2

min 76.6 min 3659.44 min

s + + + + + 25+ + + + + +

= − =

or

( ) (² ) (² )² ( ) (² )²

2 30 76.6 60 76.6 20 76.6 ... 45 76.6 75 76.6 2 2

min 3659.44 min

s − + − + − 25 + + − + −

= =

( ) (² ) (² )² ( ) (² )²

2 30 76.6 60 76.6 20 76.6 ... 45 76.6 75 76.6 2 2

min 3811.92 min

s − + − + − 24 + + − + −

= ≈

$

(22)

76.6 min

x= (sample mean) 3659.44 min2 60.49 min

s= = – sample standard deviation

3811.92 min2 61.74 min

s= =

$ - improved sample standard deviation

10% 0.10

Test-statistics ⁰ 76.6 120

3.515 60.49

1 25 1

t x s n

μ

− −

= = ≈ −

− −

or

0 76.6 120

3.515 61.74

25 t x

s n

μ

− −

= = ≈ −

$

( )

0 120

H μ= vs H1(μ<120) ( )or

0 120

H μ≥ vs H1(μ<120)

left-tailed test;

(

, t_n₋1;2_α⎤⎦

1;2 25 1;2 0.10 24;0.20 1.318

tn₋ _α =t _{− ⋅} =t = ⇒^C^{= −∞ −}( ^{, 1.318}]

Since t≈ −3.515∈C, we reject H0(μ=120)

(

H0(μ≥120)

)

- the claim of the journalist

(23)

Example C: A recent study claimed that the mean yield per apple plant of sort „G“ is 60 kilos. A researcher has measured the yields of 55 apple plants of certain sort „G“ and obtained the following:

Yields(kg) per plant

[51, 53) [53, 55) [55, 57) [57, 59) [59, 61) [61, 63]

Number of plants 6 9 10 12 10 8

Test the claim of the recent study with 10% significance level

X - the yields per apple plants of sort „G“

n=55 (large sample) x=?

6 52 9 54 10 56 12 58 10 60 8 62

57.27 x ⋅ + ⋅ + ⋅ + ⋅ + ⋅55 + ⋅ kg kg

= ≈

s=?

$ s=?

2 2 2 2 2 2

2 6 52 9 54 10 56 12 58 10 60 8 62 2 2 2

57.27

s = ⋅ + ⋅ + ⋅ 55+ ⋅ + ⋅ + ⋅ kg − kg

or

( )² ( )² ( )² ( )² ( )² ( )²

2 6 52 57.27 9 54 57.27 10 56 57.27 12 58 57.27 10 60 57.27 8 62 57.27 2

55

s = ^⋅ ⁻ ^{+ ⋅} ⁻ ⁺ ^⋅ ⁻ ⁺ ^⋅ ⁻ ⁺ ^⋅ ⁻ ^{+ ⋅} ⁻ kg

or

( )² ( )² ( )² ( )² ( )² ( )²

2 6 52 57.27 9 54 57.27 10 56 57.27 12 58 57.27 10 60 57.27 8 62 57.27 2

54

s = ^⋅ ⁻ ^{+ ⋅} ⁻ ⁺ ^⋅ ⁻ ⁺ ^⋅ ⁻ ⁺ ^⋅ ⁻ ^{+ ⋅} ⁻ kg

$

2 2

s =9.98kg

$

2 2

s ≈9.80kg