• Nem Talált Eredményt

Types of Data

N/A
N/A
Protected

Academic year: 2022

Ossza meg "Types of Data"

Copied!
24
0
0

Teljes szövegt

(1)

Types of Data

(2)

The values (range)

Type of data continuous discrete(categorical) Q

u Nominal not possible sex, country,

a place of birth,

l profession, bloodgroup

i

t Ordinal subjective statements very good-good -acceptable

a about intensity of - wrong - very wrong,

t different things low - normal - high,

i (brightness, voice) etc..

v e

Quantitative temperature, number of hospitals, children, concentration other counts

Examples to types of data

(3)

 A data set contains information on a number of individuals.

 Individuals are objects described by a set of data, they may be people, animals or things.

For each individual, the data give values for one or more variables.

 A variable describes some characteristic of

an individual, such as person's age, height,

gender or salary.

(4)

The data-table

 Data of one experimental unit (“person”) must be in one record (row)

 Data of the answers to the same question (variables) must be in the same field of the record (column)

 The variables (fields) are generally named by an 8 characters long identifier (e.g.:SPSS)

Number SEX AGE ....

1 1 20 ....

2 2 17 ....

. . . ...

(5)

Statistical programsystems

 SPSS

 STATGRAPHICS

 SAS

 STATA

 SIGMASTAT

 BMDP

 SOLO

 CSS/STATISTICA

 STATXACT

 (EXCEL)

(6)

Distribution

 The distribution of a categorical variable

describes what values it takes and how often it takes these values.

 The distribution of a continuous variable

describes what values it takes and how often these values fall into an interval.

 Describing distributions with graphs:

 Categorical variables: bar chart, pie chart

 Continuous variable: histogram

(7)

The distribution of a categorical variable, example

Education categories:

1: <8 elementary 2: 8 elementary 3: secondary school

4: high school or university

Frequencies:

Frequency Percent

< 8 elementary 4 20.0

8 elementary 2 10.0

secondary 9 45.0

high school 5 25.0

Total 20 100.0

Education

Education

high school secondary

8 elementary

< 8 elementary

Frequency

10

8

6

4

2

0

Education

25.0%

45.0%

10.0%

20.0%

high school

secondary

8 elementary

< 8 elementary

(8)

The distribution of a continuous variable, example

Values: Categories:

20.00 0-10

17.00 10-20

22.00 20-30

28.00 30-40

9.00 40-50

5.00 50-60

26.00 60.00 35.00 51.00 17.00 50.00 9.00 10.00 19.00 22.00 25.00 29.00 27.00 19.00

age in years

50 - 60 40 - 50

30 - 40 20 - 30

10 - 20 0 - 10

Age in years

Frequency

10

8

6

4

2

0

(9)

Histogram

(Body weights)

Jelenlegi testsúlya /kg/

87.5 82.5 77.5 72.5 67.5 62.5 57.5 52.5 47.5 42.5 37.5 32.5

Hisztogram

Jelenlegi testsúlyok

300

200

100

0

Std. Dev = 8.74 Mean = 57.0 N = 1090.00

(10)

The overall pattern of a distribution:

 The center , spread and shape describe the overall pattern of a distribution.

 Some distributions have simple shape, such as symmetric and skewed. Not all distributions have a simple overall shape, especially when there

are few observations.

 A distribution is skewed to the right if the right

side of the histogram extends much further out

then the left side.

(11)

Outliers

 Outliers are observations that lie outside the overall pattern of a

distribution. Always look for outliers and try to explain them (real data, typing mistake or other).

110 .0 10 5.0 1 00.0 9 5.0 90 .0 8 5.0 8 0.0 75 .0 70 .0 6 5.0 60.0 55 .0 5 0.0 45.0 40 .0 1 0

8

6

4

2

0

St d. Dev = 13 .79 M ean = 62 .1 N = 43 .00

(12)

Describing distributions with numbers

 Measures of central tendency:

 the mean, the mode and the median are three commonly used measures of the center.

 Measures of variability :

 the range, the quartiles, the variance, the standard deviation are the most commonly used measures of variability .

 Measures of an individual:

 rank, z score

(13)

Measures of central tendency

 Mean:

 Mode: is the most frequent number

 Median: is the value that half the

members of the sample fall below and half above. In other words, it is the

middle number when the sample

elements are written in numerical order

x x x x

n

x n

n

i i

n

= + + + = ∑

=

1 2

...

1

(14)

Example

 The grades of a test written by 11 students were the following:

 100 100 100 63 62 60 12 12 6 2 0.

 A student indicated that the class average was 47, which he felt was rather low. The professor stated that

nevertheless there were more 100s than any other grade.

The department head said that the middle grade was 60,

which was not unusual.

(15)

Results

 The mean is 517/11=47,

 the mode is 100,

 the median is 60.

(16)

Relationships among the mean(m), the median(M) and the mode(Mo)

 A symmetrical curve

 A curve skewed to the right

 A curve skewed to the left

m=M=Mo

Mo M m

(17)

Measures of variability (dispersion)

 The range is the difference between the largest number (maximum) and the smallest number (minimum).

 The variance

 The standard deviation

s

x x n

i i

n 2

2 1

= 1

=

( )

s

x x n

i i

n

=

=

( )

2

1

1

(18)

Example

Var 1 Var 2 Var 3

2 12 20

3 13 30

4 14 40

5 15 50

8 18 80

9 19 90

9 19 90

10 20 100

40 50 400

44 54 440

44 54 440

62 72 620

mean=20 mean=30 mean=200

SD=21,0971777 SD=21,0971777 SD=210,971777

(19)

Displaying data

 Categorical data

 barchart

 piechart

 Continuous data

 dot plot

 histogram

 box-whisker plot

 mean-standard deviation plot

 scatterplot

(20)

Bar chart and histogram

Discrete: Continuous:

SEX

SEX

female male

Frequency

14 12 10 8 6 4

2 0

age in years

65.0 55.0 45.0 35.0 25.0 15.0 5.0

Histogram

Frequency

10

8

6

4

2

0

(21)

Box-plot

595 474

N =

Boxplot

Volt-e levert időszaka életében

nem igen

Je le nl eg i te st sú ly a /k g/

120 100 80 60 40 20 0 -20

(22)

How create a box-plot

 We need

 Median(P 50% ), P 25% and P 75%

 Calculalate the differences of

 d 1 =P 50% - P 25% and

 d 2 =P 75% - P 50%

 Then calculate 1.5 x d 1 and 1.5 x d 2 .

 And plot

(23)

Mean and standard deviation

595 474

N =

Átlag és standard deviáció

Volt-e levert időszaka életében

nem igen

Mean +- 1 SD Jelenlegi testsúlya /kg/

70

60

50

40

(24)

Box-plot vs Mean±SD plot

 Box-plot

 Give information about the symmetry

 Mean and standard deviation plot

 Could be used for normal distributed data

Hivatkozások

KAPCSOLÓDÓ DOKUMENTUMOK

A heat flow network model will be applied as thermal part model, and a model based on the displacement method as mechanical part model2. Coupling model conditions will

Major research areas of the Faculty include museums as new places for adult learning, development of the profession of adult educators, second chance schooling, guidance

The decision on which direction to take lies entirely on the researcher, though it may be strongly influenced by the other components of the research project, such as the

In this article, I discuss the need for curriculum changes in Finnish art education and how the new national cur- riculum for visual art education has tried to respond to

Respiration (The Pasteur-effect in plants). Phytopathological chemistry of black-rotten sweet potato. Activation of the respiratory enzyme systems of the rotten sweet

XII. Gastronomic Characteristics of the Sardine C.. T h e skin itself is thin and soft, easily torn; this is a good reason for keeping the scales on, and also for paying

An antimetabolite is a structural analogue of an essential metabolite, vitamin, hormone, or amino acid, etc., which is able to cause signs of deficiency of the essential metabolite

Perkins have reported experiments i n a magnetic mirror geometry in which it was possible to vary the symmetry of the electron velocity distribution and to demonstrate that