Probability Distribution for a Discrete Random Variable

1. INTRODUCTION INTO PROBABILITY DISTRIBUTIONS

1.4. Probability Distribution

1.4.1. Probability Distribution for a Discrete Random Variable

The probability distribution for a discrete random variable (X) assigns a probability 0 ≤ P(x) ≤ 1 to each distinct value (x) of a discrete random variable. The sum of all possible probability values must equal 1. To present a probability distribution graphically for a discrete random variable a stick plot is the correct type because it does not take on any value between two integers. The values of X are placed along the horizontal axis and the probabilities of the outcomes on the vertical axis.

The expected value of a discrete random variable is the weighted average of all possible values (x) of the variable, where the weights are the probabilities P(x):

μ_X = E(X) = ΣxP(x) The variance of a discrete random variable is:

σ²X = Var(X) = Σ(x-μ_X)²P(x) = Σx²P(x)-μ_X² EXAMPLE 1.4.1

Assume that 5 (n) cows are calving. The probability and the cumulative distribution for the number (x) of the female newborn calves is as follows, where the probability P(x) of the outcomes is:

= = =

The mean and the variance are calculated as:

μ = 0 ∙ 1 selected for planting. The probability distribution for the number (x) of planted apple trees is as follows, with the probability P(x) of the outcomes:

= = ∙

The mean and the variance are calculated as:

μ = E X = 1 ∙ + 2 ∙ + 3 ∙ = = 1.8 σ = Var X = 1 ∙ + 2 ∙ + 3 ∙ − 1.8 = 0.36

σX = 0.6

Observations from experiments can be classified and described by a certain probability distribution. It is important to be able to recognize the type of a discrete distribution. The most common probability distributions are the following:

 Uniform distribution: the probability of every outcome is the same (P(X)=1/k, where k is the number of the possible outcomes), the probability function is defined only by one parameter, the number of possible outcomes. The mean and the variance of a discrete uniform variable of consecutive integers between a ≤ b are

= =

If the experiment has two possible outcomes (‘success’ and ‘failure’), the trial and the variable is called binary ~ Bernoulli. This trial is repeated several times (n) under identical conditions.

 Binomial distribution: the probability function for this dichonomous event is defined by two parameters: the number of trials (n), and the probability (p) of the specific preferred outcome (called ‘success’), which is the same for each trial. The trials are independent because binomial probabilities are the result from a sampling with replacement. The probability function for the variable X (0 ≤ x ≤ n) representing the occurrence number of the preferred outcome:

= 1 −

The mean and the variance of a binomial variable are

= =

 Hypergeometric distribution: the probability is not constant, and the trials are not independent because trials are made without replacement. If the population is relatively small, the hypergeometric distribution should be used. If random sample is drawn from a very large population (n < 0.05N), the trials are approximately independent, so the probabilities do not change much and the hypergeometric distribution approaches the binomial one. The probability function for the variable X (0 ≤ x ≤ min(k,n)) representing the number of preferred outcomes (‘successes’) occurring in a sample of n from a set of size N (n < N) with k of them being ‘successes’ that is, the probability of ‘success’ is p = k/N:

EXAMPLE 1.4.3

A biologist is studying a new hybrid plant of which seeds have probability of germinating 0.85 (p).

The biologist plants four (n) seeds. What is the probability that exactly x seeds will germinate, what is the expected number and the standard deviation of germinating seeds?

= 4 0.85 0.15 = 4 ∙ 0.85 = 3.4 = √4 ∙ 0.85 ∙ 0.15 = √0.51=0.7141 The mean and the variance can be also calculated in EXAMPLE 1.4.1, as:

= 5 ∙ 0.5 = 2.5, σ² = 5 ∙ 0.5 ∙ 0.5 = 1.25

−−

The mean and the variance of a hypergeometric variable are

= = = 1 − = 1 −

There are distributions in case of which n is not fixed but they give the probability of reaching a particular number of successes (k) from a certain number of trials.

 Negative binomial distribution: gives the probability of reaching the k^th success in the n^th trial

= − 1

− 1 1 − = =

 Geometric distribution: is a special form of the negative binomial distribution, which gives the probability of the first (k = 1) success in the n^th trial.

= 1 − = =

EXAMPLE 1.4.4

In a box there are 2 (k) ill and 10 (N - k) healthy pigs, from which 3 (n) pigs are selected at random.

What is the probability, the expected number and the standard deviation of the number of ill animals in the random selection?

= = 3 = 0.5 = 3 1 − = √0.34

The mean and the variance can be also calculated in EXAMPLE 1.4.2. as μ = 3 = 1.8 σ² = 3 1 − = 0.36

EXAMPLE 1.4.5

In a herd the pregnancy rate of the cows is approximately 40%. Denote n the random variable that represents the number of cows that must be examined to find the first pregnant one. What is the probability distribution of the random variable n, and the expected number of cows to be examined?

= 0.4 0.6 = _. = 2.5

If a distribution of the probability of rare events, occurring infrequently in time or space, is examined, that is, there is a small probability of an occurrence, the distribution is:

 Poisson distribution: the probability distribution of the number of rare events in a fixed time, area, volume or any other quantity that can be subdivided into smaller and smaller intervals, based on long-term experience, with a known average rate over designated intervals (μ = λ), the mean and the variance of the distribution are:

= _! x = 0,1,2,… = =

where e is the base of the natural logarithm, and the only parameter, λ defines the distribution. Binomial distribution can be approximated by a Poisson one if n is large (n > 100) and p is small (np < 10).

EXAMPLE 1.4.6

Approximately 4.0% of untreated Jonathan apples has bitter pit disease. Denote n the random variable that represents the number of apples that must be examined to find the first one with bitter pit. What is the probability distribution of the random variable n, and the expected number of apples that must be examined?

= 0.04 0.96 = _. = 25

EXAMPLE 1.4.7

There are 100 potato plants in a garden, on which 50 Colorado potato beetles randomly land. What is the expected number of beetles per plant and the probability of plants with X = 0,1,2,... beetles?

μ = λ = 0 · 0.6065 + 1 · 0.3033 + ⋯ = = 0.5 beetles/plants Number of beetles X P(X) Estimated numbers of plant

0 0.6065 61

1 0.3033 30

2 0.0758 7

3 0.0126 1

4 0.0016 0

5 0.0002 0

more than 5 0.0000 0

Sum 1 100

In document Biometry (Pldal 15-20)