ST 371 (IX): Theories of Sampling Distributions

Transcription

ST 371 (IX): Theories of Sampling Distributions
ST 371 (IX): Theories of Sampling
Distributions
1
Sample, Population, Parameter and Statistic
The major use of inferential statistics is to use information from a sample
to infer characteristics about a population.
A population is the complete collection of subjects to be studied; it contains all subjects of interest. A sample is a part of the population of interest,
a sub-collection selected from a population. A parameter describes a characteristic of a population, while a statistic describes a characteristic of a
sample. In general, we will use a statistic to infer the value of a parameter.
Unbiased Sample: A sample is unbiased if every individual or the element
in the population has an equal chance of being selected. Next we discuss
several examples occurred in survey sampling.
1. Survey in presidential election.
(a) Option I: Call all registered voters on the phone and ask them who
they will vote for. Although this would provide a very accurate
result, it would be a very tedious and time consuming project.
(b) Option II: Call 4 registered voters,1 in each time zone, and ask
them who they will vote for. Although this is a very easy task, the
results would not be very reliable.
(c) Option III: Randomly select 20,000 registered voters and poll them.
The population of interest here is all registered voters, and the
parameter is the percentage of them that will vote for a candidate.
The sample is the 20,000 registered voters that were polled, and the
statistic is the percentage of them that will vote for a candidate.
2. Kathy wants to know how many students in her city use the internet
for learning purposes. She used an email poll. Based on the replies
1
to her poll, she found that 83% of those surveyed used the internet.
Kathys sample is biased as she surveyed only the students those who
use the internet. She should have randomly selected a few schools and
colleges in the city to conduct the survey.
3. Another classic example of a biased sample and the misleading results
it produced occurred in 1936. In the early days of opinion polling, the
American Literary Digest magazine collected over two million postal
surveys and predicted that the Republican candidate in the U.S. presidential election, Alf Landon, would beat the incumbent president,
Franklin Roosevelt by a large margin. The result was the exact opposite. The Literary Digest survey represented a sample collected from
readers of the magazine, supplemented by records of registered automobile owners and telephone users. This sample included an overrepresentation of individuals who were rich, who, as a group, were more
likely to vote for the Republican candidate. In contrast, a poll of only
50 thousand citizens selected by George Gallup’s organization successfully predicted the result, leading to the popularity of the Gallup poll.
Conclusion: To use a sample to make inferences about a population, the
sample should be representative of the population (unbiased).
2
Statistics and their Distributions
A statistic is a random variable, denoted by an upper case letter whose value
can be computed from sample data. We often use a statistic to infer the
value of a parameter. Examples include
• Measures of location:
– Suppose we observe n realizations
Pn of random variable X: x1 , · · · , xn ,
1
the sample mean is x̄ = n i=1 xi . In contrast, the population
mean is E(X) = µ.
2
– the sample median: let x(1) , · · · , x(n) denote the ordered values.
If n is odd, then x̃ = x( n+1
. If n is even, x̃ = 1/2[x( n2 ) + x( n2 +1) ]. In
2 )
contrast, the population median is µ̃ = FX−1 (0.5).
• Measure of variability: the sample variance
n
1 X
S =
(xi − x̄)2 .
n − 1 i=1
2
Note that the population variance is
σ 2 = V (X) = E(X − µ)2 .
• Measure of contrasts: Consider random samples from two populations
{x1 , · · · , xn } and {y1 , · · · , ym }, for example, in a randomized clinical
trial, the difference of the quality of life (QOL) between the patients
(or survival time, or cure rate) on two treatment arms T = x̄ − ȳ. The
contrast between two populations is
µX − µY = E(X) − E(Y ).
Each statistic is a random variable and has a probability distribution.
The probability distribution of a statistic is referred to as its sampling distribution. The sampling distribution depends not only on the population
distribution but also on the method of sampling. The most widely used
sampling method is random sampling with replacement.
The random variables X1 , · · · , Xn are said to form a random sample of
size n, or be independently identically distributed (i.i.d.), if
1. The Xi s are independent rv’s.
2. Every Xi has the same probability distribution.
Denote by µ and σ 2 the mean and variance of the random variable X.
The next theorem follows from the results on the distribution of a linear
combination that we shall discuss in Section 4.
3
Theorem on the distribution of the sample mean X̄.
1. E(X̄) = µX̄ = µ.
2
= σ 2 /n.
2. V(X̄) = σX̄
√
3. σX̄ = σ/ n.
Example 1 Let X1 , · · · , X5 be a random sample from a normal distribution
with µ = 1.5 and σ = 0.35.
• Find the probability that P (X̄ ≤ 2.0).
P
• Find the variance of 5i=1 Xi .
4
Example 2 Service time for a certain bank transaction is a random variable having an exponential distribution with parameter λ. Suppose X1 and
X2 are service times for two independent customers. Consider the average
service time X̄ = (X1 + X2 )/2.
• Find the cdf of X̄.
• Find the pdf of X̄.
• Find the mean and variance of X̄.
5
3
3.1
Limit Theorems
Weak law of large numbers
Consider a sample of independent and identically distributed random variables X1 , · · · , Xn . The relationship between the sample mean
X̄n =
X1 + · · · + Xn
n
and true mean of the Xi ’s, E(Xi ) = µ, is a problem of pivotal importance
in statistics. Typically, µ is unknown and we would like to estimate µ based
on X̄n . The weak law of large numbers says that the sample mean converges
in probability to µ. This means that for a large enough sample size n, X̄n
will be close to µ with high probability.
The weak law of large numbers. Let X1 , X2 , · · · be a sequence of independent and identically distributed random variables, each having finite mean
E(Xi ) = µ. Then, for any ² > 0,
¯
©¯
ª
(3.1)
P ¯X̄n − µ¯ ≥ ε → 0 as n → ∞.
Example 3 A numerical study of the law of large numbers. We first simulate normal random variables from N (5, 1) with different sample sizes, then
calculate the difference between the sample mean and the population mean.
n
Bias: X̄n − µ
5
0.8323
20
-0.1339
500
0.0368
10000
0.0069
50000
-0.0092
We can see that X̄n based on a large n tends to be closer to µ than does
X̄n based on a small n.
Example 4 (optional) Application of Weak Law of Large Numbers: Monte
Carlo Integration. Suppose that we wish to calculate
Z 1
I(f ) =
f (x)dx,
0
6
where the integration cannot be done by elementary means or evaluated
using tables of integrals. The most common approach is to use a numerical
method in which the integral is approximated by a sum; various schemes
and computer packages exist for doing this. Another method, called the
Monte Carlo method, works in the following way. Generate independent
uniform random variables on (0,1), that is, X1 , · · · , Xn , and compute
n
1X
f (Xi ).
I(fˆ) =
n i=1
By the law of large numbers, for large n, this should be close to E[f (X)],
which is simply
Z 1
E[f (X)] =
f (x)dx = I(f ).
0
This simple scheme can easily be modified in order to change the range of
integration and in other ways. Compared to the standard numerical methods, it is not especially efficient in one dimension, but becomes increasingly
efficient as the dimensionality of the integral grows.
3.2
Strong law of large numbers (optional)
The strong law of large numbers states that for a sequence of independent
and identically distributed random variables X1 , X2 , · · · , the sample mean
converges almost surely to the mean of the random variables E(Xi ) = µ.
Let be a sequence of independent and identically distributed random
variables, each having a finite mean µ = E(Xi ). Then, with probability
1, X̄ → µ as n → ∞. The weak law of large numbers states that for
any specified large value n∗ , X̄n∗ is likely to be near µ. However, it does
not say that X̄n is bound to stay near µ for all values of n larger than n∗ .
Thus, it leaves open the possibly that large values of |X̄n − µ| can occur
infinitely often (thought at infrequent intervals). The strong law shows that
this cannot occur. In particular, it implies that with probability 1, for any
positive value ², |X̄n −µ| will be greater than ² only a finite number of times.
7
The strong law of large numbers is of enormous importance, because it
provides a direct link between the axioms of probability and the frequency
interpretation of probability. If we accept the interpretation that “with
probability 1” means “with certainty”, then we can say that P (E) is the
limit of the long-run relative frequency of times E would occur in repeated,
independent trials of the experiment.
3.3
Central limit theorem
The weak law of large numbers says that for X1 , · · · , Xn , iid, the sample
mean X̄n is close to E(Xi ) = µ when n is large. The Central Limit Theorem
provides a more precise approximation by showing that a magnification
of the distribution of X̄n around µ has approximately a standard normal
distribution:
The Central Limit Theorem (CLT): Let X1 , · · · , Xn be a sequence of independent and identically distributed random variables, each having finite
mean E(Xi ) = µ and finite variance Var(Xi ) = σ 2 . Then the distribution
−µ
n√
tends to the standard normal distribution as n → ∞. That is, for
of X̄
σ/ n
−∞ < a < ∞,
µ
¶
Z a
X1 + · · · + Xn − nµ
1
2
√
P
e−x /2 dx
≤a → √
σ n
2π −∞
as n → ∞. The theorem can be thought of as roughly saying that the
sum of a large number of iid random variables has a distribution that is
approximately normal. By writing
¡
¢
X1 + · · · + Xn − nµ n X̄n − µ
X̄n − µ
√
√
√ ,
=
=
σ n
(σ n)
σ/ n
we see that the CLT says that the sample mean X̄n has a approximately
√
a normal distribution with mean µ and variance σ/ n. The CLT is a
remarkable result - only assuming that a sequence of iid random variables
have a finite mean and variance, the central limit theorem shows that the
mean of the sequence, suitably standardized, always converges to having a
8
standard normal distribution. The normal approximation to the binomial
distribution is a special case of the central limit theorem.
Consider a skewed distribution (lognormal). Consider the histogram of
the sample mean X̄n for n = 1, 5, 20, 50.
n=5
800
600
0
200
400
Frequency
1000
500
0
Frequency
1500
n=1
1
2
3
4
0.0
0.5
1.0
x1.bar
x2.bar
n=10
n=30
1.5
600
200
400
Frequency
1000
600
0
0 200
Frequency
800
0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
0.2
x3.bar
0.4
0.6
0.8
1.0
x4.bar
We can see from the histograms that the sampling distributions become
progressively less skewed as the sample size n increases, therefore the distribution can be better approximated by a normal distribution. This interesting result shows that the central limit theorem can be successfully applied
when n is large. In general, the rule of thumb is n > 30.
9
Example 5 An Airline “overbooks” a flight because it expects that there
will be no-shows. Assume that (i) There are 200 seats available on the
flight. (ii) Seats are occupied only by individuals who made reservations
(no standbys). (iii) The probability that a person who made a reservation
shows up for the flight is 0.95. (iv) Reservations show up for the flight
independently of each other.
1. If the airline accepts 220 reservations, write an expression for the exact
probability that the plane will be full (i.e., at least 200 reservations show
up). Use the central limit theorem to approximate this probability.
2. Suppose the airline wants to choose a number n of reservations so that
the probability that at least 200 of the n reservations show up is 0.75.
Find the (approximate) minimum value of n.
10
Example 6 The number of parking tickets issued in Raleigh on any given
weekday has a Poisson distribution with parameter λ = 50. What is the
approximate probability that
(a) Between 35 and 70 tickets are given out on a particular day?
(b) The total number of tickets given out during a 5-day week is between
225 and 275?
11
4
Distribution of a Linear Combination
Given a collection of n random variables X1 , · · · , Xn and n numerical constants a1 , · · · , an , the rv
Y = a1 X1 + · · · + an Xn =
n
X
ai Xi
i=1
is called a linear combination of the Xi s.
Let X1 , · · · , Xn have means µ1 , · · · , µn , respectively, and variances σ12 , · · · , σn2 ,
respectively. Then
1. E(a1 X1 + a2 X2 + · · · + an Xn ) = a1 E(X1 ) + a2 E(X2 ) + · · · + an E(Xn ) =
a1 µ1 + · · · + an µn .
2. If X1 , X2 , · · · , Xn are independent, then
Var(a1 X1 +· · · , an Xn ) = a21 Var(X1 )+· · ·+a2n Var(Xn ) = a21 σ12 +· · ·+a2n σn2 .
3. For any (possibly dependent) random variables X1 , · · · , Xn ,
Var(a1 X1 + · · · + an Xn ) =
n X
n
X
ai aj Cov(Xi , Xj ).
i=1 j=1
The case of normal random variables:
If X1 , · · · , Xn are independent, normally distributed rv’s, then any particular linear combination of the Xi s are also normally distributed.
Special cases:
1. E(X̄) = µX̄ = µ.
2
2. If all Xi are independent, V(X̄) = σX̄
= σ 2 /n.
3. E(X1 − X2 ) = E(X1 ) − E(X2 ).
4. If Xi are independent, then V (X1 − X2 ) = V (X1 ) + V (X2 ). Otherwise
V (X1 − X2 ) = V (X1 ) + V (X2 ) − 2Cov(X1 , X2 ).
12
Example 7 The total revenue from the sale of the three grades of gasoline
on a particular day was Y = 21.2X1 +21.35X2 +21.5X3 . Assume that X1 , X2
and X3 are independent with µ1 = 1000, µ2 = 500, µ3 = 300, σ1 = 100,
σ2 = 80 and σ3 = 50. What is the probability that the revenue exceeds
45000?
13
Example 8 A student has a class that is supposed to end at 9am and
another class that is supposed to begin at 9:10am. Suppose that the actual
ending time (after 9 in minutes) X1 ∼ N (2, 1.52 ) and the starting time of
the next class X2 ∼ N (10, 12 ). Suppose also that the time to get from one
location to next location X3 ∼ N (6, 12 ). What is the probability that a
student makes it to the second class before the lecture starts.
14
Example 9 Three different roads feed into a particular freeway entrance.
Suppose that during a fixed time period, the number of cars coming from
each road onto the freeway Xi is normally distributed, with X1 ∼ N (750, 162 ),
X2 ∼ N (1000, 242 ) and X3 ∼ N (550, 182 ).
(a). What is the expected total number of cars entering the freeway at this
point during the period?
(b). Suppose X1 , X2 and X3 are independent. Find the probability P (X1 +
X2 + X3 > 2500).
(c). Now suppose that the three streams of traffic are not independent, and
Cov(X1 , X2 ) = 80, Cov(X1 , X3 ) = 90 and Cov(X2 , X3 ) = 100. Compute the
expected value and variance of the total number of entering cars.
15

Documents pareils