Examen Final

Transcription

Examen Final

MAT 3777
Examen Final
Date: 29 avril 2009
Durée: 3 heures
Professeur: G. Lamothe
# d’étudiant:
Nom:
Prénom:
Ceci est un examen à livre fermé.
Seules les calculatrices non-programmables et
non-graphiques sont permises.
Deux feuilles (recto-verso) de formules sont permises.
Total = 90 points
Il y a 24 pages.
1
Question 1 (10 points) Considérons une population de taille N = 5 et
supposons qu’on connait
y1 = 10, y2 = 12, y3 = 8, y4 = 6, y5 = 20.
(a) Calculer la moyenne et la variance de la population.
(b) Considérons le plan d’échantillonage suivant.
S
{1, 5}
{2, 3}
{2, 5}
{2, 4}
P (S)
1/4
1/4
1/4
1/4
Si y est la moyenne de l’échantillon. Calculer E[y], V [y], biais[y], EQM [y].
(c) Est-ce que le plan d’échantillonage en (b) est simple? Pourquoi?
English Version: Consider a population of size N = 5 and suppose that
we know
y1 = 10, y2 = 12, y3 = 8, y4 = 6, y5 = 20.
(a) Give the population mean and variance.
(b) Consider the following sampling design.
S
{1, 5}
{2, 3}
{2, 5}
{2, 4}
P (S)
1/4
1/4
1/4
1/4
If y is the sample mean. Determine E[y], V [y], biais[y], EQM [y].
(c) Is the sampling design from (b) simple? Why?
2
(Question 1 : cont.)
3
Question 2 (10 points) Considérons le modèle suivant: Y1 , . . . , YN sont
indépendants tel que
EM [Yi ] = β xi
et VM [Yi ] = σ 2 xi ,
pour i = 1, . . . , N .
(a) Démontrer que l’estimateur par les moindres carrés pondérés pour β est
le quotient
Y
βb = .
x
(b) En utilisant la partie (a), démonter que notre prévision pour le total de
la population est de la forme
tby =
y
tx .
x
English Version: Consider the following model Y1 , . . . , YN are independent such that
EM [Yi ] = β xi and VM [Yi ] = σ 2 xi ,
for i = 1, . . . , N .
(a) Show that the weighted least squares estimator for β is the ratio
βb =
Y
.
x
(b) Using part (a), show that are prediction for the total of the population
is of the form
y
tby = tx .
x
4
5
Question 3 (10 points)
Une banque a 150 000 clients avec des comptes d’épargnes, et 50 000 de ces
clients ont moins que 25 ans. Le gérant veut une estimation du solde moyen
des comptes le 31 décembre. Il soupçonne que les jeunes clients ont des soldes
plus petits que les clients plus matures. Le gérant assigne un documentaliste pour examiner la base de données pour cueillir un échantillon de 300
détenteurs de comptes stratifié par âge comme ci-haut avec une allocation
proportionnelle. La moyenne pour les jeunes est $1200 avec un écart type de
$900, tandis que plus vieux ont une moyenne de $3600 avec un écart type de
$1800.
(a) Est-ce une post-stratification? Pourquoi?
(b) Déterminer l’estimation du solde moyen pour tous les jeunes clients et
son erreur type.
(c) Déterminer l’estimation du solde moyen pour tous les clients et son erreur
type.
(d) Le gérant veut une estimation du solde moyen pour tous les clients pour la
fin du mois mars. En supposant que les écarts types des strates n’ont pas trop
changé, déterminer l’allocation optimale pour un plan stratifié. Déterminer
la taille d’échantillon requise afin que la marge d’erreur soit au plus $50 à un
niveau de confiance de 95%.
English Version: A bank has 150 000 clients with savings accounts, and
50 000 of those clients are 25 years old or younger. The manager wants an
estimate of the mean balance in those accounts on December 31. He suspects
that the younger clients have smaller balances compared to the more mature
clients. He assigns a researcher to examine the data base to collect a sample
of 300 clients stratitied by age as above with a proportional allocation. The
mean for the young clients is $1200 with a standard deviation of $900, while
the more mature clients have a mean of $3600 with a standard deviation of
$1800.
(a) Is it a post-stratification? Why?
(b) Give an estimate for the mean balance of all the young clients with its
standard error.
(c) Give an estimate for the mean balance of all the clients with its standard
error.
(d) The manager want an estimate for the mean balance of all clients for the
end of March. Assuming that the standard deviations of the strata have not
changed, determine the optimal allocation for a stratified design. Determine
6
the sample size required so that the margin of error is at most $50 at a level
of confidence of 95%.
7
8
Question 4 (10 points) Considérons un échantillon aléatoire simple de n =
100 comtés des N = 3141 comtés aux Etats-Unis. Le nombre moyen de
vétérans par comté est y = 12 250 vétérans et l’écart type de l’échantillon est
s = 47574.9 vétérans.
(a) Estimer le nombre total de vétérans aux Etats-Unis et calculer l’erreur
type de l’estimation.
(b) Pour améliorer la précision de l’estimation en (a) nous allons utiliser une
variable auxiliaire x=population du comté en 1994. En 1994, la population
des Etats-Unis est estimer à 255 077 036. Estimer le nombre total de vétérans
aux Etats-Unis par une régression et donner l’erreur type de l’estimation.
Nous avons utiliser le programme de SAS suivant pour produire la sortie
qui suit.
proc reg data=counties;
model veterans=totpop;
run;
9
English Version: Consider a simple random sample of n = 100 counties from the N = 3141 counties in the United States. The mean number
of veterans per county is y = 12 250 veterans with a standard deviation of
s = 47574.9 veterans.
(a) Estimate the total number of veterans in the United States and calculate the standard error.
(b) To improve the precision of the estimate in (a) we shall use an auxiliary variable x=population of the county in 1994. In 1994, the population
of the United States is estimated at 255 077 036. Estimate the total number
of veterans in the United States with a regression and give the standard error.
The SAS program and the output are above.
10
11
Question 5 (10 points) Référer à la Question 4. Si on considère seulement les comtés ayant une population inférieure à 10 000, alors la moyenne
et l’écart type de l’échantillon de ces 26 comtés est y d = 586 et sd = 337.93,
respectivement.
(a) Estimer le nombre moyen de vétérans par comtés pour les comtés ayant
une population inférieure à 10 000. Calculer l’erreur type de l’estimation.
(b) Estimer le nombre total de vétérans dans les comtés ayant une population
inférieure à 10 000. Calculer l’erreur type de l’estimation.
P
P
N.B. i∈Sd yi = 15 236 et i∈Sd yi2 = 11 783 224.
English Version: Refer to Question 4. If we only consider the counties
with a population inferior to 10 000, then the mean and the standard deviation of those 26 counties are y d = 586 and sd = 337.93, respectively.
(a) Estimate the mean number of veterans per county for the counties with
a population inferior to 10 000. Calculate the standard error of the estimate.
(b) Estimate the total number of veterans per county for the counties with a
population inferior to 10 000. Calculate the standard error of the estimate.
P
P
Note: i∈Sd yi = 15 236 and i∈Sd yi2 = 11 783 224.
12
13
Question 6 (10 points) Un club est formé de 1000 branches locales à travers
le pays pour un total de 2 420 150 membres. Le nombre de membres par
branche varie énormément de la plus grande avec 50 000 membres et la plus
petite avec 10 membres. Nous allons cueillir un échantillon de 10 branches
(avec remplacement) avec des probabilités proportionnelles à la taille.
(a) Quelle est la probabilité que la branche de 50 000 soit dans l’échantillon?
(b) Quelle est la probabilité que la branche de 10 membres soit dans l’échantillon?
(c) Pour la variable y, soit y i la moyenne de la branche i. Voici les résultats
de l’échantillonage:
# de la branche
932
14
334
846
828
511
167
779
677
moyenne
51
43
67
50
40
55
55
50
49
fréquence
2
1
1
1
1
1
1
1
1
Estimer la moyenne de la population y U et calculer l’erreur type de l’estimation.
English Version: A club is formed of 1000 local branches across the
country for a total 2 420 150 members. The number of members per branch
varies greatly with the largest at 50 000 members and the smallest at 10
members. We shall select a sample of 10 branches (with replacement) with
probabilities proportional to size.
(a) What is the probability that the branch of size 50 000 is in the sample?
(b) What is the probability that the branch of size 10 is in the sample?
(c) For the variable y, let y i be the mean for branch i. Here are the results
of the sampling:
14
# of the branch
932
14
334
846
828
511
167
779
677
mean
51
43
67
50
40
55
55
50
49
frequency
2
1
1
1
1
1
1
1
1
Estimate the mean of the population y U et give the standard error of the
estimate.
15
16
Question 7 (10 points) Nous vous montrons 3 rangées dans un ensemble
de données. Chaque rangée est pour un livre qui fut sélectionné.
étagère nombre total de livres coût
sur cette étagère
4
23
23
..
..
..
.
.
.
12
35
12
..
..
..
.
.
.
32
..
.
30
..
.
24
..
.
poids de
sondage
a
..
.
b
..
.
c
..
.
Il y a N = 50 étagères et K = 1500 livres dans la bibliothèque. On
veut estimer le coût total pour remplacer les livres dans une bibliothèque.
Déterminer les poids de sondage a, b, c pour les plans d’échantillonage suivants.
(a) On cueille un échantillon simple de 30 livres.
(b) On considère une stratification des étagères: 1-10,11-20,21-50. Cette
stratification nous donne respectivement 300, 300 et 900 livres. On cueille
10 livres dans chaque strate.
(c) On cueille 5 étagères et on considére tous les livres sur ces étagères.
(d) On cueille 5 étagères et on cueille 5 livres au hasard de ces étagères.
English Version: We are displaying 3 rows from a data set. Each row
represents a book that was selected.
shelf total number of books cost sampling
on the shelf
weight
4
23
23
a
..
..
..
..
.
.
.
.
12
35
12
b
..
..
..
..
.
.
.
.
32
30
24
c
..
..
..
..
.
.
.
.
17
There are N = 50 shelves and K = 1500 books in the library. We want to
estimate the total cost to replace the books. Determine the sampling weights
a, b, c for the following sampling designs.
(a) We select a simple random sample of 30 books.
(b) We consider a stratification of the shelves: 1-10,11-20,21-50. This stratification gives respectively 300, 300 and 900 books. We collect 10 books from
each strata.
(c) We select 5 shelves et consider all books on those shelves.
(d) We select 5 shelves and select 5 books from those shelves.
18
19
Question 8 (10 points) Une agence veut estimer le nombre total d’appels
placés par ses employés pendant une journée. L’agence a N = 100 départements
et le ième département a Mi employés, pour i = 1, 2, . . . , N . Un échantillon
aléatoire simple de n = 4 départements est cueilli. Ensuite un sous-échantillon
de 20% des employés est cueilli de chaque départment. Voici les données:
Dept. nombre d’employés
i
Mi
11
20
25
30
15
36
25
87
nombre d’appels
yij
4, 5, 6, 3
2, 4, 7, 5, 3, 6
6, 7, 6
3, 6, 4 ,5, 2
total
P
j∈Si
yij
18
27
19
20
84
P
taille du
sous-échantillon
4
6
3
5
j∈Si
86
139
121
90
436
yij2
yi
s2i
4.5 1.667
4.5
3.5
6.333 0.333
4.0
2.5
Supposons que le nombre total d’employés est 1500.
a) Estimer le nombre total d’appels téléphoniques.
b) Calculer l’erreur type de l’estimation.
English Version: An agency wants to estimate the total number of calls
made by its employees during the day. The agency has N = 100 departments
and the ith department has Mi employees, for i = 1, 2, . . . , N . A simple
random sample of n = 4 departments is collected. Then, a sub-sample of
20% of the employees is collected from each department. Here are the data:
Dept. number of employees
size of
i
Mi
sub sample
11
20
4
25
30
6
36
15
3
87
25
5
20
number of calls
yij
4, 5, 6, 3
2, 4, 7, 5, 3, 6
6, 7, 6
3, 6, 4 ,5, 2
total
P
j∈Si
yij
18
27
19
20
84
P
j∈Si
86
139
121
90
436
yij2
yi
s2i
4.5 1.667
4.5
3.5
6.333 0.333
4.0
2.5
Suppose that the total number of employees is 1500.
a) Estimate the total number of calls.
b) Compute the standard error of the estimate.
21
22
Question 9 (10 points) Référer à la Question 8.
(a) Décrire l’homogénéité intra-grappe.
(b) En utilisant l’information de la Question 8, planifier un nouveau sondage.
Supposons que çà nous prend environ 8 heures pour nous établir dans un
départment et que çà nous prend environ 45 minutes par employé pour cueillir
nos données. Nous voulons que le prélèvement des données prenne au plus
250 heures.
English Version: Refer to Question 8.
(a) Describe the intra-cluster homogeneity.
(b) Using the information from Question 8, design a new survey. Suppose
that it takes about 8 hours to establish ourselves in a department and that
it takes about 45 minutes per employee to collect our data. We would like
to spend a maximum of 250 hours to collect all of the data.
23
24

Examen Final

Transcription

Documents pareils

Solutionnaire

Statistiques: Moyenne - Ecart-type

TD1

Statistiques I: Séance informatique Exercices sur Excel

2002 Q 2 A) Pour célébrer les résultats du baccalauréat

1 Introduction `a la statistique inférentielle 2 L`échantillonnage

Enoncés des QCM de statistiques

Affûtage d`une chaˆıne de tronçonneuse

2016 programme du 5° pelerinage international populos summorum

Thouars Référence du bien : 6444 06.34.96.29.56

4 points - Ceremade

* Sessions en France