Analyses en Composantes Principales

Transcription

Analyses en Composantes Principales
Fiche TD avec le logiciel
:
tdr61
—————
Analyses en Composantes Principales
A.B. Dufour & D. Chessel
—————
La fiche passe en revue quelques usages de l’analyse en composantes
principales sur différents types de tableaux. On rencontre le non
centrage, le décentrage, le double centrage autour des tableaux de
pourcentages, de notes, de rangs ou de notes d’abondance. Dans cette
famille, le cas le plus utilisé est celui de l’ACP normée ou ACP sur
matrice de corrélation. Cette pratique est incontournable quand le
tableau contient des variables de nature diverse. La variance dépendant des unités, elle n’a pratiquement que la fonction de permettre
la normalisation, c’est-à-dire sa propre disparition. Les tableaux homogènes, au contraire comporte dans chaque cellule un nombre comparable au contenu des autres cellules, qu’il s’agisse d’une notation
unique d’abondance, une présence-absence, un rang, un pourcentage,
etc. L’usage de l’ACP normée peut alors être sans inconvénient ou au
contraire obscurcir définitivement l’information. A l’aide d’exemples,
la fiche regroupe des cas typiques qui permettra de faire des choix
pertinents.
Table des matières
1 Décathlon : le signe des corrélations
2
2 Examen : l’origine est un paramètre libre
5
3 Cohérence d’un jury : compromis
8
4 Reconstitution de données : auto-modélisation
12
5 Pourcentages : représenter des moyennes
14
6 Morphométrie et non-centrage
19
7 ACP et classification
21
8 Truites et valeurs propres
26
9 Voir le tableau
35
1
A.B. Dufour & D. Chessel
10 Notes d’abondance
37
11 Ne pas se tromper de centrage
42
12 Information supplémentaire
44
Références
46
1
Décathlon : le signe des corrélations
Les données (exemple n˚357 dans [6] d’après Lunn, A. D. & McNeil, D.R. (1991)
Computer-Interactive Data Analysis, Wiley, New York) sont dans la librairie.
Examiner l’objet olympic :
library(ade4)
data(olympic)
names(olympic)
[1] "tab"
"score"
is.list(olympic)
[1] TRUE
olympic$score
[1] 8488 8399 8328 8306 8286
[17] 7869 7860 7859 7781 7753
[33] 6907
head(olympic$tab)
100 long poid haut
400
1 11.25 7.43 15.48 2.27 48.90
2 10.87 7.45 14.97 1.97 47.71
3 11.18 7.44 14.20 1.97 48.29
4 10.62 7.38 15.02 2.03 49.06
5 11.02 7.43 12.92 1.97 47.44
6 10.83 7.72 13.58 2.12 48.34
dim(olympic$tab)
8272 8216 8189 8180 8167 8143 8114 8093 8083 8036 8021
7745 7743 7623 7579 7517 7505 7422 7310 7237 7231 7016
110
15.13
14.46
14.81
14.72
14.40
14.18
disq perc jave
1500
49.28 4.7 61.32 268.95
44.36 5.1 61.76 273.02
43.66 5.2 64.16 263.20
44.80 4.9 64.04 285.11
41.20 5.2 57.46 256.64
43.06 4.9 52.18 274.07
[1] 33 10
boxplot(as.data.frame(scale(olympic$tab)))
3
●
−2
−1
0
1
2
●
●
−3
●
●
●
●
●
100 long
haut 400 110 disq
jave
pca1 <- dudi.pca(olympic$tab, scannf = FALSE)
On sélectionne le nombre d’axes à partir du graphe des valeurs propres
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 2/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
0.0
0.5
1.0
1.5
2.0
2.5
3.0
barplot(pca1$eig)
par(mfrow = c(1, 2))
s.corcircle(pca1$co)
olympic2 <- olympic$tab
olympic2[, c(1, 5, 6, 10)] = -olympic2[, c(1, 5, 6, 10)]
pca2 <- dudi.pca(olympic2, scan = F)
s.corcircle(pca2$co)
disq
poid
jave
disq
poid
jave
X1500
X400
perc
haut
X100
X110
long
perc
haut
X110
long
X100
X400
X1500
L’image de gauche est particulièrement trompeuse ! Elle est statistiquement juste
et expérimentalement fausse. La performance des athlètes augmente avec les distances des lancers, la hauteur et la longueur des sauts, elle décroı̂t avec le temps
des courses. L’image de droite est mathématiquement équivalente et expérimentalement correcte.
plot(olympic$score, pca1$l1[, 1])
abline(lm(pca1$l1[, 1] ~ olympic$score))
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 3/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
2
●
●
●
●
●
1
●
●
●
●
●
●
●
●
●
0
pca1$l1[, 1]
●
●
●
●●
●
●
●
●
−1
●
● ●●
●
●
●●
●
7000
7500
●
8000
8500
olympic$score
Commenter.
s.label(pca1$l1, clab = 0.5)
s.arrow(2 * pca1$co, add.p = T)
d=1
17
poid disq
18
X1500
jave
31
28
X400
10
11
20
1
7
perc
X100
4
23
9
haut
2
3
X110
16
8
21
22
12
15
14
long
30
32
24
25
6
5
13
26
19
27
29
33
Ces deux figures sont des doubles représentations euclidiennes des nuages et des
bases canoniques. La fonction générique scatter pour ce type d’analyse retient
cette propriété.
par(mfrow = c(1, 2))
s.label(pca1$li, clab = 0.5)
s.arrow(5 * pca1$c1, add.p = T)
scatter(pca1)
scatter(pca2)
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 4/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
d=2
17
Eigenvalues
poiddisq
d=2
17
X1500
jave
X400
disq
poid
jave
10
111
7
4
perc
9
haut
2
3 8
long
X1500
18
18
31
X400
perc
20
23
16
12
15
21
22
14
4
X100
X110
30
haut
2
32
5
long
X110
23
16
12
15
21
22
24
30
32
25
5
13
26 29
19
33
27
33
27
2
38
X100
20
6
26 29
19
10
111
7
9
14
24
25
6
13
31
28
28
Examen : l’origine est un paramètre libre
deug est une liste à trois composantes :
$tab un data frame avec 104 lignes-étudiants et 9 colonnes-disciplines académiques
$result un facteur donnant une synthèse du résultat à l’examen (A+, A, B,
B-, C-, D) pour les 104 étudiants
$cent un vecteur contenant la moyenne théorique pour chacune des disciplines
étudiées, ceci en fonction de leur coefficient.
data(deug)
names(deug)
[1] "tab"
"result" "cent"
deug$result
[1] C- B B A B C- B B
[27] C- B B B B B B B
[53] A+ A A C- B A C- A
[79] A C- C- C- B B B A
Levels: D A A+ C- B- B
names(deug$tab)
[1] "Algebra"
"Analysis"
[7] "Option2"
"English"
D
B
B
B
A
CB
A
B
D
B
A
"Proba"
"Sport"
B
D
CD
A
D
CD
CB
D
B-
B
B
BC-
B
B
CC-
A
A
CD
D
B
A
A
B
CB
B
A+
B
D
B
A
B
A
B
"Informatic" "Economy"
B
B
B
B
B
B
B
B
B
B
CB
B
B
CB
B
D
CB
"Option1"
Exécuter et interpréter une analyse en composantes principales normée et une
analyse en composantes principales décentrée.
args(dudi.pca)
function (df, row.w = rep(1, nrow(df))/nrow(df), col.w = rep(1,
ncol(df)), center = TRUE, scale = TRUE, scannf = TRUE, nf = 2)
NULL
pcanor <- dudi.pca(deug$tab, scann = F)
Définir et interpréter le graphe canonique.
plotreg <- function(x, y) {
par(mar = c(2, 2, 2, 2))
plot(y, x, pch = 20)
grid(lty = 1)
abline(lm(x ~ y))
mtext(paste("r = ", round(cor(x, y), digits = 3), sep = ""))
}
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 5/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
par(mfrow = c(3, 3))
apply(deug$tab, 2, plotreg, y = pcanor$l1[, 1])
NULL
pcanor$co
Comp1
Comp2
Algebra
-0.7924753 0.09239958
Analysis
-0.6531896 0.46263515
Proba
-0.7410261 0.24213427
Informatic -0.5287294 0.28126036
Economy
-0.5538660 -0.62678201
Option1
-0.7416171 0.01601705
Option2
-0.3336153 -0.37599247
English
-0.2755026 -0.66982537
Sport
-0.4171874 -0.13974793
●
●
●
●● ●
●● ●
●
●● ●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●●
●
●● ● ●● ● ● ●
●
●
●● ●
● ●
● ●●
●●●
●●●
●
● ●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
● ●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
20
●
●
3
−2
−1
●
● ●
● ●
●
30
●
●
50
●
0
1
2
●
●
●
3
−2
−1
0
1
30
●
●
●
2
●
●
● ●●
● ● ● ●
●
● ●
●
● ●● ●
●● ● ● ● ● ●
●
●●●● ●
●
●●●
●●
● ●
● ●●● ●
● ●
● ●● ●
●
● ●
●
●
● ●● ●
●●
● ● ● ●● ●
●● ●
● ●
●
●●●
● ●● ●
●●
●
●
●
●
●
● ●
● ●
●
●
3
−2
25
●
●
●
●
15
10
●●
●
−1
0
1
2
3
r = −0.417
y
●
●●
●
●
●
● ●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
● ●●
●●
●
● ●●
●
●
● ● ●●●
●●●
●
●
●
●
●●
●
●
●
●
●
●●● ● ● ●
●
●
● ●
●
●
● ●●
●
● ●●●
● ●
●
●
●
● ● ● ● ●
●
● ●
●
●
●
●
●
●
●
20
30
25
●
●
●
r = −0.276
y
●
● ● ●●● ● ●
●
●
●
●
●
●●
●●● ●● ●
●●
●
● ●
● ●
●
●
●
● ●
●●
●
●
● ● ●● ● ●●
●●●
● ● ● ●● ● ● ●
●
●
●● ● ● ● ● ●
●
●
●
●●
●●
● ●● ● ● ●●
●
●
●●
●
●
●
●
●
●
●
3
●
●
r = −0.334
y
●
●
●
●
−1
2
r = −0.742
y
35
●
●
●●
●
●
●
● ●
● ●● ● ● ●
●
● ●
●
●
●
●●
●●● ●
●
● ●● ●
●
● ● ● ●● ●● ●
●
●
● ●●
●● ●
●
●
●●
●●● ●
●
●
● ●●
●
● ●
●
●
●
●● ● ●
●
● ● ●●
●
●
1
●●
●
●
40
20
10
●
0
●
●
●
● ●
●● ●
●
●
●
●
●
●●
● ●●
●●
●
●● ●●
●● ●●●
●
●●
● ● ●● ●●●
●
● ●
●
●
●● ●
●
●
●
●
●
● ●
● ●● ●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
−2
−1
0
1
2
●
3
−2
−1
0
1
2
3
0
●
5
●
●
10
20
●
● ●
●●
●
●
● ● ● ●●
●
●
●
● ● ●●
●
● ●
●●
●
● ●
●
● ● ●●
● ●●
●
●●
● ●●● ● ●
● ● ●●
●
●
●●
●●
●
● ●
●● ● ●●●● ● ●
●
●
●
●
●
●
●●
● ●
●
● ●
●
●
−2
15
●
70
●
x
●
●
●
●
●
●
−1
●
●
25
●
x
40
●
●
−2
15
●
●
3
30
80
●
●
2
●
●
60
50
●
●
●
1
r = −0.554
y
90
r = −0.529
y
0
●
●
10
2
15
1
10
0
●
●
5
−1
●
●
● ● ●●
●
● ●● ●
●
●
●●
●
●● ●
●●
●
●●
●
● ●
●●
●
● ● ● ● ● ●●
●●
●● ●●
● ●● ●
●
●
●●
●
● ●
●
● ● ●●
●●
● ●●●● ●
●
●
●●
●
●
●
● ●●●●
●
●
●
●●
●●
●
●
● ●
● ●
●
x
−2
●
●
0
●
x
●
10 20 30 40 50 60
●● ●
20
50
●
●
●
●
●●
●
●● ● ●
●● ● ●
●
●
● ●●● ● ● ●● ●
●
●● ● ●●●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●● ●
●●●
●
● ●
●●
●
●
●
●
●
● ●● ●
●
●●
●
●
● ●● ●
●
●
●●
●
●
●
●
●
●
●
x
●
●
●
40
●
r = −0.741
●
●
● ●
30
●
r = −0.653
●
●
x
10 20 30 40 50 60 70
r = −0.792
●
●
−2
−1
● ●●
●
●● ●●
●●
● ● ● ● ●● ●
0
1
2
● ●
3
La fonction générique score utilise ce point de vue :
score(pcanor)
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 6/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
Algebra
Analysis
●
●
● ●
50
−2
10
●
●
90
y
3
●
●
●
●
●
●
25
y
70
●
●
●
2
3
● ●●
● ● ●
● ●
●
● ●● ●
●● ● ● ● ● ●
●●●● ●
●
●●●
●●
● ●
● ●●● ●
● ●● ●
●
● ●
●
●
● ●●
●
●●
● ● ● ●● ●
●● ●
● ●
●
●●●
● ●● ●
● ●
●
●
●
●
●
●
●
● ●
●
2
●
−2
●
● ● ●●● ● ●
●
●
●
●
●
●●
●●● ●● ●
●●
●
● ●
● ●
●
●
●
● ●
● ●
●
●
● ● ●● ● ●●
●●●
● ● ● ●● ● ● ●
●
●
●● ● ● ● ● ●
●
●
●
●●
●●
● ●● ● ● ●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
3
−1
0
English
●
●
●
●
●
15
−1
Sport
●
0
1
2
score
●●
●
●
3
●
● ●●
● ●
●
●
●
●
●
● ●
●● ●
●
●
●
●
●
●
● ●●
●
●
● ●●
●●
●
● ●●
● ●
● ● ●●●
●●●
●
●
●
●
●●
●
● ●●
●
●●● ●
● ●
●
●
● ●
●
●
● ●●
●
● ●●●
● ●
●
●
●
● ● ● ● ●
●
●
●
●
−2
●
●
●
●
●
●
●
●
3
●
●●
●
●
2
score
●
●
●
1
15
score
1
●
●
●
10
25
●
0
score
●
●
y
Option2
●
1
●
● ●
●
30
0
−1
Option1
●●
●
−1
−2
●
●
●
●●
●
●
●
●
●
● ●● ● ● ●
●
● ●
●
●
●
●●
●●● ●
●
● ●● ●
●
● ● ● ●● ●● ●
●
●
● ●●
●● ●
●
●
●●
●●● ●
●
●
● ●●
●
● ●
●
●
●
●
●
●
●
0
2
●
●● ● ●
●
● ● ●●
●
●
25
●
●●
●
●
●
●●● ●●
●●
●
●● ●●
●● ●●●
●
●●
● ● ●● ●●●
●
● ●
●
● ●● ● ●
●● ●
●
●
●
●
●
● ●
●
●
●●
●
●
10
●
●
●
●
●
●
10
●
5
●
●
−2
−1
0
1
0
30
y
● ●
●●
●
●
● ● ● ●●
●
●
●
● ● ●●
●
● ●
●●
●
● ●
●
● ● ●●
● ●●
●
● ●●● ● ●
● ● ●●
●
●
●●
●●
●
● ●
●● ● ●●●● ● ●
●
●
●
●
●
●●
● ●
●
● ●
●
−2
20
●
1
●
●
● ●
●
●
score
50
20
●●
● ●
●
40
●
●
●
●
●
● ●
●
30
y
●
y
40
●
●
●
●
●
●
80
●
●
●
●
0
●
●
●
●
−1
Economy
●
35
3
●
20
2
●
●
● ● ●●
●
● ●● ●
●
●
●●
●
●● ●
●●
●
●●
●
● ●
●●
●
● ● ● ● ● ●●
●●
●● ●●
● ●● ●
●
●
●●
●
● ●
●
● ● ●●
●●
●
●●● ●
●
● ●●●
●
●●●●
●
● ●
●
●
●
●●
●●
●
15
1
●
10
0
score
●
●
●
5
−1
Informatic
●
30
40
y
20
●
60
50
●
●
●
y
●
●
●
●● ●
●● ●
●
●● ●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●●
●
●● ● ●● ● ● ●
●
●
●● ●
● ●
● ●●
●●●
●●●
●
●
● ●
●●
●
●
● ●
●
●●
●
●
●
●
●
●●
●
●
● ●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
−2
15
●● ●
●
Proba
●
●
●
●
●
●
●●
●
●● ● ●
●● ● ●
●
●
● ●●● ● ● ●● ●
●
●● ● ●●●
●●
●
●
● ● ●●● ●
●
●●
●
●
●●
●● ●
●●●
●
● ●
●●
●
●●
●
●● ●
●
●●
●● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
30
●
10 20 30 40 50 60
●
●
●
●
●
20
y
10 20 30 40 50 60 70
A.B. Dufour & D. Chessel
●
2
3
−2
−1
0
1
2
3
●
−2
−1
● ●●●
●● ●●
●●
● ● ● ● ●● ●
0
1
2
● ●
3
pca1 <- dudi.pca(deug$tab, scal = FALSE, center = deug$cent, scan = FALSE)
s.class(pca1$li, deug$result)
s.arrow(50 * pca1$c1, add.plot = T, clab = 1.5)
d = 20
●
●
A+
●
●
●
● ●
●
●
Algebra
A● ●
●
Analysis
●
Proba
Economy
●
●
●
●
●
●
●
Option1
Option2
English
●
●
●
●
Informatic
●
●
●
●
●
●
●
●
●
Sport
●
●
●
●
●
●
●
●● ●
●
● ● ●
●
●
●
●
●
●●
●●
●
●
●
●
●● ● ●
●
B
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
B−
●
C−
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
D
●
●
●
L’origine n’est plus le centre de gravité du nuage. On y gagne la distinction
entre les matières qui font la forme du nuage et celles qui donnent sa position.
On retrouve cet aspect dans les données ’seconde’.
data(seconde)
pca2 <- dudi.pca(seconde, center = rep(10, 8), scale = F, scan = FALSE)
s.label(pca2$li)
s.arrow(10 * pca2$c1, add.plot = T)
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 7/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
d=5
19 21
4
1
1814
11
2
ANGLBIOL
FRAN
MATH
17
13
20 5
PHYS
12
10
6
7
3
ESPA
ECON
HGEO
16
9
15
22
8
Cette figure ne modifie pas la notion de compromis (toutes les matières valorisent
peu ou prou les bons élèves) mais souligne que Maths et Physique excluent
d’abord (les élèves en haut à droite) et que le professeur d’espagnol est d’une
bienveillance évidente. Ce qui diffère sensiblement de l’ACP normée qui mettait
en évidence l’originalité et la solidarité des professeurs de langue :
scatter(dudi.pca(seconde, scan = F), clab.r = 1)
Eigenvalues
d=2
ESPA
ANGL
9
11
15
8
3
FRAN
HGEO
ECON
1
6
4 19
22
MATH
517
13
20
16
PHYS
10
2
7
21
14
18
12
BIOL
Quand un vecteur de valeurs définit un point signifiant de l’espace devant servir
de référence, il convient d’en faire l’origine.
3
Cohérence d’un jury : compromis
S’il s’agit de mesurer la cohérence du jury, d’exprimer un compromis entre jugements, un choix collectif (ou plusieurs tendances regroupant des parties du
jury), de mettre en évidence la ressemblance entre juges, les juges sont en colonnes dans une ACP normée. Les moyennes sont toutes égales, les variances
aussi, la normalisation n’est ni nécessaire, ni nuisible. On peut l’utiliser pour
tracer les cercles de corrélation.
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 8/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
25 juges classent 8 bouteilles dans le tour final du concours d’une grande foire :
data(macon)
macon
a b c d e f
A 5 5 4 3 3 4
B 4 8 2 4 1 5
C 2 6 1 1 6 2
D 6 7 5 8 2 6
E 1 4 3 2 7 1
F 3 2 8 6 5 8
G 7 1 6 5 4 7
H 8 3 7 7 8 3
g
7
2
1
8
6
3
4
5
h
2
7
5
8
4
3
1
6
i
1
8
5
6
3
4
7
2
j
3
8
4
6
1
7
5
2
k
5
1
3
6
2
8
7
4
l
4
6
7
5
8
1
3
2
m
4
3
2
6
1
5
8
7
n
5
7
2
6
1
8
3
4
o
4
8
6
3
1
7
2
5
p
8
5
2
6
3
4
7
1
q
5
7
1
8
2
4
3
6
r
7
8
6
1
2
3
5
4
s
8
1
2
7
6
3
4
5
t
5
4
1
6
2
8
7
3
u
4
1
2
7
8
6
3
5
v
6
5
1
4
2
8
7
3
w
7
4
2
1
8
6
3
5
x
2
4
5
6
1
7
8
3
y
8
6
1
7
2
3
5
4
16 juges ont rangé par ordre de préférence 28 lots de fruits [8].
data(fruits)
names(fruits)
[1] "type" "jug"
fruits$jug
J1 J2
1.nec 10 5
2.nec
3 1
3.pea
5 11
4.pea
6 12
5.pea
4 2
6.nec
2 6
7.pea 17 9
8.nec 22 20
9.nec
7 7
10.nec 14 18
11.nec 1 8
12.nec 9 3
13.nec 12 10
14.pea 11 14
15.nec 24 22
16.nec 15 4
17.pea 20 17
18.nec 13 15
19.pea 19 19
20.pea 23 27
21.pea 16 16
22.nec 18 21
23.nec 8 13
24.pea 21 24
25.pea 28 25
26.pea 27 23
27.pea 26 28
28.pea 25 26
J3
8
9
5
3
4
16
1
12
14
24
20
25
19
7
13
21
10
15
2
6
17
26
28
23
11
18
27
22
"var"
J4
3
8
2
4
14
10
5
6
16
27
15
12
13
7
1
19
17
23
9
11
22
18
28
20
26
21
24
25
J5
1
6
8
4
17
13
9
26
23
15
28
11
25
3
2
18
21
27
19
24
16
10
20
14
7
22
5
12
J6
18
16
8
7
10
2
24
23
17
1
15
13
22
21
28
3
9
14
6
25
5
20
19
4
11
27
12
26
J7
5
8
18
17
16
11
23
2
6
14
13
7
4
15
1
9
20
12
28
19
21
3
10
22
24
27
25
26
J8
17
10
3
2
1
5
19
7
21
6
11
12
18
23
16
8
9
13
14
20
4
27
24
15
26
22
28
25
J9 J10 J11 J12 J13 J14 J15 J16
3
4
1
2
5
3
1
1
2
1
8
8
9
5
6
4
4 15 14
4
7
1
3 13
1 16 13
7
3
8
4 14
5 19 21 13
6
2
5 15
13
8 10 14 10
9 15
8
23 18
6
3
8 14
7
5
8 14
3 11
1 11 22
2
9
2
2 16 20 13 11 17
11
3 11
6 19
6 10 10
10 11
7 22 15
7
8 11
14
7
9 17 16 10 14
9
6
5 17 15 11 12 18
3
17 17 12 10
4
4
9 24
7 13 15
1 27 16
2 20
12 10 16 19 21 20 26
7
20 22
4 12
2 17 20 18
16
9
5 23 14 15 25 12
21 21 18
9 12 22 21 19
25 25 23
5 17 19 12
6
19 20 19 20 13 18 23 16
15 12 22 18 25 21 27 21
18
6 20 24 24 27 28 22
22 23 25 21 18 28 16 25
24 26 26 27 22 25 17 27
28 24 28 25 23 23 13 23
27 28 24 28 28 24 19 28
26 27 27 26 26 26 24 26
51 étudiants de la filière biomathématique ont exprimé leur préférence sur 10
groupes de musique. Les données sont dans l’objet rankrock.
data(rankrock)
rankrock
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20
Metallica
6 7 4 7 7 10 9 3 10
1
7
3
7
1
7
6
8
8 10
6
Guns.n.Roses 10 9 9 8 5 8 10 2 5
9
3
6
2
7
9
4
5
6
8
9
Nirvana
5 8 8 4 2 6 6 9 3
2
8
2
6
6
3
5
4
4
9
5
AC.DC
9 4 2 6 6 9 7 8 4
6
6
7
5
2
8
7
9
9
5
8
Noir.Desir
2 6 3 1 1 3 2 1 2 10
5
4
9
5
1
2
1
2
3
3
U2
7 2 7 3 8 4 5 4 1
4
2
1
4
8
2
1
2
7
6
7
Pink.Floyd
1 3 5 5 4 1 1 6 6
5
4 10
3
4
4
3
3
1
1
1
Led.Zeppelin 3 1 1 9 9 2 4 5 8
8
9
9
1
3
5
8
6
5
4
4
Deep.Purple
4 5 6 10 10 5 3 7 9
7 10
8
8
9
6
9
7
3
2
2
Bon.Jovi
8 10 10 2 3 7 8 10 7
3
1
5 10 10 10 10 10 10
7 10
X21 X22 X23 X24 X25 X26 X27 X28 X29 X30 X31 X32 X33 X34 X35 X36 X37 X38
Metallica
9
9 10
9
9
5 10
9
4
6
9
1
1
8
9
6
6
6
Guns.n.Roses
5
6
9
8
8
7
8
8
7
5
5
2
5
1
6
5
9
4
Nirvana
4
7
2
6
7
3
7
2
8
4
3
3
4
6
5
7
4
5
AC.DC
8 10
8 10 10
6
9
7
1
7 10
6
8
9 10 10
7
7
Noir.Desir
3
3
1
2
2
8
2 10
9 10
4
4
2
4
4
3
1
3
U2
1
2
3
1
1
1
1
5
6
2
1
9
3
2
1
2
3
1
Pink.Floyd
2
1
4
5
6
2
3
1
3
1
2
7
6
5
2
1
2
2
Led.Zeppelin
6
8
5
3
5
9
6
6
2
3
8
5
7
7
7
8 10 10
Deep.Purple
7
5
7
4
4 10
5
3
5
8
6 10
9 10
3
4
5
9
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 9/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
Bon.Jovi
10
4
6
7
3
4
4
4 10
9
7
8 10
X39 X40 X41 X42 X43 X44 X45 X46 X47 X48 X49 X50 X51
Metallica
10
6
6
4
6
9
7
8
9
9
8
8
4
Guns.n.Roses
6
5
2
5
5
4
8
9
8
6
9
9
6
Nirvana
5
3
5
7
4
1
6
3
3
5
7
6
2
AC.DC
9
7 10
8
8 10
9
7
1
7
6
7
8
Noir.Desir
2
4
4
6
7
6
5
1
4
4
3
4
5
U2
3
1
7
1
1
5
4
5
2
1
1
1
1
Pink.Floyd
1
2
1
2
3
2
2
2
7
3
2
2
3
Led.Zeppelin
8
8
8
9
9
8
1
4
5
2
5
5
7
Deep.Purple
4
9
9 10 10
7
3
6
6
8
4
3
9
Bon.Jovi
7 10
3
3
2
3 10 10 10 10 10 10 10
3
8
9
8
Chacune des colonnes donne le rang (1 pour le préféré, . . . , 10 pour le moins
apprécié) qu’un étudiant attribue à chaque groupe.
par(mfrow = c(2, 2))
data(macon)
s.corcircle(dudi.pca(macon, scan = F)$co)
data(fruits)
s.corcircle(dudi.pca(fruits$jug, scan = F)$co)
data(rankrock)
s.corcircle(dudi.pca(rankrock, scan = F)$co)
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 10/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
8
A.B. Dufour & D. Chessel
j
i
o
b
r
J10
J7
h
l
n
x
q
py
J1
J2
f
J8
v
t
a
J6
J9J11
J16
J14
dm
J13
k
J12
J4J15
c
w
e
J5
g
J3
us
X3
x1 X29
X2
X45
X20
X19
X7 X1
X50
X49
X18
X46
X6
X14
X47
X48
X24
X15
X35
X36
X21
X17
X23
X39
X37 X25
X27
x1
X13
X30X28
X8
X51 X33
X16
X22
X40
X31
X9
X38
X44X41X5
X26 X12
X4
X42
X34
X43 X11
X32
X10
par(mfrow = c(1, 2))
s.corcircle(dudi.pca(fruits$jug, scan = F)$co)
s.class(dudi.pca(fruits$jug, scan = F)$li, fruits$type)
d=2
●
J10
J7
J1
J2
J9
J16
J11
J8
J14
●
●
●
peche
●
●
J6
●●
●
●
●
●
●
●
●
●
●
J13
J12
J4
J15
necta
●
●
●
●
●
●
●●
●
●
J5
●
J3
Quelle est la particularité des juges 7 et 10 ? Indiquer comment on relie ces deux
figures.
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 11/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
S’il s’agit de faire une typologie des juges, de mettre en évidence ce qui les
opposent, de montrer qu’il existe plusieurs types de jugements, les juges sont
en lignes dans une ACP centrée. On laissera ainsi dominer dans l’analyse les
produits qui ont reçu les appréciations les plus variables. Les deux approches
sont antinomiques.
par(mfrow = c(2, 2))
s.arrow(dudi.pca(t(macon), scan = F)$li)
s.arrow(dudi.pca(t(fruits$jug), scan = F)$li)
s.arrow(dudi.pca(t(rankrock), scan = F)$li)
d=1
g
d=2
J6
s
u
w
e
J8
c
y
a
q
d
m
k
l
h
p
b
t v
J1
J13
r
n
J2
f
o
J11
J16 J9
J10
x
J7
i
j
J5 J3
J14
J12
J4 J15
d=1
X14
X29
X3
X32
X10
X33
X13
X47
X2
X8
X30
X12
X51
X5
X26
X9X40
X16
X11
X42X4
X38
X43
X34 X41
X44
X48
X15
X28
X37
X17
X21
X23
X36
X25
X35
X27
X31
X39
X22
X20
X45
X46 X1 X19
X49
X50 X7
X18
X6
X24
On remarquera, que derrière le compromis, s’il est suffisant pour définir le premier axe, les premières analyses définissent aussi les contradictions entre jugements, mais les secondes sont plus claires.
4
Reconstitution de données : auto-modélisation
Dans la thèse [1] de G. Carrel, les données portent sur 15 variables physicochimiques mesurées en une station au cours de l’année 1983-1984 à 39 reprises.
Ces 15 variables sont :
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 12/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
data(rhone)
names(rhone$tab)
[1] "air.temp"
"wat.temp"
[7] "caco3"
"totca"
[13] "suspension" "organique"
"conduc"
"mg"
"chloro"
"pH"
"so4"
"oxygen"
"no2"
"secchi"
"hco3"
air.temp Température de l’air (˚C ) mg Magnésium (mg/l Mg++)
wat.temp Température de l’eau (˚C) so4 Sulfates (mg/l x10)
conduc Conductivité (mS/cm) no2 Azote nitrique (mb/l)
pH potentiel Hyrogène (pH) hco3 TAC (mg/l HCO3-)
oxygen Saturation en oxygène (%) suspension Mat. en suspension (mg/l)
secchi Transparence (cm) organique Mat. organique (mg/l)
caco3 Dureté totale (mg/l CaCO3) chloro Chlorophyle a (mg/l)
totca Dureté calcique (mg/l Ca++)
Automodéliser la chronique :
dd1 <- dudi.pca(rhone$tab, nf = 2, scann = F)
rh1 <- reconst(dd1, 1)
rh2 <- reconst(dd1, 2)
par(mfrow = c(4, 4))
par(mar = c(2.6, 2.6, 1.1, 1.1))
for (i in 1:15) {
plot(rhone$date, rhone$tab[, i], pch = 20)
lines(rhone$date, rh1[, i], lty = 2)
lines(rhone$date, rh2[, i], lty = 1)
}
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 13/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
●
●●
●●●
●
●
● ●
●
● ●
●
●
●●
50
●●
●●
●●
●
● ●
●
●●
●
●
150 250 350
●
● ●
●●
●
●●
● ●
●
●
●●●
●
●
● ●
●● ●●
●
50
rhone$date
●
●
●
● ●
50
50
0.6
●
● ●
●
●
●
●
●
●
●
● ●● ●
● ●● ●
● ●
●
●
●● ●
150 250 350
●
●
50
8.2
8.0
7.8
●
150 250 350
●
70
●
●
●●
●
●
●
●
●
● ●
● ●
●
●●
● ●
●
●
●●
●
●●
150 250 350
rhone$date
●
●
●
● ●
●
●
●
● ●
●
●
●
●
● ●
●
●
●●
●
●
●
● ●●
●
●●
●
●
150 250 350
●
● ●●
●
●
●
●
● ●●
●
●
50
rhone$date
●
●
●●
●
●
●●
●
50
150 250 350
rhone$date
●
●
●
● ●
150 250 350
50
●
●●
●●
●
●●
● ●●●●
●
●
●● ● ●●● ●● ● ●
● ●●
●
150 250 350
8
●
6
●
●
●
●
●
● ●
50
●
●●
●
●
4
rhone$tab[, i]
10
●
●
●
2
15
●
●
5
rhone$tab[, i]
● ●
●
●●●●
●●
●
●● ●●●●●●● ●●● ●●● ●● ●
●●
●
●
●
rhone$date●
0
●
●
● ●
●
●
●
200
100
●●
●
●
●
●● ●●●
●
●
●●
100
●
150 250 350
rhone$date
●●
●
●
●
rhone$tab[, i]
●
rhone$tab[, i]
●
●
●
50
●●
● ● ●● ● ●
●
●●
●
●●
●
●
0.2
45
●●
● ●
●
●
●
●●
●
●
●●
0
●
●
35
●
●
●
●
●
● ●
●●
●
●
● ●●
●
● ●●
●
● ●
●
● ● ●
●
3
5
●
●
●
●
●
25
●
rhone$tab[, i]
●
●
50
150 250 350
●
●
●
●
● ●
7.6
●
● ●●
● ●
rhone$date
●●
● ●
●
●
●
●
●
●
150 250 350
●
rhone$date ●
●
9
50
●
●
60
150
●
●
●
150 250 350
rhone$date
●
●
●
rhone$tab[, i]
●
●
●
●
●
●●
●●
●●
●●
● ●
●
● ● ●
●
●
● ●●
●
●
120
●
● ●
● ●●
●
●
50
●
rhone$tab[, i]
●
●
●● ●
●
●
rhone$date
●
●
●
●●
●
50
5
250
100 110
90
●
●
●
●●
80
rhone$tab[, i]
●●●
●
●
● ●
●
●
●
●
●
7
150 250 350
●
rhone$date ●
●
50
rhone$tab[, i]
50
●
●
●●
●
●
●
●
●
●
●
0
●
●
150 250 350
rhone$date
●
5
●
●
50
rhone$tab[, i]
●
●
●
●
40
●
● ●
● ●
●●●
●●
●●
●
●● ●
●
●●●
● ● ●● ●● ●●●●●● ● ● ● ● ●
180
●
●
●
●
●
140
●
●
rhone$tab[, i]
●
●●
●
5
●
● ●●
●
●
●●
rhone$tab[, i]
●
●
●
●●●●
rhone$tab[, i]
●
●● ●●
●
220 260 300 340
●
●
●
●
200
●●
●
●●
●●●
● ●
160
●
●
●
●
●●
10 15 20
15
●
●●
rhone$tab[, i]
● ●●
● ●● ●
−5
rhone$tab[, i]
25
A.B. Dufour & D. Chessel
●
●
●●
● ●●
●●
●
●
●
● ●●
●
150 250 350
Pourcentages : représenter des moyennes
Un tableau X est un tableau de fréquences si la somme des valeurs par ligne ou la
somme des valeurs par colonne vaut l’unité. On notera alors xij = fj/i dans le premier
cas et xij = fi/j dans le second. Si on hésite, c’est qu’on est sur un problème d’analyse
des correspondances (AFC).
data(granulo)
names(granulo)
[1] "tab" "born"
head(granulo$tab)
V1 V2 V3 V4
V5
1 0 2
1 15 260
2 0 0 12 340 1220
3 0 0
0 12 1505
4 0 0 185 440 330
5 0 2
1
8 400
6 0 0
0
0 115
granulo$born
[1]
0.3
0.5
V6
810
2095
1960
335
1480
1810
1.0
V7
1815
2950
4415
725
4935
6845
2.0
V8
V9
1990
0
2050 1160
1265 260
1195 2210
2900
0
4030 280
4.0
8.0
16.0
32.0
64.0 128.0
Ce tableau comporte 49 lignes - échantillons et 9 colonnes - classes de diamètres [4].
L’échantillon 32 a fourni 2 grammes de grains ayant un diamètre compris entre 0.3
et 0.5 mm (sable fin), . . . , 293 grammes de grains compris entre 64 et 128 mm (gros
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 14/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
galets). Le poids total de sédiments récoltés par la drague n’est que le résultat du
hasard et permet de calculer un profil par lignes :
grapc <- t(apply(granulo$tab, 1, function(x) x/sum(x)))
round(head(grapc), dig = 2)
V1 V2
V3
V4
V5
V6
V7
V8
V9
1 0 0 0.00 0.00 0.05 0.17 0.37 0.41 0.00
2 0 0 0.00 0.03 0.12 0.21 0.30 0.21 0.12
3 0 0 0.00 0.00 0.16 0.21 0.47 0.13 0.03
4 0 0 0.03 0.08 0.06 0.06 0.13 0.22 0.41
5 0 0 0.00 0.00 0.04 0.15 0.51 0.30 0.00
6 0 0 0.00 0.00 0.01 0.14 0.52 0.31 0.02
grapc = data.frame(grapc)
Chaque cellule contient une fréquence et le tableau est homogène.
par(mfrow = c(2, 2))
pcaa <- dudi.pca(grapc, scann = F)
barplot(pcaa$eig)
s.corcircle(pcaa$co)
pcab <- dudi.pca(grapc, scal = F, scann = F)
barplot(pcab$eig)
s.arrow(pcab$co)
V5
V6
V2
V3
V1
V9
0.0
1.0
2.0
3.0
V4
V7
V8
d = 0.05
V7
0.015
0.030
x1
V8
V6
V1
V2
V3
0.000
V5
V4
V9
Les variances par colonne sont très différentes :
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 15/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
apply(grapc, 2, var)
V1
V2
V3
V4
V5
V6
5.047118e-07 6.827175e-06 9.624694e-05 2.187413e-03 6.208129e-03 1.333202e-02
V7
V8
V9
1.382115e-02 1.952539e-02 1.143872e-02
mais ce n’est pas une raison pour s’en débarrasser. La première classe joue dans l’ACP
normée un rôle disproportionné à son importance expérimentale. On peut regrouper
les cinq premières classes.
c1 <- apply(grapc[, 1:5], 1, sum)
granou <- cbind.data.frame(c1, grapc[, 6:9])
round(head(granou), dig = 2)
c1
V6
V7
V8
V9
1 0.06 0.17 0.37 0.41 0.00
2 0.16 0.21 0.30 0.21 0.12
3 0.16 0.21 0.47 0.13 0.03
4 0.18 0.06 0.13 0.22 0.41
5 0.04 0.15 0.51 0.30 0.00
6 0.01 0.14 0.52 0.31 0.02
Pour une première approche, rapide et approximative :
par(mfrow = c(1, 1))
scatter(dudi.pca(granou, scal = F, scan = F))
20
23
49
27
21
42
29
14 28
13
d = 0.2
V7
Eigenvalues
41
3
V6
5
48
34
43
36
22 38
35
16
15
44
26
30
37
19
31
40
6
47
1248 45
9
12
17
10
V8
2
33
c1
32
V9
39
25
18
11
7
4
46
On a perdu une valeur propre (démontrer que la dernière est toujours nulle). L’analyse
indique que la position des variables est celle de la représentation triangulaire étendue
à 5 dimensions.
x = cos((1:5) * 2 * pi/5)
y = sin((1:5) * 2 * pi/5)
xy = cbind.data.frame(x, y)
row.names(xy) = c("]0.3,8]", "]8,16]", "]16,32]", "]32,64]", "]64,128]")
xy
x
y
]0.3,8]
0.309017 9.510565e-01
]8,16]
-0.809017 5.877853e-01
]16,32] -0.809017 -5.877853e-01
]32,64]
0.309017 -9.510565e-01
]64,128] 1.000000 -2.449294e-16
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 16/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
s.corcircle(xy, grid = F)
s.distri(xy, data.frame(t(granou)), add.p = T, axesell = T, csta = 0.2,
clab = 0.75, cell = 0)
]0.3,8]
●
X7
]8,16] ●
X14
X29
X42
X28
X21
X13
X30
●
X11
X20
X27 X3
X19
X44 X31
X26
X40
X15
X12
X48
X36
X22
X23 X43
X16
X35
X34
X5 X38
X1
X24
X33
X6X47X8
X9
X45
X17X10
X41
X49
]16,32] ●
]64,128]
X4X46
X37 X2
X39
X25
X18
X32
●
]32,64]
names(granou) <- row.names(xy)
xy.pca <- dudi.pca(granou, scal = F, scan = F)$c1
s.arrow(xy.pca, grid = F, lab = names(granou))
s.distri(xy.pca, data.frame(t(granou)), add.p = T, axesell = T,
csta = 0.2, clab = 0.75, cell = 0)
]16,32]
●
X21
X13
]8,16] ●
X42
X29
X14X28
X20
X23X5X6
X27 X49
X48X34 X47
X3 X41 X15X43
X36
X22
X38
X35
X16
X45
X24
X8
X1
X9 X17
X44
X26
X12
X30
X31
X10
X37 X19 X40
X2
X33
X32
X39
X25
X11
X18
●
]32,64]
X7
X4
X46
]0.3,8] ●
●
]64,128]
L’image naı̈ve (pas tant que ça ?) contient presque la même information que l’image
optimale. Les structures des tableaux de pourcentages sont très particulières et supportent mal une normalisation indésirable. Il est logique de privilégier l’expression d’un
point au centre de gravité de sa distribution plutôt que de laisser faire le centrage. Mais
ce centrage est nécessaire pour éviter l’expression absurde de l’évidence ”les données
sont positives”. La représentation triangulaire s’impose pour trois variables.
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 17/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
data(housetasks)
hc <- t(apply(housetasks, 1, function(x) x/sum(x)))
round(hc, dig = 2)
Wife Alternating Husband Jointly
Laundry
0.89
0.08
0.01
0.02
Main_meal 0.81
0.13
0.03
0.03
Dinner
0.71
0.10
0.06
0.12
Breakfeast 0.59
0.26
0.11
0.05
Tidying
0.43
0.09
0.01
0.47
Dishes
0.28
0.21
0.04
0.47
Shopping
0.28
0.19
0.08
0.46
Official
0.12
0.48
0.24
0.16
Driving
0.07
0.37
0.54
0.02
Finances
0.12
0.12
0.19
0.58
Insurance 0.06
0.01
0.38
0.55
Repairs
0.00
0.02
0.97
0.01
Holidays
0.00
0.01
0.04
0.96
hc <- data.frame(hc)
hc.pca <- dudi.pca(hc, scal = F, scan = F)$c1
s.arrow(hc.pca, grid = F, lab = names(hc))
s.distri(hc.pca, data.frame(t(hc)), add.p = T, axesell = T, csta = 0.2,
clab = 0.75, cell = 0)
Jointly
●
Holidays
Tidying
Dishes
Shopping
Finances
Insurance
Dinner
Laundry
Main_meal
●
Wife
Breakfeast
Official
●
Alternating
Driving
Repairs
●
Husband
A Madame, la cuisine, à Monsieur la voiture, au couple le reste. Groupons ”ensemble”
et ”chacun à son tour”, la représentation triangulaire suffira :
tri <- cbind.data.frame(hc$Wife, hc$Husband, hc$Jointly + hc$Alternating)
names(tri) <- c("Wife", "Husband", "Both")
triangle.plot(tri, show = F, clab = 1, label = row.names(hc))
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 18/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
0 1
●
Holidays
Finances
●
Dishes
●
Shopping
●
Official
●
●
Tidying
●
Insurance
Wife
Both
●
Driving
●
Breakfeast
●
Dinner
●
Main_meal
●
Laundry
1
●
Repairs
0
1
Husband
0
Une dissymétrie certaine. Ci-après, l’analyse des correspondances.
row.names(tri) <- row.names(hc)
scatter(dudi.coa(tri, scann = F))
d = 0.5
Eigenvalues
Holidays
Both
Finances
Dishes
Shopping
Official
Insurance
Tidying
Driving
Breakfeast
Dinner
Husband
Wife
Main_meal
Laundry
Repairs
En indiquant comment, pour une tâche donnée, les couples se répartissent, on indique
clairement une intention.
6
Morphométrie et non-centrage
C’est souvent un problème de morphométrie. Dans tortues, les variables sont les trois
dimensions de carapaces de tortues mesurées en mm [7].
data(tortues)
ttaille <- tortues[, 1:3]
tsexe <- tortues[, 4]
names(ttaille)
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 19/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
[1] "long" "larg" "haut"
dudi1 <- dudi.pca(ttaille, scan = F)
plot(dudi1$l1[, 1], tortues$long, ylim = c(30, 180), xlab = "Composante principale",
ylab = "Variables")
text(-1, 130, "Longueur")
points(dudi1$l1[, 1], tortues$haut)
text(1.5, 70, "Hauteur")
points(dudi1$l1[, 1], tortues$larg)
text(0.7, 115, "Largeur")
abline(lm(tortues$larg ~ dudi1$l1[, 1]))
abline(lm(tortues$long ~ dudi1$l1[, 1]))
abline(lm(tortues$haut ~ dudi1$l1[, 1]))
●
150
●
● ●
●●
●
●●
●
●
Longueur
●●
●
●
●
●● ● ●
●
●
●●● ●
● ● ●
●
●●
●
●
●●
Largeur
●●●
●●●
100
Variables
●
●
●
●
●●
●●●●
●
●●
●● ●
● ●
●
●
●
●●
●●● ●
● ●
●
●● ●
●●
●●● ●●
●
●
●●
● ●●
●●
●
●●
●
Hauteur
●
50
●
● ● ●
●
●●●
●
●●
●●
● ●
●
−2
−1
●
●●
●●● ● ● ●
● ●●
●
● ●●● ●● ●●
●
●
● ●●
0
● ●●
1
Composante principale
L’ACP joue ici son rôle de recherche de variable latente. La composante principale
prédit les trois variables. C’est une explicative cachée. Elle représente la taille théorique
de la tortue. C’est la variable cachée qui prédit au mieux les autres, c’est aussi la
variable cachée qui est le mieux prédite par toutes les autres.
s.corcircle(dudi1$co)
haut
long
larg
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 20/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
On dit que c’est un effet ”taille”.
coefficients(lm(tortues$larg ~ dudi1$l1[, 1]))
(Intercept) dudi1$l1[, 1]
95.50000
-12.41816
coefficients(lm(tortues$long ~ dudi1$l1[, 1]))
(Intercept) dudi1$l1[, 1]
124.68750
-20.09136
coefficients(lm(tortues$haut ~ dudi1$l1[, 1]))
(Intercept) dudi1$l1[, 1]
46.145833
-8.120671
apply(ttaille, 2, mean)
long
larg
haut
124.68750 95.50000 46.14583
Si on veut une variable latente qui fait des prédictions nulles pour une valeur nulle
(régression par l’origine), on fait une ACP non centrée.
dudi2 <- dudi.pca(ttaille, scan = F, cent = F, scal = F)
plot(dudi2$l1[, 1], tortues$long, ylim = c(30, 180), xlab = "Composante principale",
ylab = "Variables")
text(-1.1, 170, "Longueur")
points(dudi2$l1[, 1], tortues$haut)
text(-1.1, 70, "Hauteur")
points(dudi2$l1[, 1], tortues$larg)
text(-1.1, 125, "Largeur")
abline(lm(tortues$larg ~ -1 + dudi2$l1[, 1]))
abline(lm(tortues$long ~ -1 + dudi2$l1[, 1]))
abline(lm(tortues$haut ~ -1 + dudi2$l1[, 1]))
●
Longueur
150
●
●●
●
●
●
●
●
●
●
● ●
●●
●
● ●
100
Variables
●
●●●
●
●●
●
Largeur
●
● ●
●●
●●●
●
●
● ●
●
●●
●●●
●
●
●
●
●
●
●
● ●●
●●
●
● ●● ●
●● ●
●●
●
●
●
●●
●
●
●●●
●●
●●
●● ●
●
●
●●●● ● ●
●
●
Hauteur
●
50
●
−1.4
●
●
−1.3
● ●●
●
●
● ●
−1.2
●
●●
●●
●● ●
●
−1.1
●
●● ●
● ●
● ●●
● ●●●
●
●
−1.0
−0.9
●●
●
●●
●●● ●
● ●●
−0.8
Composante principale
Cet auto-modèle (créé par les données pour modéliser les données) est plus réaliste
mais moins bon (les erreurs ont une organisation, liée à la présence des deux sexes).
7
ACP et classification
La source des données est dans Prodon et Lebreton [10]. Le pourcentage de recouvrement de la végétation est mesuré pour 8 strates dans 182 sites :
1) Rocher
2) 0 m / 0.25 m
3) 0.25 m / 0.50 m
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 21/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
4) 0.50 m - 1 m
5) 1 m / 2 m
6) 2 m / 4 m
7) 4 m / 8 m
8) 8 m / 16 m
data(rpjdl)
names(rpjdl)
[1] "fau"
"mil"
"frlab" "lab"
"lalab"
mil = rpjdl$mil
names(mil)
[1] "ROCH" "C.25" "C.50" "C1"
"C2"
"C4"
dim(mil)
[1] 182
8
"C8"
"C16"
millog <- log(rpjdl$mil + 1)
pcamil <- dudi.pca(millog, scan = F)
s.corcircle(pcamil$co)
s.label(pcamil$li, clab = 0.75)
round(cor(millog), dig = 3)
ROCH
C.25
C.50
C1
C2
C4
C8
C16
ROCH 1.000 0.132 -0.484 -0.733 -0.725 -0.732 -0.686 -0.533
C.25 0.132 1.000 0.504 0.157 -0.076 -0.419 -0.523 -0.441
C.50 -0.484 0.504 1.000 0.752 0.532 0.212 0.079 0.017
C1
-0.733 0.157 0.752 1.000 0.840 0.539 0.395 0.256
C2
-0.725 -0.076 0.532 0.840 1.000 0.737 0.590 0.399
C4
-0.732 -0.419 0.212 0.539 0.737 1.000 0.920 0.665
C8
-0.686 -0.523 0.079 0.395 0.590 0.920 1.000 0.779
C16 -0.533 -0.441 0.017 0.256 0.399 0.665 0.779 1.000
C16
C8
C4
ROCH
C2
C1
C.25
C.50
Un artifice ? Retourner aux données :
par(mfrow = c(3, 3))
for (i in 1:8) s.value(pcamil$li, pcamil$tab[, i], cleg = 1.5)
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 22/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
d=2
−0.5 0.5 1.5
d=2
−4.5
2.5
−3.5
−2.5
−1.5
−0.75
−0.25 0.25 0.75
d=2
−0.75
−0.25 0.25 0.75
1.25
−2.5
−1.5
−0.5 0.5
d=2
−1.25
−0.5 0.5 1.5
−3.5
−1.5 −0.5 0.5
d=2
d=2
1.75 −0.5 0.5 1.5
−0.75
1.25
d=2
−0.25 0.25 0.75
1.25
1.75
d=2
2.5
par(mfrow = c(2, 4))
for (i in 1:8) hist(pcamil$tab[, i], main = names(pcamil$tab)[i])
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 23/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
C.25
C.50
C1
1 2 3
60
−5
−2
0
20
40
Frequency
60
0
20
0
0
−1
40
Frequency
20
40
Frequency
60
40
0
20
Frequency
60
80
80
80
80
ROCH
−3
−1
1
−2.0
0.0
1.5
pcamil$tab[, i]
pcamil$tab[, i]
pcamil$tab[, i]
C2
C4
C8
C16
0.0
1.5
pcamil$tab[, i]
−1.0
0.5
2.0
pcamil$tab[, i]
80 100
60
0
20
40
Frequency
60
20
0
0
−1.5
40
Frequency
60
20
40
Frequency
30
20
0
10
Frequency
40
80
50
80
100
pcamil$tab[, i]
−1.0
0.5
2.0
pcamil$tab[, i]
−1
1 2 3
pcamil$tab[, i]
Variables à seuil ? Classement des stations ?
Deux gradients ? Un gradient et une partition ?
plot(hclust(dist(millog), "ward"))
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 24/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
150
100
69
50
63
67
73
71
66
72
33
58
57
68
62
59
65
48
56
60
53
61
12
49
29
27
34
39
55
44
54
35
51
64
41
32
46
38
36
47
52
42
43
91
45
2
31
37
19
5
21
14
28
30
15
40
4
7
3
6
8
18
22
10
13
24
26
16
20
25
17
11
23
95
102
105
104
106
115
100
151
91
150
93
96
90
94
97
70
81
78
83
76
82
84
74
75
89
85
86
88
80
79
77
87
129
121
130
122
127
125
123
128
124
126
174
153
164
156
157
144
154
155
165
152
160
161
140
146
147
149
107
163
141
170
172
158
167
182
162
175
109
168
159
173
179
176
169
171
108
117
166
180
145
178
181
116
111
112
120
113
114
118
119
132
133
136
137
103
135
143
92
101
110
177
139
142
148
131
138
134
98
99
0
50
Height
200
250
300
350
Cluster Dendrogram
dist(millog)
hclust (*, "ward")
par(mfrow = c(1, 2))
s.class(pcamil$li, as.factor(cutree(hclust(dist(millog), "ward"),
3)))
s.class(pcamil$li, as.factor(cutree(hclust(dist(millog), "ward"),
5)))
par(mfrow = c(1, 1))
d=2
d=2
●
●
●
●
●
●●
●
●●
●
● ●
●
●
●●
●
● ●
●
●
●
●●
●
● ●
●
●●
● ● ●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●● ●
●●
●
●
●
●
●
●●
●
●
●●
●● ●
●
●
●
●
●
●●
●
●
●●●●●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●●
● ●
● ● ● ●
●
●
●
●
●
●●●
●● ●
●● ●
●
●
● ●●
●●
● ●
●
●●●
●
●● ●●●
●
●
● ●
●
● ●
●●
●●● ●●●
●
●
●●
●●
●
●
●
●
●●●
●
●
3
1
2
●
5●●●
●
●
●
●●
●
● ●
●
●●
● ● ●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●● ●
●●
●
●
●
●
●
●●
●
●
●●
●● ●
●
●
●
●
●
●●
●
●
●●●●●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●●
● ●
● ● ● ●
●
●
●
●
●
●●●
●● ●
●● ●
●
●
● ●●
●●
● ●
●
●●●
●
●● ●●●
●
●
● ●
●
● ●
●●
●●● ●●●
●
●
●●
●●
●
●
●
●
●●●
●
●
1
4
2
3
parti <- as.factor(cutree(hclust(dist(millog), "ward"), 5))
summary(parti)
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 25/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
1 2 3 4 5
46 26 32 68 10
round(data.frame(lapply(split(mil, parti), function(x) apply(x,
2, mean))), dig = 3)
X1
X2
X3
X4
X5
ROCH 16.848 7.000 2.156 0.235 0.0
C.25 74.130 83.462 90.938 67.353 7.6
C.50 17.413 52.308 83.438 63.603 7.6
C1
0.630 22.115 70.312 53.015 8.6
C2
0.087 0.577 45.000 41.191 14.0
C4
0.000 0.038 6.125 38.162 43.0
C8
0.000 0.000 0.125 28.824 78.0
C16
0.000 0.000 0.000 7.691 41.0
L’ordination suggère la classification. La classification renvoie une ordination. Il s’agit
de modèles. Et si on regardait les données ?
table.paint(t(mil), clabel.c = 0, cleg = 0)
ROCH
C.25
C.50
C1
C2
C4
C8
C16
table.paint(t(millog), x = sort(pcamil$li[, 1]), clabel.c = 0, cleg = 0)
ROCH
C.25
C.50
C1
C2
C4
C8
C16
8
Truites et valeurs propres
J.M. Lascaux a étudié 306 truites [9]
data(lascaux)
names(lascaux$meris)
[1] "rd"
"ra"
"rpelg" "rpecg" "caec"
J.M. Lascaux a mesuré 5 variables méristiques :
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 26/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
rd Nombre de rayons à la dorsale
ra Nombre de rayons à l’anale
rpelg Nombre de rayons à la pelvienne gauche
rpecg Nombre de rayons à la pectorale gauche
caec Nombre de caeca pyloriques
0.0
0.2
0.4
0.6
0.8
1.0
1.2
pcmeris <- dudi.pca(lascaux$meris, scan = F)
barplot(pcmeris$eig)
cor(lascaux$meris)
rd
ra
rpelg
rpecg
caec
rd
1.00000000 0.31108004 0.02633742 0.07528987 -0.06038263
ra
0.31108004 1.00000000 -0.05677408 0.12549442 -0.11191258
rpelg 0.02633742 -0.05677408 1.00000000 0.04819827 0.10350996
rpecg 0.07528987 0.12549442 0.04819827 1.00000000 0.06108539
caec -0.06038263 -0.11191258 0.10350996 0.06108539 1.00000000
cor.test(lascaux$meris$ra, lascaux$meris$rpecg)
Pearson's product-moment correlation
data: lascaux$meris$ra and lascaux$meris$rpecg
t = 2.2055, df = 304, p-value = 0.02817
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.01356170 0.23432087
sample estimates:
cor
0.1254944
Conclure sur la nature de ce type de variables.
J.M. Lascaux a mesuré 35 variables morphologiques :
names(lascaux$morpho)
[1] "LS"
"MD"
[10] "DPEL"
"DPEC"
[19] "PELAN" "PELC"
[28] "LAD"
"LD"
[37] "ETET"
"MAD"
"ADC"
"ANC"
"HD"
"MAN"
"ADAN"
"LPRO"
"LC"
"MPEL"
"ADPEL"
"DO"
"LAN"
"MPEC"
"ADPEC"
"LPOO"
"HAN"
"DAD"
"PECPEL"
"LTET"
"LPELG"
"DC"
"PECAN"
"HTET"
"LPECG"
"DAN"
"PECC"
"LMAX"
"HPED"
LS Longueur standard
MD Distance bout du museau - insertion de la dorsale
MAD Distance bout du museau - insertion de l’adipeuse
MAN Distance bout du museau - insertion de l’anale
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 27/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
MPEL Distance bout du museau - insertion de la pelvienne
MPEC Distance bout du museau - insertion de la pectorale
DAD Distance insertion de la dorsale - insertion de l’adipeuse
DC Distance insertion de la dorsale - départ de la caudale
DAN Distance insertion de la dorsale - insertion de l’anale
DPEL Distance insertion de la dorsale - insertion de la pelvienne
DPEC Distance insertion de la dorsale - insertion de la pectorale
ADC Distance insertion de l’adipeuse - départ de la caudale
ADAN Distance insertion de l’adipeuse - insertion de l’anale
ADPEL Distance insertion de l’adipeuse - insertion de la pelvienne
ADPEC Distance insertion de l’adipeuse - insertion de la pectorale
PECPEL Distance insertion de la pectorale - insertion de la pelvienne
PECAN Distance insertion de la pectorale - insertion de l’anale
PECC Distance insertion de la pectorale - départ de la caudale
PELAN Distance insertion de la pelvienne - insertion de l’anale
PELC Distance insertion de la pelvienne - départ de la caudale
ANC Distance insertion de l’anale - départ de la caudale
LPRO Longueur préorbitale
DO Diamètre de l’orbite
LPOO Longueur postorbitale
LTET Longueur de la tête
HTET Hauteur de la tête (en passant par au milieu de l’orbite)
LMAX Longueur de la mâchoire supérieure
LAD Longueur de l’adipeuse
LD Longueur de la dorsale
HD Hauteur de la dorsale
LC Longueur de la caudale
LAN Longueur de l’anale
HAN Hauteur de l’anale
LPELG Longueur de la pelvienne gauche
LPECG Longueur de la pectorale gauche
HPED Hauteur du corps au niveau du pédoncule caudal
ETET Largeur de la tête (au niveau des orbites)
m <- na.omit(lascaux$morpho)
m.acp <- dudi.pca(m, scann = F)
L’effet taille est écrasant :
barplot(m.acp$eig)
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 28/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
0
5
10
15
20
25
30
A.B. Dufour & D. Chessel
● ●
●● ●
●● ●●●
●●●●
●
●
●●
●●
●●
●
●●
●●
●
●
●
●
●
●
●●●● ●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●● ●
0
score
0
score
0
score
30 60
20 40
y
30 60
50
y
110
35
15
5 25
25
50
6 12
15 30
y
score
0
score
10
15
0
score
●
●●
●● ●
●
●
●
●
●
●●
●●
●
●●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
LPOO
0
HD
● ●● ●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●● ●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●● ●
●
●
●
●●
−2
LPECG
●
●●●score
●
●●●
●●
0
●
●
●
●●
●
●
●
●●
●●
●
●
●●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●● ●
●
●●
●
●
●
●
●
●
●
●●●
●● ●●
●●
−2
●
−2
y
score
0
LD
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
−2
● ●●●
● ●●
●●
●●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
0
score
●●● score
●
●
●
−2
40
0
●
●● ●●
●
●
●● ●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●●
●
●
●●●
●
●
●
●
−2
y
60 120
y
y
y
● ●
●●●●●
● ●
●● ●
●
●●
●● ●● ●
●●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●●
y
y
y
10
−2
ADC
PECC
0
score
0
LPELG
score
●
●
● ●●
● ●
●● ● ●
●● ●●
●
●●
●●● ●
●●
●●●●●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
0
● ●●
●
●●● ●
●●
●●
●
●●
●●●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
−2
PECAN
DO
MPEC
score
●
●●●
●●
0
●●
●
●
●●
●●●●
●●
● ●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
−2
score
LAD
35
score
● ●●
●
●
● ●●
●
●
●
●●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
0
●●
●
●●
●
●
●
●● ●● ●
●
●
●
●
●
● ●●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−2
●●
●
● ●●●
● ●●●
−2
PECPEL
●
0
HAN
score
●
●
●● ●
●
●
●●●
●● ●
●
●
●●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
−2
DPEC
●
●
●
● ●
● ●●●●●
●● ●●
●●●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
−2
●
●●
●●
●
●●●●
●
●
●●
●
●●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
−2
y
6
0
score
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●●
0
LPRO
0
score
score
14
score
MPEL
−2
● ●
●
●
●●●
● ●
●●
●●●
●● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
−2
●●
●●●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
LMAX
DPEL
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
y
0
0
score
●
●
y
160
y
80
50
20
y
y
30 60
score
ADPEC
ANC
MAN
−2
●
●
●●●
●
●
●
●●
●
●●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
−2
score
0
y
180
80
40
60 120
60
20 100
0
●
●●●
●●
●
● ●●
●●
●●
●
●●●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
−2
30
y
●●●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●●
●●
●●
LAN
●● ●
●
●● ●
●
● ●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
−2
y
15 30
0
● ● score
● ●●
HTET
DAN
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
−2
15
●
0
80
y
score
PELC
0
●●● score
●
●
●●
−2
15 35
score
●
● ●
●●●
●
●
●●
● ●●●
● ●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●●
●
●
●
●●
●
●
0
●
●
●●●
●
●●●
●
●●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●●
●
−2
25
0
y
ADPEL
MAD
−2
●
●●
●
●
●●●
●●●
●
●●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
−2
●
●
●
● ●
●
●●
● ●●●
●●●●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
−2
0
y
0
score
score
y
60 120
30 60
50 100
PELAN
ETET
y
y
score
●● ●
●
●
● ●●
● ●●
●●
●
●●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●●●
−2
DC
●
●
●●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
−2
●
●●
●
●●
●
●
●●●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
−2
10
45
y
20
60
y
30
35
0
LC
0
score
score
y
70
y
20
score ●
LTET
MD
−2
ADAN
−2
y
0
●
●
●●
●●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
−2
10
y
DAD
−2
●
●
●
●●●
●●●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
−2
●
●
●●●
●●
●
● ●●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
−2
10
0
score
y
40
y
80
−2
y
LS
50 100
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
y
y
100 220
score(m.acp)
0
score
●●
● ●●● ●
●●●●
● ●●
●
●
●●
● ●●
● ●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●●
●●
HPED
−2
0
score
0
Une truite qui a perdu un morceau de nageoire dans une bataille est facilement repérée !
Ce qui reste quand on enlève l’effet taille concerne, grossièrement parlant, la forme.
Les variables 13, 18, 20 et 27 présentent des outliers. Elles sont exclues :
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 29/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
0
10
y
●●
●●
● ●●
●●
●
●● ●
●
●●
●
●●●● ●
●
●
●
●
●
●●
●
●●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●●
●●
20
●score
●
●● ●
●
●
●
●●
●●
●●●
●
●●
●
●●
●●
●
●
●●
●●
●
●
●
●
●
●●●● ●●
●
●●●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●●
● ●●●
●
●
●
●
●
●
●
●
●●
●
●
●●●●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
ETET
−2
0
score
30 50
y
20 35 50
y
y
30 50 70
120
60
60
30
40
20
0
●●
●
●●●●
●
●●●●●
● ●
●
●●
●
●
●●
●
●●
●●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
ANC
10
score
y
0
score
LAD
●●●
●●
● ●
● ●
●● ●
● ● ●●
●●
●
●●
●●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●●
●●
−2
HAN
0
score
●●
0
●●
●
●●●
●
● ●●● ●
● ●
●●
●
●●
●
●●
●
●
●
● ●
●
●
●
●●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
−2
score
ADC
●
20
HTET
y
30
15
●
●●
●
●
●●●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●●
●
●
●
0
●
●
●
●●
●
● ●●
●●●
●●●
●
●●
●●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
−2
0
score
30
y
y
y
LAN
−2
PELAN
●●
● ●●
●
●●
●● ●
●●
●
●
●●
●
● ●●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●●
−2
MPEC
−2
score
0
●
●●●
●●
● ●●
● ●
●● ●
●
●●
●●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
0
15
●
−2
LTET
score
20
score
LC
● ●
●●
●
●
●●●
● ●●
●●
●
● ●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●●●
●
●
●●
●●
●
●●
●
●●
DPEC
−2
●
●
●
● ●
●
●●
●
● ●●●
● ●●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
−2
y
30
0
score
0
●
●
●
●●●
● ●●
●
● ●●●
●●
●
● ●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●●
●●
−2
●● ●score
15 30
●
●
●●
●
●
0
10
y
HD
PECAN
0
●
●●
●●
●●●●
●
●
● ●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
−2
score
50
y
LPOO
30
●
●●
● ●
●
●●●
●● ●●
●
●
●
●●
●
●
●
●●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
MPEL
score
0
●●
●
●
●●
●●●●●
●
● ●
●
●●
● ●●
●
●
●●
●●●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
−2
10
score
●●
●
●
●●●
●
●●
● ●
● ●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●●
●
●
●
●
score
DPEL
y
60
90
y
0
●
●
●
●●
●●
● ●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
−2
score
50
PECPEL
0
●
●
● ●
●
●
●
●
●
● ●
● ●●
●
●●●
●● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
−2
●●
●●
●
●
●●
●
●● ●
●●
●●
● ●
●
●● ●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●●
−2
MAN
y
140
80
y
y
40
70
40
30
30
15
y
0
HPED
20
0
score
●
● ● ●● ●
●●●
● ●
●●● ●
●●
●
●●●●
●●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
0
20
80 140
y
score
DO
−2
DAN
−2
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
−2
score
●
0
●
●
● ●●
● ●
●
●●
10
y
15 30
y
0
ADPEC
−2
LPECG
−2
●
●●
●● ●
●●● ●
●●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
0
●
●●●
●
●
●
●● ●●
●●
●●
● ●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
−2
60
60
60
20
0
● score
●
●
●
●●
●
●
●
●●●
●
● ●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●● ●●
●
●
0
5
y
15
y
30
score
LD
MAD
score
score
−2
40
0
●●●
●
●
●
● ●●●
● ●●
● ●●
●
●●
●
●
●●
●●
●●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
−2
16
LPRO
6 10
score
●●
●
● ●●
●
●
●● ●● ●●
●●
●
● ●
●
●●
●●●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
−2
score
DC
−2
y
18
y
12
6
0
●
●
●●
●●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
−2
y
ADPEL
y
60
y
30
●
●●
●
●
●●●
●●●
●
●●
●●
●●
●
●● ●
●
●●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
0
●
●
●
●●
●●
●
●
●●●
●
● ●●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
−2
120
0
score
−2
MD
y
score
DAD
●●●
●●
●
● ●●
●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●●
−2
●
●●
●
● ●
●●●
●●●
●●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
−2
y
40
y
70
●
●
0
120
−2
90
LS
50
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
y
200
100
y
m <- na.omit(lascaux$morpho)[, -c(13, 18, 20, 27)]
m.acp <- dudi.pca(m, scan = F, nf = 3)
score(m.acp)
0
score
●
●●
●
●●
●
●
●●
●●
● ●●
●● ●
●
●
●●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
● ●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●●
●●
●●
●
LPELG
−2
0
score
0
Noter que le problème esr résolu, avec brutalité, certes !
names(lascaux$sex) <- row.names(lascaux$morpho)
names(lascaux$gen) <- row.names(lascaux$morpho)
Enlever des données le modèle issu de l’effet taille ou dépouiller l’analyse à partir du
facteur 2 donne strictement le même résultat. C’est l’illustration qu’une ACP peut
faire deux choses de nature radicalement différente. Le plan 2-3 est directement une
analyse de la forme :
s.class(m.acp$li[, 2:3], lascaux$sex[as.numeric(row.names(m.acp$li))])
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 30/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
d=1
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
● ● ●
●
●●
●
●
●
●
●
●
●
F
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●●●
●
●
● ●
●● ●
●
●
●
● ● ●
● ●
●
●
●
●
●
● ● ●●
●
●
● ● ●●● ● ●●
● ●
●
●
●
●
●
●
●
●● ● ●
●
● ● ●● ●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
M
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
Dans la forme, il y a certainement une composante sexuelle.
s.class(m.acp$li[, 2:3], lascaux$gen[as.numeric(row.names(m.acp$li))])
d=1
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
● ●
●●
●
● ●●
●
●
●
● ● ●●
●
● ●
●●
●
●
● ●
●●● ● ●●
●
●
●
●
● ● ● ● ●
●●
●
●
●
●
●
●● ● ●
●●
●
● ● ●● ●
●
●
●
●
●
●●
● ● ●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
5
4 6
0
1 23
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
Dans la forme, il y a certainement une composante génétique.
par(mfrow = c(1, 2))
barplot(m.acp$eig[1:15])
barplot(m.acp$eig[2:16])
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 31/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
0.4
0.0
0
10
20
0.8
A.B. Dufour & D. Chessel
Une ACP peut ainsi en cacher une autre. On peut analyser l’effet taille par une ACP
non centrée sur le tableau des log doublement centré :
mlog <- log(m)
mlog0 <- bicenter.wt(mlog)
mlog0 <- data.frame(mlog0)
names(mlog0) <- names(m)
mlog0.acp <- dudi.pca(mlog0, scan = F, scal = F, cent = F)
Ceci indique que l’introduction des deux facteurs directement dans l’analyse est pertinente. L’ACP inter-classe est celle des centres de gravité des sous-nuages.
croi <- lascaux$gen[as.numeric(row.names(m.acp$li))]:lascaux$sex[as.numeric(row.names(m.acp$li))]
plot(between(mlog0.acp, croi, scan = F))
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 32/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
LAN
d = 0.2
DO
d = 0.2
HAN
LD
ANC
PELAN
DADADPEL
DC
DAN
ADPEC
LS
MAD
ADC
LPECG
PECAN
MAN
HD
LPELG
DPEL
MPEL
MD MPEC
HPED
PECPEL
LC LTET
DPEC
LPOO
HTET
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
LAD
●
●
●
●
●
●
●
LPRO
ETET
Canonical
weights
●
●
●
●
● ●
●
2:F
●
●
●
LD
ANC
PELAN
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
5:M
●
●
4:M
●
●●
●
●●
●●
●
●●
●
6:M
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
DAD
DC
ADPEL
DAN
ADPEC
LS
MAD
ADC
PECAN
MAN
DPEL
MPEL
HPED
MD MPEC
LC
LTET
PECPEL DPEC
LPOO
HTET
●
●
●
●
● ●●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
3:M
1:M
2:M
●
●
●
●●●
● ●
●
●●
●
●
DO
HAN
●
●
●
●
●
●
●
●
● ●
0:F
1:F
3:F0:M
●
●
●
●
●
6:F
●
4:F
●
●
●
●●
●
●
●
●
●
● ●
●
●
5:F
●
●
LAN
●
●
●
●
●
●
●
●●
●
LPECG
LPELGHD
●
●
●
●
●
LAD
ETET
LPRO
Variables
Scores and classes
d = 0.1
Eigenvalues
Axis2
5:F
6:F
4:F
Axis1 2:F
0:F1:F
3:F 0:M
5:M
1:M
2:M
Inertia axes
4:M
3:M
6:M
Classes
Ces truites sont classées en 7 groupes génétiques par le nombre d’allèles méditerranéens. La variable génétique prend les modalités 0 (0 allèle méditerranéen, homozygotes
atlantiques, truites dites modernes), 1 à 5 (respectivement 1 à 5 allèles méditerranéens)
et 6 (6 allèles méditerranéens, homozygotes méditerranéens, truites dites ancestrales).
Les modernes sont à gauche, les ancestrales à droite, les femelles en haut et les mâles
en bas. Le code des variables permettra de poursuivre.
J.M. Lascaux a mesuré 15 variables de coloration de la robe :
names(lascaux$colo)
[1] "PRAD"
"PNAD"
[8] "PRINF"
"PNINF"
[15] "PND"
"PRAA"
"PRSUP"
"PNAA"
"PNSUP"
"PRDA"
"PNO"
"PNDA"
"PRLI"
"PNPERIOP" "PRD"
PRAD Nombre de points rouges avant l’aplomb de la Dorsale
PNAD Nombre de points noirs avant l’aplomb de la Dorsale
PRAA Nombre de points rouges après l’aplomb de l’Anale
PNAA Nombre de points noirs après l’aplomb de l’Anale
PRDA Nombre de points rouges entre aplomb Dorsale et aplomb Anale
PNDA Nombre de points noirs entre aplomb Dorsale et aplomb Anale
PRLI Nombre de points rouges sur la ligne latérale
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 33/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
PRINF Nombre de points rouges au dessous de la ligne latérale
PNINF Nombre de points noirs au dessous de la ligne latérale
PRSUP Nombre de points rouges au dessus de la ligne latérale
PNSUP Nombre de points noirs au dessus de la ligne latérale
PNO Nombre de points noirs sur l’opercule
PNPERIOP Nombre de tâches noires à la périphérie de l’opercule
PRD Nombre de points rouges sur la Dorsale
PND Nombre de points noirs sur la Dorsale
0
1
2
3
4
5
6
pcacol <- dudi.pca(lascaux$colo, scann = F)
barplot(pcacol$eig)
Questions
1) Trouver en quoi une partie de la corrélation est arte-factuelle dans ce tableau.
2) Interpréter les composantes.
3) Trouve-t-on dans la coloration de la robe une composante génétique ?
4) Trouve-t-on dans la coloration de la robe une composante environnementale ?
J.M. Lascaux a mesuré 15 variables ornementales qualitatives :
names(lascaux$ornem)
[1] "ocpr"
"ocpn"
[10] "frpel" "frc"
"maju"
"taop"
"ptsdos" "coufl"
"pntet"
"coupn"
"frd"
"zeb"
"ptsad"
"frad"
"fran"
ocpr Ocelles autour des points rouges (1 nulles, 2 faibles, 3 marquées)
ocpn Ocelles autour des points noirs (1 nulles, 2 faibles, 3 marquées)
maju Marques juvéniles (1 absence, 2 présence)
taop Tache operculaire (1 absence, 2 présence)
pntet Points noirs sur la tête (1 absence, 2 présence)
frd Frange de la dorsale (1 aucune, 2 blanche, 3 blanche et noire)
ptsad Points sur l’adipeuse (1 absence, 2 présence)
frad Frange de l’adipeuse (1 aucune, 2 rouge, 3 très rouge)
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 34/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
fran Frange de l’anale (1 aucune, 2 blanche, 3 blanche et noire)
frpel Frange des pelviennes (1 aucune, 2 blanche, 3 blanche et noire)
frc Frange de la caudale (1 aucune, 2 plus ou moins rouge)
ptsdos Points sur le dos (1 absence, 2 présence)
coufl Couleur des flancs (1 brun-jaune, 2 gris, 3 blanc argenté)
coupn Contour des points noirs du flan (1 net, 2 flou)
zeb Zébrures sur les flancs (1 absence, 2 présence)
summary(lascaux$ornem)
ocpr
ocpn
maju
1:203
1:238
1:111
2: 66
2: 48
2:195
3: 37
3: 20
frc
ptsdos coufl
1:249
1:248
1:213
2: 57
2: 58
2: 74
3: 19
taop
1: 57
2:249
pntet
1: 89
2:217
coupn
1:239
2: 67
zeb
1:278
2: 28
frd
1:178
2: 14
3:114
ptsad
1:218
2: 88
frad
1: 51
2:251
3: 4
fran
1: 27
2:173
3:106
frpel
1:107
2:146
3: 53
On retrouvera ces données plus tard.
9
Voir le tableau
Source : Devillers, J., J. Thioulouse, and W. Karcher. 1993 [2].
data(toxicity)
head(toxicity$tab)
V1
V2
V3
1 1.224 1.824 3.004
2 1.438 1.540 2.630
3 0.679 1.893 2.855
4 1.001 1.390 2.807
5 1.097 1.747 2.815
6 0.967 1.666 2.870
toxicity$species
[1]
[3]
[5]
[7]
[9]
[11]
[13]
[15]
[17]
V4
3.323
2.801
2.293
2.939
3.086
3.198
V5
2.579
2.833
2.547
3.636
3.170
3.358
V6
5.241
5.284
4.943
4.926
5.303
5.441
"Daphnia magna (LC50)"
"Orconectes immunis"
"Oncorhynchus mykiss"
"Gambusia affinis"
"Carassius auratus"
"Arbacia punctulata (embryo)"
"Photobacterium phosphoreum"
"Daphnia magna (EC50)"
"Ceriodaphnia reticulata"
V7
6.264
4.024
3.163
6.103
6.364
6.120
"Tanytarsus dissimilis"
"Rana catesbeiana"
"Lepomis macrochirus"
"Ictalurus punctatus"
"Pimephales promelas"
"Arbacia punctulata (sperm)"
"Arbacia punctulata (DNA)"
"Daphnia pulex"
toxicity$chemicals
[1] "2-Methyl-2,4-pentanediol" "2-Methyl-1-propanol"
[4] "2,4-Pentanedione"
"2-Chloroethanol"
[7] "Pentachlorophenol"
"2,2,2-Trichloroethanol"
"Hexachloroethane"
On a la toxicité de 7 molécules (colonnes) sur 16 cibles (lignes) exprimée en : log(mol/litre)
data(toxicity)
table.paint(toxicity$tab, row.lab = toxicity$species, col.lab = toxicity$chemicals)
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 35/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
Daphnia magna (LC50)
Tanytarsus dissimilis
Orconectes immunis
Rana catesbeiana
Oncorhynchus mykiss
Lepomis macrochirus
Gambusia affinis
Ictalurus punctatus
Carassius auratus
Pimephales promelas
Arbacia punctulata (embryo)
Arbacia punctulata (sperm)
Photobacterium phosphoreum
Arbacia punctulata (DNA)
Daphnia magna (EC50)
Daphnia pulex
Ceriodaphnia reticulata
1]
2]
3]
4]
5]
Pentachlorophenol
Hexachloroethane
2−Chloroethanol
2,4−Pentanedione
2,2,2−Trichloroethanol
2−Methyl−1−propanol
2−Methyl−2,4−pentanediol
A.B. Dufour & D. Chessel
6]
table.value(toxicity$tab, row.lab = toxicity$species, col.lab = toxicity$chemicals)
f1 <- as.factor(row(as.matrix(toxicity$tab)))
f2 <- as.factor(col(as.matrix(toxicity$tab)))
tox <- unlist(toxicity$tab)
anova(lm(tox ~ f1 + f2))
Analysis of Variance Table
Response: tox
Df Sum Sq Mean Sq F value Pr(>F)
f1
16
5.933
0.371
1.1217 0.3467
f2
6 244.098 40.683 123.0630 <2e-16 ***
Residuals 96 31.736
0.331
--Signif. codes: 0
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 36/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
Pentachlorophenol
Hexachloroethane
2−Chloroethanol
2,4−Pentanedione
2,2,2−Trichloroethanol
2−Methyl−1−propanol
2−Methyl−2,4−pentanediol
A.B. Dufour & D. Chessel
Daphnia magna (LC50)
Tanytarsus dissimilis
Orconectes immunis
Rana catesbeiana
Oncorhynchus mykiss
Lepomis macrochirus
Gambusia affinis
Ictalurus punctatus
Carassius auratus
Pimephales promelas
Arbacia punctulata (embryo)
Arbacia punctulata (sperm)
Photobacterium phosphoreum
Arbacia punctulata (DNA)
Daphnia magna (EC50)
Daphnia pulex
Ceriodaphnia reticulata
1 3 5
7
On a fait le tour. Commenter.
10
Notes d’abondance
Quand on pense l’ACP comme approche d’un échantillon d’une loi normale multivariée,
son usage sur un tableau de notes binaires (0-1) est sans objet. On dit à l’utilisateur
”qu’il n’a pas le droit”. Quand on pense l’ACP comme une opération géométrique ou la
maximisation d’une forme quadratique, on dit à l’utilisateur ”qu’il a le droit”. Quand
un lecteur formé à une école juge le manuscrit d’un auteur formé à l’autre école, le
dialogue est sommaire et tous les coups sont permis. Le même problème se pose pour
les notes d’abondance (souvent de 0 à 7 en phytosociologie). L’ACP d’un tableau florofaunistique est valide comme première approche d’objectifs précis. Elle a marqué les
pionniers [5] par la facilité avec laquelle elle permet des cartes de synthèse.
data(jv73)
par(mfrow = c(5, 4))
for (m in 1:19) {
s.value(jv73$xy, jv73$poi[, m] - mean(jv73$poi[, m]), incl = F,
contour = jv73$contour, sub = names(jv73$poi)[m], possub = "topleft",
csub = 3, cleg = 0)
}
s.value(jv73$xy, dudi.pca(jv73$poi, scal = F, scan = F)$li[, 1],
incl = F, contour = jv73$contour, sub = "PCA", possub = "topleft",
csub = 3, cleg = 0)
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 37/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
Chb
Omb
Van
Spi
Tan
d = 50
d = 50
d = 50
d = 50
d = 50
Tru
Bla
Che
Gou
Gar
d = 50
d = 50
d = 50
d = 50
d = 50
Vai
Hot
Bar
Bro
Lam
d = 50
d = 50
d = 50
d = 50
d = 50
Loc
Tox
Lot
Per
PCA
d = 50
d = 50
d = 50
d = 50
d = 50
L’ACP est en fait souvent discutée sur ce genre de tableau parce que c’est une méthode
qui introduit une grande dissymétrie entre le traitement des lignes et des colonnes. Elle
est centrée sur la différence entre les lignes et la redondance entre colonnes. Dans un
tableau sites-espèces, on s’appuie sur la covariance entre espèces pour discriminer les
sites ; dans un tableau espèces-sites, on fait l’inverse. Seule l’analyse des correspondances fait les deux, c’est-à-dire, donne un compromis des deux objectifs. L’important
est de ne pas utiliser la procédure à l’aveugle. La difficulté duale est la grande diversité
des structures de ce genre de tableaux et l’objectif est de trouver le particulier par le
biais de procédure générale.
data(trichometeo)
trichometeo$fau comporte 49 pièges et 17 espèces.
t1 <- data.frame(apply(log(trichometeo$fau + 1), 2, function(x) x/sum(x)))
s.distri(dudi.pca(t1, scan = F, scale = F)$l1, t1, cell = 0, csta = 0.3,
clab = 1)
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 38/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
d=1
●
●
Che
●
Hve
●
Ath
Cea
●
Hfo
Glo
●●
Set
Hsp Han
All
Ced
Psy
●
Sta
Aga
Hym
●
● Hys
● ●●●●
●
●
●
●●
●
●
● ● ●●●
●
●●
●
●● ●
●
● ●
●
Hyc
●
●
Les dénombrements sont généralement transformés par x 7→ log (x + 1). Le tableau
[11] est passé en pourcentage par colonnes (espèces). Un taxon est une distribution de
fréquence entre sites sur une colonne. La moyenne est 1/n et la colonne centrée est
l’écart entre la distribution de l’espèce et la distribution uniforme. Une composante
principale est un score normalisé des sites et la coordonnée de l’espèce est exactement
la position moyenne sur ce score. L’analyse maximise la somme des carrés des écarts
des positions moyennes à l’origine qui est la position du taxon indifférent :
sco.distri(dudi.pca(t1, scan = F, scale = F)$l1[, 1], t1)
Hys
●
●
All
Hym
●
Aga
●
●
Sta
Psy
●
●
Ced
●
Che
●
Han
●
Ath
Hsp
●
●
Hve
●
Set
●
Glo
●
●
●
Hfo
Hyc
Cea
d=1
L’essentiel est que certains relevés ont décalé la majorité des espèces dans le même
sens (une configuration météorologique particulière provoque l’émergence simultanée
des trichoptères). Autre ambiguı̈té : la covariance de deux variables est une mesure
de ressemblance même quand elle est négative (une corrélation de -1 indique qu’on a
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 39/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
mesuré exactement la même chose au signe près). Pour deux espèces, c’est au contraire
une mesure de différence : une espèce disparaı̂t quand l’autre apparaı̂t. L’ACP a alors
le statut de méthode qui fait une double typologie. Le même calcul n’a pas le même
sens expérimental.
t1 <- data.frame(apply(jv73$poi, 2, function(x) x/sum(x)))
par(mfrow = c(1, 2))
s.distri(dudi.pca(t1, scan = F, scale = F)$l1, t1, cstar = 0.3,
clab = 1, cell = 0, axesel = F, sub = "PCA")
s.distri(dudi.coa(jv73$poi, scan = F)$l1, jv73$poi, cstar = 0.3,
clab = 1, cell = 0, axesel = F, sub = "COA")
d=2
d=1
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
Tan
Gar
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
BroPer
●
●
●
●
●
Gou
●
TruVai
Loc Che
Van
Chb
Lam
Bla
●
●
Omb
Lot
●
Spi
Tox
●
Hot
Bar
●
●
●●
Bar
Hot
Tox
●
●
Spi
●
Bla
Lam Omb
Lot● ●
Chb
●
●
Van
●
●
●
Per
●
Gou
●
●
Che
●
●
Tru
●
Bro
●
●●
●
●
● Gar
●
Tan
●
●
Vai
● ●●
●
Loc
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
PCA
●
●
●
●
●
●
●
COA
Les deux analyses parlent de la typologie de distributions des espèces. La carte de
l’ACP est bien meilleure en ce sens qu’elle indique le gradient amont-aval de richesse
croissante (salmonidés-cyprinidés) et dans la seconde partie le gradient de vitesse de
courant (lénitique-lentique). Les contraintes de l’AFC déforment sans gain précis cette
vision juste du réseau hydrographique.
data(aviurba)
names(aviurba$fau) <- aviurba$species.names.fr
t1 <- data.frame(apply(aviurba$fau, 2, function(x) x/sum(x)))
par(mfrow = c(1, 2))
s.distri(dudi.pca(t1, scan = F, scale = F)$l1, t1, cstar = 0.3,
clab = 0.75, cell = 0, axesel = F, sub = "PCA")
s.distri(dudi.coa(aviurba$fau, scan = F)$l1, aviurba$fau, cstar = 0.3,
clab = 0.75, cell = 0, axesel = F, sub = "COA")
d=1
d=2
●
Chouca.des.tours
●
●
●
●
Pigeon.domestique
●
●
● ●●
●
Fauvette.des.jardins
Troglodyte Rougequeue.a.front.blanc
●
●
●
Pie.bavarde
●
●
●
●
Mouette.rieuse
Pouillot.fitis
Moineau.friquet
●●
Tourterelle.turque
Martinet.noir
Rougequeue.noir
Verdier
●
●
●
Hirondelle.de.fenetre
Etourneau.sansonnet
Moineau.domestique
Hypolais.polyglote
Pouillot.veloce
Serin.cini
Merle.noir
●
Pinson.des.arbres
Hirondelle.de.cheminee
Mesange.bleue Tourterelle.des.bois
●●
Mesange.charbonniere
Alouette.des.champs
●
●
Rossignol.philomele
Corneille.noire
Fauvette.a.tete.noire
●
Bruant.proyer
Faucon.crecerelle
Chardonneret.elegant
Loriot.jaune Linotte.melodieuse
Coucou.gris
Traquet.patre
●
●
●
●
Busard.cendre
●
Traquet.tarier
●
Mesange.bleue
●
Loriot.jaune
Fauvette.des.jardins
Troglodyte
Pouillot.fitis
Pouillot.veloce
●
Linotte.melodieuse
●
●●
Pinson.des.arbres
●
Tourterelle.des.bois
●
Mesange.charbonniere
●
Rossignol.philomele
●
●
Etourneau.sansonnet
Fauvette.a.tete.noire
● ● ●
Merle.noir
●
●
Serin.cini
Moineau.friquet
Tourterelle.turque
●
●
Pigeon.domestique
●
Verdier
● ●●●●●
Moineau.domestique
Chouca.des.tours
●● ●
●●
●
Pie.bavarde
Chardonneret.elegant
Hirondelle.de.fenetre
Coucou.gris
Bruand.zizi
●
Martinet.noir
Rougequeue.noir
Rougequeue.a.front.blanc
●
●● ●
Corneille.noire
Faucon.crecerelle
●
● ●
● ●Mouette.rieuse
Hirondelle.de.cheminee
Bruant.proyer
●
Hypolais.polyglote
●
Alouette.des.champs
●
●
●
Busard.cendre
●
Traquet.patre
●●
●
Bruand.zizi
●
Traquet.tarier
Bergeronnette.printaniere
Bergeronnette.printaniere
●
PCA
●
COA
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 40/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
Deux gradients centre-ville vers rural fermé et centre-ville vers rural ouvert pour l’ACP
correspondent aux deux gradients urbain-rural et ouvert-fermé pour l’AFC. L’avantage est à la seconde. Il n’y a pas de meilleures méthodes, il n’y a que de meilleures
interactions entre une méthode et un objet. Le même raisonnement s’applique aux
tableaux espèces-relevés en pourcentage par lignes (profils espèces) centrés par lignes.
Les moyennes se calculent toujours par espèces et la normalisation par espèces est à
éviter. Si on désire d’abord éviter toute réflexion préalable, on peut préférer l’analyse
des correspondances (on aura souvent un résultat acceptable mais ce peut être une
erreur totale).
data(steppe)
512 relevés par quadrat sur un transect de 5.12 km ont enregistré la présence / absence
de 37 espèces (milieu steppique). A gauche l’ACP du tableau non modifié,à droite son
AFC :
0.0
−1.5
PCA
1.5
par(mfrow = c(2, 1))
plot(dudi.pca(steppe$tab, scan = F, scale = F)$li[, 1], pch = 20,
ylab = "PCA", xlab = "", type = "b")
plot(dudi.coa(steppe$tab, scan = F)$li[, 1], pch = 20, ylab = "COA",
xlab = "", type = "b")
●
●
●
●
●●●●
●●
●●
●
●●● ● ●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●●
●●
●● ●●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
● ●●
●●●●●
●
●● ●
●
●
●
●
●
●
●
●●
●●
●●
● ●●
●
●
●●
●
● ● ●●
●●
●● ●
●●●
●●
●
●
●
●●
●
●
●
●●
● ●●●
●
●
●●
●
●●
●
●
●
●
●
●● ●●
●
●● ●
●●
●
●●●●●● ●
●
●
●
●
●
●
●●
●
● ●●
●●
● ●
● ●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
● ●●
●
● ●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●● ● ●
●●
●
● ●
● ●
●
●
●
●● ●
●
●
● ● ●●
●●
●●
●
●
●
●
●
●
●
●
●
● ● ●●●
●
●●●●
● ●● ●
●
●●
●
●●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●●●●
●
●
●● ●
●
●
●
●●
●●●●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●● ●●
●●● ● ●
●●
●
●
●●
●●
●
●
●
●●
●
● ●●●
●●
●
●●● ●
●● ● ●
●●●
●
●
●●
●●●●●●
●●●●●
●
●
●
●
●
● ●
●● ●●
●
●●
●
●
● ●●
●●
●●
●
●
●
●● ●●● ●
●● ●
●●●
●● ●
●
●
●
●
●
●●
●●● ● ●
● ●
●●
●
● ●●
●●
●●
●
●●●
●
●
●●
●
●
●● ●
●
●
● ●
●
●●
●●
●
●●
●
●
●
●
●●
●
0
100
200
300
400
500
1
−1
COA
2
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●●●
●●
●
●●
●
●
●
●●
●
●●
●
●●
●
●●●
●
●●
●
● ●●
●
●
●
●
● ● ●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●● ●● ● ●
●
●
●● ●●
● ●
●
●●
●●● ● ●
●
● ●
●● ●
●
●
● ●
●
●●
● ●
●
●
●●
●
●
●●●
●
●
● ●●●
● ●●
●
●
●
●
●●●
●
● ● ●●
●●●
●●
●● ●
●
●●
●●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●●
●●
●
●
●
●
●
●
● ● ●●
●
●● ●
●
●●
●
●
●●
●●
● ●
●
●
●
●
●
●
●
●●
●●
● ●●
●
●
●
●●● ● ●●
●
●
●
●
●●●
●
● ●● ●
●
●
● ●●
●●
●● ●●●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●● ●
●●
●●●
●
●
●
●●●●
●●●●
●●
●
●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●●
● ●●
● ●●●
●●
●●
●
●●●
●
●●
●
●
●
● ●●●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
0
100
200
300
400
500
table.value(t(steppe$tab[1:512, 1:24]), clabel.c = 0, csize = 0.05,
cleg = 0)
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 41/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
E1
E2
E3
E4
E5
E6
E7
E8
E9
E10
E11
E12
E13
E14
E15
E16
E17
E18
E19
E20
E21
E22
E23
E24
On obtient cette fois un résultat très voisin. L’ACP n’est cependant qu’un point
de départ dans la question finalement difficile de l’ordination [3] des tableaux florofaunistiques. Ces exemples illustrent l’insertion de la procédure dans un comportement
adapté à chaque cas.
11
Ne pas se tromper de centrage
data(veuvage)
par(mfrow = c(3, 2))
for (j in 1:6) plot(veuvage$age, veuvage$tab[, j], xlab = "^
age",
ylab = "pourcentage de veufs", type = "b", main = names(veuvage$tab)[j])
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 42/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
70
80
70
80
40
40
20
90
●
●
●●●
●
●
●●
●
● ● ●
●● ●
●●●●●●●●
●●●●●●●●●●●●
50
60
70
âge
Employe
Ouvrier
70
80
90
0
●
●
●●●
●●
●●
●●●●
●
●●●●
●●●●●●●
●●●●●●●●●●
20 40 60
âge
●
90
●
0
pourcentage de veufs
60
Cadre_moyen
60
40
20
0
80
Cadre_sup
●
60
70
âge
●
50
60
âge
●●●
●●●●●
●●●●
●●●●
●
●
●
●●●●●●●●●●●●●● ●●
60
●
●
●●
●
●
●
●
●●
●●
●●●●
● ● ●●
●●● ● ●
●●●●●●●●●●●
50
●
50
20
90
40
20
0
pourcentage de veufs
60
●
0
●●
●●●
●●
●
●
●●●
●●●●
●●●●●●
●●●●●●●●●●●●●
pourcentage de veufs
●
●
50
pourcentage de veufs
Artisan
pourcentage de veufs
20 40 60
0
pourcentage de veufs
Agriculteur
80
90
●
●
●
●
●●
●●
●
●
●
●●
●●●
●●●●●
●●●●●
●●●●●●●●●●●
50
60
âge
70
80
90
âge
par(mfrow = c(2, 2))
PCA1 <- dudi.pca(veuvage$tab, scal = F, scan = F)
PCA2 <- dudi.pca(t(veuvage$tab), scal = F, scan = F)
dotchart(PCA1$co[, 1], lab = names(veuvage$tab))
plot(veuvage$age, PCA1$li[, 1], type = "b", ylab = "ACP1")
dotchart(PCA2$li[, 1], lab = names(veuvage$tab))
plot(veuvage$age, PCA2$co[, 1], type = "b", ylab = "ACP2")
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 43/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
●
Employe
●
●
●
Artisan
50
ACP1
●
Cadre_moyen
Cadre_sup
●
100
Ouvrier
0
●
Agriculteur
●
12
13
14
15
●
●●
●
●
●
●●
●
●
●
●●
●●
●●●
●●●●
●
●
●
●
●●●●●●●●●
50
60
70
80
90
6
veuvage$age
Ouvrier
Employe
●
●
Agriculteur
●
−20
0
10
4
1
Artisan
●
2
●
●
●
●●
3
ACP2
●
Cadre_moyen
Cadre_sup
●
●
5
●
●
●
●
●
● ●
●
● ●● ●
● ●●
●●● ●
●●● ●●●●●
●●● ●
50
60
70
80
90
veuvage$age
Ces deux analyses se ressemblent fort et disent deux choses radicalement différentes.
Lesquelles ?
12
Information supplémentaire
Plusieurs auteurs ont souligné que le terme supplémentaire s’applique souvent de manière abusive à tout ce qui ne fait pas partie du tableau des données alors qu’on
devrait bien réserver le terme projection en individus supplémentaires à une opération
géométrique précise.
data(trichometeo)
names(trichometeo)
[1] "fau"
"meteo" "cla"
On a un tableau de 11 variables météorologiques [11]. Chaque ligne est une journée
d’été.
meteo.pca <- dudi.pca(trichometeo$meteo, scan = F)
par(mfrow = c(2, 2))
s.label(meteo.pca$li)
s.class(meteo.pca$li, trichometeo$cla)
s.traject(meteo.pca$li, trichometeo$cla)
s.corcircle(meteo.pca$co)
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 44/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
d=2
11
12
●
10
●
●
2 41 25
1
840 24
26 42
43
44
3
9
45 28 18
13
14
33
35
48
23
19
29
32
27 1665
49
38
34
46
20
15 4
21
39
36 37
17
30
31
d=2
47
●
●
●
●
●
●
●
7
5●●
●
● ●
●
●
●
●
●
●
●
●
●
1 4
9
●
●
●
11
●
7
● 10
●
●
12
2 ●●
3 8
●
●
●
●
●
●
●
●
●
●
6
●
●
●
●
22
●
●
d=2
Pression
1
4
●
●
●
11
7
●
●
5
●
●
6
Var.Pression
●
●
●
10
2
●
●
9
Humidite
12
8
3
T.soir
T.max
Precip.Tot
Nebu.Moy
Precip.Nuit
T.min
Vent
Nebu.Nuit
Il y a beaucoup de redondance dans les mesures. La carte des lignes supporte un premier type d’information supplémentaire. Chaque journée est suivie d’une nuit pendant
laquelle fonctionne un piège lumineux destiné à la capture des trichoptères émergeant
de la rivière. Ce piège est récolté régulièrement mais non chaque jour (week-end ?). On
a gardé le résultat obtenu pour une nuit unique de piégeage et le facteur ’cla’ donne
les groupes de nuits consécutives. Ces groupes sont représentés sur la carte factorielle :
il s’agit simplement d’information complémentaire. Sur les trajectoires, on voit qu’il
faut lire une sorte de mouvement circulaire. On note la succession haute pression (beau
temps) puis fortes températures puis précipitations (orages d’été) caractéristiques du
temps estival de la région.
Le tableau ’fau’ associé contient les abondances d’animaux capturés dans le piège
triés par espèce. La question porte sur l’influence des variables météorologiques sur
l’abondance des piégeages lumineux. Le tableau faunistique a 17 espèces (variables).
Les variables faunistiques sont supplémentaires :
tabsup <- scalewt(log(trichometeo$fau + 1))
w <- supcol(meteo.pca, tabsup)
s.corcircle(meteo.pca$co)
s.corcircle(w$cosup, add.p = T, clab = 1.5)
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 45/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
Pression
Hys
T.soir
Var.Pression
Che
Hve All
Ath
Glo
Hfo
Hsp
Hyc
Aga Han
Cea
Hym
Set
Psy
Sta
Ced
Humidite
Precip.Tot
Nebu.Moy
Precip.Nuit
T.max
Vent
T.min
Nebu.Nuit
Les projections des variables supplémentaires normalisées (vecteurs de norme 1) donnent
des coordonnées qui sont des coefficients de corrélation avec les coordonnées factorielles. Ces corrélation sont faibles mais de même signe. Les nouvelles variables ont été
projetées sur les deux composantes principales. Il s’agit d’une projection de variables
supplémentaires au sens euclidien. Mais on peut considérer également que chaque espèce définit de l’information supplémentaire pour le plan des individus :
s.distri(meteo.pca$li, log(trichometeo$fau + 1), csta = 0.3, cell = 0,
clab = 1)
d=2
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
Che All
Hve ● Ath PsyHym
●
Sta
Aga
Glo
Ced
Hsp
●
●
Han
●
●
Hfo
●
●
Set
●
Hyc
●
●
●
●
●
●
●
Cea
●
Hys
●
●
●
●
●
●
●
●
On a ainsi superposé les moyennes des positions des espèces. On pourra aussi représenter l’abondance des espèces sur le plan. Ici domine l’idée d’une combinaison de
variables météorologiques ayant une influence commune sur les émergences de tous les
taxons. Notons enfin qu’il arrive que de véritables projections euclidiennes soient également des représentations par moyennes de distribution et que les notions d’individus
supplémentaires et d’information supplémentaire se confondent.
Quoiqu’il en soit le graphique appliqué à la statistique multidimensionnelle est un
moyen d’expression. Cela suppose quelques libertés dans les choix et la référence à un
comportement “conforme à la règle” peut être le signe d’une certaine absence d’imagination. Ce n’est évidemment pas une raison pour faire n’importe quoi.
En conclusion, l’analyse en composantes principales est une procédure simple utilisable
de multiples façons. Elle donne à voir dans Rp .
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 46/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf
A.B. Dufour & D. Chessel
Références
[1] G. Carrel. Caractérisation physico-chimique du Haut-Rhône français et de ses
annexes : incidences sur la croissance des populations d’alevins. PhD thesis,
Université Claude Bernard, Lyon 1, 1986.
[2] J. Devillers, J. Thioulouse, and W. Karcher. Chemometrical evaluation of
multispecies-multichemical data by means of graphical techniques combined with
multivariate analyses. Ecotoxicology and Environnemental Safety, 26 :333–345,
1993.
[3] J. Estève. Les méthodes d’ordination : éléments pour une discussion. In J.M.
Legay and R. Tomassone, editors, Biométrie et Ecologie, pages 223–250. Société
Française de Biométrie, Paris, 1978.
[4] O. Gaschignard-Fossati. Répartition spatiale des macroinvertébrés benthiques d’un
bras vif du Rhône. Rôle des crues et dynamique saisonnière. PhD thesis, Université Claude Bernard, Lyon 1, 1986.
[5] D.W. Goodall. Objective methods for the classification of vegetation iii. an essay
in the use of factor analysis. Australian Journal of Botany, 2 :304–324, 1954.
[6] D.J. Hand, F. Daly, A.D. Lunn, K.J. McConway, and E. Ostrowski. A handbook
of small data sets. Chapman & Hall, London, 1994.
[7] P. Jolicœur and J.E. Mosimann. Size and shape variation in the painted turtle. a
principal component analysis. Growth, 24 :339–354, 1960.
[8] J. Kervella. Analyse de l’attrait d’un produit : exemple d’une comparaison de lots
de pêches. In 2émes journées européennes Agro-Industrie et Méthodes Statistiques,
pages 103–106. Association pour la Statistique et ses Utilisations, Paris, Nantes
13-14 juin 1991, 1991.
[9] J.M. Lascaux. Analyse de la variabilité morphologique de la truite commune
(Salmo trutta L.) dans les cours d’eau du bassin pyrénéen méditerranéen. PhD
thesis, INP Toulouse, 1996.
[10] R. Prodon and J.D. Lebreton. Breeding avifauna of a mediterranean succession :
the holm oak and cork oak series in the eastern pyrénées. 1 : Analysis and modelling of the structure gradient. Oı̈kos, 37 :21–38, 1981.
[11] P. Usseglio-Polatera and Y. Auda. Influence des facteurs météorologiques sur les
résultats de piégeage lumineux. Annales de Limnologie, 23 :65–79, 1987.
Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 47/47 – Compilé le 2010-11-24
Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf