Analyses en Composantes Principales
Transcription
Analyses en Composantes Principales
Fiche TD avec le logiciel : tdr61 ————— Analyses en Composantes Principales A.B. Dufour & D. Chessel ————— La fiche passe en revue quelques usages de l’analyse en composantes principales sur différents types de tableaux. On rencontre le non centrage, le décentrage, le double centrage autour des tableaux de pourcentages, de notes, de rangs ou de notes d’abondance. Dans cette famille, le cas le plus utilisé est celui de l’ACP normée ou ACP sur matrice de corrélation. Cette pratique est incontournable quand le tableau contient des variables de nature diverse. La variance dépendant des unités, elle n’a pratiquement que la fonction de permettre la normalisation, c’est-à-dire sa propre disparition. Les tableaux homogènes, au contraire comporte dans chaque cellule un nombre comparable au contenu des autres cellules, qu’il s’agisse d’une notation unique d’abondance, une présence-absence, un rang, un pourcentage, etc. L’usage de l’ACP normée peut alors être sans inconvénient ou au contraire obscurcir définitivement l’information. A l’aide d’exemples, la fiche regroupe des cas typiques qui permettra de faire des choix pertinents. Table des matières 1 Décathlon : le signe des corrélations 2 2 Examen : l’origine est un paramètre libre 5 3 Cohérence d’un jury : compromis 8 4 Reconstitution de données : auto-modélisation 12 5 Pourcentages : représenter des moyennes 14 6 Morphométrie et non-centrage 19 7 ACP et classification 21 8 Truites et valeurs propres 26 9 Voir le tableau 35 1 A.B. Dufour & D. Chessel 10 Notes d’abondance 37 11 Ne pas se tromper de centrage 42 12 Information supplémentaire 44 Références 46 1 Décathlon : le signe des corrélations Les données (exemple n˚357 dans [6] d’après Lunn, A. D. & McNeil, D.R. (1991) Computer-Interactive Data Analysis, Wiley, New York) sont dans la librairie. Examiner l’objet olympic : library(ade4) data(olympic) names(olympic) [1] "tab" "score" is.list(olympic) [1] TRUE olympic$score [1] 8488 8399 8328 8306 8286 [17] 7869 7860 7859 7781 7753 [33] 6907 head(olympic$tab) 100 long poid haut 400 1 11.25 7.43 15.48 2.27 48.90 2 10.87 7.45 14.97 1.97 47.71 3 11.18 7.44 14.20 1.97 48.29 4 10.62 7.38 15.02 2.03 49.06 5 11.02 7.43 12.92 1.97 47.44 6 10.83 7.72 13.58 2.12 48.34 dim(olympic$tab) 8272 8216 8189 8180 8167 8143 8114 8093 8083 8036 8021 7745 7743 7623 7579 7517 7505 7422 7310 7237 7231 7016 110 15.13 14.46 14.81 14.72 14.40 14.18 disq perc jave 1500 49.28 4.7 61.32 268.95 44.36 5.1 61.76 273.02 43.66 5.2 64.16 263.20 44.80 4.9 64.04 285.11 41.20 5.2 57.46 256.64 43.06 4.9 52.18 274.07 [1] 33 10 boxplot(as.data.frame(scale(olympic$tab))) 3 ● −2 −1 0 1 2 ● ● −3 ● ● ● ● ● 100 long haut 400 110 disq jave pca1 <- dudi.pca(olympic$tab, scannf = FALSE) On sélectionne le nombre d’axes à partir du graphe des valeurs propres Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 2/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel 0.0 0.5 1.0 1.5 2.0 2.5 3.0 barplot(pca1$eig) par(mfrow = c(1, 2)) s.corcircle(pca1$co) olympic2 <- olympic$tab olympic2[, c(1, 5, 6, 10)] = -olympic2[, c(1, 5, 6, 10)] pca2 <- dudi.pca(olympic2, scan = F) s.corcircle(pca2$co) disq poid jave disq poid jave X1500 X400 perc haut X100 X110 long perc haut X110 long X100 X400 X1500 L’image de gauche est particulièrement trompeuse ! Elle est statistiquement juste et expérimentalement fausse. La performance des athlètes augmente avec les distances des lancers, la hauteur et la longueur des sauts, elle décroı̂t avec le temps des courses. L’image de droite est mathématiquement équivalente et expérimentalement correcte. plot(olympic$score, pca1$l1[, 1]) abline(lm(pca1$l1[, 1] ~ olympic$score)) Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 3/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel 2 ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● 0 pca1$l1[, 1] ● ● ● ●● ● ● ● ● −1 ● ● ●● ● ● ●● ● 7000 7500 ● 8000 8500 olympic$score Commenter. s.label(pca1$l1, clab = 0.5) s.arrow(2 * pca1$co, add.p = T) d=1 17 poid disq 18 X1500 jave 31 28 X400 10 11 20 1 7 perc X100 4 23 9 haut 2 3 X110 16 8 21 22 12 15 14 long 30 32 24 25 6 5 13 26 19 27 29 33 Ces deux figures sont des doubles représentations euclidiennes des nuages et des bases canoniques. La fonction générique scatter pour ce type d’analyse retient cette propriété. par(mfrow = c(1, 2)) s.label(pca1$li, clab = 0.5) s.arrow(5 * pca1$c1, add.p = T) scatter(pca1) scatter(pca2) Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 4/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel d=2 17 Eigenvalues poiddisq d=2 17 X1500 jave X400 disq poid jave 10 111 7 4 perc 9 haut 2 3 8 long X1500 18 18 31 X400 perc 20 23 16 12 15 21 22 14 4 X100 X110 30 haut 2 32 5 long X110 23 16 12 15 21 22 24 30 32 25 5 13 26 29 19 33 27 33 27 2 38 X100 20 6 26 29 19 10 111 7 9 14 24 25 6 13 31 28 28 Examen : l’origine est un paramètre libre deug est une liste à trois composantes : $tab un data frame avec 104 lignes-étudiants et 9 colonnes-disciplines académiques $result un facteur donnant une synthèse du résultat à l’examen (A+, A, B, B-, C-, D) pour les 104 étudiants $cent un vecteur contenant la moyenne théorique pour chacune des disciplines étudiées, ceci en fonction de leur coefficient. data(deug) names(deug) [1] "tab" "result" "cent" deug$result [1] C- B B A B C- B B [27] C- B B B B B B B [53] A+ A A C- B A C- A [79] A C- C- C- B B B A Levels: D A A+ C- B- B names(deug$tab) [1] "Algebra" "Analysis" [7] "Option2" "English" D B B B A CB A B D B A "Proba" "Sport" B D CD A D CD CB D B- B B BC- B B CC- A A CD D B A A B CB B A+ B D B A B A B "Informatic" "Economy" B B B B B B B B B B CB B B CB B D CB "Option1" Exécuter et interpréter une analyse en composantes principales normée et une analyse en composantes principales décentrée. args(dudi.pca) function (df, row.w = rep(1, nrow(df))/nrow(df), col.w = rep(1, ncol(df)), center = TRUE, scale = TRUE, scannf = TRUE, nf = 2) NULL pcanor <- dudi.pca(deug$tab, scann = F) Définir et interpréter le graphe canonique. plotreg <- function(x, y) { par(mar = c(2, 2, 2, 2)) plot(y, x, pch = 20) grid(lty = 1) abline(lm(x ~ y)) mtext(paste("r = ", round(cor(x, y), digits = 3), sep = "")) } Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 5/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel par(mfrow = c(3, 3)) apply(deug$tab, 2, plotreg, y = pcanor$l1[, 1]) NULL pcanor$co Comp1 Comp2 Algebra -0.7924753 0.09239958 Analysis -0.6531896 0.46263515 Proba -0.7410261 0.24213427 Informatic -0.5287294 0.28126036 Economy -0.5538660 -0.62678201 Option1 -0.7416171 0.01601705 Option2 -0.3336153 -0.37599247 English -0.2755026 -0.66982537 Sport -0.4171874 -0.13974793 ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ●●● ●●● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● 3 −2 −1 ● ● ● ● ● ● 30 ● ● 50 ● 0 1 2 ● ● ● 3 −2 −1 0 1 30 ● ● ● 2 ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●●●● ● ● ●●● ●● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ●● ● ● ● ● ●●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● 3 −2 25 ● ● ● ● 15 10 ●● ● −1 0 1 2 3 r = −0.417 y ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ●● ● ● ● ● ●●● ●●● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 30 25 ● ● ● r = −0.276 y ● ● ● ●●● ● ● ● ● ● ● ● ●● ●●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ●●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● 3 ● ● r = −0.334 y ● ● ● ● −1 2 r = −0.742 y 35 ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ●● ●● ● ● ● ●● ●●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● 1 ●● ● ● 40 20 10 ● 0 ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ●● ● ●● ●● ●● ●●● ● ●● ● ● ●● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● −2 −1 0 1 2 ● 3 −2 −1 0 1 2 3 0 ● 5 ● ● 10 20 ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ●● ● ●●● ● ● ● ● ●● ● ● ●● ●● ● ● ● ●● ● ●●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● −2 15 ● 70 ● x ● ● ● ● ● ● −1 ● ● 25 ● x 40 ● ● −2 15 ● ● 3 30 80 ● ● 2 ● ● 60 50 ● ● ● 1 r = −0.554 y 90 r = −0.529 y 0 ● ● 10 2 15 1 10 0 ● ● 5 −1 ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ●● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ●● ●● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ●●●● ● ● ● ●● ● ● ● ● ●●●● ● ● ● ●● ●● ● ● ● ● ● ● ● x −2 ● ● 0 ● x ● 10 20 30 40 50 60 ●● ● 20 50 ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ●●● ●● ●● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ●●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● x ● ● ● 40 ● r = −0.741 ● ● ● ● 30 ● r = −0.653 ● ● x 10 20 30 40 50 60 70 r = −0.792 ● ● −2 −1 ● ●● ● ●● ●● ●● ● ● ● ● ●● ● 0 1 2 ● ● 3 La fonction générique score utilise ce point de vue : score(pcanor) Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 6/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf Algebra Analysis ● ● ● ● 50 −2 10 ● ● 90 y 3 ● ● ● ● ● ● 25 y 70 ● ● ● 2 3 ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●●●● ● ● ●●● ●● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ●● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 ● −2 ● ● ● ●●● ● ● ● ● ● ● ● ●● ●●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● 3 −1 0 English ● ● ● ● ● 15 −1 Sport ● 0 1 2 score ●● ● ● 3 ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ●● ● ● ● ● ●●● ●●● ● ● ● ● ●● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 ● ● ● ● ● ● ● ● 3 ● ●● ● ● 2 score ● ● ● 1 15 score 1 ● ● ● 10 25 ● 0 score ● ● y Option2 ● 1 ● ● ● ● 30 0 −1 Option1 ●● ● −1 −2 ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ●● ●● ● ● ● ●● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● 0 2 ● ●● ● ● ● ● ● ●● ● ● 25 ● ●● ● ● ● ●●● ●● ●● ● ●● ●● ●● ●●● ● ●● ● ● ●● ●●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● 10 ● ● ● ● ● ● 10 ● 5 ● ● −2 −1 0 1 0 30 y ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●●● ● ● ● ● ●● ● ● ●● ●● ● ● ● ●● ● ●●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● −2 20 ● 1 ● ● ● ● ● ● score 50 20 ●● ● ● ● 40 ● ● ● ● ● ● ● ● 30 y ● y 40 ● ● ● ● ● ● 80 ● ● ● ● 0 ● ● ● ● −1 Economy ● 35 3 ● 20 2 ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ●● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ●● ●● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ●●● ● ● ● ●●● ● ●●●● ● ● ● ● ● ● ●● ●● ● 15 1 ● 10 0 score ● ● ● 5 −1 Informatic ● 30 40 y 20 ● 60 50 ● ● ● y ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ●●● ●●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● −2 15 ●● ● ● Proba ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ●●● ●● ● ● ● ● ●●● ● ● ●● ● ● ●● ●● ● ●●● ● ● ● ●● ● ●● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● 30 ● 10 20 30 40 50 60 ● ● ● ● ● 20 y 10 20 30 40 50 60 70 A.B. Dufour & D. Chessel ● 2 3 −2 −1 0 1 2 3 ● −2 −1 ● ●●● ●● ●● ●● ● ● ● ● ●● ● 0 1 2 ● ● 3 pca1 <- dudi.pca(deug$tab, scal = FALSE, center = deug$cent, scan = FALSE) s.class(pca1$li, deug$result) s.arrow(50 * pca1$c1, add.plot = T, clab = 1.5) d = 20 ● ● A+ ● ● ● ● ● ● ● Algebra A● ● ● Analysis ● Proba Economy ● ● ● ● ● ● ● Option1 Option2 English ● ● ● ● Informatic ● ● ● ● ● ● ● ● ● Sport ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● B ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● B− ● C− ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● D ● ● ● L’origine n’est plus le centre de gravité du nuage. On y gagne la distinction entre les matières qui font la forme du nuage et celles qui donnent sa position. On retrouve cet aspect dans les données ’seconde’. data(seconde) pca2 <- dudi.pca(seconde, center = rep(10, 8), scale = F, scan = FALSE) s.label(pca2$li) s.arrow(10 * pca2$c1, add.plot = T) Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 7/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel d=5 19 21 4 1 1814 11 2 ANGLBIOL FRAN MATH 17 13 20 5 PHYS 12 10 6 7 3 ESPA ECON HGEO 16 9 15 22 8 Cette figure ne modifie pas la notion de compromis (toutes les matières valorisent peu ou prou les bons élèves) mais souligne que Maths et Physique excluent d’abord (les élèves en haut à droite) et que le professeur d’espagnol est d’une bienveillance évidente. Ce qui diffère sensiblement de l’ACP normée qui mettait en évidence l’originalité et la solidarité des professeurs de langue : scatter(dudi.pca(seconde, scan = F), clab.r = 1) Eigenvalues d=2 ESPA ANGL 9 11 15 8 3 FRAN HGEO ECON 1 6 4 19 22 MATH 517 13 20 16 PHYS 10 2 7 21 14 18 12 BIOL Quand un vecteur de valeurs définit un point signifiant de l’espace devant servir de référence, il convient d’en faire l’origine. 3 Cohérence d’un jury : compromis S’il s’agit de mesurer la cohérence du jury, d’exprimer un compromis entre jugements, un choix collectif (ou plusieurs tendances regroupant des parties du jury), de mettre en évidence la ressemblance entre juges, les juges sont en colonnes dans une ACP normée. Les moyennes sont toutes égales, les variances aussi, la normalisation n’est ni nécessaire, ni nuisible. On peut l’utiliser pour tracer les cercles de corrélation. Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 8/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel 25 juges classent 8 bouteilles dans le tour final du concours d’une grande foire : data(macon) macon a b c d e f A 5 5 4 3 3 4 B 4 8 2 4 1 5 C 2 6 1 1 6 2 D 6 7 5 8 2 6 E 1 4 3 2 7 1 F 3 2 8 6 5 8 G 7 1 6 5 4 7 H 8 3 7 7 8 3 g 7 2 1 8 6 3 4 5 h 2 7 5 8 4 3 1 6 i 1 8 5 6 3 4 7 2 j 3 8 4 6 1 7 5 2 k 5 1 3 6 2 8 7 4 l 4 6 7 5 8 1 3 2 m 4 3 2 6 1 5 8 7 n 5 7 2 6 1 8 3 4 o 4 8 6 3 1 7 2 5 p 8 5 2 6 3 4 7 1 q 5 7 1 8 2 4 3 6 r 7 8 6 1 2 3 5 4 s 8 1 2 7 6 3 4 5 t 5 4 1 6 2 8 7 3 u 4 1 2 7 8 6 3 5 v 6 5 1 4 2 8 7 3 w 7 4 2 1 8 6 3 5 x 2 4 5 6 1 7 8 3 y 8 6 1 7 2 3 5 4 16 juges ont rangé par ordre de préférence 28 lots de fruits [8]. data(fruits) names(fruits) [1] "type" "jug" fruits$jug J1 J2 1.nec 10 5 2.nec 3 1 3.pea 5 11 4.pea 6 12 5.pea 4 2 6.nec 2 6 7.pea 17 9 8.nec 22 20 9.nec 7 7 10.nec 14 18 11.nec 1 8 12.nec 9 3 13.nec 12 10 14.pea 11 14 15.nec 24 22 16.nec 15 4 17.pea 20 17 18.nec 13 15 19.pea 19 19 20.pea 23 27 21.pea 16 16 22.nec 18 21 23.nec 8 13 24.pea 21 24 25.pea 28 25 26.pea 27 23 27.pea 26 28 28.pea 25 26 J3 8 9 5 3 4 16 1 12 14 24 20 25 19 7 13 21 10 15 2 6 17 26 28 23 11 18 27 22 "var" J4 3 8 2 4 14 10 5 6 16 27 15 12 13 7 1 19 17 23 9 11 22 18 28 20 26 21 24 25 J5 1 6 8 4 17 13 9 26 23 15 28 11 25 3 2 18 21 27 19 24 16 10 20 14 7 22 5 12 J6 18 16 8 7 10 2 24 23 17 1 15 13 22 21 28 3 9 14 6 25 5 20 19 4 11 27 12 26 J7 5 8 18 17 16 11 23 2 6 14 13 7 4 15 1 9 20 12 28 19 21 3 10 22 24 27 25 26 J8 17 10 3 2 1 5 19 7 21 6 11 12 18 23 16 8 9 13 14 20 4 27 24 15 26 22 28 25 J9 J10 J11 J12 J13 J14 J15 J16 3 4 1 2 5 3 1 1 2 1 8 8 9 5 6 4 4 15 14 4 7 1 3 13 1 16 13 7 3 8 4 14 5 19 21 13 6 2 5 15 13 8 10 14 10 9 15 8 23 18 6 3 8 14 7 5 8 14 3 11 1 11 22 2 9 2 2 16 20 13 11 17 11 3 11 6 19 6 10 10 10 11 7 22 15 7 8 11 14 7 9 17 16 10 14 9 6 5 17 15 11 12 18 3 17 17 12 10 4 4 9 24 7 13 15 1 27 16 2 20 12 10 16 19 21 20 26 7 20 22 4 12 2 17 20 18 16 9 5 23 14 15 25 12 21 21 18 9 12 22 21 19 25 25 23 5 17 19 12 6 19 20 19 20 13 18 23 16 15 12 22 18 25 21 27 21 18 6 20 24 24 27 28 22 22 23 25 21 18 28 16 25 24 26 26 27 22 25 17 27 28 24 28 25 23 23 13 23 27 28 24 28 28 24 19 28 26 27 27 26 26 26 24 26 51 étudiants de la filière biomathématique ont exprimé leur préférence sur 10 groupes de musique. Les données sont dans l’objet rankrock. data(rankrock) rankrock X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 Metallica 6 7 4 7 7 10 9 3 10 1 7 3 7 1 7 6 8 8 10 6 Guns.n.Roses 10 9 9 8 5 8 10 2 5 9 3 6 2 7 9 4 5 6 8 9 Nirvana 5 8 8 4 2 6 6 9 3 2 8 2 6 6 3 5 4 4 9 5 AC.DC 9 4 2 6 6 9 7 8 4 6 6 7 5 2 8 7 9 9 5 8 Noir.Desir 2 6 3 1 1 3 2 1 2 10 5 4 9 5 1 2 1 2 3 3 U2 7 2 7 3 8 4 5 4 1 4 2 1 4 8 2 1 2 7 6 7 Pink.Floyd 1 3 5 5 4 1 1 6 6 5 4 10 3 4 4 3 3 1 1 1 Led.Zeppelin 3 1 1 9 9 2 4 5 8 8 9 9 1 3 5 8 6 5 4 4 Deep.Purple 4 5 6 10 10 5 3 7 9 7 10 8 8 9 6 9 7 3 2 2 Bon.Jovi 8 10 10 2 3 7 8 10 7 3 1 5 10 10 10 10 10 10 7 10 X21 X22 X23 X24 X25 X26 X27 X28 X29 X30 X31 X32 X33 X34 X35 X36 X37 X38 Metallica 9 9 10 9 9 5 10 9 4 6 9 1 1 8 9 6 6 6 Guns.n.Roses 5 6 9 8 8 7 8 8 7 5 5 2 5 1 6 5 9 4 Nirvana 4 7 2 6 7 3 7 2 8 4 3 3 4 6 5 7 4 5 AC.DC 8 10 8 10 10 6 9 7 1 7 10 6 8 9 10 10 7 7 Noir.Desir 3 3 1 2 2 8 2 10 9 10 4 4 2 4 4 3 1 3 U2 1 2 3 1 1 1 1 5 6 2 1 9 3 2 1 2 3 1 Pink.Floyd 2 1 4 5 6 2 3 1 3 1 2 7 6 5 2 1 2 2 Led.Zeppelin 6 8 5 3 5 9 6 6 2 3 8 5 7 7 7 8 10 10 Deep.Purple 7 5 7 4 4 10 5 3 5 8 6 10 9 10 3 4 5 9 Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 9/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel Bon.Jovi 10 4 6 7 3 4 4 4 10 9 7 8 10 X39 X40 X41 X42 X43 X44 X45 X46 X47 X48 X49 X50 X51 Metallica 10 6 6 4 6 9 7 8 9 9 8 8 4 Guns.n.Roses 6 5 2 5 5 4 8 9 8 6 9 9 6 Nirvana 5 3 5 7 4 1 6 3 3 5 7 6 2 AC.DC 9 7 10 8 8 10 9 7 1 7 6 7 8 Noir.Desir 2 4 4 6 7 6 5 1 4 4 3 4 5 U2 3 1 7 1 1 5 4 5 2 1 1 1 1 Pink.Floyd 1 2 1 2 3 2 2 2 7 3 2 2 3 Led.Zeppelin 8 8 8 9 9 8 1 4 5 2 5 5 7 Deep.Purple 4 9 9 10 10 7 3 6 6 8 4 3 9 Bon.Jovi 7 10 3 3 2 3 10 10 10 10 10 10 10 3 8 9 8 Chacune des colonnes donne le rang (1 pour le préféré, . . . , 10 pour le moins apprécié) qu’un étudiant attribue à chaque groupe. par(mfrow = c(2, 2)) data(macon) s.corcircle(dudi.pca(macon, scan = F)$co) data(fruits) s.corcircle(dudi.pca(fruits$jug, scan = F)$co) data(rankrock) s.corcircle(dudi.pca(rankrock, scan = F)$co) Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 10/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf 8 A.B. Dufour & D. Chessel j i o b r J10 J7 h l n x q py J1 J2 f J8 v t a J6 J9J11 J16 J14 dm J13 k J12 J4J15 c w e J5 g J3 us X3 x1 X29 X2 X45 X20 X19 X7 X1 X50 X49 X18 X46 X6 X14 X47 X48 X24 X15 X35 X36 X21 X17 X23 X39 X37 X25 X27 x1 X13 X30X28 X8 X51 X33 X16 X22 X40 X31 X9 X38 X44X41X5 X26 X12 X4 X42 X34 X43 X11 X32 X10 par(mfrow = c(1, 2)) s.corcircle(dudi.pca(fruits$jug, scan = F)$co) s.class(dudi.pca(fruits$jug, scan = F)$li, fruits$type) d=2 ● J10 J7 J1 J2 J9 J16 J11 J8 J14 ● ● ● peche ● ● J6 ●● ● ● ● ● ● ● ● ● ● J13 J12 J4 J15 necta ● ● ● ● ● ● ●● ● ● J5 ● J3 Quelle est la particularité des juges 7 et 10 ? Indiquer comment on relie ces deux figures. Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 11/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel S’il s’agit de faire une typologie des juges, de mettre en évidence ce qui les opposent, de montrer qu’il existe plusieurs types de jugements, les juges sont en lignes dans une ACP centrée. On laissera ainsi dominer dans l’analyse les produits qui ont reçu les appréciations les plus variables. Les deux approches sont antinomiques. par(mfrow = c(2, 2)) s.arrow(dudi.pca(t(macon), scan = F)$li) s.arrow(dudi.pca(t(fruits$jug), scan = F)$li) s.arrow(dudi.pca(t(rankrock), scan = F)$li) d=1 g d=2 J6 s u w e J8 c y a q d m k l h p b t v J1 J13 r n J2 f o J11 J16 J9 J10 x J7 i j J5 J3 J14 J12 J4 J15 d=1 X14 X29 X3 X32 X10 X33 X13 X47 X2 X8 X30 X12 X51 X5 X26 X9X40 X16 X11 X42X4 X38 X43 X34 X41 X44 X48 X15 X28 X37 X17 X21 X23 X36 X25 X35 X27 X31 X39 X22 X20 X45 X46 X1 X19 X49 X50 X7 X18 X6 X24 On remarquera, que derrière le compromis, s’il est suffisant pour définir le premier axe, les premières analyses définissent aussi les contradictions entre jugements, mais les secondes sont plus claires. 4 Reconstitution de données : auto-modélisation Dans la thèse [1] de G. Carrel, les données portent sur 15 variables physicochimiques mesurées en une station au cours de l’année 1983-1984 à 39 reprises. Ces 15 variables sont : Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 12/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel data(rhone) names(rhone$tab) [1] "air.temp" "wat.temp" [7] "caco3" "totca" [13] "suspension" "organique" "conduc" "mg" "chloro" "pH" "so4" "oxygen" "no2" "secchi" "hco3" air.temp Température de l’air (˚C ) mg Magnésium (mg/l Mg++) wat.temp Température de l’eau (˚C) so4 Sulfates (mg/l x10) conduc Conductivité (mS/cm) no2 Azote nitrique (mb/l) pH potentiel Hyrogène (pH) hco3 TAC (mg/l HCO3-) oxygen Saturation en oxygène (%) suspension Mat. en suspension (mg/l) secchi Transparence (cm) organique Mat. organique (mg/l) caco3 Dureté totale (mg/l CaCO3) chloro Chlorophyle a (mg/l) totca Dureté calcique (mg/l Ca++) Automodéliser la chronique : dd1 <- dudi.pca(rhone$tab, nf = 2, scann = F) rh1 <- reconst(dd1, 1) rh2 <- reconst(dd1, 2) par(mfrow = c(4, 4)) par(mar = c(2.6, 2.6, 1.1, 1.1)) for (i in 1:15) { plot(rhone$date, rhone$tab[, i], pch = 20) lines(rhone$date, rh1[, i], lty = 2) lines(rhone$date, rh2[, i], lty = 1) } Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 13/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf ● ●● ●●● ● ● ● ● ● ● ● ● ● ●● 50 ●● ●● ●● ● ● ● ● ●● ● ● 150 250 350 ● ● ● ●● ● ●● ● ● ● ● ●●● ● ● ● ● ●● ●● ● 50 rhone$date ● ● ● ● ● 50 50 0.6 ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● 150 250 350 ● ● 50 8.2 8.0 7.8 ● 150 250 350 ● 70 ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● 150 250 350 rhone$date ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● 150 250 350 ● ● ●● ● ● ● ● ● ●● ● ● 50 rhone$date ● ● ●● ● ● ●● ● 50 150 250 350 rhone$date ● ● ● ● ● 150 250 350 50 ● ●● ●● ● ●● ● ●●●● ● ● ●● ● ●●● ●● ● ● ● ●● ● 150 250 350 8 ● 6 ● ● ● ● ● ● ● 50 ● ●● ● ● 4 rhone$tab[, i] 10 ● ● ● 2 15 ● ● 5 rhone$tab[, i] ● ● ● ●●●● ●● ● ●● ●●●●●●● ●●● ●●● ●● ● ●● ● ● ● rhone$date● 0 ● ● ● ● ● ● ● 200 100 ●● ● ● ● ●● ●●● ● ● ●● 100 ● 150 250 350 rhone$date ●● ● ● ● rhone$tab[, i] ● rhone$tab[, i] ● ● ● 50 ●● ● ● ●● ● ● ● ●● ● ●● ● ● 0.2 45 ●● ● ● ● ● ● ●● ● ● ●● 0 ● ● 35 ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● 3 5 ● ● ● ● ● 25 ● rhone$tab[, i] ● ● 50 150 250 350 ● ● ● ● ● ● 7.6 ● ● ●● ● ● rhone$date ●● ● ● ● ● ● ● ● ● 150 250 350 ● rhone$date ● ● 9 50 ● ● 60 150 ● ● ● 150 250 350 rhone$date ● ● ● rhone$tab[, i] ● ● ● ● ● ●● ●● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● 120 ● ● ● ● ●● ● ● 50 ● rhone$tab[, i] ● ● ●● ● ● ● rhone$date ● ● ● ●● ● 50 5 250 100 110 90 ● ● ● ●● 80 rhone$tab[, i] ●●● ● ● ● ● ● ● ● ● ● 7 150 250 350 ● rhone$date ● ● 50 rhone$tab[, i] 50 ● ● ●● ● ● ● ● ● ● ● 0 ● ● 150 250 350 rhone$date ● 5 ● ● 50 rhone$tab[, i] ● ● ● ● 40 ● ● ● ● ● ●●● ●● ●● ● ●● ● ● ●●● ● ● ●● ●● ●●●●●● ● ● ● ● ● 180 ● ● ● ● ● 140 ● ● rhone$tab[, i] ● ●● ● 5 ● ● ●● ● ● ●● rhone$tab[, i] ● ● ● ●●●● rhone$tab[, i] ● ●● ●● ● 220 260 300 340 ● ● ● ● 200 ●● ● ●● ●●● ● ● 160 ● ● ● ● ●● 10 15 20 15 ● ●● rhone$tab[, i] ● ●● ● ●● ● −5 rhone$tab[, i] 25 A.B. Dufour & D. Chessel ● ● ●● ● ●● ●● ● ● ● ● ●● ● 150 250 350 Pourcentages : représenter des moyennes Un tableau X est un tableau de fréquences si la somme des valeurs par ligne ou la somme des valeurs par colonne vaut l’unité. On notera alors xij = fj/i dans le premier cas et xij = fi/j dans le second. Si on hésite, c’est qu’on est sur un problème d’analyse des correspondances (AFC). data(granulo) names(granulo) [1] "tab" "born" head(granulo$tab) V1 V2 V3 V4 V5 1 0 2 1 15 260 2 0 0 12 340 1220 3 0 0 0 12 1505 4 0 0 185 440 330 5 0 2 1 8 400 6 0 0 0 0 115 granulo$born [1] 0.3 0.5 V6 810 2095 1960 335 1480 1810 1.0 V7 1815 2950 4415 725 4935 6845 2.0 V8 V9 1990 0 2050 1160 1265 260 1195 2210 2900 0 4030 280 4.0 8.0 16.0 32.0 64.0 128.0 Ce tableau comporte 49 lignes - échantillons et 9 colonnes - classes de diamètres [4]. L’échantillon 32 a fourni 2 grammes de grains ayant un diamètre compris entre 0.3 et 0.5 mm (sable fin), . . . , 293 grammes de grains compris entre 64 et 128 mm (gros Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 14/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel galets). Le poids total de sédiments récoltés par la drague n’est que le résultat du hasard et permet de calculer un profil par lignes : grapc <- t(apply(granulo$tab, 1, function(x) x/sum(x))) round(head(grapc), dig = 2) V1 V2 V3 V4 V5 V6 V7 V8 V9 1 0 0 0.00 0.00 0.05 0.17 0.37 0.41 0.00 2 0 0 0.00 0.03 0.12 0.21 0.30 0.21 0.12 3 0 0 0.00 0.00 0.16 0.21 0.47 0.13 0.03 4 0 0 0.03 0.08 0.06 0.06 0.13 0.22 0.41 5 0 0 0.00 0.00 0.04 0.15 0.51 0.30 0.00 6 0 0 0.00 0.00 0.01 0.14 0.52 0.31 0.02 grapc = data.frame(grapc) Chaque cellule contient une fréquence et le tableau est homogène. par(mfrow = c(2, 2)) pcaa <- dudi.pca(grapc, scann = F) barplot(pcaa$eig) s.corcircle(pcaa$co) pcab <- dudi.pca(grapc, scal = F, scann = F) barplot(pcab$eig) s.arrow(pcab$co) V5 V6 V2 V3 V1 V9 0.0 1.0 2.0 3.0 V4 V7 V8 d = 0.05 V7 0.015 0.030 x1 V8 V6 V1 V2 V3 0.000 V5 V4 V9 Les variances par colonne sont très différentes : Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 15/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel apply(grapc, 2, var) V1 V2 V3 V4 V5 V6 5.047118e-07 6.827175e-06 9.624694e-05 2.187413e-03 6.208129e-03 1.333202e-02 V7 V8 V9 1.382115e-02 1.952539e-02 1.143872e-02 mais ce n’est pas une raison pour s’en débarrasser. La première classe joue dans l’ACP normée un rôle disproportionné à son importance expérimentale. On peut regrouper les cinq premières classes. c1 <- apply(grapc[, 1:5], 1, sum) granou <- cbind.data.frame(c1, grapc[, 6:9]) round(head(granou), dig = 2) c1 V6 V7 V8 V9 1 0.06 0.17 0.37 0.41 0.00 2 0.16 0.21 0.30 0.21 0.12 3 0.16 0.21 0.47 0.13 0.03 4 0.18 0.06 0.13 0.22 0.41 5 0.04 0.15 0.51 0.30 0.00 6 0.01 0.14 0.52 0.31 0.02 Pour une première approche, rapide et approximative : par(mfrow = c(1, 1)) scatter(dudi.pca(granou, scal = F, scan = F)) 20 23 49 27 21 42 29 14 28 13 d = 0.2 V7 Eigenvalues 41 3 V6 5 48 34 43 36 22 38 35 16 15 44 26 30 37 19 31 40 6 47 1248 45 9 12 17 10 V8 2 33 c1 32 V9 39 25 18 11 7 4 46 On a perdu une valeur propre (démontrer que la dernière est toujours nulle). L’analyse indique que la position des variables est celle de la représentation triangulaire étendue à 5 dimensions. x = cos((1:5) * 2 * pi/5) y = sin((1:5) * 2 * pi/5) xy = cbind.data.frame(x, y) row.names(xy) = c("]0.3,8]", "]8,16]", "]16,32]", "]32,64]", "]64,128]") xy x y ]0.3,8] 0.309017 9.510565e-01 ]8,16] -0.809017 5.877853e-01 ]16,32] -0.809017 -5.877853e-01 ]32,64] 0.309017 -9.510565e-01 ]64,128] 1.000000 -2.449294e-16 Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 16/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel s.corcircle(xy, grid = F) s.distri(xy, data.frame(t(granou)), add.p = T, axesell = T, csta = 0.2, clab = 0.75, cell = 0) ]0.3,8] ● X7 ]8,16] ● X14 X29 X42 X28 X21 X13 X30 ● X11 X20 X27 X3 X19 X44 X31 X26 X40 X15 X12 X48 X36 X22 X23 X43 X16 X35 X34 X5 X38 X1 X24 X33 X6X47X8 X9 X45 X17X10 X41 X49 ]16,32] ● ]64,128] X4X46 X37 X2 X39 X25 X18 X32 ● ]32,64] names(granou) <- row.names(xy) xy.pca <- dudi.pca(granou, scal = F, scan = F)$c1 s.arrow(xy.pca, grid = F, lab = names(granou)) s.distri(xy.pca, data.frame(t(granou)), add.p = T, axesell = T, csta = 0.2, clab = 0.75, cell = 0) ]16,32] ● X21 X13 ]8,16] ● X42 X29 X14X28 X20 X23X5X6 X27 X49 X48X34 X47 X3 X41 X15X43 X36 X22 X38 X35 X16 X45 X24 X8 X1 X9 X17 X44 X26 X12 X30 X31 X10 X37 X19 X40 X2 X33 X32 X39 X25 X11 X18 ● ]32,64] X7 X4 X46 ]0.3,8] ● ● ]64,128] L’image naı̈ve (pas tant que ça ?) contient presque la même information que l’image optimale. Les structures des tableaux de pourcentages sont très particulières et supportent mal une normalisation indésirable. Il est logique de privilégier l’expression d’un point au centre de gravité de sa distribution plutôt que de laisser faire le centrage. Mais ce centrage est nécessaire pour éviter l’expression absurde de l’évidence ”les données sont positives”. La représentation triangulaire s’impose pour trois variables. Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 17/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel data(housetasks) hc <- t(apply(housetasks, 1, function(x) x/sum(x))) round(hc, dig = 2) Wife Alternating Husband Jointly Laundry 0.89 0.08 0.01 0.02 Main_meal 0.81 0.13 0.03 0.03 Dinner 0.71 0.10 0.06 0.12 Breakfeast 0.59 0.26 0.11 0.05 Tidying 0.43 0.09 0.01 0.47 Dishes 0.28 0.21 0.04 0.47 Shopping 0.28 0.19 0.08 0.46 Official 0.12 0.48 0.24 0.16 Driving 0.07 0.37 0.54 0.02 Finances 0.12 0.12 0.19 0.58 Insurance 0.06 0.01 0.38 0.55 Repairs 0.00 0.02 0.97 0.01 Holidays 0.00 0.01 0.04 0.96 hc <- data.frame(hc) hc.pca <- dudi.pca(hc, scal = F, scan = F)$c1 s.arrow(hc.pca, grid = F, lab = names(hc)) s.distri(hc.pca, data.frame(t(hc)), add.p = T, axesell = T, csta = 0.2, clab = 0.75, cell = 0) Jointly ● Holidays Tidying Dishes Shopping Finances Insurance Dinner Laundry Main_meal ● Wife Breakfeast Official ● Alternating Driving Repairs ● Husband A Madame, la cuisine, à Monsieur la voiture, au couple le reste. Groupons ”ensemble” et ”chacun à son tour”, la représentation triangulaire suffira : tri <- cbind.data.frame(hc$Wife, hc$Husband, hc$Jointly + hc$Alternating) names(tri) <- c("Wife", "Husband", "Both") triangle.plot(tri, show = F, clab = 1, label = row.names(hc)) Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 18/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel 0 1 ● Holidays Finances ● Dishes ● Shopping ● Official ● ● Tidying ● Insurance Wife Both ● Driving ● Breakfeast ● Dinner ● Main_meal ● Laundry 1 ● Repairs 0 1 Husband 0 Une dissymétrie certaine. Ci-après, l’analyse des correspondances. row.names(tri) <- row.names(hc) scatter(dudi.coa(tri, scann = F)) d = 0.5 Eigenvalues Holidays Both Finances Dishes Shopping Official Insurance Tidying Driving Breakfeast Dinner Husband Wife Main_meal Laundry Repairs En indiquant comment, pour une tâche donnée, les couples se répartissent, on indique clairement une intention. 6 Morphométrie et non-centrage C’est souvent un problème de morphométrie. Dans tortues, les variables sont les trois dimensions de carapaces de tortues mesurées en mm [7]. data(tortues) ttaille <- tortues[, 1:3] tsexe <- tortues[, 4] names(ttaille) Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 19/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel [1] "long" "larg" "haut" dudi1 <- dudi.pca(ttaille, scan = F) plot(dudi1$l1[, 1], tortues$long, ylim = c(30, 180), xlab = "Composante principale", ylab = "Variables") text(-1, 130, "Longueur") points(dudi1$l1[, 1], tortues$haut) text(1.5, 70, "Hauteur") points(dudi1$l1[, 1], tortues$larg) text(0.7, 115, "Largeur") abline(lm(tortues$larg ~ dudi1$l1[, 1])) abline(lm(tortues$long ~ dudi1$l1[, 1])) abline(lm(tortues$haut ~ dudi1$l1[, 1])) ● 150 ● ● ● ●● ● ●● ● ● Longueur ●● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ●● Largeur ●●● ●●● 100 Variables ● ● ● ● ●● ●●●● ● ●● ●● ● ● ● ● ● ● ●● ●●● ● ● ● ● ●● ● ●● ●●● ●● ● ● ●● ● ●● ●● ● ●● ● Hauteur ● 50 ● ● ● ● ● ●●● ● ●● ●● ● ● ● −2 −1 ● ●● ●●● ● ● ● ● ●● ● ● ●●● ●● ●● ● ● ● ●● 0 ● ●● 1 Composante principale L’ACP joue ici son rôle de recherche de variable latente. La composante principale prédit les trois variables. C’est une explicative cachée. Elle représente la taille théorique de la tortue. C’est la variable cachée qui prédit au mieux les autres, c’est aussi la variable cachée qui est le mieux prédite par toutes les autres. s.corcircle(dudi1$co) haut long larg Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 20/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel On dit que c’est un effet ”taille”. coefficients(lm(tortues$larg ~ dudi1$l1[, 1])) (Intercept) dudi1$l1[, 1] 95.50000 -12.41816 coefficients(lm(tortues$long ~ dudi1$l1[, 1])) (Intercept) dudi1$l1[, 1] 124.68750 -20.09136 coefficients(lm(tortues$haut ~ dudi1$l1[, 1])) (Intercept) dudi1$l1[, 1] 46.145833 -8.120671 apply(ttaille, 2, mean) long larg haut 124.68750 95.50000 46.14583 Si on veut une variable latente qui fait des prédictions nulles pour une valeur nulle (régression par l’origine), on fait une ACP non centrée. dudi2 <- dudi.pca(ttaille, scan = F, cent = F, scal = F) plot(dudi2$l1[, 1], tortues$long, ylim = c(30, 180), xlab = "Composante principale", ylab = "Variables") text(-1.1, 170, "Longueur") points(dudi2$l1[, 1], tortues$haut) text(-1.1, 70, "Hauteur") points(dudi2$l1[, 1], tortues$larg) text(-1.1, 125, "Largeur") abline(lm(tortues$larg ~ -1 + dudi2$l1[, 1])) abline(lm(tortues$long ~ -1 + dudi2$l1[, 1])) abline(lm(tortues$haut ~ -1 + dudi2$l1[, 1])) ● Longueur 150 ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● 100 Variables ● ●●● ● ●● ● Largeur ● ● ● ●● ●●● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ●● ● ●● ● ● ● ●● ● ● ●●● ●● ●● ●● ● ● ● ●●●● ● ● ● ● Hauteur ● 50 ● −1.4 ● ● −1.3 ● ●● ● ● ● ● −1.2 ● ●● ●● ●● ● ● −1.1 ● ●● ● ● ● ● ●● ● ●●● ● ● −1.0 −0.9 ●● ● ●● ●●● ● ● ●● −0.8 Composante principale Cet auto-modèle (créé par les données pour modéliser les données) est plus réaliste mais moins bon (les erreurs ont une organisation, liée à la présence des deux sexes). 7 ACP et classification La source des données est dans Prodon et Lebreton [10]. Le pourcentage de recouvrement de la végétation est mesuré pour 8 strates dans 182 sites : 1) Rocher 2) 0 m / 0.25 m 3) 0.25 m / 0.50 m Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 21/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel 4) 0.50 m - 1 m 5) 1 m / 2 m 6) 2 m / 4 m 7) 4 m / 8 m 8) 8 m / 16 m data(rpjdl) names(rpjdl) [1] "fau" "mil" "frlab" "lab" "lalab" mil = rpjdl$mil names(mil) [1] "ROCH" "C.25" "C.50" "C1" "C2" "C4" dim(mil) [1] 182 8 "C8" "C16" millog <- log(rpjdl$mil + 1) pcamil <- dudi.pca(millog, scan = F) s.corcircle(pcamil$co) s.label(pcamil$li, clab = 0.75) round(cor(millog), dig = 3) ROCH C.25 C.50 C1 C2 C4 C8 C16 ROCH 1.000 0.132 -0.484 -0.733 -0.725 -0.732 -0.686 -0.533 C.25 0.132 1.000 0.504 0.157 -0.076 -0.419 -0.523 -0.441 C.50 -0.484 0.504 1.000 0.752 0.532 0.212 0.079 0.017 C1 -0.733 0.157 0.752 1.000 0.840 0.539 0.395 0.256 C2 -0.725 -0.076 0.532 0.840 1.000 0.737 0.590 0.399 C4 -0.732 -0.419 0.212 0.539 0.737 1.000 0.920 0.665 C8 -0.686 -0.523 0.079 0.395 0.590 0.920 1.000 0.779 C16 -0.533 -0.441 0.017 0.256 0.399 0.665 0.779 1.000 C16 C8 C4 ROCH C2 C1 C.25 C.50 Un artifice ? Retourner aux données : par(mfrow = c(3, 3)) for (i in 1:8) s.value(pcamil$li, pcamil$tab[, i], cleg = 1.5) Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 22/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel d=2 −0.5 0.5 1.5 d=2 −4.5 2.5 −3.5 −2.5 −1.5 −0.75 −0.25 0.25 0.75 d=2 −0.75 −0.25 0.25 0.75 1.25 −2.5 −1.5 −0.5 0.5 d=2 −1.25 −0.5 0.5 1.5 −3.5 −1.5 −0.5 0.5 d=2 d=2 1.75 −0.5 0.5 1.5 −0.75 1.25 d=2 −0.25 0.25 0.75 1.25 1.75 d=2 2.5 par(mfrow = c(2, 4)) for (i in 1:8) hist(pcamil$tab[, i], main = names(pcamil$tab)[i]) Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 23/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel C.25 C.50 C1 1 2 3 60 −5 −2 0 20 40 Frequency 60 0 20 0 0 −1 40 Frequency 20 40 Frequency 60 40 0 20 Frequency 60 80 80 80 80 ROCH −3 −1 1 −2.0 0.0 1.5 pcamil$tab[, i] pcamil$tab[, i] pcamil$tab[, i] C2 C4 C8 C16 0.0 1.5 pcamil$tab[, i] −1.0 0.5 2.0 pcamil$tab[, i] 80 100 60 0 20 40 Frequency 60 20 0 0 −1.5 40 Frequency 60 20 40 Frequency 30 20 0 10 Frequency 40 80 50 80 100 pcamil$tab[, i] −1.0 0.5 2.0 pcamil$tab[, i] −1 1 2 3 pcamil$tab[, i] Variables à seuil ? Classement des stations ? Deux gradients ? Un gradient et une partition ? plot(hclust(dist(millog), "ward")) Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 24/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel 150 100 69 50 63 67 73 71 66 72 33 58 57 68 62 59 65 48 56 60 53 61 12 49 29 27 34 39 55 44 54 35 51 64 41 32 46 38 36 47 52 42 43 91 45 2 31 37 19 5 21 14 28 30 15 40 4 7 3 6 8 18 22 10 13 24 26 16 20 25 17 11 23 95 102 105 104 106 115 100 151 91 150 93 96 90 94 97 70 81 78 83 76 82 84 74 75 89 85 86 88 80 79 77 87 129 121 130 122 127 125 123 128 124 126 174 153 164 156 157 144 154 155 165 152 160 161 140 146 147 149 107 163 141 170 172 158 167 182 162 175 109 168 159 173 179 176 169 171 108 117 166 180 145 178 181 116 111 112 120 113 114 118 119 132 133 136 137 103 135 143 92 101 110 177 139 142 148 131 138 134 98 99 0 50 Height 200 250 300 350 Cluster Dendrogram dist(millog) hclust (*, "ward") par(mfrow = c(1, 2)) s.class(pcamil$li, as.factor(cutree(hclust(dist(millog), "ward"), 3))) s.class(pcamil$li, as.factor(cutree(hclust(dist(millog), "ward"), 5))) par(mfrow = c(1, 1)) d=2 d=2 ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ●●●●● ● ● ● ● ● ● ● ● ●●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ●● ● ● ● ● ●● ●● ● ● ● ●●● ● ●● ●●● ● ● ● ● ● ● ● ●● ●●● ●●● ● ● ●● ●● ● ● ● ● ●●● ● ● 3 1 2 ● 5●●● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ●●●●● ● ● ● ● ● ● ● ● ●●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ●● ● ● ● ● ●● ●● ● ● ● ●●● ● ●● ●●● ● ● ● ● ● ● ● ●● ●●● ●●● ● ● ●● ●● ● ● ● ● ●●● ● ● 1 4 2 3 parti <- as.factor(cutree(hclust(dist(millog), "ward"), 5)) summary(parti) Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 25/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel 1 2 3 4 5 46 26 32 68 10 round(data.frame(lapply(split(mil, parti), function(x) apply(x, 2, mean))), dig = 3) X1 X2 X3 X4 X5 ROCH 16.848 7.000 2.156 0.235 0.0 C.25 74.130 83.462 90.938 67.353 7.6 C.50 17.413 52.308 83.438 63.603 7.6 C1 0.630 22.115 70.312 53.015 8.6 C2 0.087 0.577 45.000 41.191 14.0 C4 0.000 0.038 6.125 38.162 43.0 C8 0.000 0.000 0.125 28.824 78.0 C16 0.000 0.000 0.000 7.691 41.0 L’ordination suggère la classification. La classification renvoie une ordination. Il s’agit de modèles. Et si on regardait les données ? table.paint(t(mil), clabel.c = 0, cleg = 0) ROCH C.25 C.50 C1 C2 C4 C8 C16 table.paint(t(millog), x = sort(pcamil$li[, 1]), clabel.c = 0, cleg = 0) ROCH C.25 C.50 C1 C2 C4 C8 C16 8 Truites et valeurs propres J.M. Lascaux a étudié 306 truites [9] data(lascaux) names(lascaux$meris) [1] "rd" "ra" "rpelg" "rpecg" "caec" J.M. Lascaux a mesuré 5 variables méristiques : Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 26/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel rd Nombre de rayons à la dorsale ra Nombre de rayons à l’anale rpelg Nombre de rayons à la pelvienne gauche rpecg Nombre de rayons à la pectorale gauche caec Nombre de caeca pyloriques 0.0 0.2 0.4 0.6 0.8 1.0 1.2 pcmeris <- dudi.pca(lascaux$meris, scan = F) barplot(pcmeris$eig) cor(lascaux$meris) rd ra rpelg rpecg caec rd 1.00000000 0.31108004 0.02633742 0.07528987 -0.06038263 ra 0.31108004 1.00000000 -0.05677408 0.12549442 -0.11191258 rpelg 0.02633742 -0.05677408 1.00000000 0.04819827 0.10350996 rpecg 0.07528987 0.12549442 0.04819827 1.00000000 0.06108539 caec -0.06038263 -0.11191258 0.10350996 0.06108539 1.00000000 cor.test(lascaux$meris$ra, lascaux$meris$rpecg) Pearson's product-moment correlation data: lascaux$meris$ra and lascaux$meris$rpecg t = 2.2055, df = 304, p-value = 0.02817 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.01356170 0.23432087 sample estimates: cor 0.1254944 Conclure sur la nature de ce type de variables. J.M. Lascaux a mesuré 35 variables morphologiques : names(lascaux$morpho) [1] "LS" "MD" [10] "DPEL" "DPEC" [19] "PELAN" "PELC" [28] "LAD" "LD" [37] "ETET" "MAD" "ADC" "ANC" "HD" "MAN" "ADAN" "LPRO" "LC" "MPEL" "ADPEL" "DO" "LAN" "MPEC" "ADPEC" "LPOO" "HAN" "DAD" "PECPEL" "LTET" "LPELG" "DC" "PECAN" "HTET" "LPECG" "DAN" "PECC" "LMAX" "HPED" LS Longueur standard MD Distance bout du museau - insertion de la dorsale MAD Distance bout du museau - insertion de l’adipeuse MAN Distance bout du museau - insertion de l’anale Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 27/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel MPEL Distance bout du museau - insertion de la pelvienne MPEC Distance bout du museau - insertion de la pectorale DAD Distance insertion de la dorsale - insertion de l’adipeuse DC Distance insertion de la dorsale - départ de la caudale DAN Distance insertion de la dorsale - insertion de l’anale DPEL Distance insertion de la dorsale - insertion de la pelvienne DPEC Distance insertion de la dorsale - insertion de la pectorale ADC Distance insertion de l’adipeuse - départ de la caudale ADAN Distance insertion de l’adipeuse - insertion de l’anale ADPEL Distance insertion de l’adipeuse - insertion de la pelvienne ADPEC Distance insertion de l’adipeuse - insertion de la pectorale PECPEL Distance insertion de la pectorale - insertion de la pelvienne PECAN Distance insertion de la pectorale - insertion de l’anale PECC Distance insertion de la pectorale - départ de la caudale PELAN Distance insertion de la pelvienne - insertion de l’anale PELC Distance insertion de la pelvienne - départ de la caudale ANC Distance insertion de l’anale - départ de la caudale LPRO Longueur préorbitale DO Diamètre de l’orbite LPOO Longueur postorbitale LTET Longueur de la tête HTET Hauteur de la tête (en passant par au milieu de l’orbite) LMAX Longueur de la mâchoire supérieure LAD Longueur de l’adipeuse LD Longueur de la dorsale HD Hauteur de la dorsale LC Longueur de la caudale LAN Longueur de l’anale HAN Hauteur de l’anale LPELG Longueur de la pelvienne gauche LPECG Longueur de la pectorale gauche HPED Hauteur du corps au niveau du pédoncule caudal ETET Largeur de la tête (au niveau des orbites) m <- na.omit(lascaux$morpho) m.acp <- dudi.pca(m, scann = F) L’effet taille est écrasant : barplot(m.acp$eig) Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 28/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf 0 5 10 15 20 25 30 A.B. Dufour & D. Chessel ● ● ●● ● ●● ●●● ●●●● ● ● ●● ●● ●● ● ●● ●● ● ● ● ● ● ● ●●●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● 0 score 0 score 0 score 30 60 20 40 y 30 60 50 y 110 35 15 5 25 25 50 6 12 15 30 y score 0 score 10 15 0 score ● ●● ●● ● ● ● ● ● ● ●● ●● ● ●● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● LPOO 0 HD ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● −2 LPECG ● ●●●score ● ●●● ●● 0 ● ● ● ●● ● ● ● ●● ●● ● ● ●●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ● ● ● ● ● ●●● ●● ●● ●● −2 ● −2 y score 0 LD ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● −2 ● ●●● ● ●● ●● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● 0 score ●●● score ● ● ● −2 40 0 ● ●● ●● ● ● ●● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●●● ● ● ● ● −2 y 60 120 y y y ● ● ●●●●● ● ● ●● ● ● ●● ●● ●● ● ●● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ●● y y y 10 −2 ADC PECC 0 score 0 LPELG score ● ● ● ●● ● ● ●● ● ● ●● ●● ● ●● ●●● ● ●● ●●●●● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● 0 ● ●● ● ●●● ● ●● ●● ● ●● ●●● ● ● ● ● ● ●● ● ● ● ● ●● ●● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● −2 PECAN DO MPEC score ● ●●● ●● 0 ●● ● ● ●● ●●●● ●● ● ● ● ●● ●● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● −2 score LAD 35 score ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● 0 ●● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 ●● ● ● ●●● ● ●●● −2 PECPEL ● 0 HAN score ● ● ●● ● ● ● ●●● ●● ● ● ● ●● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● −2 DPEC ● ● ● ● ● ● ●●●●● ●● ●● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● −2 ● ●● ●● ● ●●●● ● ● ●● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● −2 y 6 0 score ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●● 0 LPRO 0 score score 14 score MPEL −2 ● ● ● ● ●●● ● ● ●● ●●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● −2 ●● ●●● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● LMAX DPEL ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● y 0 0 score ● ● y 160 y 80 50 20 y y 30 60 score ADPEC ANC MAN −2 ● ● ●●● ● ● ● ●● ● ●● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● −2 score 0 y 180 80 40 60 120 60 20 100 0 ● ●●● ●● ● ● ●● ●● ●● ● ●●● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● −2 30 y ●●● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ●● ●● ●● LAN ●● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● −2 y 15 30 0 ● ● score ● ●● HTET DAN ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● −2 15 ● 0 80 y score PELC 0 ●●● score ● ● ●● −2 15 35 score ● ● ● ●●● ● ● ●● ● ●●● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ●● ● ● ● ●● ● ● 0 ● ● ●●● ● ●●● ● ●● ● ● ●● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ●● ● −2 25 0 y ADPEL MAD −2 ● ●● ● ● ●●● ●●● ● ●●● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● −2 ● ● ● ● ● ● ●● ● ●●● ●●●●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● −2 0 y 0 score score y 60 120 30 60 50 100 PELAN ETET y y score ●● ● ● ● ● ●● ● ●● ●● ● ●● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ●●● −2 DC ● ● ●●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● −2 ● ●● ● ●● ● ● ●●● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● −2 10 45 y 20 60 y 30 35 0 LC 0 score score y 70 y 20 score ● LTET MD −2 ADAN −2 y 0 ● ● ●● ●●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● −2 10 y DAD −2 ● ● ● ●●● ●●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● −2 ● ● ●●● ●● ● ● ●● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● −2 10 0 score y 40 y 80 −2 y LS 50 100 ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● y y 100 220 score(m.acp) 0 score ●● ● ●●● ● ●●●● ● ●● ● ● ●● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ●● ●● ●● HPED −2 0 score 0 Une truite qui a perdu un morceau de nageoire dans une bataille est facilement repérée ! Ce qui reste quand on enlève l’effet taille concerne, grossièrement parlant, la forme. Les variables 13, 18, 20 et 27 présentent des outliers. Elles sont exclues : Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 29/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel 0 10 y ●● ●● ● ●● ●● ● ●● ● ● ●● ● ●●●● ● ● ● ● ● ● ●● ● ●● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ●● ●● 20 ●score ● ●● ● ● ● ● ●● ●● ●●● ● ●● ● ●● ●● ● ● ●● ●● ● ● ● ● ● ●●●● ●● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ●●●● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ETET −2 0 score 30 50 y 20 35 50 y y 30 50 70 120 60 60 30 40 20 0 ●● ● ●●●● ● ●●●●● ● ● ● ●● ● ● ●● ● ●● ●●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ANC 10 score y 0 score LAD ●●● ●● ● ● ● ● ●● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ●●● ●● −2 HAN 0 score ●● 0 ●● ● ●●● ● ● ●●● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● −2 score ADC ● 20 HTET y 30 15 ● ●● ● ● ●●●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ●● ● ● ● 0 ● ● ● ●● ● ● ●● ●●● ●●● ● ●● ●● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● −2 0 score 30 y y y LAN −2 PELAN ●● ● ●● ● ●● ●● ● ●● ● ● ●● ● ● ●● ●●● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●●● −2 MPEC −2 score 0 ● ●●● ●● ● ●● ● ● ●● ● ● ●● ●● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● 0 15 ● −2 LTET score 20 score LC ● ● ●● ● ● ●●● ● ●● ●● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ●●● ● ● ●● ●● ● ●● ● ●● DPEC −2 ● ● ● ● ● ● ●● ● ● ●●● ● ●● ● ●●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● −2 y 30 0 score 0 ● ● ● ●●● ● ●● ● ● ●●● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ●● ●● −2 ●● ●score 15 30 ● ● ●● ● ● 0 10 y HD PECAN 0 ● ●● ●● ●●●● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● −2 score 50 y LPOO 30 ● ●● ● ● ● ●●● ●● ●● ● ● ● ●● ● ● ● ●● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● MPEL score 0 ●● ● ● ●● ●●●●● ● ● ● ● ●● ● ●● ● ● ●● ●●● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● −2 10 score ●● ● ● ●●● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ●● ● ● ● ● score DPEL y 60 90 y 0 ● ● ● ●● ●● ● ●● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● −2 score 50 PECPEL 0 ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● −2 ●● ●● ● ● ●● ● ●● ● ●● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ●● −2 MAN y 140 80 y y 40 70 40 30 30 15 y 0 HPED 20 0 score ● ● ● ●● ● ●●● ● ● ●●● ● ●● ● ●●●● ●●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● 0 20 80 140 y score DO −2 DAN −2 ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● −2 score ● 0 ● ● ● ●● ● ● ● ●● 10 y 15 30 y 0 ADPEC −2 LPECG −2 ● ●● ●● ● ●●● ● ●● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● 0 ● ●●● ● ● ● ●● ●● ●● ●● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● −2 60 60 60 20 0 ● score ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ●● ●● ● ● 0 5 y 15 y 30 score LD MAD score score −2 40 0 ●●● ● ● ● ● ●●● ● ●● ● ●● ● ●● ● ● ●● ●● ●● ● ●● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● −2 16 LPRO 6 10 score ●● ● ● ●● ● ● ●● ●● ●● ●● ● ● ● ● ●● ●●● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● −2 score DC −2 y 18 y 12 6 0 ● ● ●● ●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● −2 y ADPEL y 60 y 30 ● ●● ● ● ●●● ●●● ● ●● ●● ●● ● ●● ● ● ●● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● 0 ● ● ● ●● ●● ● ● ●●● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● −2 120 0 score −2 MD y score DAD ●●● ●● ● ● ●● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ●● −2 ● ●● ● ● ● ●●● ●●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● −2 y 40 y 70 ● ● 0 120 −2 90 LS 50 ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● y 200 100 y m <- na.omit(lascaux$morpho)[, -c(13, 18, 20, 27)] m.acp <- dudi.pca(m, scan = F, nf = 3) score(m.acp) 0 score ● ●● ● ●● ● ● ●● ●● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●●● ●● ●● ● LPELG −2 0 score 0 Noter que le problème esr résolu, avec brutalité, certes ! names(lascaux$sex) <- row.names(lascaux$morpho) names(lascaux$gen) <- row.names(lascaux$morpho) Enlever des données le modèle issu de l’effet taille ou dépouiller l’analyse à partir du facteur 2 donne strictement le même résultat. C’est l’illustration qu’une ACP peut faire deux choses de nature radicalement différente. Le plan 2-3 est directement une analyse de la forme : s.class(m.acp$li[, 2:3], lascaux$sex[as.numeric(row.names(m.acp$li))]) Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 30/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel d=1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● F ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● M ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Dans la forme, il y a certainement une composante sexuelle. s.class(m.acp$li[, 2:3], lascaux$gen[as.numeric(row.names(m.acp$li))]) d=1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 4 6 0 1 23 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Dans la forme, il y a certainement une composante génétique. par(mfrow = c(1, 2)) barplot(m.acp$eig[1:15]) barplot(m.acp$eig[2:16]) Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 31/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf 0.4 0.0 0 10 20 0.8 A.B. Dufour & D. Chessel Une ACP peut ainsi en cacher une autre. On peut analyser l’effet taille par une ACP non centrée sur le tableau des log doublement centré : mlog <- log(m) mlog0 <- bicenter.wt(mlog) mlog0 <- data.frame(mlog0) names(mlog0) <- names(m) mlog0.acp <- dudi.pca(mlog0, scan = F, scal = F, cent = F) Ceci indique que l’introduction des deux facteurs directement dans l’analyse est pertinente. L’ACP inter-classe est celle des centres de gravité des sous-nuages. croi <- lascaux$gen[as.numeric(row.names(m.acp$li))]:lascaux$sex[as.numeric(row.names(m.acp$li))] plot(between(mlog0.acp, croi, scan = F)) Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 32/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel LAN d = 0.2 DO d = 0.2 HAN LD ANC PELAN DADADPEL DC DAN ADPEC LS MAD ADC LPECG PECAN MAN HD LPELG DPEL MPEL MD MPEC HPED PECPEL LC LTET DPEC LPOO HTET ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● LAD ● ● ● ● ● ● ● LPRO ETET Canonical weights ● ● ● ● ● ● ● 2:F ● ● ● LD ANC PELAN ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● 5:M ● ● 4:M ● ●● ● ●● ●● ● ●● ● 6:M ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● DAD DC ADPEL DAN ADPEC LS MAD ADC PECAN MAN DPEL MPEL HPED MD MPEC LC LTET PECPEL DPEC LPOO HTET ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● 3:M 1:M 2:M ● ● ● ●●● ● ● ● ●● ● ● DO HAN ● ● ● ● ● ● ● ● ● ● 0:F 1:F 3:F0:M ● ● ● ● ● 6:F ● 4:F ● ● ● ●● ● ● ● ● ● ● ● ● ● 5:F ● ● LAN ● ● ● ● ● ● ● ●● ● LPECG LPELGHD ● ● ● ● ● LAD ETET LPRO Variables Scores and classes d = 0.1 Eigenvalues Axis2 5:F 6:F 4:F Axis1 2:F 0:F1:F 3:F 0:M 5:M 1:M 2:M Inertia axes 4:M 3:M 6:M Classes Ces truites sont classées en 7 groupes génétiques par le nombre d’allèles méditerranéens. La variable génétique prend les modalités 0 (0 allèle méditerranéen, homozygotes atlantiques, truites dites modernes), 1 à 5 (respectivement 1 à 5 allèles méditerranéens) et 6 (6 allèles méditerranéens, homozygotes méditerranéens, truites dites ancestrales). Les modernes sont à gauche, les ancestrales à droite, les femelles en haut et les mâles en bas. Le code des variables permettra de poursuivre. J.M. Lascaux a mesuré 15 variables de coloration de la robe : names(lascaux$colo) [1] "PRAD" "PNAD" [8] "PRINF" "PNINF" [15] "PND" "PRAA" "PRSUP" "PNAA" "PNSUP" "PRDA" "PNO" "PNDA" "PRLI" "PNPERIOP" "PRD" PRAD Nombre de points rouges avant l’aplomb de la Dorsale PNAD Nombre de points noirs avant l’aplomb de la Dorsale PRAA Nombre de points rouges après l’aplomb de l’Anale PNAA Nombre de points noirs après l’aplomb de l’Anale PRDA Nombre de points rouges entre aplomb Dorsale et aplomb Anale PNDA Nombre de points noirs entre aplomb Dorsale et aplomb Anale PRLI Nombre de points rouges sur la ligne latérale Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 33/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel PRINF Nombre de points rouges au dessous de la ligne latérale PNINF Nombre de points noirs au dessous de la ligne latérale PRSUP Nombre de points rouges au dessus de la ligne latérale PNSUP Nombre de points noirs au dessus de la ligne latérale PNO Nombre de points noirs sur l’opercule PNPERIOP Nombre de tâches noires à la périphérie de l’opercule PRD Nombre de points rouges sur la Dorsale PND Nombre de points noirs sur la Dorsale 0 1 2 3 4 5 6 pcacol <- dudi.pca(lascaux$colo, scann = F) barplot(pcacol$eig) Questions 1) Trouver en quoi une partie de la corrélation est arte-factuelle dans ce tableau. 2) Interpréter les composantes. 3) Trouve-t-on dans la coloration de la robe une composante génétique ? 4) Trouve-t-on dans la coloration de la robe une composante environnementale ? J.M. Lascaux a mesuré 15 variables ornementales qualitatives : names(lascaux$ornem) [1] "ocpr" "ocpn" [10] "frpel" "frc" "maju" "taop" "ptsdos" "coufl" "pntet" "coupn" "frd" "zeb" "ptsad" "frad" "fran" ocpr Ocelles autour des points rouges (1 nulles, 2 faibles, 3 marquées) ocpn Ocelles autour des points noirs (1 nulles, 2 faibles, 3 marquées) maju Marques juvéniles (1 absence, 2 présence) taop Tache operculaire (1 absence, 2 présence) pntet Points noirs sur la tête (1 absence, 2 présence) frd Frange de la dorsale (1 aucune, 2 blanche, 3 blanche et noire) ptsad Points sur l’adipeuse (1 absence, 2 présence) frad Frange de l’adipeuse (1 aucune, 2 rouge, 3 très rouge) Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 34/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel fran Frange de l’anale (1 aucune, 2 blanche, 3 blanche et noire) frpel Frange des pelviennes (1 aucune, 2 blanche, 3 blanche et noire) frc Frange de la caudale (1 aucune, 2 plus ou moins rouge) ptsdos Points sur le dos (1 absence, 2 présence) coufl Couleur des flancs (1 brun-jaune, 2 gris, 3 blanc argenté) coupn Contour des points noirs du flan (1 net, 2 flou) zeb Zébrures sur les flancs (1 absence, 2 présence) summary(lascaux$ornem) ocpr ocpn maju 1:203 1:238 1:111 2: 66 2: 48 2:195 3: 37 3: 20 frc ptsdos coufl 1:249 1:248 1:213 2: 57 2: 58 2: 74 3: 19 taop 1: 57 2:249 pntet 1: 89 2:217 coupn 1:239 2: 67 zeb 1:278 2: 28 frd 1:178 2: 14 3:114 ptsad 1:218 2: 88 frad 1: 51 2:251 3: 4 fran 1: 27 2:173 3:106 frpel 1:107 2:146 3: 53 On retrouvera ces données plus tard. 9 Voir le tableau Source : Devillers, J., J. Thioulouse, and W. Karcher. 1993 [2]. data(toxicity) head(toxicity$tab) V1 V2 V3 1 1.224 1.824 3.004 2 1.438 1.540 2.630 3 0.679 1.893 2.855 4 1.001 1.390 2.807 5 1.097 1.747 2.815 6 0.967 1.666 2.870 toxicity$species [1] [3] [5] [7] [9] [11] [13] [15] [17] V4 3.323 2.801 2.293 2.939 3.086 3.198 V5 2.579 2.833 2.547 3.636 3.170 3.358 V6 5.241 5.284 4.943 4.926 5.303 5.441 "Daphnia magna (LC50)" "Orconectes immunis" "Oncorhynchus mykiss" "Gambusia affinis" "Carassius auratus" "Arbacia punctulata (embryo)" "Photobacterium phosphoreum" "Daphnia magna (EC50)" "Ceriodaphnia reticulata" V7 6.264 4.024 3.163 6.103 6.364 6.120 "Tanytarsus dissimilis" "Rana catesbeiana" "Lepomis macrochirus" "Ictalurus punctatus" "Pimephales promelas" "Arbacia punctulata (sperm)" "Arbacia punctulata (DNA)" "Daphnia pulex" toxicity$chemicals [1] "2-Methyl-2,4-pentanediol" "2-Methyl-1-propanol" [4] "2,4-Pentanedione" "2-Chloroethanol" [7] "Pentachlorophenol" "2,2,2-Trichloroethanol" "Hexachloroethane" On a la toxicité de 7 molécules (colonnes) sur 16 cibles (lignes) exprimée en : log(mol/litre) data(toxicity) table.paint(toxicity$tab, row.lab = toxicity$species, col.lab = toxicity$chemicals) Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 35/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf Daphnia magna (LC50) Tanytarsus dissimilis Orconectes immunis Rana catesbeiana Oncorhynchus mykiss Lepomis macrochirus Gambusia affinis Ictalurus punctatus Carassius auratus Pimephales promelas Arbacia punctulata (embryo) Arbacia punctulata (sperm) Photobacterium phosphoreum Arbacia punctulata (DNA) Daphnia magna (EC50) Daphnia pulex Ceriodaphnia reticulata 1] 2] 3] 4] 5] Pentachlorophenol Hexachloroethane 2−Chloroethanol 2,4−Pentanedione 2,2,2−Trichloroethanol 2−Methyl−1−propanol 2−Methyl−2,4−pentanediol A.B. Dufour & D. Chessel 6] table.value(toxicity$tab, row.lab = toxicity$species, col.lab = toxicity$chemicals) f1 <- as.factor(row(as.matrix(toxicity$tab))) f2 <- as.factor(col(as.matrix(toxicity$tab))) tox <- unlist(toxicity$tab) anova(lm(tox ~ f1 + f2)) Analysis of Variance Table Response: tox Df Sum Sq Mean Sq F value Pr(>F) f1 16 5.933 0.371 1.1217 0.3467 f2 6 244.098 40.683 123.0630 <2e-16 *** Residuals 96 31.736 0.331 --Signif. codes: 0 Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 36/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf Pentachlorophenol Hexachloroethane 2−Chloroethanol 2,4−Pentanedione 2,2,2−Trichloroethanol 2−Methyl−1−propanol 2−Methyl−2,4−pentanediol A.B. Dufour & D. Chessel Daphnia magna (LC50) Tanytarsus dissimilis Orconectes immunis Rana catesbeiana Oncorhynchus mykiss Lepomis macrochirus Gambusia affinis Ictalurus punctatus Carassius auratus Pimephales promelas Arbacia punctulata (embryo) Arbacia punctulata (sperm) Photobacterium phosphoreum Arbacia punctulata (DNA) Daphnia magna (EC50) Daphnia pulex Ceriodaphnia reticulata 1 3 5 7 On a fait le tour. Commenter. 10 Notes d’abondance Quand on pense l’ACP comme approche d’un échantillon d’une loi normale multivariée, son usage sur un tableau de notes binaires (0-1) est sans objet. On dit à l’utilisateur ”qu’il n’a pas le droit”. Quand on pense l’ACP comme une opération géométrique ou la maximisation d’une forme quadratique, on dit à l’utilisateur ”qu’il a le droit”. Quand un lecteur formé à une école juge le manuscrit d’un auteur formé à l’autre école, le dialogue est sommaire et tous les coups sont permis. Le même problème se pose pour les notes d’abondance (souvent de 0 à 7 en phytosociologie). L’ACP d’un tableau florofaunistique est valide comme première approche d’objectifs précis. Elle a marqué les pionniers [5] par la facilité avec laquelle elle permet des cartes de synthèse. data(jv73) par(mfrow = c(5, 4)) for (m in 1:19) { s.value(jv73$xy, jv73$poi[, m] - mean(jv73$poi[, m]), incl = F, contour = jv73$contour, sub = names(jv73$poi)[m], possub = "topleft", csub = 3, cleg = 0) } s.value(jv73$xy, dudi.pca(jv73$poi, scal = F, scan = F)$li[, 1], incl = F, contour = jv73$contour, sub = "PCA", possub = "topleft", csub = 3, cleg = 0) Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 37/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel Chb Omb Van Spi Tan d = 50 d = 50 d = 50 d = 50 d = 50 Tru Bla Che Gou Gar d = 50 d = 50 d = 50 d = 50 d = 50 Vai Hot Bar Bro Lam d = 50 d = 50 d = 50 d = 50 d = 50 Loc Tox Lot Per PCA d = 50 d = 50 d = 50 d = 50 d = 50 L’ACP est en fait souvent discutée sur ce genre de tableau parce que c’est une méthode qui introduit une grande dissymétrie entre le traitement des lignes et des colonnes. Elle est centrée sur la différence entre les lignes et la redondance entre colonnes. Dans un tableau sites-espèces, on s’appuie sur la covariance entre espèces pour discriminer les sites ; dans un tableau espèces-sites, on fait l’inverse. Seule l’analyse des correspondances fait les deux, c’est-à-dire, donne un compromis des deux objectifs. L’important est de ne pas utiliser la procédure à l’aveugle. La difficulté duale est la grande diversité des structures de ce genre de tableaux et l’objectif est de trouver le particulier par le biais de procédure générale. data(trichometeo) trichometeo$fau comporte 49 pièges et 17 espèces. t1 <- data.frame(apply(log(trichometeo$fau + 1), 2, function(x) x/sum(x))) s.distri(dudi.pca(t1, scan = F, scale = F)$l1, t1, cell = 0, csta = 0.3, clab = 1) Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 38/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel d=1 ● ● Che ● Hve ● Ath Cea ● Hfo Glo ●● Set Hsp Han All Ced Psy ● Sta Aga Hym ● ● Hys ● ●●●● ● ● ● ●● ● ● ● ● ●●● ● ●● ● ●● ● ● ● ● ● Hyc ● ● Les dénombrements sont généralement transformés par x 7→ log (x + 1). Le tableau [11] est passé en pourcentage par colonnes (espèces). Un taxon est une distribution de fréquence entre sites sur une colonne. La moyenne est 1/n et la colonne centrée est l’écart entre la distribution de l’espèce et la distribution uniforme. Une composante principale est un score normalisé des sites et la coordonnée de l’espèce est exactement la position moyenne sur ce score. L’analyse maximise la somme des carrés des écarts des positions moyennes à l’origine qui est la position du taxon indifférent : sco.distri(dudi.pca(t1, scan = F, scale = F)$l1[, 1], t1) Hys ● ● All Hym ● Aga ● ● Sta Psy ● ● Ced ● Che ● Han ● Ath Hsp ● ● Hve ● Set ● Glo ● ● ● Hfo Hyc Cea d=1 L’essentiel est que certains relevés ont décalé la majorité des espèces dans le même sens (une configuration météorologique particulière provoque l’émergence simultanée des trichoptères). Autre ambiguı̈té : la covariance de deux variables est une mesure de ressemblance même quand elle est négative (une corrélation de -1 indique qu’on a Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 39/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel mesuré exactement la même chose au signe près). Pour deux espèces, c’est au contraire une mesure de différence : une espèce disparaı̂t quand l’autre apparaı̂t. L’ACP a alors le statut de méthode qui fait une double typologie. Le même calcul n’a pas le même sens expérimental. t1 <- data.frame(apply(jv73$poi, 2, function(x) x/sum(x))) par(mfrow = c(1, 2)) s.distri(dudi.pca(t1, scan = F, scale = F)$l1, t1, cstar = 0.3, clab = 1, cell = 0, axesel = F, sub = "PCA") s.distri(dudi.coa(jv73$poi, scan = F)$l1, jv73$poi, cstar = 0.3, clab = 1, cell = 0, axesel = F, sub = "COA") d=2 d=1 ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Tan Gar ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● BroPer ● ● ● ● ● Gou ● TruVai Loc Che Van Chb Lam Bla ● ● Omb Lot ● Spi Tox ● Hot Bar ● ● ●● Bar Hot Tox ● ● Spi ● Bla Lam Omb Lot● ● Chb ● ● Van ● ● ● Per ● Gou ● ● Che ● ● Tru ● Bro ● ●● ● ● ● Gar ● Tan ● ● Vai ● ●● ● Loc ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● PCA ● ● ● ● ● ● ● COA Les deux analyses parlent de la typologie de distributions des espèces. La carte de l’ACP est bien meilleure en ce sens qu’elle indique le gradient amont-aval de richesse croissante (salmonidés-cyprinidés) et dans la seconde partie le gradient de vitesse de courant (lénitique-lentique). Les contraintes de l’AFC déforment sans gain précis cette vision juste du réseau hydrographique. data(aviurba) names(aviurba$fau) <- aviurba$species.names.fr t1 <- data.frame(apply(aviurba$fau, 2, function(x) x/sum(x))) par(mfrow = c(1, 2)) s.distri(dudi.pca(t1, scan = F, scale = F)$l1, t1, cstar = 0.3, clab = 0.75, cell = 0, axesel = F, sub = "PCA") s.distri(dudi.coa(aviurba$fau, scan = F)$l1, aviurba$fau, cstar = 0.3, clab = 0.75, cell = 0, axesel = F, sub = "COA") d=1 d=2 ● Chouca.des.tours ● ● ● ● Pigeon.domestique ● ● ● ●● ● Fauvette.des.jardins Troglodyte Rougequeue.a.front.blanc ● ● ● Pie.bavarde ● ● ● ● Mouette.rieuse Pouillot.fitis Moineau.friquet ●● Tourterelle.turque Martinet.noir Rougequeue.noir Verdier ● ● ● Hirondelle.de.fenetre Etourneau.sansonnet Moineau.domestique Hypolais.polyglote Pouillot.veloce Serin.cini Merle.noir ● Pinson.des.arbres Hirondelle.de.cheminee Mesange.bleue Tourterelle.des.bois ●● Mesange.charbonniere Alouette.des.champs ● ● Rossignol.philomele Corneille.noire Fauvette.a.tete.noire ● Bruant.proyer Faucon.crecerelle Chardonneret.elegant Loriot.jaune Linotte.melodieuse Coucou.gris Traquet.patre ● ● ● ● Busard.cendre ● Traquet.tarier ● Mesange.bleue ● Loriot.jaune Fauvette.des.jardins Troglodyte Pouillot.fitis Pouillot.veloce ● Linotte.melodieuse ● ●● Pinson.des.arbres ● Tourterelle.des.bois ● Mesange.charbonniere ● Rossignol.philomele ● ● Etourneau.sansonnet Fauvette.a.tete.noire ● ● ● Merle.noir ● ● Serin.cini Moineau.friquet Tourterelle.turque ● ● Pigeon.domestique ● Verdier ● ●●●●● Moineau.domestique Chouca.des.tours ●● ● ●● ● Pie.bavarde Chardonneret.elegant Hirondelle.de.fenetre Coucou.gris Bruand.zizi ● Martinet.noir Rougequeue.noir Rougequeue.a.front.blanc ● ●● ● Corneille.noire Faucon.crecerelle ● ● ● ● ●Mouette.rieuse Hirondelle.de.cheminee Bruant.proyer ● Hypolais.polyglote ● Alouette.des.champs ● ● ● Busard.cendre ● Traquet.patre ●● ● Bruand.zizi ● Traquet.tarier Bergeronnette.printaniere Bergeronnette.printaniere ● PCA ● COA Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 40/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel Deux gradients centre-ville vers rural fermé et centre-ville vers rural ouvert pour l’ACP correspondent aux deux gradients urbain-rural et ouvert-fermé pour l’AFC. L’avantage est à la seconde. Il n’y a pas de meilleures méthodes, il n’y a que de meilleures interactions entre une méthode et un objet. Le même raisonnement s’applique aux tableaux espèces-relevés en pourcentage par lignes (profils espèces) centrés par lignes. Les moyennes se calculent toujours par espèces et la normalisation par espèces est à éviter. Si on désire d’abord éviter toute réflexion préalable, on peut préférer l’analyse des correspondances (on aura souvent un résultat acceptable mais ce peut être une erreur totale). data(steppe) 512 relevés par quadrat sur un transect de 5.12 km ont enregistré la présence / absence de 37 espèces (milieu steppique). A gauche l’ACP du tableau non modifié,à droite son AFC : 0.0 −1.5 PCA 1.5 par(mfrow = c(2, 1)) plot(dudi.pca(steppe$tab, scan = F, scale = F)$li[, 1], pch = 20, ylab = "PCA", xlab = "", type = "b") plot(dudi.coa(steppe$tab, scan = F)$li[, 1], pch = 20, ylab = "COA", xlab = "", type = "b") ● ● ● ● ●●●● ●● ●● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ●● ●● ●● ●● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●●●●● ● ●● ● ● ● ● ● ● ● ● ●● ●● ●● ● ●● ● ● ●● ● ● ● ●● ●● ●● ● ●●● ●● ● ● ● ●● ● ● ● ●● ● ●●● ● ● ●● ● ●● ● ● ● ● ● ●● ●● ● ●● ● ●● ● ●●●●●● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●●● ● ●● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●● ● ● ●● ● ● ● ● ●● ●●●● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ●●● ● ● ●● ● ● ●● ●● ● ● ● ●● ● ● ●●● ●● ● ●●● ● ●● ● ● ●●● ● ● ●● ●●●●●● ●●●●● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ●● ●● ● ● ● ●● ●●● ● ●● ● ●●● ●● ● ● ● ● ● ● ●● ●●● ● ● ● ● ●● ● ● ●● ●● ●● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ●● ● 0 100 200 300 400 500 1 −1 COA 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ●● ● ●● ● ● ● ●● ● ●● ● ●● ● ●●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ● ● ● ● ●● ●● ● ● ● ●● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●●● ● ● ● ●●● ● ●● ● ● ● ● ●●● ● ● ● ●● ●●● ●● ●● ● ● ●● ●● ● ● ● ● ●● ● ● ●● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●●● ● ●● ● ● ● ● ●●● ● ● ●● ● ● ● ● ●● ●● ●● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ●● ●●● ● ● ● ●●●● ●●●● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●● ●● ●● ● ●●● ● ●● ● ● ● ● ●●● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● 0 100 200 300 400 500 table.value(t(steppe$tab[1:512, 1:24]), clabel.c = 0, csize = 0.05, cleg = 0) Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 41/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11 E12 E13 E14 E15 E16 E17 E18 E19 E20 E21 E22 E23 E24 On obtient cette fois un résultat très voisin. L’ACP n’est cependant qu’un point de départ dans la question finalement difficile de l’ordination [3] des tableaux florofaunistiques. Ces exemples illustrent l’insertion de la procédure dans un comportement adapté à chaque cas. 11 Ne pas se tromper de centrage data(veuvage) par(mfrow = c(3, 2)) for (j in 1:6) plot(veuvage$age, veuvage$tab[, j], xlab = "^ age", ylab = "pourcentage de veufs", type = "b", main = names(veuvage$tab)[j]) Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 42/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel 70 80 70 80 40 40 20 90 ● ● ●●● ● ● ●● ● ● ● ● ●● ● ●●●●●●●● ●●●●●●●●●●●● 50 60 70 âge Employe Ouvrier 70 80 90 0 ● ● ●●● ●● ●● ●●●● ● ●●●● ●●●●●●● ●●●●●●●●●● 20 40 60 âge ● 90 ● 0 pourcentage de veufs 60 Cadre_moyen 60 40 20 0 80 Cadre_sup ● 60 70 âge ● 50 60 âge ●●● ●●●●● ●●●● ●●●● ● ● ● ●●●●●●●●●●●●●● ●● 60 ● ● ●● ● ● ● ● ●● ●● ●●●● ● ● ●● ●●● ● ● ●●●●●●●●●●● 50 ● 50 20 90 40 20 0 pourcentage de veufs 60 ● 0 ●● ●●● ●● ● ● ●●● ●●●● ●●●●●● ●●●●●●●●●●●●● pourcentage de veufs ● ● 50 pourcentage de veufs Artisan pourcentage de veufs 20 40 60 0 pourcentage de veufs Agriculteur 80 90 ● ● ● ● ●● ●● ● ● ● ●● ●●● ●●●●● ●●●●● ●●●●●●●●●●● 50 60 âge 70 80 90 âge par(mfrow = c(2, 2)) PCA1 <- dudi.pca(veuvage$tab, scal = F, scan = F) PCA2 <- dudi.pca(t(veuvage$tab), scal = F, scan = F) dotchart(PCA1$co[, 1], lab = names(veuvage$tab)) plot(veuvage$age, PCA1$li[, 1], type = "b", ylab = "ACP1") dotchart(PCA2$li[, 1], lab = names(veuvage$tab)) plot(veuvage$age, PCA2$co[, 1], type = "b", ylab = "ACP2") Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 43/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel ● Employe ● ● ● Artisan 50 ACP1 ● Cadre_moyen Cadre_sup ● 100 Ouvrier 0 ● Agriculteur ● 12 13 14 15 ● ●● ● ● ● ●● ● ● ● ●● ●● ●●● ●●●● ● ● ● ● ●●●●●●●●● 50 60 70 80 90 6 veuvage$age Ouvrier Employe ● ● Agriculteur ● −20 0 10 4 1 Artisan ● 2 ● ● ● ●● 3 ACP2 ● Cadre_moyen Cadre_sup ● ● 5 ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●●● ● ●●● ●●●●● ●●● ● 50 60 70 80 90 veuvage$age Ces deux analyses se ressemblent fort et disent deux choses radicalement différentes. Lesquelles ? 12 Information supplémentaire Plusieurs auteurs ont souligné que le terme supplémentaire s’applique souvent de manière abusive à tout ce qui ne fait pas partie du tableau des données alors qu’on devrait bien réserver le terme projection en individus supplémentaires à une opération géométrique précise. data(trichometeo) names(trichometeo) [1] "fau" "meteo" "cla" On a un tableau de 11 variables météorologiques [11]. Chaque ligne est une journée d’été. meteo.pca <- dudi.pca(trichometeo$meteo, scan = F) par(mfrow = c(2, 2)) s.label(meteo.pca$li) s.class(meteo.pca$li, trichometeo$cla) s.traject(meteo.pca$li, trichometeo$cla) s.corcircle(meteo.pca$co) Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 44/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel d=2 11 12 ● 10 ● ● 2 41 25 1 840 24 26 42 43 44 3 9 45 28 18 13 14 33 35 48 23 19 29 32 27 1665 49 38 34 46 20 15 4 21 39 36 37 17 30 31 d=2 47 ● ● ● ● ● ● ● 7 5●● ● ● ● ● ● ● ● ● ● ● ● ● 1 4 9 ● ● ● 11 ● 7 ● 10 ● ● 12 2 ●● 3 8 ● ● ● ● ● ● ● ● ● ● 6 ● ● ● ● 22 ● ● d=2 Pression 1 4 ● ● ● 11 7 ● ● 5 ● ● 6 Var.Pression ● ● ● 10 2 ● ● 9 Humidite 12 8 3 T.soir T.max Precip.Tot Nebu.Moy Precip.Nuit T.min Vent Nebu.Nuit Il y a beaucoup de redondance dans les mesures. La carte des lignes supporte un premier type d’information supplémentaire. Chaque journée est suivie d’une nuit pendant laquelle fonctionne un piège lumineux destiné à la capture des trichoptères émergeant de la rivière. Ce piège est récolté régulièrement mais non chaque jour (week-end ?). On a gardé le résultat obtenu pour une nuit unique de piégeage et le facteur ’cla’ donne les groupes de nuits consécutives. Ces groupes sont représentés sur la carte factorielle : il s’agit simplement d’information complémentaire. Sur les trajectoires, on voit qu’il faut lire une sorte de mouvement circulaire. On note la succession haute pression (beau temps) puis fortes températures puis précipitations (orages d’été) caractéristiques du temps estival de la région. Le tableau ’fau’ associé contient les abondances d’animaux capturés dans le piège triés par espèce. La question porte sur l’influence des variables météorologiques sur l’abondance des piégeages lumineux. Le tableau faunistique a 17 espèces (variables). Les variables faunistiques sont supplémentaires : tabsup <- scalewt(log(trichometeo$fau + 1)) w <- supcol(meteo.pca, tabsup) s.corcircle(meteo.pca$co) s.corcircle(w$cosup, add.p = T, clab = 1.5) Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 45/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel Pression Hys T.soir Var.Pression Che Hve All Ath Glo Hfo Hsp Hyc Aga Han Cea Hym Set Psy Sta Ced Humidite Precip.Tot Nebu.Moy Precip.Nuit T.max Vent T.min Nebu.Nuit Les projections des variables supplémentaires normalisées (vecteurs de norme 1) donnent des coordonnées qui sont des coefficients de corrélation avec les coordonnées factorielles. Ces corrélation sont faibles mais de même signe. Les nouvelles variables ont été projetées sur les deux composantes principales. Il s’agit d’une projection de variables supplémentaires au sens euclidien. Mais on peut considérer également que chaque espèce définit de l’information supplémentaire pour le plan des individus : s.distri(meteo.pca$li, log(trichometeo$fau + 1), csta = 0.3, cell = 0, clab = 1) d=2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● Che All Hve ● Ath PsyHym ● Sta Aga Glo Ced Hsp ● ● Han ● ● Hfo ● ● Set ● Hyc ● ● ● ● ● ● ● Cea ● Hys ● ● ● ● ● ● ● ● On a ainsi superposé les moyennes des positions des espèces. On pourra aussi représenter l’abondance des espèces sur le plan. Ici domine l’idée d’une combinaison de variables météorologiques ayant une influence commune sur les émergences de tous les taxons. Notons enfin qu’il arrive que de véritables projections euclidiennes soient également des représentations par moyennes de distribution et que les notions d’individus supplémentaires et d’information supplémentaire se confondent. Quoiqu’il en soit le graphique appliqué à la statistique multidimensionnelle est un moyen d’expression. Cela suppose quelques libertés dans les choix et la référence à un comportement “conforme à la règle” peut être le signe d’une certaine absence d’imagination. Ce n’est évidemment pas une raison pour faire n’importe quoi. En conclusion, l’analyse en composantes principales est une procédure simple utilisable de multiples façons. Elle donne à voir dans Rp . Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 46/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf A.B. Dufour & D. Chessel Références [1] G. Carrel. Caractérisation physico-chimique du Haut-Rhône français et de ses annexes : incidences sur la croissance des populations d’alevins. PhD thesis, Université Claude Bernard, Lyon 1, 1986. [2] J. Devillers, J. Thioulouse, and W. Karcher. Chemometrical evaluation of multispecies-multichemical data by means of graphical techniques combined with multivariate analyses. Ecotoxicology and Environnemental Safety, 26 :333–345, 1993. [3] J. Estève. Les méthodes d’ordination : éléments pour une discussion. In J.M. Legay and R. Tomassone, editors, Biométrie et Ecologie, pages 223–250. Société Française de Biométrie, Paris, 1978. [4] O. Gaschignard-Fossati. Répartition spatiale des macroinvertébrés benthiques d’un bras vif du Rhône. Rôle des crues et dynamique saisonnière. PhD thesis, Université Claude Bernard, Lyon 1, 1986. [5] D.W. Goodall. Objective methods for the classification of vegetation iii. an essay in the use of factor analysis. Australian Journal of Botany, 2 :304–324, 1954. [6] D.J. Hand, F. Daly, A.D. Lunn, K.J. McConway, and E. Ostrowski. A handbook of small data sets. Chapman & Hall, London, 1994. [7] P. Jolicœur and J.E. Mosimann. Size and shape variation in the painted turtle. a principal component analysis. Growth, 24 :339–354, 1960. [8] J. Kervella. Analyse de l’attrait d’un produit : exemple d’une comparaison de lots de pêches. In 2émes journées européennes Agro-Industrie et Méthodes Statistiques, pages 103–106. Association pour la Statistique et ses Utilisations, Paris, Nantes 13-14 juin 1991, 1991. [9] J.M. Lascaux. Analyse de la variabilité morphologique de la truite commune (Salmo trutta L.) dans les cours d’eau du bassin pyrénéen méditerranéen. PhD thesis, INP Toulouse, 1996. [10] R. Prodon and J.D. Lebreton. Breeding avifauna of a mediterranean succession : the holm oak and cork oak series in the eastern pyrénées. 1 : Analysis and modelling of the structure gradient. Oı̈kos, 37 :21–38, 1981. [11] P. Usseglio-Polatera and Y. Auda. Influence des facteurs météorologiques sur les résultats de piégeage lumineux. Annales de Limnologie, 23 :65–79, 1987. Logiciel R version 2.11.1 (2010-05-31) – tdr61.rnw – Page 47/47 – Compilé le 2010-11-24 Maintenance : S. Penel, URL : http://pbil.univ-lyon1.fr/R/pdf/tdr61.pdf