Des arbres de décision aux forêts aléatoires, état de l`art
Transcription
Des arbres de décision aux forêts aléatoires, état de l`art
Outline CART Extensions Aggregating classifiers Justifying and extending Des arbres de décision aux forêts aléatoires, état de l’art Badih Ghattas Université d’Aix Marseille [email protected] Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Outline ◮ ◮ Classification and Regression Trees Extensions: ◮ ◮ ◮ ◮ ◮ ◮ ◮ ◮ ◮ Oblique trees Multidimensional and functional outputs Time Series covariates Bayesian Approach Decision trees for clustering Decision trees for density estimation Decision trees and other methods : logistic regression, SVM Decision trees, distances, consensus Aggregating Classifiers ◮ ◮ ◮ ◮ Stacking Bagging, Boosting Random Forests Cforest and other competitors Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Maximal tree Pruning Simulations Instability In the context of supervised learning We wish to estimate f using the dataset at hand Dn = {(X1 , Y1 ), . . . , (Xn , Yn )} (Xi , Yi ) iid ∼ P(X , Y ) where P(X , Y ) is the joint distribution of (X , Y ). We must choose f within a class of functions, with unknown parameters. For example : y = f (x) = a0 + a1 x1 + a2 x2 + ap xp Dn =⇒ fn (X , Dn ) Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Maximal tree Pruning Simulations Instability The model Search for a partition of the space X and assign a value of Y to each class of the partition. In regression : q X E [Y |X = x] = cj 1Nj (x) j=1 b cj = X 1 Yi Card{i; xi ∈ Nj } i;xi ∈Nj In Classification : Y discrete having J levels b cj = The most frequent class in Nj (x) General framework : linear or convex combination of non linear functions Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Maximal tree Pruning Simulations Instability Example: Predicting Ozone concentration Wind < 6.6 300 | 49.731.2 Solar.R < 153 250 Wind < 4.35 200 26.8 100 Solar.R < 230 Wind < 10.6 12.4226.82 71.257.0 123.0 78.1 150 Wind < 8.9 Solar.R Solar.R < 79.5 123.2078.08 28.5 50 Solar.R < 232.5 28.48 71.2249.71 12.4 0 57.0031.25 5 Badih Ghattas 10 15 20 Des arbres de décision aux forêts aléatoires, état de l’art Wind Outline CART Extensions Aggregating classifiers Justifying and extending Maximal tree Pruning Simulations Instability 2 stages: Maximal Tree and Pruning All the observations are in the root node. Splitting rule: one variable and a threshold. How to do ? Use the deviance to measure the heterogeneity of a node: X R(t) = (yn − ȳ (t))2 xn ∈t Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Maximal tree Pruning Simulations Instability Optimal Splits: minimize the children’s deviance Minimize total new nodes Heterogeneity. Let s be a split of the form: x m < a, ∆R (s, t) = R(t) − (R(tL ) + R(tR )) ≥ 0 ∆R (s, t) = maxs∈Σ ∆R (s, t) In classification, R(t) = − X pj (t)log (pj (t)) j∈J where pj (t) prior probability for each class j in t. Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Maximal tree Pruning Simulations Instability Stopping Rule Split the root t into two children tL et tR Do the same recursively. Stop when at least one of the following conditions is satified: ◮ very few observations in a node, minsize ◮ ∆R (s, t) is lower than a fixed threshold, mindev The maximal tree: ◮ has low errors over learning sample ◮ is poor over test samples ◮ is too big, thus unreadable Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Maximal tree Pruning Simulations Instability Penalized deviance Tree’s deviance: R(T ) = 1 X R(t) N t∈T̃ Penalised deviance: Rα (T ) = 1 X R(t) + α|T̃ | N t∈T̃ For a subtree pruned at node t, X Rα (Tt ) = R(t) + α|T̃ | t∈T̃t and, Rα (t) = R(t) + α Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Maximal tree Pruning Simulations Instability Selecting the optimal tree It is based on the deviance estimate of each tree in the sequence. Suppose the data set S is randomly partioned S = S train ∪ S test One may train the sequence using S train and select the best tree estimating the deviance over S test R̂ test (T ) = 1 |S train | where R̂ test (t) = xt X X R̂ test (t) t∈T̃ (yt − ȳt )2 ∈S test and ȳt is the predicted value in t and nt is the number of training set observations falling in node t. An original cross validation procedure may also be used. Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Maximal tree Pruning Simulations Instability Simulation p=1 Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Maximal tree Pruning Simulations Instability Simulation p=2 Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Maximal tree Pruning Simulations Instability Advantages and Disavantages ◮ Working in high dimension ◮ Variables of different natures ◮ Regression - Classification ◮ Model easy to interpret ◮ Interactions between variables used ◮ Dealing with missing data ◮ Variables importance ◮ Many extensions possible Disadvantage: Instability Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Maximal tree Pruning Simulations Instability Instability Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Oblique trees Muldimensional or/and functional output Bayesian Approach Other usage Type of extensions Extension Specific form for the partition Arbitrary form for the partition Regression within leaves Bootstrap within nodes p Xj ∈ R j Xj ∈ L2 (R) q Y ∈R Y ∈ {0, 1}q Y ∈ L2 (R) Clustering Density estimation modification Class of splits used output estimation + criterion – Class of splits used Class of splits Criterion modification, L2 Criterion modification Criterion modification, KL Criterion modification Specific Badih Ghattas Authors Breiman et al, Murthy et al Morimoto, 2001 Chaudhuri et al. (1994, 1995), TSR, FACT Danneger et al. 1999 Coming soon Yamada et al 2003, Roche & Ghattas 2009 Yu, 1999, Nerini & Ghattas 2007 Zhang, 1999 Segal 1992, Nerini & Ghattas 2007 Fraiman et. al., 2012 J. Klemla, Software rpart, OC1 NA rpart NA rpart-variant R+C R code, rpartmv NA R+C R package delt package Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Oblique trees Muldimensional or/and functional output Bayesian Approach Other usage OC1 (Murthy et al., 1994) Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Oblique trees Muldimensional or/and functional output Bayesian Approach Other usage Difficulties Find oblique splits for each level in the form : p X am xm + ap+1 ≤ s m=1 ◮ NP hard ◮ 2 solutions: Deterministic (Breiman et. al., 1984) and stochastic (Murthy et. al., 1994). ◮ Advantages : less complex trees sometimes with better generalization error ◮ Disadvantage : Interpretation of the splits. Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Oblique trees Muldimensional or/and functional output Bayesian Approach Other usage Oblique Regression Trees Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Oblique trees Muldimensional or/and functional output Bayesian Approach Other usage Multidimensional or functional output ◮ Predict a vector and/or a functional. Y ∈ R d , or Y ∈ L2 (R) ◮ Predict the daily ozone profile ◮ Predict the size distribution of zooplanktons (indicator of changes in climate) ◮ Predict the profiles of sea salinity The regression function has the form: f (x) = E [Y |X = x] = q X fj I (X ∈ Nj ) j=1 Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Oblique trees Muldimensional or/and functional output Bayesian Approach Other usage Predicting Salinity profiles Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Oblique trees Muldimensional or/and functional output Bayesian Approach Other usage Modeling zooplankton sizes’ densities Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Oblique trees Muldimensional or/and functional output Bayesian Approach Other usage How is it done Main difficulty : generalize the univariate criterion: X R(t) = (yn − ȳ (t))2 xn ∈t Natural idea R(t) = X ||yn − ȳ (t)| |2 xn ∈t Where we must define the norm constrained to the property: ∆R (s, t) = R(t) − (R(tL ) − R(tR )) ≥ 0 Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Oblique trees Muldimensional or/and functional output Bayesian Approach Other usage Multivariate and functional output ◮ Multivariate case ◮ ◮ ◮ When Y is a vector Y ∈ R d , if the d components are independent, we can use the Euclidian norm. If not, we transform the data Y by projection onto an orthogonal basis where the Euclidian norm may be used. Functional case: we can use f-divergences ◮ ◮ ◮ Kullback Leibler Chi2 distance Hellinger distance K (yi , yj ) = 2 H (yi , yj ) = Badih Ghattas Z yj (t)ln Z p yi (t) − yi (t) yj (t) p dt yi (t) 2 dt Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Oblique trees Muldimensional or/and functional output Bayesian Approach Other usage Simulated example Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Oblique trees Muldimensional or/and functional output Bayesian Approach Other usage Estimated tree Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Oblique trees Muldimensional or/and functional output Bayesian Approach Other usage Bayesian Approach ◮ Principle : look for the maximum a posteriori (MAP) tree within the space of trees. ◮ How : A stochastic search algorithm may be used defining random moves within trees space. ◮ See Denison, George... Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Oblique trees Muldimensional or/and functional output Bayesian Approach Other usage ... ◮ Within survival models (Intrator, 1991) ◮ Within mixture models, ZIP models (HU W., 2010) ◮ Combined with Kernel models (Torgo, 2000) ◮ ... ◮ ... Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Stacking Bagging and Boosting Bagging random Forests Other approaches Why ◮ Instabilty ? ◮ Multiple models ? ◮ Boost ? Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Stacking Bagging and Boosting Bagging random Forests Other approaches Stacked Regression, 1995 ◮ ◮ Construct K different models b fk , k = 1..K over the data set at hand E = {(yi , xi ), i = 1..n} Define the stacked model as b f (K ) (x) = K X k=1 βk b fk (xi ) where βk are the linear ridge coefficients resulting from the regression of yi over b fk (xi ). Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Stacking Bagging and Boosting Bagging random Forests Other approaches Bagging, Boosting, ... ◮ Freund : Weak Learner ⇒ Strong learner (vote within several ”learners”), ”Boosting” (1995) ◮ Breiman : Unstable ”Classifier” ⇒ Stable (by bootstrap aggregation) ”Bagging” (1996), ”Arcing” (1999) Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Stacking Bagging and Boosting Bagging random Forests Other approaches Aggregation In Regression, f (a) (x) = K 1 Xb fk (x) K k=1 In Classification, f (a) (x) = Argmaxj K X k=1 Badih Ghattas 1bf (x)=j k Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Stacking Bagging and Boosting Bagging random Forests Other approaches Boosting Y ∈ {0, 1} ǫk = n X dk (i)|b yk (i)−yi |, βk = i=1 yba (x) = 1, if 1 − ǫk |b y (i)−yi | , wk = log (βk ), dk+1 (i) = dk (i)βk k ǫk X wk ≥ k;b yk (x)=1 Badih Ghattas X wk k;b yk (x)=0 Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Stacking Bagging and Boosting Bagging random Forests Other approaches Example-Breast Cancer Figure: Learning and Test errors of the boosted classifier Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Stacking Bagging and Boosting Bagging random Forests Other approaches Datasets from ML Benchmark Simulated name waveform Ringnorm Variables 22 21 Observations 5000 7400 levels 3 2 Iono Glass Breast DNA Vowel Vtrl1 Vtrl3 35 10 10 61 11 41 41 351 214 683 (+16) 3190 990 622 622 2 6 2 3 11 2 2 Real Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Stacking Bagging and Boosting Bagging random Forests Other approaches Some remarks from (Breiman 99), FS(95,96,99) ◮ ◮ ◮ ◮ ◮ ◮ ◮ ◮ ◮ ◮ Bagging + CART, high gain in generalization. Boosting works due to adaptive re-sampling and not to the form of the algorithm Boosting has better results than Bagging in a majority of the tests No tuning in these methods Observations weights vary without convergence. This is essential. Instability is also essential : We cant improve ADL neither kNN by boosting. Experiences of F.S.(95), Drucker and Cortes (97), Quinlan (96) show that boosting trees give rise to a performant rapid classifier Re-sampling or modifying the weights are equivalent in boosting. In the cited experiences, for 5 out of 39 cases, CART outperformed boosting. This never happened with BAGGING. (Why ? Sample size, outliers ? ) Boosting CART, give very different trees. Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Stacking Bagging and Boosting Bagging random Forests Other approaches Random Forests ◮ Construct bootstrap samples of the data ◮ Leave the OOB sample aside ◮ For each node of the tree, select the optimal split searching over only log (p) variables among the p ones, selected randomly. ◮ Dont prune the tree ◮ Aggregate the trees like in bagging ◮ Random Features : random linear combination of the selected variables at each node Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Stacking Bagging and Boosting Bagging random Forests Other approaches RF Properties ◮ Each tree has a low bias (but high variance) ◮ Trees are not correlated ◮ The correlation is defined to be the one computed between trees predictions over OOB samples. ◮ Very high performences, ”Best of the chelf classifier” ◮ Computational complexity reduced ◮ Possible parallelization Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Stacking Bagging and Boosting Bagging random Forests Other approaches Variables importance in RF ◮ ◮ ◮ ◮ ◮ ◮ Set Ni = 0, Mi = 0 et Mij = 0, for i = 1..N et j = 1..p Ni = Number of times observation i appears in a OOB. Mi = Number of times observation i appears in a OOB and is misclassified Mij = Number of times observation i appears in a OOB and is misclassified after permutation of the values of variable j in the OOB. For variables j = 1, p, For each tree k = 1, K in the forest If observation i is in OOBk , Ni = Ni + 1 If observation i is in OOBk and misclassified, Mi = Mi + 1 Perturb randomly the values of variable j in OOBk . If observation i is in OOBk and is misclassified after permutation, Mij = Mij + 1 P (M −M ) Importance of variable j is = n1 i Zi (j) where Zi (j) = ijNi i . Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending Stacking Bagging and Boosting Bagging random Forests Other approaches ... ◮ Cforest ◮ ◮ ◮ Hothorn T., K. Hornik & A. Zeileis. 2006. Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of Computational and Graphical Statistics, 15:651-674. Hothorn T., K. Hornik & A. Zeileis. 2007. party: A Laboratory for Recursive Part(y)itioning party package. Extremely randomized trees, P. Geurts, 2003. Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending multiclass approaches Density estimation State of the art ◮ Bayesian justifications. ◮ Comparison with generalized additive models. ◮ Comparison to game theory. ◮ Comparison to SVM ◮ Boosting and regression (Drucker, Schapire .R, Friedman MART) ◮ Multi-class approaches ◮ Density estimation Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending multiclass approaches Density estimation Multi Class generalisations - Some principles ◮ ◮ ◮ ◮ Adaboost.M2, Schapire, 1996 - Divide, learn and agregate. Adaboost.OC, - Error correcting Codes subdivisions. Adaboost.ECC, Guruswami et. al. 1997 , - Error correcting Codes subdivisions. SAMME, Hastie 2007 - Direct generalization of Adaboost.M1. Different strategies when selecting the binary subproblems are used, one versus other or one versus rest. Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending multiclass approaches Density estimation Hastie 2007 Y ∈ {0, 1} ǫk = βk = Y ∈ {1, .., J} Pn i=1 dk (i)|yi 1−ǫk ǫk − ybk (i)| ǫk = Pn i=1 dk (i)1|yi 6=ybk (i)| k βk = (J − 1) 1−ǫ ǫk 1|yi 6=byk (i)| |y −b yk (i)| dk+1 (i) = dk (i)βk i dk+1 (i) = dk (i)βk wk = log (βk ) wk = log (βk ) yba (x) = 1Pk;y k (x)=1 wk ≥ P k;yk (x)=0 wk Badih Ghattas P yba (x) = Argmaxj { k;yk (x)=j wk } Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending multiclass approaches Density estimation ... Di Marzio, Rigollet & Tsybakov, Bourel & Ghattas, J. Klemla... ◮ ◮ ◮ ◮ ◮ Ridgeway, G. 2002. Looking for lumps: boosting and bagging for density estimation. Comput. stat. data anal., 38(4), 379392. Rigollet, P., & Tsybakov, A. B. 2007. Linear and convex aggregation of density estimators. Math. methods statist., 16(3), 260280. Rosset, S., & Segal, E. 2002. Boosting density estimation. Pages 641648 of: In advances in neural information processing systems 15. MIT Press. Smyth, P., & Wolpert, D. 1999. Linearly combining density estimators via stacking. Mach. learn., 36(1-2), 5983. Song, X., Yang, K., & Pavel, M. 2004. Density boosting for Gaussian mixtures. Neural information processing, 3316, 508515. Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending multiclass approaches Density estimation Call for submission: Special issue of Journal de la SFDS This special issue concerns Decision trees, their variants, their extensions, their applications and available software. ◮ ◮ ◮ ◮ ◮ ◮ ◮ ◮ Oblique trees, Trees and linear models, TSR, Trees and mixture models, Trees and zip models, Trees for multivariate and functional output (continuous/discrete), Trees for time series Other extensions, unsupervised cases : Trees for clustering, Trees for density estimation Different type of trees, and relation to other models : Bayesian trees, Diadique trees, Trees vs Logistic models, Trees vs SVM Agregating trees : Random Forests, Boosting, Bagging, Forest Garrote Call for Submission : January 2013. Submission : January 2013- April 2013. Reviewing : May 2013 - July 2013 Final submission and Reviewing : July 2013 - September 2013. Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending multiclass approaches Density estimation References 1 ◮ ◮ ◮ ◮ ◮ ◮ ◮ ◮ ◮ ◮ ◮ ◮ ◮ Breiman L., Friedman J.H., Olshen R., Stone C.J. (1984) Classification And Regression Trees , Wadsworth, Belmont CA. Chaudhuri P., Huang M.C., Loh W.Y. and Yao R., (1994). Piecewise-polynomial regression trees, Statistica Sinica 4 , 143-167. Danneger F. (1999), Tree stability diagnostics and some remedies against instability. Submitted for publication. Michael Leblanc Robert Tibshirani, Combining Estimates in Regression and Classification, Journal of the American Statistical Association December 1996, Vol. 91, N◦ .436 Theory and Methods p.1641-1650. Morimoto, Y., Hiromu, I., Morishita, S. Efficient Construction of Regression Trees with Range and Region Splitting. Machine Learning, 45, 235259, 2001. Segal M.R., (1992). Tree structured methods for longitudinal data. Journal of the American Statistical Association, Vol. 87, N.418 Theory and Methods p.407-418. Tibshirani, R (1996) Bias, Variance, and Prediction Error for Classification Rules, Technical Report, Statistics Department, University of Toronto. Vapnik, V.N. (1995) The nature of statistical learning theory. Springer, New York. S.K.Murthy ,S.Kasif, S.Salzberg. A system for induction of oblique decision trees, JAIR 1994. Trevor, H. Tibshirani, R. and Friedman, J. The elements of statistical learning. Data mining, inference and prediction. (English). Springer Series in Statistics. New York, NY: Springer. Yamada, Y., Suzuki, E., Yokoi, H., Takabayashi, K. (2003). Decision-tree Induction from Time-series Data Based on a Standard-example Split Test. Proceeding of the Twentieth International Conference on Machine Learning, ICML, Washington DC. Yu, Y., Lambert, D. (1999). Fitting trees to functional data: with an application to time-of-day patterns. J. Comput. Gragh. Statist. 8, pp, 749-762. Zhang, H., (1998). Classification Tree for Multiple Binary Responses. American Statistical Association, Vol. 93. Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art Outline CART Extensions Aggregating classifiers Justifying and extending multiclass approaches Density estimation References 2 ◮ ◮ ◮ ◮ ◮ ◮ ◮ ◮ Breiman L., Heuristic of instability and stabilization in model selection, The Annals of Statistics, Vol 24, N◦ 6,2350-2383 (1996). Breiman L., Bagging Predictors, Machine Learning, 24, 123-140 (1996). Breiman L., Arcing classifiers, The Annals of Statistics, Vol. 26 N ◦ 3, pp. 801-849 (1997). Dietterich, T.G. and Bakiri, G.(1995) Solving Multiclass Problems via Error-Correcting Output Coding Journal of Artificial Intelligence Research, 2, pp. 263-286. Freund, Y. and Schapire R. (1995) A decision-theoretic generalization of on-line learning and an application to boosting. to appear , Journal of Computer and System Sciences. Freund, Y. and Schapire R. (1996) Experiments with a new boosting algorithm, Machine Learning: Proceedings of the Thirteenth International Conference, pp. 148-156 Schapire R.E. (1997) Using output codes to boost multiclass learning problems. In Machine Iearning : Proceedings of the Fourteenth International Conference, pages 313-321. Schapire R., Freund Y., Bartlett P., Lee W.S. (1998) Boosting the Margin: A New Explanation of the Effectiveness of Voting Methods. The Annals of Statistics, Vol. 26, N ◦ 5, pp.1651-1686. Badih Ghattas Des arbres de décision aux forêts aléatoires, état de l’art