Abstract

Transcription

Abstract
Dpt de mathématique et HEC-Ecole de Gestion
Séminaire de statistique et de recherche en gestion
Mardi 25 avril 2006 à 12h30 à la Salle du Conseil du bâtiment B31
Aurélie Lemmens
KULeuven
Bagging and Boosting Classification Trees to Predict Churn.
Insights from the US Telecom Industry.
We bring to the attention of the marketers bagging and boosting, two recent classification techniques
originating from the statistical machine learning literature. Bagging (Breiman, 1996) and stochastic
gradient boosting (Friedman, 2002) consist in sequentially estimating a binary choice model from
resampled versions of a given calibration sample. The obtained classifiers are ultimately aggregated
into a final choice model. While bagging is conceptually very simple and easy-to-use, boosting is
based on a more sophisticated weighted resampling procedure. Bagging and boosting can successfully
be used for consumer choice modeling. The present study focuses on a binary choice problem, i.e.
predicting customers’ churn behavior in an anonymous U.S.wireless telecom company. Using a crosssectional database of more than 250,000 customers provided by the Teradata Center for Customer
Relationship Management at Duke University, bagging and boosting – based on decision trees –
significantly outperform classical classification methods, such as the binary logit model, especially
regarding the predictive accuracy criterion. If the telecom company starts using bagging and/or
boosting, it could expect to see the gains (as computed in Neslin et al., 2004) of its future retention
campaign to increase to an additional $3,000,000, compared to traditional models. We also show that
churn prediction requires the use of a balanced sampling scheme. However, it usually overestimates
the number of churners in real life, and therefore requires an appropriate bias correction. We discuss
two easy correction methods for bagging and boosting, from which marketers may take profit to
predict churn. (this is joint work with Christophe Croux, KULeuven).
Références:
Breiman, Leo. 1996. Bagging Predictors. Machine Learning. 26 123-140.
Friedman, Jerome H. 2002. Stochastic Gradient Boosting. Comp. Stat and Data Anal. 38 367-378.
Neslin, Scott A., Sunil Gupta, Wagner Kamakura, Junxiang Lu, and Charlotte Mason. 2004. Defection
Detection: Improving Predictive Accuracy of Customer Churn Models. Working Paper Series
Teradata Center for Customer Relationship Management at Duke University.
Invitation cordiale à tous (des sandwichs seront offerts aux participants)