Abstract
Transcription
Abstract
Dpt de mathématique et HEC-Ecole de Gestion Séminaire de statistique et de recherche en gestion Mardi 25 avril 2006 à 12h30 à la Salle du Conseil du bâtiment B31 Aurélie Lemmens KULeuven Bagging and Boosting Classification Trees to Predict Churn. Insights from the US Telecom Industry. We bring to the attention of the marketers bagging and boosting, two recent classification techniques originating from the statistical machine learning literature. Bagging (Breiman, 1996) and stochastic gradient boosting (Friedman, 2002) consist in sequentially estimating a binary choice model from resampled versions of a given calibration sample. The obtained classifiers are ultimately aggregated into a final choice model. While bagging is conceptually very simple and easy-to-use, boosting is based on a more sophisticated weighted resampling procedure. Bagging and boosting can successfully be used for consumer choice modeling. The present study focuses on a binary choice problem, i.e. predicting customers’ churn behavior in an anonymous U.S.wireless telecom company. Using a crosssectional database of more than 250,000 customers provided by the Teradata Center for Customer Relationship Management at Duke University, bagging and boosting – based on decision trees – significantly outperform classical classification methods, such as the binary logit model, especially regarding the predictive accuracy criterion. If the telecom company starts using bagging and/or boosting, it could expect to see the gains (as computed in Neslin et al., 2004) of its future retention campaign to increase to an additional $3,000,000, compared to traditional models. We also show that churn prediction requires the use of a balanced sampling scheme. However, it usually overestimates the number of churners in real life, and therefore requires an appropriate bias correction. We discuss two easy correction methods for bagging and boosting, from which marketers may take profit to predict churn. (this is joint work with Christophe Croux, KULeuven). Références: Breiman, Leo. 1996. Bagging Predictors. Machine Learning. 26 123-140. Friedman, Jerome H. 2002. Stochastic Gradient Boosting. Comp. Stat and Data Anal. 38 367-378. Neslin, Scott A., Sunil Gupta, Wagner Kamakura, Junxiang Lu, and Charlotte Mason. 2004. Defection Detection: Improving Predictive Accuracy of Customer Churn Models. Working Paper Series Teradata Center for Customer Relationship Management at Duke University. Invitation cordiale à tous (des sandwichs seront offerts aux participants)