نوع مقاله : مقاله علمی پژوهشی
نویسندگان
1 کارشناسی ارشد، گروه مهندسی کامپیوتر، دانشکده ریاضی و علوم کامپیوتر، دانشگاه صنعتی امیرکبیر (پلی تکنیک تهران)، تهران، ایران.
2 استادیار، گروه آموزشی علوم کامپیوتر، دانشکده ریاضی و علوم کامپیوتر، دانشگاه صنعتی امیرکبیر (پلی تکنیک تهران)، تهران، ایران.
چکیده
کلیدواژهها
موضوعات
عنوان مقاله [English]
نویسندگان [English]
Objective
In competitive markets, companies focus on establishing long-term relationships with their customers and strengthening their loyalty. Due to the high costs associated with acquiring new customers, businesses tend to focus on retaining existing ones. Predicting which customers are likely to churn in the future plays a crucial role in shaping effective customer retention strategies. To predict customer churn and identify its drivers, companies use customer information and historical data recorded from them. This paper investigates customer churn prediction in the banking industry using real customer data from one of the largest banks in Iran.
Methods
To analyze and predict customer behavior, two equal and consecutive periods were considered. Customer behavior in the first period was used to predict a target variable in the second period. A significant drop in the average effective balance during the second period, compared to the first, was defined as the indicator of customer churn. By processing a large volume of banking transactions in the first period and aggregating them at different levels, various behavioral features for customers. We selected. One fixed validation set and three training sets of different sizes were selected. To address the issue of dataset imbalance, class weights were determined based on the ratio of class sizes, ensuring that the minority class received greater weight during the training process. To predict customer churn, widely used machine learning algorithms—including Naive Bayes, k-Nearest Neighbors, Support Vector Machine, Logistic Regression, and Decision Tree—were applied, along with ensemble learning methods such as Random Forest, Adaptive Boosting, and Gradient Boosting. Subsequently, deep learning methods were applied, and a model incorporating modern modules such as residual connections and layer normalization—similar to state-of-the-art architectures—was proposed. Exhaustive experiments were conducted to evaluate the performance of the aforementioned methods.
Results
The results showed that ensemble learning algorithms and the proposed deep learning models outperformed the baseline models. Additionally, increasing the size of the training set contributed to improved model performance. Among the traditional machine learning classification algorithms, the decision tree trained on two training sets obtained the highest AUC ROC on the validation set with 0.8531 and 0.8597. The gradient boosting model obtained the overall highest AUC ROC on the validation set with 0.8984 and 0.9010. Deep learning-based single models achieved AUC-ROC values of 0.8825, 0.8909, and 0.8958, outperforming all traditional methods and two ensemble learning approaches while performing competitively with the gradient boosting algorithm.
Conclusion
Extracting behavioral features from customers' banking transactions and applying ensemble methods, along with the proposed deep learning-based models, proves effective in predicting banking customer churn, particularly in cases of a significant decrease in the average effective balance.
کلیدواژهها [English]