نوع مقاله : مقاله علمی پژوهشی
نویسندگان
1 گروه بازارها و نهادهای مالی، دانشکده حسابداری و علوم مالی، دانشکدگان مدیریت، دانشگاه تهران، تهران، ایران
2 گروه مهندسی مالی، دانشکده حسابداری و علوم مالی، دانشکدگان مدیریت، دانشگاه تهران، تهران، ایران.
چکیده
کلیدواژهها
موضوعات
عنوان مقاله [English]
نویسندگان [English]
Objective
This study aims to forecast the performance of equity funds using supervised machine-learning algorithms. It seeks to identify the key drivers of fund performance and to propose an advanced forecasting framework that enables investors to pinpoint funds capable of generating positive alpha. In doing so, investors can make more informed capital-allocation decisions and achieve higher returns relative to passive or underperforming funds. Beyond benefiting investors, the approach can enhance overall market efficiency and the optimal allocation of capital.
Methods
From a research-purpose perspective, the study is both developmental and applied. We collected and cleaned data on 23 variables for 12 equity funds. Supervised learning models—linear (linear regression and elastic net) and tree-based (random forest and gradient boosting)—were implemented in Python. To maximize predictive accuracy and mitigate overfitting, hyperparameters for each algorithm were tuned via cross-validation. The dataset was split into training (80%) and test (20%) partitions. After fitting models with the optimized hyperparameters, out-of-sample performance was evaluated on the held-out test set using three accuracy metrics: mean squared error (MSE), root mean squared error (RMSE), and mean absolute error (MAE). For model interpretability and feature attribution, we employed Shapley Additive Explanations (SHAP). Relative predictive accuracy across algorithms was assessed using the Diebold–Mariano test.
Results
Tree-based models (gradient boosting and random forest) significantly outperformed linear models (linear regression and elastic net) on the evaluation metrics for 11 of the 12 funds. The value-added variable and the market return emerged as the most important return drivers across nearly all models and showed the strongest association with changes in alpha. Additionally, variables such as total net asset (TNA) and fund age exhibited pronounced importance in linear models, indicating that fund size and track record contribute to performance persistence. By contrast, nonlinear models placed greater emphasis on features like value-added and tended to down-weight many of the remaining variables.
Conclusion
The results underscore the strong capability of tree-based machine-learning models (gradient boosting and random forest) to analyze financial data and uncover latent patterns. Their validity, however, hinges on sound evaluation design—most notably cross-validation and careful hyperparameter tuning. As SHAP facilitated a comparative view of the determinants of fund performance in linear versus nonlinear settings, the observed superiority of gradient boosting and random forest in this study aligns with their capacity to capture nonlinearities and complex interactions. Although the magnitude and ranking of feature importance vary across models, several variables consistently play a pivotal role in explaining fund performance. A major contribution of this research is to offer methodologies that, in addition to individual investors, fund managers, financial advisors, and institutional investors—such as banks, insurance companies, and pension funds responsible for large pools of capital—to more accurately identify superior funds and optimize their portfolios using machine-learning algorithms. The proposed models may also intensify healthy competition between active and passive investment funds.
کلیدواژهها [English]