Ensemble Strategy for Algorithmic Trading Using Deep Reinforcement Learning

Document Type : Research Paper

Authors

1 Assistant Prof., Department of Finance and Banking, Faculty of Management and Accounting, Allameh Tabataba'i University, Tehran, Iran.

2 Associate Prof., Department of Finance and Banking, Faculty of Management and Accounting, Allameh Tabataba'i University, Tehran, Iran.

3 MSc., Department of Finance and Banking, Faculty of Management and Accounting, Allameh Tabataba'i University, Tehran, Iran.

10.22059/frj.2025.378736.1007620

Abstract

Objective
Trading strategies are crucial in investment companies as they guide decision-making processes and optimize returns. However, designing a profitable strategy within the complex and dynamic stock market environment poses significant challenges. The intricacies of market behavior and the multitude of influencing factors necessitate advanced modelling techniques. The growing availability of extensive data sets and increased computational power have facilitated the use of agent-based models, which have become essential tools for understanding economic and financial systems. The Tehran Stock Exchange often requires rapid adaptation due to severe volatility, regulatory changes, and sudden economic shifts. The choice to implement an ensemble strategy, consisting of deep reinforcement learning agents, arises from the unique challenges and opportunities of the Tehran Stock Exchange. Unlike traditional supervised learning models that make predictions solely based on historical data, agent-based models offer an adaptive approach that can respond to market changes in real-time. Another reason for selecting this strategy is its capacity to perform complex portfolio management operations. Combining multiple deep reinforcement learning agents, each with distinct strengths, the ensemble approach can leverage diverse strategies to optimize trades, manage risk, and enhance decision-making across different market conditions. Therefore, this research proposes an Ensemble strategy for algorithmic trading, leveraging deep reinforcement learning to optimize stock trading strategies that maximize returns while minimizing investment risk.
 
Methods
This study implements an ensemble trading strategy by modelling the stock market and employing five distinct deep reinforcement learning algorithms. This ensemble strategy synthesizes each algorithm's strengths and best features, making it adaptable to various market conditions. To achieve this, Data from stocks listed in the price index of the top 50 companies on the Tehran Stock Exchange are utilized to train and test these algorithms. The performance of the trading agent, using different reinforcement learning algorithms, is subsequently evaluated and compared against the benchmark index and a traditional minimum-variance portfolio allocation strategy. The comparative analysis helps thoroughly assess the effectiveness of the ensemble approach in real-world trading scenarios.
 
Results
From June 29, 2022, to January 20, 2024, the research implemented various trading models to gauge their performance. The ensemble strategy demonstrated a significant annual return of 47.13%, a cumulative return of 78.47%, and a risk-adjusted return of 1.56. These results indicate a superior performance over individual deep reinforcement learning algorithms, the benchmark price index of the 50 Tehran Stock Exchange companies, and the traditional minimum-variance portfolio allocation strategy. Among the individual algorithms, the Soft Actor-Critic (SAC) algorithm recorded the highest returns, with an annual return of 29.89% and a cumulative return of 47.89%. However, its higher annual volatility of 44.22% suggested weaker risk management. Conversely, the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm achieved a more balanced outcome with a risk-adjusted return of 0.92, highlighting its effective risk management alongside respectable returns. Therefore, the findings indicate that the ensemble strategy can effectively create a trading strategy that outperforms deep reinforcement learning algorithms, the price index of the top 50 companies on the Tehran Stock Exchange, and the minimum variance portfolio allocation strategy.
 
Conclusion
The Ensemble strategy offers a robust and adaptive framework for dynamic stock portfolio management by combining the strengths of multiple deep reinforcement learning algorithms. It is a reliable trading strategy that enhances returns and effectively manages investment risks. Future improvements to this strategy also involve further integrating fundamental and macroeconomic indicators to refine its predictive accuracy. Additionally, incorporating legal and regulatory constraints into the stock market modeling process, as well as considering market participants beyond investors, could improve the realism and performance of the model. This holistic approach would provide a more comprehensive understanding of market dynamics, potentially leading to more stable and robust trading outcomes.

Keywords

Main Subjects


 
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J. & Zaremba, W. (2016). OpenAI Gym. arXiv:arXiv:1606.01540
Busoniu, L., de Bruin, T., Tolić, D., Kober, J. & Palunko, I. (2018). Reinforcement learning for control: Performance, stability, and deep approximators. Annual Reviews in Control.
Chen, L. & Gao, Q. (2019). Application of deep reinforcement learning on automated stock trading. 2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS), 29-33.
Chong, T., Ng, W.-K. & Liew, V. (2014). Revisiting the performance of MACD and RSI oscillators. Journal of Risk and Financial Management, 1-12.
Craig A., E. & Parbery, S. A. (2005). Is smarter better? A comparison of adaptive, and simple moving average trading strategies. Research in International Business and Finance, 399-411.
Deng, Y., Bao, F., Kong, Y., Ren, Z. & Dai, Q. (2016). Deep direct reinforcement learning for financial signal representation and trading. IEEE Transactions on Neural Networks and Learning Systems, 1-12.
Fischer, T. G. (2018). Reinforcement learning in financial markets - a survey. FAU Discussion Papers in Economics.
Fujimoto, S., Hoof, H. & Meger, D. (2018). Addressing function approximation error in actor-critic methods. International conference on machine learning, 1587-1596.
Gurrib, I. (2018). Performance of the average directional index as a market timing tool for the most actively traded USD based currency pairs. Banks and Bank Systems, 58-70.
Haarnoja, T., Zhou, A., Abbeel, P. & Levine, S. (2018). Soft actor critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. International conference on machine learning, 1861-1870.
Heidari, M. & Amiri, H. (2022). Inspecting the Predictive Power of Artificial Intelligence Models in Predicting the Stock Price Trend in Tehran Stock Exchange. Financial Research Journal, 24(4), 602-623. (in Persian)
Hill, A., Raffin, A., Ernestus, M., Gleave, A., Kanervisto, A., Traore, R., … Wu, Y. (2018). Stable baselines. https://github.com/hill-a/stable-baselines.
Jeong, G. & Kim, H. (2019). Improving financial trading decisions using deep Q-learning: predicting the number of shares, action strategies, and transfer learning. Expert Systems with Applications, 117, 125- 138.
Jiang, Z. & Liang, J. (2017). Cryptocurrency portfolio management with deep reinforcement learning. In 2017 Intelligent systems conference (IntelliSys) (pp. 905-913). IEEE.
 
Konda, V. & Tsitsiklis, J. (2001). Actor-critic algorithms. Society for Industrial and Applied Mathematics. 12.
Kritzman, M. & Li, Y. (2010). Skulls, financial turbulence, and risk management. Financial Analysts Journal, 66(5), 30-41.
Lauguico, S., Concepcion II, R., Alejandrino, J., Macasaet, D., Tobias, R. R., Bandala, A. & Dadios, E. (2019). A fuzzy logic-based stock market trading algorithm using bollinger bands. International conference on humanoid, nanotechnology, information technology, communication and control, environment, and management (HNICEM), 1-6.
Li, J., Rao, R. & Shi, J. (2018). Learning to Trade with Deep Actor Critic Methods. 11th International Symposium on Computational Intelligence and Design, 66-71.
Maitah, M., Procházka, P., Čermák, M. & Šrédl, K. (2016). Comodity Channel index: evaluation of trading rule of agricultural Commodities. International Journal of Economics and Financial, 176-178.
Markowitz, H. (1952). Portfolio selection. Journal of Finance, 77-91.
Mohebi, S., Fadaeinejad, M. E., Osoolian, M. & Hamidizadeh, M. R. (2022). Feature Selection for the Prediction Model of the Tehran Stock Exchange Index by Dimensionality Reduction Techniques. Financial Research Journal, 24(4), 577-601. (in Persian)
Neuneier, R. (1996). Optimal asset allocation using adaptive dynamic programming. Conference on Neural Information Processing Systems.
Neuneier, R. (1997). Enhancing Q-learning for optimal asset allocation. Coference on Neural Information Processing Systems.
Nourahmadi, M. J. & Nourahmadi, M. (2023). Application of Kalman Filter to Estimate Dynamic Hedge Ratio in Pairs Trading Strategy: A Case Study of the Automobile Industry. Financial Research Journal, 25(1), 63-87. (in Persian)
Nourahmadi, M., Rahimi, A. & Sadeqi, H. (2024). Designing a Stock Recommender System Using the Collaborative Filtering Algorithm for the Tehran Stock Exchange. Financial Research Journal, 26(2), 302-330. (in Persian)
Pacheco Aznar, D. (2023). Portfolio Management: A Deep Distributional RL Approach. SSRN.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv:1707.06347.
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., ... & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484-489.
Sutton, R. & Barto, A. (1998). Reinforcement learning: an introduction. IEEE Transactions on Neural Networks, 1054.
Sutton, R., Mcallester, D., Singh, S. & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. Conference on Neural Information Processing Systems (NeurIPS).
Yang, H., Liu, X.-Y., Zhong, S. & Walid, A. (2020). Deep reinforcement learning for automated stock trading: An ensemble strategy. In Proceedings of the first ACM international conference on AI in finance, 1-8.
Yu, K. (2023). Quantitative Trading of Stocks Based on TD3 Algorithm. Highlights in Science, Engineering and Technology, 224-231.
Zhang, Y. & Yang, X. (2017). Online portfolio selection strategy based on combining experts’ advice. Computational Economics, 50(1), 141-159.
Zhang, Z., Zohren, S. & Roberts, S. (2019). Deep reinforcement learning for trading. arXiv preprint arXiv:1911.10107.