استراتژی گروهی برای معاملات الگوریتمی با یادگیری تقویتی عمیق

امیری, میثم; پیمانی, مسلم; بغوزیان, همو

doi:10.22059/frj.2025.378736.1007620

استراتژی گروهی برای معاملات الگوریتمی با یادگیری تقویتی عمیق

نوع مقاله : مقاله علمی پژوهشی

نویسندگان

¹ استادیار، گروه مالی و بانکداری، دانشکدۀ مدیریت و حسابداری، دانشگاه علامه طباطبائی، تهران، ایران.

² دانشیار، گروه مالی و بانکداری، دانشکده مدیریت و حسابداری، دانشگاه علامه طباطبائی، تهران، ایران.

³ کارشناس ارشد، گروه مالی و بانکداری، دانشکده مدیریت و حسابداری، دانشگاه علامه طباطبائی، تهران، ایران.

10.22059/frj.2025.378736.1007620

چکیده

هدف: استراتژی‌های معاملاتی در شرکت‌های سرمایه‌گذاری نقشی مهم دارند؛ با این حال، طراحی یک استراتژی سودآور در بازار سهامی که پیچیدگی و پویایی خاص خود را دارد، چالش‌برانگیز است. با افزایش دسترسی به داده‌ها و توان بیشتر محاسباتی، مدل‌های مبتنی بر عامل، برای درک اقتصاد و بازارهای مالی اهمیت بیشتری پیدا کرده‌اند. بازار بورس اوراق بهادار تهران، به‌دلیل نوسان‌های شدید، تغییرات قوانین نظارتی و تغییرات ناگهانی اقتصادی، اغلب به انطباق سریع نیاز دارد. انتخاب اجرای استراتژی گروهی، متشکل از عامل‌های یادگیری تقویتی عمیق، از چالش‌ها و فرصت‌های منحصربه‌فرد بورس اوراق بهادار تهران نشئت می‌گیرد. برخلاف مدل‌های یادگیری نظارت‌شده سنتی که پیش‌بینی‌ها را فقط بر اساس داده‌های تاریخی انجام می‌دهند، مدل‌های مبتنی بر عامل، یک رویکرد تطبیقی ارائه می‌کنند که می‌تواند بی‌درنگ به تغییرات بازار پاسخ دهد. یکی دیگر از دلایل انتخاب این استراتژی، ظرفیت آن برای اجرای عملیات پیچیدۀ مدیریت پرتفوی است. با ترکیب چندین عامل یادگیری تقویتی عمیق که هر یک نقاط قوت متمایزی دارند، رویکرد گروهی می‌تواند از استراتژی‌های متنوعی برای بهینه‌سازی معاملات، مدیریت ریسک و بهبود تصمیم‌گیری در شرایط مختلف بازار استفاده کند. بنابراین، هدف از این پژوهش، پیشنهاد استراتژی گروهی برای معاملات الگوریتمی و استفاده از الگوریتم‌های یادگیری تقویتی عمیق برای معاملات سهام، به‌منظور به حداکثر رساندن بازده و کمینه‌کردن ریسک سرمایه‌گذاری است.
روش: در این پژوهش، با مدل‌‌سازی بازار سهام و استفاده از آموزش پنج الگوریتم یادگیری تقویتی عمیق، یعنی Advantage Actor Critic (A2C)، Deep Deterministic Policy Gradient (DDPG)، Proximal Policy Optimization (PPO)، Soft Actor-Critic (SAC) و Twin-Delayed Deep Deterministic (TD3)، یک استراتژی معاملاتی گروهی پیاده‌سازی می‌شود. این استراتژی، بهترین ویژگی‌های پنج الگوریتم را به ارث می‌برد و ادغام می‌کند؛ در نتیجه با موقعیت‌های مختلف بازار سازگاری بیشتری دارد. برای آموزش و آزمایش الگوریتم‌ها، از سهام موجود در شاخص قیمتی ۵۰ شرکت بورس اوراق بهادار تهران استفاده شده است. در آخر، نتایج حاصل از معامله با استراتژی معاملاتی طراحی‌شده با الگوریتم‌های یادگیری تقویتی عمیق به‌صورت مجزا، شاخص قیمتی ۵۰ شرکت بورس اوراق بهادار تهران و استراتژی تخصیص پرتفوی حداقل واریانس مقایسه و به بحث گذاشته می‌شود.
یافته‌ها: با انجام معامله با استفاده از مدل‌های مختلف از تاریخ ۸ تیر ماه ۱۴۰۱ تا ۳۰ دی ماه ۱۴۰۲، استراتژی گروهی طراحی‌شده با بازده سالانه 13/47 درصد، بازده تجمعی 47/78 درصد، بازده تعدیل‌شده با ریسک 56/1 و حداکثر افت سرمایه 49/18 درصد، از الگوریتم‌های یادگیری تقویتی عمیق، شاخص قیمتی ۵۰ شرکت بورس اوراق بهادار تهران و استراتژی تخصیص پرتفوی حداقل واریانس از لحاظ بازدهی و مدیریت ریسک، عملکرد بهتری را از خود نشان داد. از بین الگوریتم‌های یادگیری تقویتی عمیق، SAC با بازده سالانه و بازده تجمعی 89/29 و ۸۹/۴۷ درصد از لحاظ بازدهی، بهترین عملکرد را داشت؛ اما نوسان‌های سالانه 22/44 درصدی آن موجب شد تا از لحاظ مدیریت ریسک عملکرد مطلوبی نداشته باشد. در مقابل، TD3 با بازده تعدیل‌شده با ریسک 92/0 از لحاظ بازدهی و مدیریت ریسک بهترین عملکرد را بین الگوریتم‌های یادگیری تقویتی عمیق داشت. بنابراین، یافته‌ها نشان می‌دهد که استراتژی گروهی می‌تواند به‌طور مؤثر یک استراتژی معاملاتی ایجاد کند که عملکردی بهتر از الگوریتم‌های یادگیری تقویتی عمیق و شاخص قیمت ۵۰ شرکت بورس اوراق بهادار تهران و استراتژی تخصیص پرتفوی حداقل واریانس از خود نشان دهد.
نتیجه‌گیری: با توجه به اینکه استراتژی گروهی بهترین ویژگی‌های هر یک از الگوریتم‌های یادگیری تقویتی عمیق را ترکیب می‌کند و به‌صورت پویا به مدیریت پرتفوی سهام می‌پردازد، می‌توان از آن به‌عنوان یک استراتژی معاملاتی قابل اتکا برای کسب بازدهی بیشتر و مدیریت ریسک سرمایه‌گذاری استفاده کرد. در پژوهش‌های آتی به‌منظور بهبود عملکرد این الگوریتم می‌توان متغیرهای بنیادی و اقتصاد کلان را نیز برای یادگیری عامل معامله‌گر به کار گرفت. همچنین در نظر گرفتن محدودیت‌های قانونی و نظارتی در مدل‌سازی بازار سهام و پیاده‌سازی عامل‌های دیگری به‌جز سرمایه‌گذاران، می‌تواند مدل را به واقعیت نزدیک‌تر کند و عملکرد آن را بهبود بخشد.

کلیدواژه‌ها

موضوعات

20. مدیریت فعال پرتفوی؛ استراتژی‌های معاملاتی؛ معاملات الگوریتمی

عنوان مقاله [English]

Ensemble Strategy for Algorithmic Trading Using Deep Reinforcement Learning

نویسندگان [English]

Meysam Amiri ¹
Moslem Peymany ²
Hemo Boghosian ³

¹ Assistant Prof., Department of Finance and Banking, Faculty of Management and Accounting, Allameh Tabataba'i University, Tehran, Iran.

² Associate Prof., Department of Finance and Banking, Faculty of Management and Accounting, Allameh Tabataba'i University, Tehran, Iran.

³ MSc., Department of Finance and Banking, Faculty of Management and Accounting, Allameh Tabataba'i University, Tehran, Iran.

چکیده [English]

Objective
Trading strategies are crucial in investment companies as they guide decision-making processes and optimize returns. However, designing a profitable strategy within the complex and dynamic stock market environment poses significant challenges. The intricacies of market behavior and the multitude of influencing factors necessitate advanced modelling techniques. The growing availability of extensive data sets and increased computational power have facilitated the use of agent-based models, which have become essential tools for understanding economic and financial systems. The Tehran Stock Exchange often requires rapid adaptation due to severe volatility, regulatory changes, and sudden economic shifts. The choice to implement an ensemble strategy, consisting of deep reinforcement learning agents, arises from the unique challenges and opportunities of the Tehran Stock Exchange. Unlike traditional supervised learning models that make predictions solely based on historical data, agent-based models offer an adaptive approach that can respond to market changes in real-time. Another reason for selecting this strategy is its capacity to perform complex portfolio management operations. Combining multiple deep reinforcement learning agents, each with distinct strengths, the ensemble approach can leverage diverse strategies to optimize trades, manage risk, and enhance decision-making across different market conditions. Therefore, this research proposes an Ensemble strategy for algorithmic trading, leveraging deep reinforcement learning to optimize stock trading strategies that maximize returns while minimizing investment risk.

Methods
This study implements an ensemble trading strategy by modelling the stock market and employing five distinct deep reinforcement learning algorithms. This ensemble strategy synthesizes each algorithm's strengths and best features, making it adaptable to various market conditions. To achieve this, Data from stocks listed in the price index of the top 50 companies on the Tehran Stock Exchange are utilized to train and test these algorithms. The performance of the trading agent, using different reinforcement learning algorithms, is subsequently evaluated and compared against the benchmark index and a traditional minimum-variance portfolio allocation strategy. The comparative analysis helps thoroughly assess the effectiveness of the ensemble approach in real-world trading scenarios.

Results
From June 29, 2022, to January 20, 2024, the research implemented various trading models to gauge their performance. The ensemble strategy demonstrated a significant annual return of 47.13%, a cumulative return of 78.47%, and a risk-adjusted return of 1.56. These results indicate a superior performance over individual deep reinforcement learning algorithms, the benchmark price index of the 50 Tehran Stock Exchange companies, and the traditional minimum-variance portfolio allocation strategy. Among the individual algorithms, the Soft Actor-Critic (SAC) algorithm recorded the highest returns, with an annual return of 29.89% and a cumulative return of 47.89%. However, its higher annual volatility of 44.22% suggested weaker risk management. Conversely, the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm achieved a more balanced outcome with a risk-adjusted return of 0.92, highlighting its effective risk management alongside respectable returns. Therefore, the findings indicate that the ensemble strategy can effectively create a trading strategy that outperforms deep reinforcement learning algorithms, the price index of the top 50 companies on the Tehran Stock Exchange, and the minimum variance portfolio allocation strategy.

Conclusion
The Ensemble strategy offers a robust and adaptive framework for dynamic stock portfolio management by combining the strengths of multiple deep reinforcement learning algorithms. It is a reliable trading strategy that enhances returns and effectively manages investment risks. Future improvements to this strategy also involve further integrating fundamental and macroeconomic indicators to refine its predictive accuracy. Additionally, incorporating legal and regulatory constraints into the stock market modeling process, as well as considering market participants beyond investors, could improve the realism and performance of the model. This holistic approach would provide a more comprehensive understanding of market dynamics, potentially leading to more stable and robust trading outcomes.

کلیدواژه‌ها [English]

Algorithmic trading
Agent-based modeling
Deep reinforcement learning
Ensemble strategy

مراجع

حیدری، مهدی و امیری، حمیدرضا (۱۴۰۱). بررسی قدرت مدل‌های مبتنی بر هوش مصنوعی در پیش‌‌بینی روند قیمت سهام بورس اوراق بهادار تهران. تحقیقات مالی، ۲۴(۴)، 602- ۶۲۳.

محبی، سمیه؛ فدائی نژاد، محمد اسماعیل؛ اصولیان، محمد و حمیدیزاده، محمدرضا (1401). انتخاب ویژگی‌های مناسـب بـرای مـدل پیش‌بینی شاخص بورس اوراق بهادار تهران بر مبنای تکنیک کاهش ابعاد. تحقیقات مالی، 24(4)، 577- ۶۰۱.

نوراحمدی، محمدجواد و نوراحمدی، مرضیه (۱۴۰۲). کاربرد فیلتر کالمن برای تخمین نسبت پوشش ریسک پویا در استراتژی معاملات زوجی (مطالعه موردی: صنعت خودرو). تحقیقات مالی، ۲۵(۱)، ۸۷-۶۳.

نوراحمدی، مرضیه؛ رحیمی، علی و صادقی، حجت الله (۱۴۰۳). طراحی سیستم توصیه‌کننده سهام مبتنی بر الگوریتم فیلترینگ مشارکتی برای بورس اوراق بهادار تهران، تحقیقات مالی، ۲۶(۲)، ۳۳۰-۳۰۲.

References

Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J. & Zaremba, W. (2016). OpenAI Gym. arXiv:arXiv:1606.01540

Busoniu, L., de Bruin, T., Tolić, D., Kober, J. & Palunko, I. (2018). Reinforcement learning for control: Performance, stability, and deep approximators. Annual Reviews in Control.

Chen, L. & Gao, Q. (2019). Application of deep reinforcement learning on automated stock trading. 2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS), 29-33.

Chong, T., Ng, W.-K. & Liew, V. (2014). Revisiting the performance of MACD and RSI oscillators. Journal of Risk and Financial Management, 1-12.

Craig A., E. & Parbery, S. A. (2005). Is smarter better? A comparison of adaptive, and simple moving average trading strategies. Research in International Business and Finance, 399-411.

Deng, Y., Bao, F., Kong, Y., Ren, Z. & Dai, Q. (2016). Deep direct reinforcement learning for financial signal representation and trading. IEEE Transactions on Neural Networks and Learning Systems, 1-12.

Fischer, T. G. (2018). Reinforcement learning in financial markets - a survey. FAU Discussion Papers in Economics.

Fujimoto, S., Hoof, H. & Meger, D. (2018). Addressing function approximation error in actor-critic methods. International conference on machine learning, 1587-1596.

Gurrib, I. (2018). Performance of the average directional index as a market timing tool for the most actively traded USD based currency pairs. Banks and Bank Systems, 58-70.

Haarnoja, T., Zhou, A., Abbeel, P. & Levine, S. (2018). Soft actor critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. International conference on machine learning, 1861-1870.

Heidari, M. & Amiri, H. (2022). Inspecting the Predictive Power of Artificial Intelligence Models in Predicting the Stock Price Trend in Tehran Stock Exchange. Financial Research Journal, 24(4), 602-623. (in Persian)

Hill, A., Raffin, A., Ernestus, M., Gleave, A., Kanervisto, A., Traore, R., … Wu, Y. (2018). Stable baselines. https://github.com/hill-a/stable-baselines.

Jeong, G. & Kim, H. (2019). Improving financial trading decisions using deep Q-learning: predicting the number of shares, action strategies, and transfer learning. Expert Systems with Applications, 117, 125- 138.

Jiang, Z. & Liang, J. (2017). Cryptocurrency portfolio management with deep reinforcement learning. In 2017 Intelligent systems conference (IntelliSys) (pp. 905-913). IEEE.

Konda, V. & Tsitsiklis, J. (2001). Actor-critic algorithms. Society for Industrial and Applied Mathematics. 12.

Kritzman, M. & Li, Y. (2010). Skulls, financial turbulence, and risk management. Financial Analysts Journal, 66(5), 30-41.

Lauguico, S., Concepcion II, R., Alejandrino, J., Macasaet, D., Tobias, R. R., Bandala, A. & Dadios, E. (2019). A fuzzy logic-based stock market trading algorithm using bollinger bands. International conference on humanoid, nanotechnology, information technology, communication and control, environment, and management (HNICEM), 1-6.

Li, J., Rao, R. & Shi, J. (2018). Learning to Trade with Deep Actor Critic Methods. 11th International Symposium on Computational Intelligence and Design, 66-71.

Maitah, M., Procházka, P., Čermák, M. & Šrédl, K. (2016). Comodity Channel index: evaluation of trading rule of agricultural Commodities. International Journal of Economics and Financial, 176-178.

Markowitz, H. (1952). Portfolio selection. Journal of Finance, 77-91.

Mohebi, S., Fadaeinejad, M. E., Osoolian, M. & Hamidizadeh, M. R. (2022). Feature Selection for the Prediction Model of the Tehran Stock Exchange Index by Dimensionality Reduction Techniques. Financial Research Journal, 24(4), 577-601. (in Persian)

Neuneier, R. (1996). Optimal asset allocation using adaptive dynamic programming. Conference on Neural Information Processing Systems.

Neuneier, R. (1997). Enhancing Q-learning for optimal asset allocation. Coference on Neural Information Processing Systems.

Nourahmadi, M. J. & Nourahmadi, M. (2023). Application of Kalman Filter to Estimate Dynamic Hedge Ratio in Pairs Trading Strategy: A Case Study of the Automobile Industry. Financial Research Journal, 25(1), 63-87. (in Persian)

Nourahmadi, M., Rahimi, A. & Sadeqi, H. (2024). Designing a Stock Recommender System Using the Collaborative Filtering Algorithm for the Tehran Stock Exchange. Financial Research Journal, 26(2), 302-330. (in Persian)

Pacheco Aznar, D. (2023). Portfolio Management: A Deep Distributional RL Approach. SSRN.

Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv:1707.06347.

Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., ... & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484-489.

Sutton, R. & Barto, A. (1998). Reinforcement learning: an introduction. IEEE Transactions on Neural Networks, 1054.

Sutton, R., Mcallester, D., Singh, S. & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. Conference on Neural Information Processing Systems (NeurIPS).

Yang, H., Liu, X.-Y., Zhong, S. & Walid, A. (2020). Deep reinforcement learning for automated stock trading: An ensemble strategy. In Proceedings of the first ACM international conference on AI in finance, 1-8.

Yu, K. (2023). Quantitative Trading of Stocks Based on TD3 Algorithm. Highlights in Science, Engineering and Technology, 224-231.

Zhang, Y. & Yang, X. (2017). Online portfolio selection strategy based on combining experts’ advice. Computational Economics, 50(1), 141-159.

Zhang, Z., Zohren, S. & Roberts, S. (2019). Deep reinforcement learning for trading. arXiv preprint arXiv:1911.10107.

استراتژی گروهی برای معاملات الگوریتمی با یادگیری تقویتی عمیق

Ensemble Strategy for Algorithmic Trading Using Deep Reinforcement Learning

مراجع

دوره 28، شماره 2
1405
صفحه 349-372

فایل ها

هم رسانی

ارجاع به این مقاله

آمار

استراتژی گروهی برای معاملات الگوریتمی با یادگیری تقویتی عمیق

Ensemble Strategy for Algorithmic Trading Using Deep Reinforcement Learning

مراجع

دوره 28، شماره 2 1405صفحه 349-372

فایل ها

هم رسانی

ارجاع به این مقاله

آمار

دوره 28، شماره 2
1405
صفحه 349-372