A hybrid approach to formulaic alpha discovery with large language model assistance

Shuo YU , Hong-Yan XUE , Xiang AO , Qing HE

Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (2) : 2002316

PDF (2313KB)
Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (2) : 2002316 DOI: 10.1007/s11704-025-41061-5
Artificial Intelligence
RESEARCH ARTICLE

A hybrid approach to formulaic alpha discovery with large language model assistance

Author information +
History +
PDF (2313KB)

Abstract

In the domain of quantitative trading, the imperative is to translate historical financial data into predictive signals, commonly referred to as alpha factors, which serves to anticipate future market trends. Notably, formulaic alphas that are expressible via explicit mathematical formulas are highly sought after by certain investors for better interpretability. The evolving landscape of technology has witnessed the increasing deployment of large language models (LLMs) across various domains, which raises the question of whether LLMs can be effective in the context of formulaic alpha-mining tasks. This paper presents several paradigms aimed at integrating LLMs into the optimization loop of alpha mining, including scenarios where an LLM serves as the sole alpha generator, as well as instances where LLMs enhance existing frameworks. Empirical evaluations on real-world stock data demonstrate significant performance improvements, with our hybrid method achieving an average information coefficient (IC) of 0.0515, a 75% improvement over the baseline — a state-of-the-art reinforcement learning-based framework; backtesting further reveals a cumulative excess return more than double the baseline framework. These results underscore the potential of LLM-enhanced approaches in advancing formulaic alpha discovery and driving innovation in quantitative trading.

Graphical abstract

Keywords

computational finance / stock trend forecasting / large language model

Cite this article

Download citation ▾
Shuo YU, Hong-Yan XUE, Xiang AO, Qing HE. A hybrid approach to formulaic alpha discovery with large language model assistance. Front. Comput. Sci., 2026, 20(2): 2002316 DOI:10.1007/s11704-025-41061-5

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Kakushadze Z . 101 Formulaic Alphas. Wilmott, 2016, 2016( 84): 72–81

[2]

Hochreiter S, Schmidhuber J . Long short-term memory. Neural Computation, 1997, 9( 8): 1735–1780

[3]

Zhang L, Aggarwal C, Qi G J. Stock price prediction via discovering multi-frequency trading patterns. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2017, 2141−2149

[4]

Zhao L, Kong S, Shen Y. DoubleAdapt: A meta-learning approach to incremental learning for stock trend forecasting. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2023, 3492−3503

[5]

Xu W, Liu W, Wang L, Xia Y, Bian J, Yin J, Liu T Y. HIST: a graph-based framework for stock trend forecasting via mining concept-oriented shared information. 2021, arXiv preprint arXiv: 2110.13716

[6]

Xu W, Liu W, Xu C, Bian J, Yin J, Liu T Y. REST: relational event-driven stock trend forecasting. In: Proceedings of the Web Conference 2021. 2021, 1−10

[7]

Qian E E, Hua R H, Sorensen E H. Quantitative Equity Portfolio Management: Modern Techniques and Applications. New York: Chapman and Hall/CRC, 2007

[8]

Rosenberg B, Marathe V. The prediction of investment risk: systematic and residual risk. Graduate School of Business, University of Chicago, 1975

[9]

Fama E F, French K R . Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 1993, 33( 1): 3–56

[10]

Lin X, Chen Y, Li Z, He K. Stock alpha mining based on genetic algorithm. Huatai Securities Research Center, 2019

[11]

Lin X, Chen Y, Li Z, He K. Revisiting stock alpha mining based on genetic algorithm. Huatai Securities Research Center, 2019

[12]

Cui C, Wang W, Zhang M, Chen G, Luo Z, Ooi B C. AlphaEvolve: a learning framework to discover novel alphas in quantitative investment. In: Proceedings of 2021 International Conference on Management of Data. 2021, 2208−2216

[13]

Yu S, Xue H, Ao X, Pan F, He J, Tu D, He Q. Generating synergistic formulaic alpha collections via reinforcement learning. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2023, 5476−5486

[14]

OpenAI. Introducing ChatGPT. See Openai. com/index/chatgpt/ website, 2022

[15]

Han J, Collier N, Buntine W, Shareghi E. PiVe: prompting with iterative verification improving graph-based generative capability of LLMs. In: Proceedings of Findings of the Association for Computational Linguistics: ACL 2024. 2024, 6702−6718

[16]

Wang H, Feng S, He T, Tan Z, Han X, Tsvetkov Y. Can language models solve graph problems in natural language? In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 30840−30861

[17]

Luo Y, Tang N, Li G, Chai C, Li W, Qin X. Synthesizing natural language to visualization (NL2VIS) benchmarks from NL2SQL benchmarks. In: Proceedings of 2021 International Conference on Management of Data. 2021, 1235−1247

[18]

Agrawal M, Hegselmann S, Lang H, Kim Y, Sontag D. Large language models are zero-shot clinical information extractors. 2022, arXiv preprint arXiv: 2205.12689

[19]

Liu X Y, Wang G, Zha D. FinGPT: democratizing internet-scale data for financial large language models. 2023, arXiv preprint arXiv: 2307.10485

[20]

Zhang B, Yang H, Zhou T, Ali Babar M, Liu X Y. Enhancing financial sentiment analysis via retrieval augmented large language models. In: Proceedings of the 4th ACM International Conference on AI in Finance. 2023, 349−356

[21]

Saravia E. Prompt engineering guide. See Github. com/dair-ai/Prompt-Engineering-Guide website, 2022

[22]

OpenAI. Prompt engineering. See platform.openai.com/docs/guides/text website, 2023

[23]

Brown T B, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler D M, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D. Language models are few-shot learners. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 159

[24]

Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C L, Mishkin P, Zhang C, Agarwal S, Slama K, Ray A, Schulman J, Hilton J, Kelton F, Miller L, Simens M, Askell A, Welinder P, Christiano P, Leike J, Lowe R. Training language models to follow instructions with human feedback. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 2011

[25]

Touvron H, Lavril T, Izacard G, Martinet X, Lachaux M A, Lacroix T, Roziere B, Goyal N, Hambro E, Azhar F, Rodriguez A, Joulin A, Grave E, Lample G. LLaMA: open and efficient foundation language models. 2023, arXiv preprint arXiv: 2302.13971

[26]

Zeng A, Liu X, Du Z, Wang Z, Lai H, Ding M, Yang Z, Xu Y, Zheng W, Xia X, Tam W L, Ma Z, Xue Y, Zhai J, Chen W, Liu Z, Zhang P, Dong Y, Tang J. GLM-130B: an open bilingual pre-trained model. In: Proceedings of the 11th International Conference on Learning Representations. 2023

[27]

Stephens T. Genetic programming in python, with a scikit-learn inspired API: gplearn. See Github. com/trevorstephens/gplearn website, 2016

[28]

Wang S, Yuan H, Zhou L, Ni L M, Shum H Y, Guo J. Alpha-GPT: human-AI interactive alpha mining for quantitative investment. 2023, arXiv preprint arXiv: 2308.00016

[29]

Zhang T, Li Y, Jin Y, Li J. Autoalpha: an efficient hierarchical evolutionary algorithm for mining alpha factors in quantitative investment. 2020, arXiv preprint arXiv: 2002.08245

[30]

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6000−6010

[31]

Mashadihasanli T . Stock market price forecasting using the Arima model: an application to Istanbul, Turkiye. Journal of Economic Policy Researches, 2022, 9( 2): 439–454

[32]

Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016, 785−794

[33]

Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T Y. LightGBM: a highly efficient gradient boosting decision tree. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 3149−3157

RIGHTS & PERMISSIONS

The Author(s) 2025. This article is published with open access at link.springer.com and journal.hep.com.cn

AI Summary AI Mindmap
PDF (2313KB)

Supplementary files

Highlights

1133

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/