Link to the preprint version: http://dx.doi.org/10.13140/RG.2.2.18713.33127
Abstract
We investigate the predictive power of LLM-based sentiment analysis in the 2021 GameStop short squeeze, analyzing 665,275 Reddit posts from r/wallstreetbets and r/GME using Claude 3 Haiku. Logistic regression models were trained on sentiment data to forecast the next trading day's price movement direction using a rolling window approach. The model achieved 60% accuracy, exceeding prior social media-based stock prediction studies. However, trading strategies based on the model underperformed the buy-and-hold benchmark in returns (113.3% vs. 291.5%), despite showing better risk control with lower maximum drawdown (61.0% vs. 84.7%), highlighting the need for further optimization. These findings demonstrate LLM-driven text analysis can effectively capture retail investor sentiment, offering potential applications in trading strategies.
This idea was born shortly after the ChatGPT moment, and most of the research work was actually done quite early (which explains why I used Claude 3, a model that was SOTA then but is now way outdated... LLM development has been incredibly fast). It took quite a long time for me to finish this work due to various reasons, including the fact that this is my first paper, but finally here we are.
Despite the research focusing on the GameStop short squeeze in 2021—an extremely unique event that may never happen again (but who knows)—the journey and conclusions I gained from this study were fascinating and insightful. LLMs are definitely capable of analyzing user-generated content, which can then be used to build effective prediction models and trading strategies. I'm also very excited to see innovative projects like AI Hedge Funds applying LLMs and agents in the financial world.
One more thing - this is also my first time using LaTeX, which I discovered is actually very similar to frontend languages, and unsurprisingly, LLMs and coding agents can help a lot with writing it.