Why Financial Data Cannot Be Modeled with “Standard Machine Learning”-开发者社区

中文版：【模块1 建立认知1】为什么金融数据不能用“普通机器学习”？

Introduction: Why Backtests Look Great but Fail in Live Trading?

What is “Standard Data Science Thinking”?

The Nature of Financial Data: A Continuously Evolving System

Non-Stationarity: Constantly Changing Market Conditions

High Noise: Extremely Limited Predictable Information

Time Dependency: Order Itself Contains Information

Why Common Machine Learning Models Fail in Finance

The Right Direction: From Predicting Outcomes to Modeling Processes

Roadmap of This Series: What You Will Learn

Conclusion

Introduction: Why Backtests Look Great but Fail in Live Trading?

Many people follow a very similar path when they first enter quantitative trading. They start by collecting historical market data, then construct technical indicators as features, and use models such as Random Forest, XGBoost, or even deep learning to predict future price movements. After training, the model may achieve over 60% accuracy on the test set, and the backtest curve often looks impressive. It is easy to conclude that financial markets are predictable.

However, the real problem usually appears in live trading. Once the strategy is deployed in the real market, performance often deteriorates rapidly—returns decline or even turn into losses. This phenomenon, where a strategy performs well in backtests but fails in live trading, is almost the norm in financial modeling.

The root cause is not whether the model is advanced enough, but rather that the underlying modeling mindset itself is not suitable for financial data. In other words, most people are applying a “standard data science paradigm” to a fundamentally different problem.

Reading Notes and Terminology
1. Technical Indicators
Technical indicators are essentially transformations of raw price data into forms that are easier to interpret. For example, calculating moving averages, measuring momentum, or determining whether a price is overbought or oversold. They do not introduce new information; instead, they re-express existing price data mathematically to make decision-making easier for humans or models.
2. Backtesting
Backtesting refers to simulating how a trading strategy would have performed using historical data. You define a set of rules—when to buy, when to sell—and apply them to past data to see whether the strategy would have made or lost money. It is essentially a “virtual experiment” conducted on historical markets.
3. Backtest Curve
A backtest curve represents the evolution of your portfolio value during the backtest. The horizontal axis is time, and the vertical axis is capital. For example, your capital might go from 100,000 to 120,000 and then down to 90,000. Many people judge a strategy based on this curve—whether it rises steadily or experiences large drawdowns. However, it is important to remember that this curve reflects only historical simulation, not future performance.
4. Live Trading
Live trading means executing trades with real money in the actual market. Unlike backtesting, live trading is affected by real-world factors such as slippage, transaction costs, sudden market changes, and human emotions. As a result, many strategies that perform well in backtests deteriorate significantly in live trading.
5. Tree-Based Models
A tree-based model can be understood as a decision-making process that repeatedly asks questions. Its structure resembles an inverted tree, where the model splits data based on conditions such as “whether a feature exceeds a threshold,” and continues branching until it reaches a final decision (e.g., up or down). This process is similar to step-by-step reasoning. While tree models are intuitive and capable of capturing nonlinear relationships, they often memorize random fluctuations in financial data as rules, leading to good backtest performance but poor real-world results.
6. Random Forest
Random Forest can be seen as a collection of decision trees that vote together. Each tree makes a prediction based on its own rules, and the final output is determined by aggregating these predictions. While it is powerful in capturing complex patterns, it tends to learn noise in financial data, resulting in strong historical performance but weak future generalization.
7. XGBoost
XGBoost is an advanced tree-based model that builds a strong predictor by sequentially correcting errors from previous models. It performs exceptionally well in many machine learning competitions due to its strong fitting ability. However, in finance, this strength often leads to overfitting, where noise is mistaken for signal, resulting in strong backtests but failure in live trading.
8. Deep Learning Models
Deep learning models are composed of multiple layers of neural networks that can automatically extract features and make predictions. They are highly effective in domains like image and speech recognition. However, in financial markets, due to high noise and unstable structure, deep learning models often learn incidental historical patterns rather than stable relationships, making them not necessarily more effective than simpler models.

What is “Standard Data Science Thinking”?

In most machine learning applications, there exists a set of implicit assumptions that are rarely stated explicitly but form the foundation of the entire modeling framework.

First, data is assumed to come from a relatively stable system. This means patterns observed in historical data are expected to persist in the future. For example, the appearance of a cat does not fundamentally change over time in image recognition, and user behavior patterns in recommendation systems can be continuously learned and reused.

Under this assumption, samples are treated as independent observations. Data can be shuffled, and training and validation can be performed through random splits, because each sample is assumed to come from the same distribution regardless of time order. This is why cross-validation works effectively in many tasks.

Additionally, typical datasets tend to have a relatively high signal-to-noise ratio, meaning they contain clear and stable structures that models can learn, rather than being dominated by random fluctuations.

Overall, this mindset assumes that data comes from a stable environment with reusable patterns, and the role of machine learning is to extract these patterns for future prediction.

The problem is that financial markets do not satisfy these assumptions.

Reading Notes and Terminology
1. Cross-Validation
Cross-validation can be understood as repeatedly testing a model on different subsets of data to evaluate its reliability. The dataset is divided into multiple parts, and each part is used as a test set in turn while the rest is used for training. If the model performs consistently across different splits, it indicates that the model generalizes well rather than fitting only a specific dataset.
2. Generalization Ability
Generalization ability refers to how well a model performs on unseen data. A model that performs well only on training data but fails on new data has poor generalization. In contrast, a model that maintains similar performance on new data has captured underlying patterns and thus has strong generalization ability.
3. Signal-to-Noise Ratio
The signal-to-noise ratio describes the proportion of useful information relative to random fluctuations in the data. A high ratio means useful patterns dominate, making learning easier. A low ratio means randomness dominates, making models prone to learning noise instead of true structure. Financial data typically has a very low signal-to-noise ratio, which makes modeling inherently difficult.

The Nature of Financial Data: A Continuously Evolving System

Financial data is not a collection of static samples, but a process that evolves over time. Understanding this is fundamental to all subsequent modeling efforts.

Non-Stationarity: Constantly Changing Market Conditions

One of the most prominent characteristics of financial time series is non-stationarity. In simple terms, statistical properties such as mean, variance, and even the entire distribution change over time.

Markets go through different regimes. Periods of loose liquidity are often associated with upward trends, while tightening cycles may lead to higher volatility and more frequent drawdowns. In extreme situations such as financial crises, market behavior can change abruptly.

This leads to a critical issue: the “patterns” learned by a model are merely local features of a specific historical period. Once the market enters a new regime, these patterns may no longer hold or may even reverse.

This phenomenon is known as regime switching, highlighting that financial markets do not follow a single stable distribution but rather shift between multiple states.

Reading Notes and Terminology
1. Non-stationarity
Non-stationarity can be understood as: the “rules” in the data keep changing. In other words, how data behaved in the past—its patterns and fluctuations—does not guarantee it will behave the same way in the future. Sometimes the market keeps rising, sometimes it keeps falling; sometimes volatility is low, and sometimes it becomes highly turbulent. These states continuously shift over time. So financial data is not like exam questions with fixed answers—it is more like an environment that is constantly changing.
2. Loose Liquidity Conditions
Loose liquidity conditions can be understood as a situation where “there is a lot of money in the market and it is easy to access.” For example, when central banks lower interest rates or inject liquidity, borrowing costs decrease, and investors become more willing to invest in assets. In such periods, markets tend to rise more easily. You can think of it like a pool filled with water—the more water there is, the easier it is for boats (asset prices) to rise.
3. Drawdown
Drawdown refers to the decline from a peak to a lower point. For example, if your account grows from 100,000 to 120,000 and then drops to 110,000, the 10,000 loss from the peak is the drawdown. It reflects how much a strategy can fall during the process of making profits. The larger the drawdown, the higher the volatility and the greater the risk.
4. Momentum Signal
A momentum signal can be understood as a way of judging trend continuation. Simply put, if something has been going up, it is likely to keep going up in the short term; if it has been going down, it may continue to fall. For example, if a stock has been rising continuously, a momentum signal would suggest it may keep rising for a while. Essentially, it relies on the idea of “inertia” in price movement.
5. Reversal Signal
A reversal signal is the opposite of momentum. It assumes that what has gone up too much is likely to fall, and what has fallen too much is likely to rise. For instance, if a stock has risen continuously and is now relatively expensive, a reversal signal would suggest it may decline. This is based on the concept of mean reversion, where prices tend to move back toward a more “normal” level over time.

High Noise: Extremely Limited Predictable Information

Another key characteristic of financial data is its extremely low signal-to-noise ratio. Short-term price movements are influenced by numerous factors, including macroeconomic policies, market sentiment, capital flows, and unexpected events. These factors are highly uncertain and difficult to model.

As a result, most price fluctuations are noise rather than meaningful signals. Machine learning models, due to their strong fitting capabilities, often mistake noise for patterns, leading to strong performance in training and backtesting but rapid failure in future data.

This is essentially an amplified form of overfitting. The weaker the signal, the easier it is for complex models to fit random fluctuations instead of true structure.

Time Dependency: Order Itself Contains Information

The third important characteristic of financial data is its strong temporal dependency. Price series contain rich dynamic structures such as momentum, mean reversion, and volatility clustering, all of which rely on the continuity of time.

Randomly shuffling data, as done in standard machine learning, destroys this structure and may introduce data leakage. Models may inadvertently use future information during training, leading to artificially high backtest performance that cannot be replicated in reality.

Therefore, time must be strictly preserved in financial modeling. Data splitting must follow chronological order, and evaluation should use techniques such as rolling windows to simulate real-world decision-making.

Reading Notes and Terminology
1. Momentum Effect
The momentum effect can be understood as “trends have inertia.” In other words, if an asset has been rising recently, it is more likely to continue rising in the short term; if it has been falling, it may continue to decline. It is similar to a person running forward—once in motion, they do not stop instantly but continue moving due to inertia.
2. Mean Reversion
Mean reversion can be understood as “what goes up too much tends to come down, and what falls too much tends to rise.” When prices deviate too far from a normal level, they often move back toward a more reasonable range. It is like a stretched rubber band that snaps back once released—this tendency to return to normal is mean reversion.
3. Volatility Clustering
Volatility clustering refers to the idea that the “intensity of market fluctuations” also has inertia. In simple terms, if the market has been calm for a period of time, it tends to remain calm; if it has recently experienced large swings, those large fluctuations are likely to continue for a while. It is similar to emotions—once tension builds up, it does not disappear immediately.
4. Rolling Window
A rolling window can be understood as a “continuously moving training approach.” For example, you train a model using data from an earlier period, test it on the immediately following period, and then move the window forward and repeat the process. The purpose is to simulate real trading conditions, where only past data is available to predict the future, rather than using all historical data at once.

Why Common Machine Learning Models Fail in Finance

After understanding the characteristics of financial data, the limitations of models such as Random Forest and XGBoost become much clearer.

These models fundamentally rely on the assumption of stable data distributions, making predictions by learning a set of fixed rules. However, in financial markets, the meaning of the same signal can vary significantly across different periods, and fixed rules are unlikely to remain effective over time.

At the same time, these models typically have strong expressive power and are capable of fitting complex nonlinear relationships. But in a high-noise environment, this capability often translates into overfitting noise rather than capturing true underlying structure.

In addition, most traditional machine learning models are built on the framework of static tabular data and have limited ability to model temporal structures. Although time-related information can be partially introduced by constructing lagged features, this approach cannot fully capture the dynamic evolution of time series.

More importantly, these models tend to focus on predicting outcomes rather than modeling uncertainty. In financial decision-making, simply knowing the direction of price movement is not sufficient; it is also necessary to evaluate the probability of that movement and the associated risk. If a model cannot characterize the distribution of outcomes, even a certain level of directional accuracy is unlikely to translate into stable returns.

The Right Direction: From Predicting Outcomes to Modeling Processes

To build effective financial models, a fundamental shift is required: moving from predicting outcomes to modeling processes.

Instead of directly predicting prices, financial modeling focuses on understanding how data is generated. This includes modeling return distributions, volatility dynamics, and interactions between variables.

Time series models serve as foundational tools, explicitly incorporating temporal structure and capturing evolving dynamics. They do not assume stability but instead seek structure within change.

Reading Notes and Terminology
1. Modeling Distribution Characteristics
Modeling distribution characteristics means that instead of only asking “will it go up or down,” we care about “how the ups and downs are distributed.” In other words, rather than predicting a single deterministic outcome, we aim to describe the probabilities of different possible scenarios—such as small gains, small losses, large gains, or large losses. Essentially, it answers the question: “What could happen in the future, and how likely is each outcome?”
2. Tail Risk
Tail risk refers to events that are rare but have severe consequences when they occur. For example, sudden market crashes or extreme price movements may not happen often, but when they do, they can cause significant losses. Focusing on tail risk means not only paying attention to normal market fluctuations but also being particularly aware of these extreme scenarios, as the most serious risks often lie in these low-probability but high-impact events.
3. Risk Measurement
Risk measurement means using concrete numbers to quantify potential losses, especially in worst-case scenarios. For example, you might ask: “Under normal conditions, what is the maximum loss I could face?” or “What is the loss level that could be exceeded with a 5% probability?” By quantifying risk in this way, you gain a clearer understanding of how risky a strategy is, rather than evaluating it based solely on returns, allowing for more rational decision-making.

Roadmap of This Series: What You Will Learn

If you have realized that traditional machine learning methods cannot directly solve financial problems, the next step is to build a new modeling framework.

In this series, we will start from the most fundamental data processing steps and gradually construct a complete financial modeling system. We will first introduce return modeling and stationarity testing to help you understand the core logic of time series analysis. Then we will move on to volatility modeling, with a focus on the GARCH model and its applications in risk analysis. Building on this, we will extend to multivariate models, analyzing the dynamic relationships between different assets and introducing more advanced methods for modeling correlations.

In the final stage, we will connect these models with practical strategies, explaining how to conduct proper backtesting and how to avoid the common pitfalls of “backtest illusions.”

Conclusion

Once you truly understand the nature of financial data, it becomes clear that this is not a problem that can be solved by simply applying machine learning tools. It requires a fundamental shift in modeling thinking.

The essence of financial modeling lies not in how complex the model is, but in whether the data itself is correctly understood.