Usage of ML models in financial time series prediction

May 10, 2024

Correlation Analysis:

Purpose: Measures the strength and direction of a linear relationship between two securities' prices.
Method: Calculate Pearson correlation coefficient over sliding windows to see how correlations change over time.

Cointegration Test:

Purpose: Determines whether two or more non-stationary series are cointegrated, meaning they have a long-term, stable relationship.
Method: Use the Johansen cointegration test or Engle-Granger two-step method to assess if any stable equilibrium relationship exists between the price series.

Vector Autoregression (VAR):

Purpose: Captures the linear interdependencies among multiple time series.
Method: Model each security’s price as a function of its own lagged (past) values and the lagged values of the other security. This can help you understand the impact of one security’s price changes on the other.

Granger Causality Test:

Purpose: Tests whether one time series can forecast another, which is useful for identifying causal relationships between the prices.
Method: A statistical hypothesis test to determine if one series can be used to forecast another.

Machine Learning Models:

Recurrent Neural Networks (RNNs): Especially Long Short-Term Memory (LSTM) models are effective for time-series data, capturing patterns in data where context from previous points informs current points.
Gated Recurrent Units (GRUs): Similar to LSTMs but simpler and can be more efficient in some cases.

Considerations:

Data Quality: Ensure data is clean and accurate, handling any outliers or missing data appropriately.
Stationarity: Time series data should generally be stationary for these analyses. Use differencing or transformation methods like logarithms if needed.
Sampling Frequency: Your results can vary significantly based on the frequency of the data points (minute-by-minute in this case). Ensure that the sampling frequency is adequate for capturing the dynamics you're interested in.
Look-Back Period: The choice of how much past data (look-back period) to include in your models affects performance, especially for machine learning models.
Model Complexity: More complex models may provide better insights but also require more data and computational power. Simpler models might be more interpretable and quicker to run.
Economic Events: Be aware of macroeconomic events or market conditions that could affect the securities' prices and might need to be included in the model as exogenous variables.