Financial time series, like Bitcoin prices, are notoriously complex and volatile. While directly predicting price is challenging, analyzing and predicting underlying statistical properties can offer valuable insights. This article walks through a Python implementation that builds, trains, and evaluates a Long Short-Term Memory (LSTM) neural network to forecast the rolling autocorrelation of Bitcoin’s closing price. Autocorrelation measures the persistence of trends, and predicting it could potentially inform trading strategies or market analysis.
We’ll cover fetching data, calculating the target feature, preparing data for the LSTM, building and training the model with regularization, and finally evaluating its predictive performance.
1. Setting the Stage: Imports and Parameters
First, we import the necessary libraries: numpy
and
pandas
for data manipulation, yfinance
to
fetch market data, matplotlib
for plotting,
sklearn
for evaluation metrics and scaling (optional),
math
for calculations, and tensorflow.keras
for building the LSTM model.
Python
import numpy as np
import pandas as pd
import yfinance as yf
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from math import sqrt
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping
import datetime
We then define key parameters for data fetching, feature calculation, and the LSTM model:
Python
# Data and Feature Parameters
ticker = 'BTC-USD'
start_date = '2023-01-01'
end_date = datetime.datetime.now().strftime('%Y-%m-%d')
rolling_window = 30 # Window for calculating autocorrelation
lag = 1 # Lag for autocorrelation (day-over-day)
# Model Hyperparameters
num_lags = 90 # How many past autocorrelation values to use as input
train_test_split = 0.80 # 80% for training, 20% for testing
num_neurons_in_hidden_layers = 128 # LSTM layer size
num_epochs = 100 # Max training epochs
batch_size = 20 # Samples per gradient update
dropout_rate = 0.1 # Regularization rate
2. Data Acquisition and Feature Engineering
We use yfinance
to download historical Bitcoin price
data.
Python
print(f"Fetching {ticker} data from {start_date} to {end_date}...")
data = yf.download(ticker, start=start_date, end=end_date)
# Clean up potential multi-level columns from yfinance
if isinstance(data.columns, pd.MultiIndex):
data.columns = data.columns.droplevel(1)
data = data['Close'] # We only need closing prices
data = data.dropna()
print(f"Data fetched successfully. Shape: {data.shape}")
The core feature we want to predict is the rolling autocorrelation.
This measures how correlated the price change on one day is with the
price change on the previous day, calculated over the specified
rolling_window
.
Python
print(f"Calculating {rolling_window}-day rolling autocorrelation (lag={lag})...")
rolling_autocorr_series = data.rolling(
window=rolling_window
).apply(lambda x: x.autocorr(lag=lag), raw=False) # Use pandas Series method
rolling_autocorr = rolling_autocorr_series.dropna().values # Drop initial NaNs
rolling_autocorr = np.reshape(rolling_autocorr, (-1)) # Ensure 1D shape
print(f"Rolling autocorrelation calculated. Shape: {rolling_autocorr.shape}")
Note: We use raw=False
to ensure the
apply
function receives a pandas Series, which has the
.autocorr()
method.
3. Preparing Data for the LSTM
LSTMs require input data in a specific format: sequences of past
observations (features) paired with the next observation (target). We
define a helper function data_preprocessing
for this:
Python
def data_preprocessing(data_series, n_lags, train_split_ratio):
"""
Prepares time series data into lags for supervised learning and splits.
"""
X, y = [], []
# Create sequences: Use 'n_lags' points to predict the next point
for i in range(n_lags, len(data_series)):
X.append(data_series[i-n_lags:i])
y.append(data_series[i])
X, y = np.array(X), np.array(y)
# Split into training and testing sets
split_index = int(len(X) * train_split_ratio)
x_train = X[:split_index]
y_train = y[:split_index]
x_test = X[split_index:]
y_test = y[split_index:]
print(f"Data shapes: X_train={x_train.shape}, y_train={y_train.shape}, X_test={x_test.shape}, y_test={y_test.shape}")
return x_train, y_train, x_test, y_test
# Create the datasets
x_train, y_train, x_test, y_test = data_preprocessing(
rolling_autocorr, num_lags, train_test_split
)
This function iterates through the autocorrelation series, creating
input sequences (X
) of length num_lags
and
corresponding target values (y
). It then splits these into
training and testing sets.
LSTMs expect a 3D input shape:
(samples, timesteps, features)
. Our timesteps
dimension is num_lags
, and we have 1 feature (the
autocorrelation value).
Python
# Reshape Input for LSTM [samples, time steps, features]
x_train = x_train.reshape((-1, num_lags, 1))
x_test = x_test.reshape((-1, num_lags, 1))
print(f"Data reshaped for LSTM: x_train={x_train.shape}, x_test={x_test.shape}")
4. Building the LSTM Model with Regularization
We use Keras’ Sequential
API to define the model
architecture. Key components include:
LSTM
layer: The core recurrent layer that learns
temporal dependencies.BatchNormalization
: Normalizes activations between
layers, often leading to faster and more stable training.Dropout
: Randomly sets a fraction
(dropout_rate
) of input units to 0 during training, helping
prevent overfitting.Dense
layer: A standard fully connected layer with one
output neuron for our single predicted value.Python
print("Building LSTM model...")
model = Sequential()
model.add(LSTM(units=num_neurons_in_hidden_layers, input_shape=(num_lags, 1)))
model.add(BatchNormalization()) # Regularization / Stability
model.add(Dropout(dropout_rate)) # Regularization
model.add(Dense(units=1)) # Output layer
# Compile: Define loss function and optimizer
model.compile(loss='mean_squared_error', optimizer='adam')
model.summary() # Display model structure
5. Training the Model with Early Stopping
To prevent overfitting and avoid unnecessary training time, we use
EarlyStopping
. This callback monitors a specified metric
(here, the training loss
) and stops training if it doesn’t
improve for a set number of epochs (patience
).
restore_best_weights=True
ensures the model weights from
the best epoch are kept.
Python
# Early stopping implementation
early_stopping = EarlyStopping(monitor='loss', patience=15,
restore_best_weights=True, verbose=1)
print("Training model...")
history = model.fit(x_train, y_train,
epochs=num_epochs,
batch_size=batch_size,
callbacks=[early_stopping],
verbose=1,
shuffle=False) # Keep temporal order if needed
print("Training finished.")
if early_stopping.stopped_epoch > 0:
print(f"Early stopping triggered at epoch {early_stopping.stopped_epoch + 1}")
Note: Using shuffle=False
is often recommended for
time series to maintain temporal sequence, although its impact might be
less critical when using long input sequences
(num_lags
).
6. Prediction and Evaluation
With the model trained, we generate predictions on both the training data (in-sample) and the unseen test data (out-of-sample).
Python
print("Predicting...")
y_predicted_train = model.predict(x_train).flatten()
y_predicted_test = model.predict(x_test).flatten()
# Prepare actual values (flatten)
y_train_flat = y_train.flatten()
y_test_flat = y_test.flatten()
We evaluate performance using several metrics:
Python
print("Evaluating performance...")
# Calculate Metrics
rmse_train = sqrt(mean_squared_error(y_train_flat, y_predicted_train))
rmse_test = sqrt(mean_squared_error(y_test_flat, y_predicted_test))
# (Assuming calculate_directional_accuracy function is defined as above)
accuracy_train = calculate_directional_accuracy(y_train_flat, y_predicted_train)
accuracy_test = calculate_directional_accuracy(y_test_flat, y_predicted_test)
min_len_train = min(len(y_train_flat), len(y_predicted_train))
min_len_test = min(len(y_test_flat), len(y_predicted_test))
correlation_train = np.corrcoef(y_train_flat[:min_len_train], y_predicted_train[:min_len_train])[0, 1]
correlation_test = np.corrcoef(y_test_flat[:min_len_test], y_predicted_test[:min_len_test])[0, 1]
# Print Results
print("\n--- Results ---")
# ... (print statements for metrics) ...
print("---------------\n")
Comparing the test metrics to the train metrics is crucial. If test performance is significantly worse, it indicates overfitting. Similar performance suggests the model generalizes well.
6. Analysis of Results
The evaluation metrics provide quantitative insights into the model’s performance:
--- Results ---
Directional Accuracy Train = 72.96 %
Directional Accuracy Test = 73.61 %
RMSE Train = 0.10346005
RMSE Test = 0.07769025
Correlation In-Sample Predicted/Train = 0.971
Correlation Out-of-Sample Predicted/Test = 0.967
---------------
Let’s break down what these numbers tell us:
Correlation (Train: 0.971, Test: 0.967): These are exceptionally high correlation coefficients, very close to 1.0. This indicates that the model’s predictions track the actual movements (ups and downs, general shape) of the rolling autocorrelation extremely well, both on the data it was trained on and, more importantly, on the unseen test data. The minimal drop between train and test correlation signifies excellent generalization.
RMSE (Train: 0.103, Test: 0.078): The Root Mean Squared Error measures the typical magnitude of the prediction error. Given that autocorrelation ranges from -1 to +1, these RMSE values are relatively low. Crucially, the Test RMSE is significantly lower than the Train RMSE. This is a strong positive sign, suggesting that the regularization techniques (Batch Normalization, Dropout, and especially Early Stopping) were highly effective in preventing overfitting. The model performs even better on unseen data according to this metric.
Directional Accuracy (Train: 72.96%, Test: 73.61%): Both values are well above 50%, indicating the model is considerably better than random chance at predicting whether the autocorrelation will increase or decrease in the next time step. Similar to RMSE, the test accuracy is slightly higher than the train accuracy, further reinforcing the conclusion that the model generalizes well.
Synthesis: Overall, these metrics paint a very positive picture. The LSTM model learned to predict the one-step-ahead 30-day rolling autocorrelation with high fidelity (high correlation), relatively low error magnitude (low RMSE), and good directional correctness. Most importantly, the model demonstrates excellent generalization to unseen test data, avoiding the common pitfall of overfitting.
7. Visualizing the Forecast
While metrics provide quantitative scores, a visual inspection helps confirm the model’s behavior.
Python
print("Plotting results...")
# (Assuming plot_train_test_values function is defined as above)
plot_train_test_values(n_train_plot=300, n_test_plot=len(y_test_flat),
y_train=y_train_flat,
y_test=y_test_flat,
y_predicted=y_predicted_test)
Plot Interpretation:
The plot visually confirms the strong performance indicated by the metrics.
This visual confirmation reinforces our confidence that the model has successfully learned the underlying short-term dynamics of the rolling autocorrelation series in this dataset.
Conclusion
This article demonstrated the complete workflow for building, training, and evaluating an LSTM model to forecast the rolling autocorrelation of Bitcoin prices. Key steps included fetching data, calculating the autocorrelation feature, preparing sequences for the LSTM, defining a regularized model architecture, training with early stopping, and evaluating using relevant metrics like RMSE, correlation, and directional accuracy.
While this model predicts a statistical feature rather than price
directly, understanding and forecasting market persistence through
autocorrelation could be a valuable component in developing more
sophisticated trading algorithms or market analysis tools. Further work
could involve hyperparameter tuning, exploring different model
architectures, or integrating these predictions into a full backtesting
framework like backtrader
.