Article

Forecasting Bitcoin Autocorrelation with 74% Directional Accuracy using LSTMs

Financial time series, like Bitcoin prices, are notoriously complex and volatile. While directly predicting price is challenging, analyzing and predicting underlying statistical properties can offer valuable insights. This article walks through a Python implementation that builds, trains, and evaluates a Long Short-Term Memory (LSTM) neural network to forecast the rolling autocorrelation of Bitcoin’s closing price. Autocorrelation measures the persistence of trends, and predicting it could potentially inform trading strategies or market analysis.

We’ll cover fetching data, calculating the target feature, preparing data for the LSTM, building and training the model with regularization, and finally evaluating its predictive performance.

1. Setting the Stage: Imports and Parameters

First, we import the necessary libraries: numpy and pandas for data manipulation, yfinance to fetch market data, matplotlib for plotting, sklearn for evaluation metrics and scaling (optional), math for calculations, and tensorflow.keras for building the LSTM model.

Python

import numpy as np
import pandas as pd
import yfinance as yf
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from math import sqrt
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping
import datetime

We then define key parameters for data fetching, feature calculation, and the LSTM model:

Python

# Data and Feature Parameters
ticker = 'BTC-USD'
start_date = '2023-01-01'
end_date = datetime.datetime.now().strftime('%Y-%m-%d')
rolling_window = 30 # Window for calculating autocorrelation
lag = 1             # Lag for autocorrelation (day-over-day)

# Model Hyperparameters
num_lags = 90       # How many past autocorrelation values to use as input
train_test_split = 0.80 # 80% for training, 20% for testing
num_neurons_in_hidden_layers = 128 # LSTM layer size
num_epochs = 100    # Max training epochs
batch_size = 20     # Samples per gradient update
dropout_rate = 0.1  # Regularization rate

2. Data Acquisition and Feature Engineering

We use yfinance to download historical Bitcoin price data.

Python

print(f"Fetching {ticker} data from {start_date} to {end_date}...")
data = yf.download(ticker, start=start_date, end=end_date)
# Clean up potential multi-level columns from yfinance
if isinstance(data.columns, pd.MultiIndex):
    data.columns = data.columns.droplevel(1)
data = data['Close'] # We only need closing prices
data = data.dropna()
print(f"Data fetched successfully. Shape: {data.shape}")

The core feature we want to predict is the rolling autocorrelation. This measures how correlated the price change on one day is with the price change on the previous day, calculated over the specified rolling_window.

Python

print(f"Calculating {rolling_window}-day rolling autocorrelation (lag={lag})...")
rolling_autocorr_series = data.rolling(
    window=rolling_window
).apply(lambda x: x.autocorr(lag=lag), raw=False) # Use pandas Series method

rolling_autocorr = rolling_autocorr_series.dropna().values # Drop initial NaNs
rolling_autocorr = np.reshape(rolling_autocorr, (-1)) # Ensure 1D shape
print(f"Rolling autocorrelation calculated. Shape: {rolling_autocorr.shape}")

Note: We use raw=False to ensure the apply function receives a pandas Series, which has the .autocorr() method.

3. Preparing Data for the LSTM

LSTMs require input data in a specific format: sequences of past observations (features) paired with the next observation (target). We define a helper function data_preprocessing for this:

Python

def data_preprocessing(data_series, n_lags, train_split_ratio):
    """
    Prepares time series data into lags for supervised learning and splits.
    """
    X, y = [], []
    # Create sequences: Use 'n_lags' points to predict the next point
    for i in range(n_lags, len(data_series)):
        X.append(data_series[i-n_lags:i])
        y.append(data_series[i])
    X, y = np.array(X), np.array(y)

    # Split into training and testing sets
    split_index = int(len(X) * train_split_ratio)
    x_train = X[:split_index]
    y_train = y[:split_index]
    x_test = X[split_index:]
    y_test = y[split_index:]

    print(f"Data shapes: X_train={x_train.shape}, y_train={y_train.shape}, X_test={x_test.shape}, y_test={y_test.shape}")
    return x_train, y_train, x_test, y_test

# Create the datasets
x_train, y_train, x_test, y_test = data_preprocessing(
    rolling_autocorr, num_lags, train_test_split
)

This function iterates through the autocorrelation series, creating input sequences (X) of length num_lags and corresponding target values (y). It then splits these into training and testing sets.

LSTMs expect a 3D input shape: (samples, timesteps, features). Our timesteps dimension is num_lags, and we have 1 feature (the autocorrelation value).

Python

# Reshape Input for LSTM [samples, time steps, features]
x_train = x_train.reshape((-1, num_lags, 1))
x_test = x_test.reshape((-1, num_lags, 1))
print(f"Data reshaped for LSTM: x_train={x_train.shape}, x_test={x_test.shape}")

4. Building the LSTM Model with Regularization

We use Keras’ Sequential API to define the model architecture. Key components include:

LSTM layer: The core recurrent layer that learns temporal dependencies.
BatchNormalization: Normalizes activations between layers, often leading to faster and more stable training.
Dropout: Randomly sets a fraction (dropout_rate) of input units to 0 during training, helping prevent overfitting.
Dense layer: A standard fully connected layer with one output neuron for our single predicted value.

Python

print("Building LSTM model...")
model = Sequential()
model.add(LSTM(units=num_neurons_in_hidden_layers, input_shape=(num_lags, 1)))
model.add(BatchNormalization()) # Regularization / Stability
model.add(Dropout(dropout_rate)) # Regularization
model.add(Dense(units=1))       # Output layer

# Compile: Define loss function and optimizer
model.compile(loss='mean_squared_error', optimizer='adam')
model.summary() # Display model structure

5. Training the Model with Early Stopping

To prevent overfitting and avoid unnecessary training time, we use EarlyStopping. This callback monitors a specified metric (here, the training loss) and stops training if it doesn’t improve for a set number of epochs (patience). restore_best_weights=True ensures the model weights from the best epoch are kept.

Python

# Early stopping implementation
early_stopping = EarlyStopping(monitor='loss', patience=15,
                             restore_best_weights=True, verbose=1)

print("Training model...")
history = model.fit(x_train, y_train,
                    epochs=num_epochs,
                    batch_size=batch_size,
                    callbacks=[early_stopping],
                    verbose=1,
                    shuffle=False) # Keep temporal order if needed

print("Training finished.")
if early_stopping.stopped_epoch > 0:
    print(f"Early stopping triggered at epoch {early_stopping.stopped_epoch + 1}")

Note: Using shuffle=False is often recommended for time series to maintain temporal sequence, although its impact might be less critical when using long input sequences (num_lags).

6. Prediction and Evaluation

With the model trained, we generate predictions on both the training data (in-sample) and the unseen test data (out-of-sample).

Python

print("Predicting...")
y_predicted_train = model.predict(x_train).flatten()
y_predicted_test = model.predict(x_test).flatten()

# Prepare actual values (flatten)
y_train_flat = y_train.flatten()
y_test_flat = y_test.flatten()

We evaluate performance using several metrics:

RMSE (Root Mean Squared Error): Measures the average magnitude of prediction errors. Lower is better.
Correlation: Measures how well the predicted values track the actual values (ranging from -1 to +1). Higher (closer to 1) is better.
Directional Accuracy: Measures the percentage of times the model correctly predicted whether the autocorrelation would increase or decrease compared to the previous day. Higher is better (> 50% suggests predictive ability).

Python

print("Evaluating performance...")
# Calculate Metrics
rmse_train = sqrt(mean_squared_error(y_train_flat, y_predicted_train))
rmse_test = sqrt(mean_squared_error(y_test_flat, y_predicted_test))

# (Assuming calculate_directional_accuracy function is defined as above)
accuracy_train = calculate_directional_accuracy(y_train_flat, y_predicted_train)
accuracy_test = calculate_directional_accuracy(y_test_flat, y_predicted_test)

min_len_train = min(len(y_train_flat), len(y_predicted_train))
min_len_test = min(len(y_test_flat), len(y_predicted_test))
correlation_train = np.corrcoef(y_train_flat[:min_len_train], y_predicted_train[:min_len_train])[0, 1]
correlation_test = np.corrcoef(y_test_flat[:min_len_test], y_predicted_test[:min_len_test])[0, 1]

# Print Results
print("\n--- Results ---")
# ... (print statements for metrics) ...
print("---------------\n")

Comparing the test metrics to the train metrics is crucial. If test performance is significantly worse, it indicates overfitting. Similar performance suggests the model generalizes well.

6. Analysis of Results

The evaluation metrics provide quantitative insights into the model’s performance:

--- Results ---
Directional Accuracy Train = 72.96 %
Directional Accuracy Test  = 73.61 %
RMSE Train = 0.10346005
RMSE Test  = 0.07769025
Correlation In-Sample Predicted/Train = 0.971
Correlation Out-of-Sample Predicted/Test = 0.967
---------------

Let’s break down what these numbers tell us:

Correlation (Train: 0.971, Test: 0.967): These are exceptionally high correlation coefficients, very close to 1.0. This indicates that the model’s predictions track the actual movements (ups and downs, general shape) of the rolling autocorrelation extremely well, both on the data it was trained on and, more importantly, on the unseen test data. The minimal drop between train and test correlation signifies excellent generalization.
RMSE (Train: 0.103, Test: 0.078): The Root Mean Squared Error measures the typical magnitude of the prediction error. Given that autocorrelation ranges from -1 to +1, these RMSE values are relatively low. Crucially, the Test RMSE is significantly lower than the Train RMSE. This is a strong positive sign, suggesting that the regularization techniques (Batch Normalization, Dropout, and especially Early Stopping) were highly effective in preventing overfitting. The model performs even better on unseen data according to this metric.
Directional Accuracy (Train: 72.96%, Test: 73.61%): Both values are well above 50%, indicating the model is considerably better than random chance at predicting whether the autocorrelation will increase or decrease in the next time step. Similar to RMSE, the test accuracy is slightly higher than the train accuracy, further reinforcing the conclusion that the model generalizes well.

Synthesis: Overall, these metrics paint a very positive picture. The LSTM model learned to predict the one-step-ahead 30-day rolling autocorrelation with high fidelity (high correlation), relatively low error magnitude (low RMSE), and good directional correctness. Most importantly, the model demonstrates excellent generalization to unseen test data, avoiding the common pitfall of overfitting.

7. Visualizing the Forecast

While metrics provide quantitative scores, a visual inspection helps confirm the model’s behavior.

Python

print("Plotting results...")
# (Assuming plot_train_test_values function is defined as above)
plot_train_test_values(n_train_plot=300, n_test_plot=len(y_test_flat),
                       y_train=y_train_flat,
                       y_test=y_test_flat,
                       y_predicted=y_predicted_test)

Plot Interpretation:

The plot visually confirms the strong performance indicated by the metrics.

The red dashed line (Predicted Test values) closely follows the overall pattern and major fluctuations of the green line (Actual Test values).
This visual alignment corroborates the high correlation score (0.967).
While the prediction captures the general trend and turning points well, it doesn’t perfectly match every peak and trough, which is expected and reflected in the non-zero RMSE (0.078). The predictions appear slightly smoother in some sections compared to the actual data.

This visual confirmation reinforces our confidence that the model has successfully learned the underlying short-term dynamics of the rolling autocorrelation series in this dataset.

Conclusion

This article demonstrated the complete workflow for building, training, and evaluating an LSTM model to forecast the rolling autocorrelation of Bitcoin prices. Key steps included fetching data, calculating the autocorrelation feature, preparing sequences for the LSTM, defining a regularized model architecture, training with early stopping, and evaluating using relevant metrics like RMSE, correlation, and directional accuracy.

While this model predicts a statistical feature rather than price directly, understanding and forecasting market persistence through autocorrelation could be a valuable component in developing more sophisticated trading algorithms or market analysis tools. Further work could involve hyperparameter tuning, exploring different model architectures, or integrating these predictions into a full backtesting framework like backtrader.