← Back to Home
Neural Networks with Kalman Filter for Trading

Neural Networks with Kalman Filter for Trading

In quantitative finance, combining statistical filtering techniques with machine learning can provide robust insights into market dynamics. In this article, we explore two powerful tools—Neural Networks and the Kalman Filter—and show how they can be used together to predict the direction of asset price movements. We then outline a trading strategy that uses these predictions, backtests its performance, and compares it to a simple buy-and-hold approach.


1. Theoretical Background

1.1 Neural Networks

Neural networks are a class of machine learning models inspired by biological neural structures. They consist of layers of interconnected nodes (neurons) that transform inputs into outputs through a series of linear and nonlinear operations.

Feed-Forward Neural Network Model

A basic multilayer perceptron (MLP) can be mathematically described as follows:

  1. Input Layer:
    The network receives an input vector: \[\mathbf{x} = [x_1, x_2, \dots, x_n]^T\]

  2. Hidden Layers:
    Each hidden layer performs a linear transformation followed by a nonlinear activation function (e.g., ReLU or sigmoid). For layer \(l\): \[\mathbf{h}^{(l)} = \sigma \left( \mathbf{W}^{(l)} \mathbf{h}^{(l-1)} + \mathbf{b}^{(l)} \right)\]

    where:

    • \(\mathbf{W}^{(l)}\) is the weight matrix.
    • \(\mathbf{b}^{(l)}\) is the bias vector.
    • \(\sigma\) is an activation function.
    • For \(l = 1\), \(\mathbf{h}^{(0)} = \mathbf{x}\).
  3. Output Layer:
    The final layer computes the output: \[\hat{\mathbf{y}} = \text{softmax} \left( \mathbf{W}^{(L)} \mathbf{h}^{(L-1)} + \mathbf{b}^{(L)} \right)\]

    The softmax function is often used for classification tasks to convert raw scores into probabilities.

  4. Training via Backpropagation:
    The network parameters \(\{\mathbf{W}^{(l)}, \mathbf{b}^{(l)}\}\) are optimized by minimizing a loss function \(L(\hat{\mathbf{y}}, \mathbf{y})\) (e.g., cross-entropy for classification) using gradient descent: \[\theta \leftarrow \theta - \eta \frac{\partial L}{\partial \theta}\]

    where \(\theta\) represents all the parameters and \(\eta\) is the learning rate.

In our code, we use Python’s MLPClassifier from scikit-learn, which implements a multilayer perceptron with hidden layers (in our case, with sizes 32 and 16 neurons) to predict the direction of asset price movements.


1.2 Kalman Filter

The Kalman filter is a recursive algorithm used for estimating the state of a dynamic system from noisy observations. It is especially useful in financial applications where price signals are noisy.

Kalman Filter Equations

The filter works in two main steps: prediction and update.

  1. Prediction Step:

    • State Prediction: \[\hat{\mathbf{x}}_{k|k-1} = \mathbf{F} \hat{\mathbf{x}}_{k-1|k-1}\]
    • Error Covariance Prediction: \[\mathbf{P}_{k|k-1} = \mathbf{F} \mathbf{P}_{k-1|k-1} \mathbf{F}^T + \mathbf{Q}\]

    Here, \(\mathbf{F}\) is the state transition model, and \(\mathbf{Q}\) is the process noise covariance.

  2. Update Step:

    • Kalman Gain: \[\mathbf{K}_k = \mathbf{P}_{k|k-1} \mathbf{H}^T \left( \mathbf{H} \mathbf{P}_{k|k-1} \mathbf{H}^T + \mathbf{R} \right)^{-1}\]
    • State Update: \[\hat{\mathbf{x}}_{k|k} = \hat{\mathbf{x}}_{k|k-1} + \mathbf{K}_k \left( \mathbf{z}_k - \mathbf{H} \hat{\mathbf{x}}_{k|k-1} \right)\]
    • Error Covariance Update: \[\mathbf{P}_{k|k} = \left( \mathbf{I} - \mathbf{K}_k \mathbf{H} \right) \mathbf{P}_{k|k-1}\]

    In these equations, \(\mathbf{H}\) is the observation model, \(\mathbf{R}\) is the measurement noise covariance, and \(\mathbf{z}_k\) is the measurement at time \(k\).

In our code, we use a custom KalmanFilter class to smooth the price series. The filter produces two outputs: a smoothed price and an estimated rate of change, which serve as features for the neural network.


2. The Trading Strategy

The core idea is to predict the price direction for the next week (7 days) using the neural network. The target is defined as:

\[\text{direction} = \begin{cases} 1 & \text{if } \text{close}_{t+7} > \text{close}_t \\ -1 & \text{otherwise} \end{cases}\]

Trading Signal Generation

Backtesting the Strategy

For backtesting:

The backtest aggregates these returns, computes cumulative performance (equity curve), and then compares the strategy to a buy-and-hold approach.


3. Code Walkthrough

Below we break down key sections of the code, explaining how each component contributes to the overall strategy.

3.1 Data Acquisition and Preprocessing

from binance.client import Client
import pandas as pd
import numpy as np
import ta

# Download price data from Binance
client = Client()
pair = 'ETHUSDC'
data = pd.DataFrame(client.get_historical_klines(pair, '1d', '1 year ago'))
data.columns = ['timestamp', 'open', 'high', 'low', 'close', 'volume', 
                'close_time', 'quote_asset_volume', 'trades', 
                'taker_buy_base', 'taker_buy_quote', 'ignore']
data['timestamp'] = pd.to_datetime(data['timestamp'], unit='ms')
ohlcv_columns = ['open', 'high', 'low', 'close', 'volume']
data[ohlcv_columns] = data[ohlcv_columns].astype(float)
data.set_index('timestamp', inplace=True)

# Shift data to avoid lookahead bias in indicator calculations
data = data.shift()

Explanation:

3.2 Smoothing with the Kalman Filter

from KalmanFilter import KalmanFilter
kf = KalmanFilter(delta_t=1, process_var=1e-7, measurement_var=1e-1)
data[['kalman_price', 'kalman_rate']] = kf.filter(data['close'])

Explanation:

3.3 Rolling Neural Network Training and Prediction

from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from tqdm import tqdm

# Features used by the neural network (in this case, the Kalman outputs)
features = ['kalman_price', 'kalman_rate']
rolling_window = 30
nn_predictions = []
nn_probabilities = []
actuals = []
prediction_dates = []

# Set up scaler and MLP neural network
scaler = StandardScaler()
mlp = MLPClassifier(hidden_layer_sizes=(32, 16), max_iter=500, random_state=42)

# Rolling window loop: Train and predict
for i in tqdm(range(rolling_window, len(data) - 1)):
    # Training data for past window
    X_train = data[features].iloc[i - rolling_window:i]
    y_train = data['direction'].iloc[i - rolling_window:i]
    X_train_scaled = scaler.fit_transform(X_train)
    
    # Train the network on the rolling window
    mlp.fit(X_train_scaled, y_train)
    
    # Predict for the next interval
    X_next = data[features].iloc[i].values.reshape(1, -1)
    X_next_scaled = scaler.transform(X_next)
    nn_pred = mlp.predict(X_next_scaled)[0]
    nn_prob = mlp.predict_proba(X_next_scaled)[0]
    nn_predictions.append(nn_pred)
    nn_probabilities.append(nn_prob)
    
    # Save the actual direction and prediction time
    actuals.append(data['direction'].iloc[i])
    prediction_dates.append(data.index[i])

Explanation:

Adjusting Predictions Based on Confidence

# Use probabilities to adjust predictions
for i in range(len(nn_predictions)):
    prob = nn_probabilities[i]
    nn_predictions[i] = -1 if prob[0] > prob[1] else 1

# Only accept predictions with high confidence (>= 80%)
for i in range(len(nn_predictions)):
    pred = nn_predictions[i]
    confidence = nn_probabilities[i][0] if pred == -1 else nn_probabilities[i][1]
    if confidence < 0.8:
        nn_predictions[i] = 0

data['nn_predictions'] = 0
data.loc[prediction_dates, 'nn_predictions'] = nn_predictions

Explanation:

3.4 Performance Evaluation and 7-Day Return Calculation

# Calculate 7-day returns for the base asset
data['7d_return'] = (data['close'].shift(-7) / data['close']) - 1

# Calculate strategy returns based on NN predictions
data['nn_7d_return'] = data['nn_predictions'] * data['7d_return']

# Filter rows with valid predictions and returns
predicted_data = data[(data['nn_predictions'] != 0) & (data['7d_return'].notna())]

# Print success rate of NN predictions
predictions = data[['nn_predictions', 'direction']][data['nn_predictions'] != 0]
success_rate = np.where(predictions['nn_predictions'] == predictions['direction'], 1, 0).mean() * 100
print("Neural Network Success Rate: {:.2f}%".format(success_rate))

Explanation:

3.5 Constructing and Plotting the Equity Curves

NN Strategy Equity Curve

# Initialize an equity column and starting capital
data['nn_equity'] = np.nan
equity = 1.0   # Starting capital
i = 0

# Simulate non-overlapping trades (skip 8 days after each trade)
while i < len(data) - 7:
    idx_entry = data.index[i]
    data.at[idx_entry, 'nn_equity'] = equity

    signal = data['nn_predictions'].iloc[i]
    entry_price = data['close'].iloc[i]
    exit_price  = data['close'].iloc[i + 7]

    if signal == 1:
        trade_return = (exit_price - entry_price) / entry_price
    elif signal == -1:
        trade_return = (entry_price - exit_price) / entry_price
    else:
        trade_return = 0.0

    equity *= (1 + trade_return)
    idx_exit = data.index[i + 7]
    data.at[idx_exit, 'nn_equity'] = equity
    i += 8

# Fill missing equity values
data['nn_equity'].ffill(inplace=True)
data['nn_equity'].bfill(inplace=True)

# Convert equity to percent profit
data['nn_equity_pct'] = (data['nn_equity'] - 1.0) * 100

Explanation:

Buy-and-Hold Equity Curve

# Buy and hold strategy: calculate daily returns and cumulative product
data['bh_return'] = data['close'].pct_change()
data['bh_equity'] = (1 + data['bh_return']).cumprod()
data['bh_equity_pct'] = (data['bh_equity'] - 1.0) * 100

Explanation:

Plotting the Comparison

import matplotlib.pyplot as plt

plt.figure(figsize=(12,6))
plt.plot(data.index, data['bh_equity_pct'], label='Buy & Hold')
plt.plot(data.index, data['nn_equity_pct'], label='NN Strategy')
plt.title('Buy & Hold vs. NN Strategy (Percent Profit)')
plt.xlabel('Date')
plt.ylabel('Percent Profit (%)')
plt.legend()
plt.show()

Explanation:

Pasted image 20250316204149.png

4. Conclusion

In this article, we explored how neural networks and the Kalman filter can be integrated into a trading strategy:

This framework is a starting point for further research and refinement. Future enhancements might include improved feature engineering, more sophisticated risk management, and alternative model architectures. As always, caution is advised when applying these techniques to live trading due to the challenges of market dynamics and overfitting.