Article

Neural Networks with Kalman Filter for Trading

In quantitative finance, combining statistical filtering techniques with machine learning can provide robust insights into market dynamics. In this article, we explore two powerful tools—Neural Networks and the Kalman Filter—and show how they can be used together to predict the direction of asset price movements. We then outline a trading strategy that uses these predictions, backtests its performance, and compares it to a simple buy-and-hold approach.

1. Theoretical Background

1.1 Neural Networks

Neural networks are a class of machine learning models inspired by biological neural structures. They consist of layers of interconnected nodes (neurons) that transform inputs into outputs through a series of linear and nonlinear operations.

Feed-Forward Neural Network Model

A basic multilayer perceptron (MLP) can be mathematically described as follows:

Input Layer:
The network receives an input vector: \[\mathbf{x} = [x_1, x_2, \dots, x_n]^T\]
Hidden Layers:
Each hidden layer performs a linear transformation followed by a nonlinear activation function (e.g., ReLU or sigmoid). For layer \(l\): \[\mathbf{h}^{(l)} = \sigma \left( \mathbf{W}^{(l)} \mathbf{h}^{(l-1)} + \mathbf{b}^{(l)} \right)\]

where:
- \(\mathbf{W}^{(l)}\) is the weight matrix.
- \(\mathbf{b}^{(l)}\) is the bias vector.
- \(\sigma\) is an activation function.
- For \(l = 1\), \(\mathbf{h}^{(0)} = \mathbf{x}\).
Output Layer:
The final layer computes the output: \[\hat{\mathbf{y}} = \text{softmax} \left( \mathbf{W}^{(L)} \mathbf{h}^{(L-1)} + \mathbf{b}^{(L)} \right)\]

The softmax function is often used for classification tasks to convert raw scores into probabilities.
Training via Backpropagation:
The network parameters \(\{\mathbf{W}^{(l)}, \mathbf{b}^{(l)}\}\) are optimized by minimizing a loss function \(L(\hat{\mathbf{y}}, \mathbf{y})\) (e.g., cross-entropy for classification) using gradient descent: \[\theta \leftarrow \theta - \eta \frac{\partial L}{\partial \theta}\]

where \(\theta\) represents all the parameters and \(\eta\) is the learning rate.

In our code, we use Python’s MLPClassifier from scikit-learn, which implements a multilayer perceptron with hidden layers (in our case, with sizes 32 and 16 neurons) to predict the direction of asset price movements.

1.2 Kalman Filter

The Kalman filter is a recursive algorithm used for estimating the state of a dynamic system from noisy observations. It is especially useful in financial applications where price signals are noisy.

Kalman Filter Equations

The filter works in two main steps: prediction and update.

Prediction Step:
- State Prediction: \[\hat{\mathbf{x}}_{k|k-1} = \mathbf{F} \hat{\mathbf{x}}_{k-1|k-1}\]
- Error Covariance Prediction: \[\mathbf{P}_{k|k-1} = \mathbf{F} \mathbf{P}_{k-1|k-1} \mathbf{F}^T + \mathbf{Q}\]
Here, \(\mathbf{F}\) is the state transition model, and \(\mathbf{Q}\) is the process noise covariance.
Update Step:
- Kalman Gain: \[\mathbf{K}_k = \mathbf{P}_{k|k-1} \mathbf{H}^T \left( \mathbf{H} \mathbf{P}_{k|k-1} \mathbf{H}^T + \mathbf{R} \right)^{-1}\]
- State Update: \[\hat{\mathbf{x}}_{k|k} = \hat{\mathbf{x}}_{k|k-1} + \mathbf{K}_k \left( \mathbf{z}_k - \mathbf{H} \hat{\mathbf{x}}_{k|k-1} \right)\]
- Error Covariance Update: \[\mathbf{P}_{k|k} = \left( \mathbf{I} - \mathbf{K}_k \mathbf{H} \right) \mathbf{P}_{k|k-1}\]
In these equations, \(\mathbf{H}\) is the observation model, \(\mathbf{R}\) is the measurement noise covariance, and \(\mathbf{z}_k\) is the measurement at time \(k\).

In our code, we use a custom KalmanFilter class to smooth the price series. The filter produces two outputs: a smoothed price and an estimated rate of change, which serve as features for the neural network.

2. The Trading Strategy

The core idea is to predict the price direction for the next week (7 days) using the neural network. The target is defined as:

\[\text{direction} = \begin{cases} 1 & \text{if } \text{close}_{t+7} > \text{close}_t \\ -1 & \text{otherwise} \end{cases}\]

Trading Signal Generation

Long Position (+1):
If the model predicts a 1, the strategy goes long by buying at the current close and selling 7 days later.
Short Position (-1):
If the model predicts -1, the strategy goes short by selling (or taking a short position) at the current close and buying back 7 days later.
No Trade (0):
If the model’s confidence is below a threshold (e.g., 80%), the signal is set to 0, meaning no position is taken.

Backtesting the Strategy

For backtesting:

7-Day Returns:
The asset’s 7‑day return is computed as: \[\text{7d\_return}_t = \frac{\text{close}_{t+7}}{\text{close}_t} - 1\]
Strategy Return:
The trading return is given by: \[\text{strategy\_return}_t = \text{nn\_signal}_t \times \text{7d\_return}_t\]

The backtest aggregates these returns, computes cumulative performance (equity curve), and then compares the strategy to a buy-and-hold approach.

3. Code Walkthrough

Below we break down key sections of the code, explaining how each component contributes to the overall strategy.

3.1 Data Acquisition and Preprocessing

from binance.client import Client
import pandas as pd
import numpy as np
import ta

# Download price data from Binance
client = Client()
pair = 'ETHUSDC'
data = pd.DataFrame(client.get_historical_klines(pair, '1d', '1 year ago'))
data.columns = ['timestamp', 'open', 'high', 'low', 'close', 'volume', 
                'close_time', 'quote_asset_volume', 'trades', 
                'taker_buy_base', 'taker_buy_quote', 'ignore']
data['timestamp'] = pd.to_datetime(data['timestamp'], unit='ms')
ohlcv_columns = ['open', 'high', 'low', 'close', 'volume']
data[ohlcv_columns] = data[ohlcv_columns].astype(float)
data.set_index('timestamp', inplace=True)

# Shift data to avoid lookahead bias in indicator calculations
data = data.shift()

Explanation:

Data is fetched using the Binance API and converted into a DataFrame with proper datetime indexing.
The .shift() function is used to avoid using current day data for calculations that would normally be computed using past data.

3.2 Smoothing with the Kalman Filter

from KalmanFilter import KalmanFilter
kf = KalmanFilter(delta_t=1, process_var=1e-7, measurement_var=1e-1)
data[['kalman_price', 'kalman_rate']] = kf.filter(data['close'])

Explanation:

The Kalman filter is applied to the close price to produce a smoothed price and an estimated rate of change (velocity).
These filtered outputs are later used as features for the neural network.

3.3 Rolling Neural Network Training and Prediction

from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from tqdm import tqdm

# Features used by the neural network (in this case, the Kalman outputs)
features = ['kalman_price', 'kalman_rate']
rolling_window = 30
nn_predictions = []
nn_probabilities = []
actuals = []
prediction_dates = []

# Set up scaler and MLP neural network
scaler = StandardScaler()
mlp = MLPClassifier(hidden_layer_sizes=(32, 16), max_iter=500, random_state=42)

# Rolling window loop: Train and predict
for i in tqdm(range(rolling_window, len(data) - 1)):
    # Training data for past window
    X_train = data[features].iloc[i - rolling_window:i]
    y_train = data['direction'].iloc[i - rolling_window:i]
    X_train_scaled = scaler.fit_transform(X_train)
    
    # Train the network on the rolling window
    mlp.fit(X_train_scaled, y_train)
    
    # Predict for the next interval
    X_next = data[features].iloc[i].values.reshape(1, -1)
    X_next_scaled = scaler.transform(X_next)
    nn_pred = mlp.predict(X_next_scaled)[0]
    nn_prob = mlp.predict_proba(X_next_scaled)[0]
    nn_predictions.append(nn_pred)
    nn_probabilities.append(nn_prob)
    
    # Save the actual direction and prediction time
    actuals.append(data['direction'].iloc[i])
    prediction_dates.append(data.index[i])

Explanation:

Rolling Window: The neural network is retrained on a moving window (30 days) to adapt to recent market behavior.
Scaling: Data is standardized using StandardScaler to ensure that features are on the same scale.
Prediction: The network predicts the next interval’s direction. The probabilities are stored for later confidence filtering.

Adjusting Predictions Based on Confidence

# Use probabilities to adjust predictions
for i in range(len(nn_predictions)):
    prob = nn_probabilities[i]
    nn_predictions[i] = -1 if prob[0] > prob[1] else 1

# Only accept predictions with high confidence (>= 80%)
for i in range(len(nn_predictions)):
    pred = nn_predictions[i]
    confidence = nn_probabilities[i][0] if pred == -1 else nn_probabilities[i][1]
    if confidence < 0.8:
        nn_predictions[i] = 0

data['nn_predictions'] = 0
data.loc[prediction_dates, 'nn_predictions'] = nn_predictions

Explanation:

The prediction is chosen based on the higher probability between -1 and 1.
A confidence threshold is applied—if the probability is less than 80%, the model issues no trade (signal 0).

3.4 Performance Evaluation and 7-Day Return Calculation

# Calculate 7-day returns for the base asset
data['7d_return'] = (data['close'].shift(-7) / data['close']) - 1

# Calculate strategy returns based on NN predictions
data['nn_7d_return'] = data['nn_predictions'] * data['7d_return']

# Filter rows with valid predictions and returns
predicted_data = data[(data['nn_predictions'] != 0) & (data['7d_return'].notna())]

# Print success rate of NN predictions
predictions = data[['nn_predictions', 'direction']][data['nn_predictions'] != 0]
success_rate = np.where(predictions['nn_predictions'] == predictions['direction'], 1, 0).mean() * 100
print("Neural Network Success Rate: {:.2f}%".format(success_rate))

Explanation:

7-Day Returns: The asset’s return over the next 7 days is calculated.
Strategy Return: The NN signal is multiplied by the 7-day return. A positive signal captures the asset return for a long position, and a negative signal inverses the return for a short position.
Success Rate: The percentage of correct predictions is computed.

3.5 Constructing and Plotting the Equity Curves

NN Strategy Equity Curve

# Initialize an equity column and starting capital
data['nn_equity'] = np.nan
equity = 1.0   # Starting capital
i = 0

# Simulate non-overlapping trades (skip 8 days after each trade)
while i < len(data) - 7:
    idx_entry = data.index[i]
    data.at[idx_entry, 'nn_equity'] = equity

    signal = data['nn_predictions'].iloc[i]
    entry_price = data['close'].iloc[i]
    exit_price  = data['close'].iloc[i + 7]

    if signal == 1:
        trade_return = (exit_price - entry_price) / entry_price
    elif signal == -1:
        trade_return = (entry_price - exit_price) / entry_price
    else:
        trade_return = 0.0

    equity *= (1 + trade_return)
    idx_exit = data.index[i + 7]
    data.at[idx_exit, 'nn_equity'] = equity
    i += 8

# Fill missing equity values
data['nn_equity'].ffill(inplace=True)
data['nn_equity'].bfill(inplace=True)

# Convert equity to percent profit
data['nn_equity_pct'] = (data['nn_equity'] - 1.0) * 100

Explanation:

Trade Simulation: The code simulates entering a trade when a signal is generated, holds the position for 7 days, and then updates the equity.
Non-Overlapping Trades: After closing a trade, the index is advanced by 8 days to ensure trades do not overlap.
Equity Curve: The cumulative equity is forward- and back-filled across the entire date range and then converted to percent profit.

Buy-and-Hold Equity Curve

# Buy and hold strategy: calculate daily returns and cumulative product
data['bh_return'] = data['close'].pct_change()
data['bh_equity'] = (1 + data['bh_return']).cumprod()
data['bh_equity_pct'] = (data['bh_equity'] - 1.0) * 100

Explanation:

This simple benchmark strategy simulates buying the asset at the beginning and holding it throughout the period.
The cumulative return is calculated by taking the cumulative product of daily returns.

Plotting the Comparison

import matplotlib.pyplot as plt

plt.figure(figsize=(12,6))
plt.plot(data.index, data['bh_equity_pct'], label='Buy & Hold')
plt.plot(data.index, data['nn_equity_pct'], label='NN Strategy')
plt.title('Buy & Hold vs. NN Strategy (Percent Profit)')
plt.xlabel('Date')
plt.ylabel('Percent Profit (%)')
plt.legend()
plt.show()

Explanation:

The equity curves of the NN strategy and the buy-and-hold approach are plotted on the same time axis.
The y-axis is in percent profit, allowing for an intuitive comparison of overall performance.

4. Conclusion

In this article, we explored how neural networks and the Kalman filter can be integrated into a trading strategy:

Neural Networks provide a way to learn complex, nonlinear relationships from historical data, using layers of weighted transformations and activation functions.
Kalman Filters help smooth out noisy price data and estimate underlying trends, producing additional features that can improve prediction accuracy.
By training a neural network on a rolling window of past data and using its predictions to determine trading signals (long, short, or no trade), we can simulate a trading strategy that captures 7‑day returns.
The code further demonstrates how to backtest this strategy by constructing an equity curve, comparing it to a benchmark buy‑and‑hold strategy.

This framework is a starting point for further research and refinement. Future enhancements might include improved feature engineering, more sophisticated risk management, and alternative model architectures. As always, caution is advised when applying these techniques to live trading due to the challenges of market dynamics and overfitting.