In quantitative finance, combining statistical filtering techniques with machine learning can provide robust insights into market dynamics. In this article, we explore two powerful tools—Neural Networks and the Kalman Filter—and show how they can be used together to predict the direction of asset price movements. We then outline a trading strategy that uses these predictions, backtests its performance, and compares it to a simple buy-and-hold approach.
Neural networks are a class of machine learning models inspired by biological neural structures. They consist of layers of interconnected nodes (neurons) that transform inputs into outputs through a series of linear and nonlinear operations.
A basic multilayer perceptron (MLP) can be mathematically described as follows:
Input Layer:
The network receives an input vector: \[\mathbf{x} = [x_1, x_2, \dots,
x_n]^T\]
Hidden Layers:
Each hidden layer performs a linear transformation followed by a
nonlinear activation function (e.g., ReLU or sigmoid). For layer \(l\): \[\mathbf{h}^{(l)} = \sigma \left( \mathbf{W}^{(l)}
\mathbf{h}^{(l-1)} + \mathbf{b}^{(l)} \right)\]
where:
Output Layer:
The final layer computes the output: \[\hat{\mathbf{y}} = \text{softmax} \left(
\mathbf{W}^{(L)} \mathbf{h}^{(L-1)} + \mathbf{b}^{(L)}
\right)\]
The softmax function is often used for classification tasks to convert raw scores into probabilities.
Training via Backpropagation:
The network parameters \(\{\mathbf{W}^{(l)},
\mathbf{b}^{(l)}\}\) are optimized by minimizing a loss function
\(L(\hat{\mathbf{y}}, \mathbf{y})\)
(e.g., cross-entropy for classification) using gradient descent: \[\theta \leftarrow \theta - \eta \frac{\partial
L}{\partial \theta}\]
where \(\theta\) represents all the parameters and \(\eta\) is the learning rate.
In our code, we use Python’s MLPClassifier
from
scikit-learn, which implements a multilayer perceptron with hidden
layers (in our case, with sizes 32 and 16 neurons) to predict the
direction of asset price movements.
The Kalman filter is a recursive algorithm used for estimating the state of a dynamic system from noisy observations. It is especially useful in financial applications where price signals are noisy.
The filter works in two main steps: prediction and update.
Prediction Step:
Here, \(\mathbf{F}\) is the state transition model, and \(\mathbf{Q}\) is the process noise covariance.
Update Step:
In these equations, \(\mathbf{H}\) is the observation model, \(\mathbf{R}\) is the measurement noise covariance, and \(\mathbf{z}_k\) is the measurement at time \(k\).
In our code, we use a custom KalmanFilter
class to
smooth the price series. The filter produces two outputs: a
smoothed price and an estimated rate of
change, which serve as features for the neural network.
The core idea is to predict the price direction for the next week (7 days) using the neural network. The target is defined as:
\[\text{direction} = \begin{cases} 1 & \text{if } \text{close}_{t+7} > \text{close}_t \\ -1 & \text{otherwise} \end{cases}\]
Long Position (+1):
If the model predicts a 1, the strategy goes long by buying at the
current close and selling 7 days later.
Short Position (-1):
If the model predicts -1, the strategy goes short by selling (or taking
a short position) at the current close and buying back 7 days
later.
No Trade (0):
If the model’s confidence is below a threshold (e.g., 80%), the signal
is set to 0, meaning no position is taken.
For backtesting:
The backtest aggregates these returns, computes cumulative performance (equity curve), and then compares the strategy to a buy-and-hold approach.
Below we break down key sections of the code, explaining how each component contributes to the overall strategy.
from binance.client import Client
import pandas as pd
import numpy as np
import ta
# Download price data from Binance
= Client()
client = 'ETHUSDC'
pair = pd.DataFrame(client.get_historical_klines(pair, '1d', '1 year ago'))
data = ['timestamp', 'open', 'high', 'low', 'close', 'volume',
data.columns 'close_time', 'quote_asset_volume', 'trades',
'taker_buy_base', 'taker_buy_quote', 'ignore']
'timestamp'] = pd.to_datetime(data['timestamp'], unit='ms')
data[= ['open', 'high', 'low', 'close', 'volume']
ohlcv_columns = data[ohlcv_columns].astype(float)
data[ohlcv_columns] 'timestamp', inplace=True)
data.set_index(
# Shift data to avoid lookahead bias in indicator calculations
= data.shift() data
Explanation:
.shift()
function is used to avoid using current
day data for calculations that would normally be computed using past
data.from KalmanFilter import KalmanFilter
= KalmanFilter(delta_t=1, process_var=1e-7, measurement_var=1e-1)
kf 'kalman_price', 'kalman_rate']] = kf.filter(data['close']) data[[
Explanation:
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from tqdm import tqdm
# Features used by the neural network (in this case, the Kalman outputs)
= ['kalman_price', 'kalman_rate']
features = 30
rolling_window = []
nn_predictions = []
nn_probabilities = []
actuals = []
prediction_dates
# Set up scaler and MLP neural network
= StandardScaler()
scaler = MLPClassifier(hidden_layer_sizes=(32, 16), max_iter=500, random_state=42)
mlp
# Rolling window loop: Train and predict
for i in tqdm(range(rolling_window, len(data) - 1)):
# Training data for past window
= data[features].iloc[i - rolling_window:i]
X_train = data['direction'].iloc[i - rolling_window:i]
y_train = scaler.fit_transform(X_train)
X_train_scaled
# Train the network on the rolling window
mlp.fit(X_train_scaled, y_train)
# Predict for the next interval
= data[features].iloc[i].values.reshape(1, -1)
X_next = scaler.transform(X_next)
X_next_scaled = mlp.predict(X_next_scaled)[0]
nn_pred = mlp.predict_proba(X_next_scaled)[0]
nn_prob
nn_predictions.append(nn_pred)
nn_probabilities.append(nn_prob)
# Save the actual direction and prediction time
'direction'].iloc[i])
actuals.append(data[ prediction_dates.append(data.index[i])
Explanation:
StandardScaler
to ensure that features are on the same
scale.# Use probabilities to adjust predictions
for i in range(len(nn_predictions)):
= nn_probabilities[i]
prob = -1 if prob[0] > prob[1] else 1
nn_predictions[i]
# Only accept predictions with high confidence (>= 80%)
for i in range(len(nn_predictions)):
= nn_predictions[i]
pred = nn_probabilities[i][0] if pred == -1 else nn_probabilities[i][1]
confidence if confidence < 0.8:
= 0
nn_predictions[i]
'nn_predictions'] = 0
data['nn_predictions'] = nn_predictions data.loc[prediction_dates,
Explanation:
# Calculate 7-day returns for the base asset
'7d_return'] = (data['close'].shift(-7) / data['close']) - 1
data[
# Calculate strategy returns based on NN predictions
'nn_7d_return'] = data['nn_predictions'] * data['7d_return']
data[
# Filter rows with valid predictions and returns
= data[(data['nn_predictions'] != 0) & (data['7d_return'].notna())]
predicted_data
# Print success rate of NN predictions
= data[['nn_predictions', 'direction']][data['nn_predictions'] != 0]
predictions = np.where(predictions['nn_predictions'] == predictions['direction'], 1, 0).mean() * 100
success_rate print("Neural Network Success Rate: {:.2f}%".format(success_rate))
Explanation:
# Initialize an equity column and starting capital
'nn_equity'] = np.nan
data[= 1.0 # Starting capital
equity = 0
i
# Simulate non-overlapping trades (skip 8 days after each trade)
while i < len(data) - 7:
= data.index[i]
idx_entry 'nn_equity'] = equity
data.at[idx_entry,
= data['nn_predictions'].iloc[i]
signal = data['close'].iloc[i]
entry_price = data['close'].iloc[i + 7]
exit_price
if signal == 1:
= (exit_price - entry_price) / entry_price
trade_return elif signal == -1:
= (entry_price - exit_price) / entry_price
trade_return else:
= 0.0
trade_return
*= (1 + trade_return)
equity = data.index[i + 7]
idx_exit 'nn_equity'] = equity
data.at[idx_exit, += 8
i
# Fill missing equity values
'nn_equity'].ffill(inplace=True)
data['nn_equity'].bfill(inplace=True)
data[
# Convert equity to percent profit
'nn_equity_pct'] = (data['nn_equity'] - 1.0) * 100 data[
Explanation:
# Buy and hold strategy: calculate daily returns and cumulative product
'bh_return'] = data['close'].pct_change()
data['bh_equity'] = (1 + data['bh_return']).cumprod()
data['bh_equity_pct'] = (data['bh_equity'] - 1.0) * 100 data[
Explanation:
import matplotlib.pyplot as plt
=(12,6))
plt.figure(figsize'bh_equity_pct'], label='Buy & Hold')
plt.plot(data.index, data['nn_equity_pct'], label='NN Strategy')
plt.plot(data.index, data['Buy & Hold vs. NN Strategy (Percent Profit)')
plt.title('Date')
plt.xlabel('Percent Profit (%)')
plt.ylabel(
plt.legend() plt.show()
Explanation:
In this article, we explored how neural networks and the Kalman filter can be integrated into a trading strategy:
This framework is a starting point for further research and refinement. Future enhancements might include improved feature engineering, more sophisticated risk management, and alternative model architectures. As always, caution is advised when applying these techniques to live trading due to the challenges of market dynamics and overfitting.