Kalman filter is a powerful tool for estimating the hidden state of a dynamic system from noisy observations. Originally developed for aerospace applications, it has since found use in many fields including finance, where it can smooth or denoise price time series. In this article, we explain the Kalman filter concept and provide a step-by-step Python implementation for smoothing hourly BTCUSDC data from Binance.
The Kalman filter works recursively in two stages:
Prediction Step:
The filter uses a dynamic model to predict the next state of the
system.
Equation:
\[\hat{x}_{t|t-1} = \phi \, \hat{x}_{t-1|t-1} + C\]\[P_{t|t-1} = \phi \, P_{t-1|t-1} \, \phi^T + Q\]
Update (Correction) Step:
Once a new observation is available, the filter updates its prediction
using the measurement:
Equation: \[K_t = P_{t|t-1}
H^T \left(H \, P_{t|t-1} H^T + R\right)^{-1}\]\[\hat{x}_{t|t} = \hat{x}_{t|t-1} + K_t \left(Y_t -
(H \, \hat{x}_{t|t-1} + D)\right)\]\[P_{t|t} = \left(I - K_t H\right)
P_{t|t-1}\]
In our implementation, we maintain a two-dimensional state vector \(x = \begin{bmatrix} \text{price} \\ \text{velocity} \end{bmatrix}\). This allows us not only to track the price but also its rate of change. When tuning your Kalman filter implementation, several parameters can be adjusted to obtain different smoothing behaviors and trading signal outcomes. Mainly by tweaking process and measurement noise covariances \(Q\) and \(R\) : \(Q\) represents how much uncertainty or variability you expect in the state evolution (e.g., price dynamics). A larger \(Q\) makes the filter more flexible (more reactive to rapid changes) but can introduce more volatility in the smoothed signal. A smaller \(Q\) forces the filter to produce a smoother, more conservative estimate, assuming the process is more stable. \(R\) captures the uncertainty or noise level in the measurements (observed prices). A larger \(R\) tells the filter to trust the model’s prediction over the noisy measurement, leading to a smoother output with slower reaction to new data. A smaller \(R\) makes the filter more responsive to the observed data, which might be noisy but reflects sudden market moves. ## Python Implementation Without a Class
Instead of encapsulating the filter into a class, we define two standalone functions for the prediction and update steps.
First, we fetch hourly BTCUSDC data from Binance for February 2024 and preprocess it:
import pandas as pd
from binance.client import Client
# --- Fetch Binance Hourly Data ---
= Client() # Assumes API keys are set in your environment
client = 'BTCUSDC'
pair = pd.DataFrame(client.get_historical_klines(pair, '1h', '2024-02-01', '2024-03-01'))
data = ['timestamp', 'open', 'high', 'low', 'close', 'volume',
data.columns 'close_time', 'quote_asset_volume', 'trades',
'taker_buy_base', 'taker_buy_quote', 'ignore']
'timestamp'] = pd.to_datetime(data['timestamp'], unit='ms')
data[= ['open', 'high', 'low', 'close', 'volume']
ohlcv_columns = data[ohlcv_columns].astype(float)
data[ohlcv_columns] 'timestamp', inplace=True)
data.set_index(
# Shift the data to avoid potential lookahead bias in indicator calculations
= data.shift()
data =True) data.dropna(inplace
In this snippet, we convert the raw data into a DataFrame with properly formatted timestamps and numeric columns. We then shift the data by one period to avoid lookahead bias.
Here we implement the two key steps of the Kalman filter: prediction and update.
import numpy as np
def kf_predict(x, P, phi, C, Q):
"""
Prediction step of the Kalman filter.
Parameters:
x - current state estimate (vector)
P - current covariance estimate (matrix)
phi - state transition matrix
C - control (drift) vector
Q - process noise covariance
Returns:
x, P - predicted state and covariance estimates
"""
= np.matmul(phi, x) + C
x = np.matmul(np.matmul(phi, P), phi.T) + Q
P return x, P
def kf_update(x, P, Y, H, D, R):
"""
Correction step of the Kalman filter.
Parameters:
x - predicted state estimate
P - predicted covariance estimate
Y - new measurement vector
H - measurement matrix
D - measurement offset vector
R - measurement noise covariance
Returns:
x, P - updated state and covariance estimates
"""
= np.eye(x.shape[0])
I = Y - (np.matmul(H, x) + D)
innovation = np.matmul(np.matmul(H, P), H.T) + R
S = np.matmul(np.matmul(P, H.T), np.linalg.inv(S))
K = x + np.matmul(K, innovation)
x = np.matmul((I - np.matmul(K, H)), P)
P return x, P
These functions encapsulate the mathematical operations of the Kalman filter, enabling you to update the state with each new measurement.
We now set up the initial state and matrices for our filter. Here, we model the state as \([ \text{price}, \text{velocity} ]\).
# For a 2D state: [price, price_velocity]
= 1.0 # 1-hour time step
delta_t = np.array([[1, delta_t],
phi 0, 1]])
[= np.array([[0],
C 0]])
[= np.array([[1, 0]])
H = np.array([[0]])
D # Process noise covariance (adjust based on your assumptions)
= np.array([[1e-5, 0],
Q 0, 1e-5]])
[# Measurement noise covariance
= np.array([[1e-2]])
R
# Initialize the state with the first closing price and zero velocity.
= data['close'].values
prices = np.array([[prices[0]],
x 0]])
[= np.eye(2) P
With our filter set up, we now loop through the price series, applying the prediction and update steps. The predicted price is stored as the smoothed value.
= []
smoothed_prices for price in prices:
# Prediction step
= kf_predict(x, P, phi, C, Q)
x, P # Store the predicted price (the first element of state vector)
0, 0])
smoothed_prices.append(x[# Update step with the actual observed price
= np.array([[price]])
measurement = kf_update(x, P, measurement, H, D, R) x, P
Here, for each new hourly price, we:
Finally, we visualize the raw closing prices and the Kalman filter–smoothed prices.
import matplotlib.pyplot as plt
=(12, 6))
plt.figure(figsize='Raw Close Price', alpha=0.6)
plt.plot(data.index, prices, label='Kalman Filter Smoothed', linewidth=2)
plt.plot(data.index, smoothed_prices, label"BTCUSDC Hourly: Raw vs. Kalman Filter Smoothed Prices")
plt.title("Time")
plt.xlabel("Price")
plt.ylabel(
plt.legend()=45)
plt.xticks(rotation
plt.tight_layout() plt.show()
This plot helps you compare the noisy raw price data against the
smoother signal produced by the Kalman filter.
In this article, we explained the key concepts of the Kalman filter, including its prediction and update steps. We then provided a detailed Python implementation for smoothing price data—using the Binance API to fetch hourly BTCUSDC data. This approach can be extended to other time series data and adapted for real-time trading strategies.
By following these steps and using the provided code snippets, you can implement your own Kalman filter to reduce noise in financial data and improve your analysis or trading models.