← Back to Home
Smoothing Financial Time Series using Kalman Filter in Python

Smoothing Financial Time Series using Kalman Filter in Python

Kalman filter is a powerful tool for estimating the hidden state of a dynamic system from noisy observations. Originally developed for aerospace applications, it has since found use in many fields including finance, where it can smooth or denoise price time series. In this article, we explain the Kalman filter concept and provide a step-by-step Python implementation for smoothing hourly BTCUSDC data from Binance.

Overview of the Kalman Filter

The Kalman filter works recursively in two stages:

  1. Prediction Step:
    The filter uses a dynamic model to predict the next state of the system.
    Equation:

    \[\hat{x}_{t|t-1} = \phi \, \hat{x}_{t-1|t-1} + C\]\[P_{t|t-1} = \phi \, P_{t-1|t-1} \, \phi^T + Q\]

    • \(\hat{x}\) is the state estimate.
    • \(P\) is the error covariance.
    • \(\phi\) is the state transition matrix.
    • \(C\) is a control or drift vector.
    • \(Q\) is the process noise covariance.
  2. Update (Correction) Step:
    Once a new observation is available, the filter updates its prediction using the measurement:
    Equation: \[K_t = P_{t|t-1} H^T \left(H \, P_{t|t-1} H^T + R\right)^{-1}\]\[\hat{x}_{t|t} = \hat{x}_{t|t-1} + K_t \left(Y_t - (H \, \hat{x}_{t|t-1} + D)\right)\]\[P_{t|t} = \left(I - K_t H\right) P_{t|t-1}\]

    • \(K_t\) is the Kalman gain.
    • \(Y_t\) is the measurement.
    • \(H\) is the measurement matrix.
    • \(D\) is a measurement offset.
    • \(R\) is the measurement noise covariance.

In our implementation, we maintain a two-dimensional state vector \(x = \begin{bmatrix} \text{price} \\ \text{velocity} \end{bmatrix}\). This allows us not only to track the price but also its rate of change. When tuning your Kalman filter implementation, several parameters can be adjusted to obtain different smoothing behaviors and trading signal outcomes. Mainly by tweaking process and measurement noise covariances \(Q\) and \(R\) : \(Q\) represents how much uncertainty or variability you expect in the state evolution (e.g., price dynamics). A larger \(Q\) makes the filter more flexible (more reactive to rapid changes) but can introduce more volatility in the smoothed signal. A smaller \(Q\) forces the filter to produce a smoother, more conservative estimate, assuming the process is more stable. \(R\) captures the uncertainty or noise level in the measurements (observed prices). A larger \(R\) tells the filter to trust the model’s prediction over the noisy measurement, leading to a smoother output with slower reaction to new data. A smaller \(R\) makes the filter more responsive to the observed data, which might be noisy but reflects sudden market moves. ## Python Implementation Without a Class

Instead of encapsulating the filter into a class, we define two standalone functions for the prediction and update steps.

1. Fetching and Preprocessing Binance Data

First, we fetch hourly BTCUSDC data from Binance for February 2024 and preprocess it:

import pandas as pd
from binance.client import Client

# --- Fetch Binance Hourly Data ---
client = Client()  # Assumes API keys are set in your environment
pair = 'BTCUSDC'
data = pd.DataFrame(client.get_historical_klines(pair, '1h', '2024-02-01', '2024-03-01'))
data.columns = ['timestamp', 'open', 'high', 'low', 'close', 'volume', 
                'close_time', 'quote_asset_volume', 'trades', 
                'taker_buy_base', 'taker_buy_quote', 'ignore']
data['timestamp'] = pd.to_datetime(data['timestamp'], unit='ms')
ohlcv_columns = ['open', 'high', 'low', 'close', 'volume']
data[ohlcv_columns] = data[ohlcv_columns].astype(float)
data.set_index('timestamp', inplace=True)

# Shift the data to avoid potential lookahead bias in indicator calculations
data = data.shift()
data.dropna(inplace=True)

In this snippet, we convert the raw data into a DataFrame with properly formatted timestamps and numeric columns. We then shift the data by one period to avoid lookahead bias.

2. Defining the Kalman Filter Functions

Here we implement the two key steps of the Kalman filter: prediction and update.

import numpy as np

def kf_predict(x, P, phi, C, Q):
    """
    Prediction step of the Kalman filter.
    
    Parameters:
      x   - current state estimate (vector)
      P   - current covariance estimate (matrix)
      phi - state transition matrix
      C   - control (drift) vector
      Q   - process noise covariance
      
    Returns:
      x, P - predicted state and covariance estimates
    """
    x = np.matmul(phi, x) + C
    P = np.matmul(np.matmul(phi, P), phi.T) + Q
    return x, P

def kf_update(x, P, Y, H, D, R):
    """
    Correction step of the Kalman filter.
    
    Parameters:
      x   - predicted state estimate
      P   - predicted covariance estimate
      Y   - new measurement vector
      H   - measurement matrix
      D   - measurement offset vector
      R   - measurement noise covariance
      
    Returns:
      x, P - updated state and covariance estimates
    """
    I = np.eye(x.shape[0])
    innovation = Y - (np.matmul(H, x) + D)
    S = np.matmul(np.matmul(H, P), H.T) + R
    K = np.matmul(np.matmul(P, H.T), np.linalg.inv(S))
    x = x + np.matmul(K, innovation)
    P = np.matmul((I - np.matmul(K, H)), P)
    return x, P

These functions encapsulate the mathematical operations of the Kalman filter, enabling you to update the state with each new measurement.

3. Initializing Filter Parameters

We now set up the initial state and matrices for our filter. Here, we model the state as \([ \text{price}, \text{velocity} ]\).

# For a 2D state: [price, price_velocity]
delta_t = 1.0  # 1-hour time step
phi = np.array([[1, delta_t],
                [0, 1]])
C = np.array([[0],
              [0]])
H = np.array([[1, 0]])
D = np.array([[0]])
# Process noise covariance (adjust based on your assumptions)
Q = np.array([[1e-5, 0],
              [0, 1e-5]])
# Measurement noise covariance
R = np.array([[1e-2]])

# Initialize the state with the first closing price and zero velocity.
prices = data['close'].values
x = np.array([[prices[0]],
              [0]])
P = np.eye(2)

4. Applying the Kalman Filter

With our filter set up, we now loop through the price series, applying the prediction and update steps. The predicted price is stored as the smoothed value.

smoothed_prices = []
for price in prices:
    # Prediction step
    x, P = kf_predict(x, P, phi, C, Q)
    # Store the predicted price (the first element of state vector)
    smoothed_prices.append(x[0, 0])
    # Update step with the actual observed price
    measurement = np.array([[price]])
    x, P = kf_update(x, P, measurement, H, D, R)

Here, for each new hourly price, we:

5. Plotting the Results

Finally, we visualize the raw closing prices and the Kalman filter–smoothed prices.

import matplotlib.pyplot as plt

plt.figure(figsize=(12, 6))
plt.plot(data.index, prices, label='Raw Close Price', alpha=0.6)
plt.plot(data.index, smoothed_prices, label='Kalman Filter Smoothed', linewidth=2)
plt.title("BTCUSDC Hourly: Raw vs. Kalman Filter Smoothed Prices")
plt.xlabel("Time")
plt.ylabel("Price")
plt.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

This plot helps you compare the noisy raw price data against the smoother signal produced by the Kalman filter. Pasted image 20250307141610.png


Conclusion

In this article, we explained the key concepts of the Kalman filter, including its prediction and update steps. We then provided a detailed Python implementation for smoothing price data—using the Binance API to fetch hourly BTCUSDC data. This approach can be extended to other time series data and adapted for real-time trading strategies.

By following these steps and using the provided code snippets, you can implement your own Kalman filter to reduce noise in financial data and improve your analysis or trading models.