Article

Market Regime Detection using Hidden Markov Models

This article explores a Python script that leverages Hidden Markov Models (HMMs) to identify distinct market regimes (specifically strong bull and strong bear phases) within financial time series data. It then utilizes the backtrader library to visualize these regime shifts on a price chart.

Core Concepts:

Hidden Markov Models (HMMs): HMMs are statistical models assuming a system transitions through a sequence of unobservable (“hidden”) states. Each state has a probability distribution governing the observable outputs (or features). In finance, we can think of market regimes (bull, bear, ranging) as hidden states, and price movements (like returns and volatility) as observable features.
Backtrader: A popular Python framework for backtesting trading strategies and creating financial visualizations. It handles data loading, indicator calculations, strategy logic, and plotting.
Market Regimes: Distinct periods in the market characterized by different price behavior (e.g., strong upward trend, sharp downward trend, low-volatility sideways movement). Identifying the current regime can be crucial for adjusting trading strategies.

Prerequisites:

You’ll need the following Python libraries installed:

Bash

pip install backtrader yfinance numpy pandas hmmlearn matplotlib

Code Breakdown

Let’s dissect the provided script section by section.

1. Imports:

Python

import backtrader as bt
import yfinance as yf
import numpy as np
import pandas as pd
import warnings
from hmmlearn import hmm
import matplotlib.pyplot as plt
# Optional: Configure matplotlib backend if needed
# %matplotlib qt5

warnings.filterwarnings("ignore") # Suppress common warnings

backtrader (bt): The core backtesting and plotting engine.
yfinance (yf): Used to download historical stock/crypto data from Yahoo Finance.
numpy (np): For numerical operations, especially array manipulations.
pandas (pd): For data manipulation using DataFrames.
warnings: To control how warnings are handled (here, they are suppressed).
hmmlearn.hmm: Provides the GaussianHMM class for implementing Hidden Markov Models with Gaussian emissions.
matplotlib.pyplot (plt): Used by backtrader (and potentially directly) for plotting.

2. HMM Training and State Identification (train_hmm_and_identify_states):

This is the heart of the regime detection logic.

Python

# --- MODIFICATIONS ONLY WITHIN THIS FUNCTION ---
def train_hmm_and_identify_states(df, n_states=5, n_iter=500, tol=1e-4, vol_window=20):
    """
    Train an HMM on [Log Return, Volatility of Log Return], label each bar with its state,
    and identify strong/weak bull & bear plus ranging regimes based on mean log return.
    # ... (docstring continues) ...
    """
    df_hmm = df.copy() # Work on a copy to avoid modifying the original DataFrame

    # --- Feature Calculation using Log Returns ---
    # Log returns are often preferred in finance as they are additive over time
    # and approximate percentage changes for small values.
    df_hmm['Log Return'] = np.log(df_hmm['Close'] / df_hmm['Close'].shift(1))
    df_hmm['Log Return'].fillna(0, inplace=True) # Handle the first NaN value

    # Calculate rolling standard deviation of log returns as a measure of volatility
    df_hmm['Volatility'] = df_hmm['Log Return'].rolling(vol_window).std()
    df_hmm['Volatility'].fillna(0, inplace=True) # Handle initial NaNs from rolling window

    # --- Select features for HMM ---
    # The HMM will learn hidden states based on these observable features.
    # More features could potentially improve state differentiation.
    X = df_hmm[['Log Return', 'Volatility']].values

    # Handle potential numerical issues before fitting
    if np.any(np.isnan(X)) or np.any(np.isinf(X)):
        print("Warning: NaNs or Infs detected in HMM features. Replacing with 0.")
        X = np.nan_to_num(X, nan=0.0, posinf=0.0, neginf=0.0)

    # --- HMM Training ---
    # GaussianHMM assumes the features within each hidden state follow a Gaussian distribution.
    # 'n_components': The number of hidden states to find (a key tuning parameter).
    # 'covariance_type="diag"': Assumes features are independent within a state (simpler, less prone to overfitting).
    # 'n_iter', 'tol': Control the convergence of the training algorithm.
    model = hmm.GaussianHMM(
        n_components=n_states,
        covariance_type='diag',
        n_iter=n_iter,
        tol=tol,
        random_state=42, # For reproducibility
        verbose=False
    )

    print(f"\nFitting HMM with {n_states} states...")
    try:
        # Fit the HMM model to the feature data (X)
        with warnings.catch_warnings(): # Suppress specific warnings during fitting
            warnings.filterwarnings("ignore", category=DeprecationWarning)
            warnings.filterwarnings("ignore", category=RuntimeWarning)
            model.fit(X)
    except ValueError as e:
        print(f"Error fitting HMM: {e}")
        print("Check input data X for issues.")
        raise e

    if not model.monitor_.converged:
        print(f"Warning: HMM did not converge after {n_iter} iterations.")

    # Predict the most likely hidden state for each data point
    states = model.predict(X)
    df_hmm['HMM_State'] = states # Add the predicted states back to the DataFrame

    # --- State Interpretation ---
    # Analyze the characteristics of each predicted state
    stats = []
    for i in range(n_states):
        mask = (states == i)
        if mask.sum() == 0: # Check if a state was even predicted
            print(f"Warning: State {i} was not predicted for any data point.")
            continue
        # Calculate average log return and volatility for data points belonging to this state
        stats.append({
            'State': i,
            'Mean Log Return': df_hmm.loc[mask, 'Log Return'].mean(),
            'Mean Volatility': df_hmm.loc[mask, 'Volatility'].mean(),
            'Count': mask.sum() # How many data points belong to this state
        })

    if not stats:
        raise ValueError("HMM training resulted in no predictable states.")

    # Sort states by their average log return (descending)
    # Assumption: Highest mean return = Strong Bull, Lowest mean return = Strong Bear
    stats_df = pd.DataFrame(stats).sort_values('Mean Log Return', ascending=False).reset_index(drop=True)

    print("\nHMM State Summary (sorted by Mean Log Return):")
    print(stats_df.to_string(index=False, float_format='{:.6f}'.format))

    # Assign regimes based on sorted order (assuming 5 states initially)
    state_indices = stats_df['State'].tolist()
    s_bull_strong = -1 # Initialize with invalid index
    s_bear_strong = -1

    # Adjust assignment based on how many distinct states were actually found
    if len(state_indices) > 0:
        s_bull_strong = state_indices[0]      # State with highest mean log return
        s_bear_strong = state_indices[-1]     # State with lowest mean log return
    # (The code handles cases with < 5 states by only assigning strong bull/bear)

    print(f"\nRegime mapping (based on Mean Log Return sort):")
    print(f"  Strong Bull State = {s_bull_strong} (Highest Mean Log Return)")
    print(f"  Strong Bear State = {s_bear_strong} (Lowest Mean Log Return)")

    # Basic check for valid state assignment
    if s_bull_strong < 0 or s_bear_strong < 0:
          print("\nError: Could not reliably assign Strong Bull or Strong Bear state index.")
          # The indicator initialization will later catch these invalid indices

    print("\nReturning states for Strong Bull and Strong Bear signals.")
    # Return the DataFrame with HMM states and the identified indices for strong bull/bear
    return df_hmm, s_bull_strong, s_bear_strong
# --- END OF MODIFICATIONS ---

Feature Engineering: Calculates log returns and rolling volatility of log returns. These serve as the observable inputs for the HMM.
HMM Instantiation: Creates a GaussianHMM model. n_states=5 is a crucial parameter – it dictates how many distinct market patterns the model should try to find.
Training: The model.fit(X) method trains the HMM using the Expectation-Maximization algorithm to find the parameters (transition probabilities between states, emission probabilities for each state) that best explain the observed feature data (X).
State Prediction: model.predict(X) determines the most likely hidden state for each time step based on the trained model and the observed features.
State Interpretation: After training, the script analyzes the average log return and volatility associated with each identified state. It sorts the states based on mean log return, assuming the state with the highest mean return corresponds to a “Strong Bull” regime and the state with the lowest mean return corresponds to a “Strong Bear” regime.
Return Values: The function returns the original DataFrame augmented with the HMM_State column, and the integer indices corresponding to the identified strong bull and strong bear states.

3. Custom Backtrader Data Feed (HMMData):

Python

class HMMData(bt.feeds.PandasData):
    """Custom PandasData that carries the HMM_State column through as `hmm_state`."""
    lines = ('hmm_state',) # Declare the new data line
    params = (
        # Map standard OHLCV columns
        ('datetime', None), # Use index for datetime
        ('open', 'Open'),
        ('high', 'High'),
        ('low', 'Low'),
        ('close', 'Close'),
        ('volume', 'Volume'),
        ('openinterest', None), # Not used here
        # Map our custom column 'HMM_State' from the DataFrame to the 'hmm_state' line
        ('hmm_state', 'HMM_State'),
    )

This class inherits from bt.feeds.PandasData.
It tells backtrader how to read the Pandas DataFrame prepared earlier.
Crucially, it adds a custom data line hmm_state and maps it to the HMM_State column created by the HMM function. This makes the HMM state available within backtrader strategies and indicators.

4. Custom Backtrader Indicator (HMMRegimeStartSignal):

Python

class HMMRegimeStartSignal(bt.Indicator):
    """
    Signals the first bar of each new strong bull or strong bear regime
    by comparing the current HMM state to the prior bar.
    """
    lines = ('bull_start', 'bear_start',) # Output lines for signals
    params = (
        ('bull_state_idx', None), # Parameter to receive the strong bull state index
        ('bear_state_idx', None), # Parameter to receive the strong bear state index
    )
    plotinfo = dict(subplot=False) # Plot directly on the price chart
    plotlines = dict(
        # Define how the signals should be plotted (green up triangles, red down triangles)
        bull_start=dict(marker='^', markersize=8, color='green', linestyle='None'),
        bear_start=dict(marker='v', markersize=8, color='red',   linestyle='None'),
    )

    def __init__(self):
        # Validate that valid state indices were passed from the main script
        if self.p.bull_state_idx is None or self.p.bear_state_idx is None or \
           self.p.bull_state_idx < 0 or self.p.bear_state_idx < 0:
            raise ValueError("Must pass valid non-negative bull_state_idx and bear_state_idx to HMMRegimeStartSignal")
        # Access the custom hmm_state line from the data feed
        self.hmm_state = self.data.hmm_state

    def next(self):
        # Called for each bar of data (once enough data is available)
        if len(self.data) < 2: # Need at least two bars to compare current and previous state
            return

        # Default signal values to NaN (no signal)
        self.lines.bull_start[0] = float('nan')
        self.lines.bear_start[0] = float('nan')

        # Get current and previous HMM state
        curr = int(self.data.hmm_state[0])
        prev = int(self.data.hmm_state[-1])
        b = self.p.bull_state_idx # Convenience alias for bull state index
        r = self.p.bear_state_idx # Convenience alias for bear state index (renamed from 'r' for clarity)

        # --- Signal Logic ---
        # Strong bull entry: Current state is strong bull, previous was not.
        if curr == b and prev != b:
            # Place a green marker slightly below the low of the current bar
            self.lines.bull_start[0] = self.data.low[0] * 0.99

        # Exit strong bull: Previous state was strong bull, current is not.
        # This is treated as a potential sell/bearish signal.
        elif prev == b and curr != b:
            # Place a red marker slightly above the high of the current bar
            self.lines.bear_start[0] = self.data.high[0] * 1.01

        # Strong bear entry: Current state is strong bear, previous was not.
        elif curr == r and prev != r:
            # Place a red marker slightly above the high of the current bar
            self.lines.bear_start[0] = self.data.high[0] * 1.01

This class inherits from bt.Indicator.
It takes the identified bull_state_idx and bear_state_idx as parameters.
The __init__ method validates these parameters and gets access to the hmm_state data line.
The next method contains the core logic:
- It compares the hmm_state of the current bar ([0]) with the hmm_state of the previous bar ([-1]).
- It generates a bull_start signal (plots a green marker) only on the first bar where the state transitions into the bull_state_idx.
- It generates a bear_start signal (plots a red marker) on the first bar where the state transitions into the bear_state_idx OR when the state transitions out of the bull_state_idx. This treats both entering a bear state and exiting a bull state as potentially bearish signals for visualization.
The plotinfo and plotlines dictionaries configure how these signals appear on the chart.

5. Main Execution Block (if __name__ == '__main__':)

Python

if __name__ == '__main__':
    ticker, start, end = 'BTC-USD', '2022-01-01', '2023-12-31'

    print(f"\nDownloading {ticker} data from {start} to {end}...")
    df = yf.download(ticker, start=start, end=end, progress=False)
    if df.empty:
        raise ValueError(f"No data downloaded for {ticker}.")

    # Optional: Flatten MultiIndex columns if yfinance returns them
    if isinstance(df.columns, pd.MultiIndex):
        df.columns = df.columns.droplevel(1)

    # --- Run HMM ---
    # Call the function to train HMM and get the state-augmented data + regime indices
    data_with_hmm, bull_state, bear_state = train_hmm_and_identify_states(df)

    # --- Backtrader Setup ---
    cerebro = bt.Cerebro(stdstats=False) # Create the main backtrader engine instance

    # --- Add Data ---
    # Ensure DataFrame index is DatetimeIndex (usually true for yfinance)
    if not isinstance(data_with_hmm.index, pd.DatetimeIndex):
         data_with_hmm.index = pd.to_datetime(data_with_hmm.index)
    # Create the custom data feed using the HMM-augmented DataFrame
    data_feed = HMMData(dataname=data_with_hmm)
    cerebro.adddata(data_feed) # Add the data feed to Cerebro

    # --- Add Indicators ---
    # Add the custom HMM signal indicator, passing the identified state indices
    cerebro.addindicator(HMMRegimeStartSignal,
                         bull_state_idx=bull_state,
                         bear_state_idx=bear_state)
    # Add standard Moving Average indicators for context
    cerebro.addindicator(bt.indicators.SimpleMovingAverage, period=30)
    cerebro.addindicator(bt.indicators.SimpleMovingAverage, period=90)

    # --- Run and Plot ---
    print("\n--- Running Cerebro (for plotting) ---")
    cerebro.run() # Run the engine (calculates indicators)

    # Configure plot appearance
    plt.rcParams['figure.figsize'] = (10, 6)
    plt.rcParams['figure.dpi'] = 100
    print("\n--- Generating Plot ---")
    # Generate the plot: includes price, volume, SMAs, and HMM signals
    cerebro.plot(style='line', volume=True, iplot=False) # iplot=False for static plot

Data Download: Fetches historical data for BTC-USD using yfinance.
HMM Training: Calls the train_hmm_and_identify_states function.
Cerebro Initialization: Creates a backtrader Cerebro engine.
Data Addition: Adds the data (including HMM states) to Cerebro using the custom HMMData feed.
Indicator Addition: Adds the custom HMMRegimeStartSignal indicator (providing it with the bull_state and bear_state indices) and two standard Simple Moving Averages (SMAs) for visual context.
Execution: cerebro.run() processes the data and calculates the indicator values. Note that no trading strategy is added here; Cerebro is used primarily for its indicator calculation and plotting capabilities in this script.
Plotting: cerebro.plot() generates the final chart, displaying the price, volume, SMAs, and the HMM regime start signals (green and red markers).

### How it Works - Summary

Download historical price data (e.g., BTC-USD).
Calculate features relevant to market behavior (log returns, volatility).
Train a Gaussian Hidden Markov Model on these features to identify a predefined number of hidden market states (n_states).
Analyze the characteristics (mean return, mean volatility) of each state found by the HMM.
Designate the state with the highest average return as the “Strong Bull” regime and the state with the lowest average return as the “Strong Bear” regime.
Feed the price data and the corresponding predicted HMM state for each bar into backtrader using a custom data feed.
Use a custom backtrader indicator to detect when the market transitions into the strong bull state (plot green marker) or transitions into the strong bear state / out of the strong bull state (plot red marker).
Display the price chart with standard indicators (like SMAs) and the HMM regime transition markers overlaid.

This script provides a powerful way to visualize potential market regime shifts identified by an HMM, which could be a valuable input for discretionary trading or the development of regime-aware automated strategies.