In the quest to build profitable trading strategies, particularly in volatile markets like Bitcoin, identifying which technical indicators genuinely provide predictive information is a crucial first step. With dozens of indicators available, how do you sift through them to find those most relevant to future price movements?
This article guides you through a Python script designed for exactly this purpose. It provides a sophisticated approach to analyze the historical predictive power of approximately 30 different technical indicators for forecasting Bitcoin’s next-day price direction. We’ll break down the code step-by-step, explain how to run it, and discuss how to interpret the results using two distinct feature importance methods: Mutual Information and Random Forest Importance.
Objective:
The goal of this script is not to create a trading bot, but rather to perform feature analysis. It helps answer the question: “Based on historical data, which of these technical indicators had the strongest relationship with whether Bitcoin’s price went up or down the next day?”
Prerequisites:
Before running the script, you need a Python environment with the following libraries installed:
pandas
: For data manipulation.numpy
: For numerical operations.yfinance
: To download market data from Yahoo
Finance.matplotlib
& seaborn
: For plotting the
results.scikit-learn
: For preprocessing, feature selection
(Mutual Information), and the Random Forest model.TA-Lib
: A crucial library for calculating technical
indicators. Important: Installing TA-Lib can sometimes
be tricky as it requires the underlying TA-Lib C library to be installed
first. Follow the official instructions carefully: https://mrjbq7.github.io/ta-lib/install.htmlYou can typically install the Python wrappers (once the C library is set up) using pip:
pip install pandas numpy yfinance matplotlib seaborn scikit-learn TA-Lib
How the Script Works: Step-by-Step Breakdown
The script follows a logical workflow from data acquisition to analysis:
A. Configuration
The script begins with a configuration section where you can easily modify key parameters for your analysis.
Python
# ==============================================================================
# Configuration
# ==============================================================================
TICKER = 'BTC-USD' # Asset to analyze (e.g., 'ETH-USD', 'AAPL')
START_DATE = '2018-01-01' # Start date for historical data
END_DATE = None # End date (None uses latest data) or 'YYYY-MM-DD'
PREDICTION_HORIZON = 1 # How many days ahead to predict direction (e.g., 1 = next day)
TEST_SIZE = 0.2 # Proportion of data reserved (chronologically) for potential later testing
# Note: This analysis primarily uses the training portion
TICKER
: The symbol recognized by Yahoo
Finance for the asset you want to analyze.START_DATE
/END_DATE
:
Defines the period for historical data. A longer period (several years)
is generally better for robustness.PREDICTION_HORIZON
: Sets the timeframe
for the direction prediction (e.g., 1
means predicting if
the close price tomorrow will be higher than today’s
close).TEST_SIZE
: Reserves the final 20% of
the data chronologically. While this script focuses importance analysis
on the training part (the first 80%), reserving a test set is
good practice for later model validation if you build upon this
analysis.B. Data Loading
The load_data
function fetches the necessary OHLCV
(Open, High, Low, Close, Volume) data using yfinance
.
Python
# In main execution block:
data = load_data(TICKER, START_DATE, END_DATE)
It includes error handling and basic column name standardization.
C. Indicator Calculation
The calculate_indicators
function is the workhorse for
feature engineering. It takes the raw OHLCV data and computes
approximately 30 different technical indicators using the installed
TA-Lib
library.
Python
# Example snippets from inside the calculate_indicators function:
# Trend
df['SMA_20'] = talib.SMA(close, timeperiod=20)
df['ADX_14'] = talib.ADX(high, low, close, timeperiod=14)
# Momentum
df['RSI_14'] = talib.RSI(close, timeperiod=14)
df['MACD'], df['MACD_signal'], df['MACD_hist'] = talib.MACD(close, fastperiod=12, slowperiod=26, signalperiod=9)
# Volatility
df['ATR_14'] = talib.ATR(high, low, close, timeperiod=14)
df['BB_upper'], df['BB_middle'], df['BB_lower'] = talib.BBANDS(close, timeperiod=20, nbdevup=2, nbdevdn=2, matype=0)
# Volume (if available)
if volume is not None and not (volume == 0).all():
df['OBV'] = talib.OBV(close, volume.astype(float))
# Other
df['High_Low'] = df['high'] - df['low']
# ... plus many others covering different indicator types ...
This function generates a wide range of potential predictor variables.
D. Target Variable Definition
The create_target
function defines what we are trying to
predict. It creates a binary Target
column.
Python
# Inside create_target function:
def create_target(df, horizon=1):
"""Creates binary target variable: 1 if future price > current, 0 otherwise."""
df['Future_Close'] = df['close'].shift(-horizon) # Look ahead 'horizon' days
# Target is 1 if the future price increased, 0 otherwise
df['Target'] = (df['Future_Close'] > df['close']).astype(int)
print(f"Target variable created for {horizon}-day future direction.")
return df
Here, Target
= 1 if the closing price
PREDICTION_HORIZON
days later is higher than the current
day’s closing price, and 0 otherwise.
E. Preprocessing
This stage prepares the data for analysis:
Python
# In main execution block:
# Drop rows with NaNs (essential after indicator/target calculation)
print(f"Shape before dropping NaNs: {data_target.shape}")
data_processed = data_target.dropna()
print(f"Shape after dropping NaNs: {data_processed.shape}")
# Separate Features (X) and Target (Y)
original_cols = ['open', 'high', 'low', 'close', 'adj_close', 'volume', 'Future_Close', 'Target']
features = [col for col in data_processed.columns if col not in original_cols]
X = data_processed[features]
Y = data_processed['Target']
# Split data chronologically (using first 80% for importance analysis)
split_index = int(len(X) * (1 - TEST_SIZE))
X_train, X_test = X[:split_index], X[split_index:]
Y_train, Y_test = Y[:split_index], Y[split_index:]
# Scale features (important for some analyses, good practice)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
# X_test_scaled = scaler.transform(X_test) # Scale test set if needed later
dropna()
: Removes rows containing
NaN
values, which are introduced by indicators requiring a
lookback period (like moving averages) and by the target variable’s
shift.X
) from the Target
column
(Y
).StandardScaler
to
standardize the features (mean=0, variance=1). This is helpful for
mutual information calculation and often beneficial for machine learning
models.F. Predictive Power Analysis 1: Mutual Information
This analysis uses mutual_info_classif
from
scikit-learn
to estimate the mutual information between
each scaled feature and the binary target variable on the training data.
Mutual information measures the reduction in uncertainty about the
target variable given knowledge of the feature, capturing both linear
and non-linear dependencies.
Python
# In main execution block:
print("\n--- Analyzing Feature Importance using Mutual Information ---")
try:
# Ensure Y_train has enough samples and variance
if len(Y_train.unique()) > 1 and len(Y_train) > 5:
mi_scores = mutual_info_classif(X_train_scaled, Y_train, discrete_features=False, random_state=42)
mi_series = pd.Series(mi_scores, index=features).sort_values(ascending=False)
# Plotting using seaborn
plt.figure(figsize=(12, 10))
sns.barplot(x=mi_series.values, y=mi_series.index, palette='viridis')
plt.title(f'Mutual Information Scores vs. Target (Next {PREDICTION_HORIZON}-Day Direction)')
plt.xlabel('Mutual Information Score')
# ... rest of plotting ...
plt.show()
print("Top 15 Features (Mutual Information):\n", mi_series.head(15))
# ... error handling ...
The bar chart visualizes these scores. Higher scores suggest a stronger statistical relationship between the indicator and the next day’s price direction in the training data.
G. Predictive Power Analysis 2: Random Forest Importance
This method trains a RandomForestClassifier
model on the
scaled training data and then extracts the feature importances
calculated by the model itself. For Random Forests, this is typically
the “mean decrease in impurity” (Gini importance) – it measures how
much, on average, splitting on a particular feature reduces the impurity
(improves the classification) across all the trees in the forest.
Python
# In main execution block:
print("\n--- Analyzing Feature Importance using Random Forest ---")
try:
if len(Y_train.unique()) > 1 and len(Y_train) > 5:
rf_model = RandomForestClassifier(n_estimators=200,
random_state=42,
n_jobs=-1,
max_depth=10,
min_samples_leaf=5,
class_weight='balanced') # Helps if Ups/Downs are imbalanced
rf_model.fit(X_train_scaled, Y_train)
rf_importances = rf_model.feature_importances_
rf_series = pd.Series(rf_importances, index=features).sort_values(ascending=False)
# Plotting using seaborn
plt.figure(figsize=(12, 10))
sns.barplot(x=rf_series.values, y=rf_series.index, palette='magma')
plt.title(f'Random Forest Feature Importance vs. Target (Next {PREDICTION_HORIZON}-Day Direction)')
plt.xlabel('Importance Score (Mean Decrease in Impurity)')
# ... rest of plotting ...
plt.show()
print("Top 15 Features (Random Forest):\n", rf_series.head(15))
# ... error handling ...
The bar chart visualizes these model-specific importance scores. Higher scores mean the Random Forest relied more heavily on that feature to make its predictions on the training data.
How to Use the Script
indicator_analysis.py
).TICKER
,
START_DATE
, etc.) for your desired analysis.python indicator_analysis.py
Interpreting the Results
Limitations and Critical Next Steps
TimeSeriesSplit
within the training data).Conclusion
This Python script offers a sophisticated starting point for quantitatively assessing which technical indicators might hold predictive value for Bitcoin’s short-term price direction. By using both Mutual Information and Random Forest importance, it provides two valuable perspectives. However, remember that this is an analytical tool for research, not a trading system. The insights gained must be rigorously validated through proper model building and backtesting before ever considering real-world application.