Article

Predicting Bitcoin’s Weekly Moves with 68% Accuracy using Random Forests in Python

Predicting the direction of volatile assets like Bitcoin is a central challenge in quantitative finance. While daily noise can make short-term predictions resemble random walks, analyzing trends over slightly longer horizons, like a week, might offer more traction. This article details a Python-based approach using a Random Forest classifier and a rolling forecast methodology to predict whether Bitcoin’s price will be higher or lower seven days from the present, leveraging a pre-selected set of technical indicators. We’ll cover the theory, the implementation with code snippets, and how to interpret the results.

1. Theoretical Background

Before diving into the code, let’s understand the core concepts:

a) Random Forest Classifier

Ensemble Learning: Random Forest is an ensemble machine learning method primarily used for classification and regression. It operates by constructing a multitude of individual decision trees during training.
How it Works:
1. Bagging (Bootstrap Aggregating): It creates multiple random subsets of the original training data (with replacement). A separate decision tree is trained on each subset.
2. Feature Randomness: When splitting a node in a decision tree, the algorithm considers only a random subset of the available features, rather than all of them. This decorrelates the trees.
3. Voting: For classification, the final prediction is determined by a majority vote among all the individual trees in the forest. The class predicted by the most trees wins.
Advantages:
- Handles non-linear relationships between features and the target well.
- Generally robust to overfitting compared to individual decision trees, especially when well-tuned.
- Can handle high-dimensional data (many features).
- Provides useful estimates of Feature Importance, indicating which features contributed most to the model’s decisions.
Equations: While the implementation is complex, the core idea relies on aggregating simple decision trees. The prediction \(\hat{y}\) for an input \(x\) is often represented conceptually as: \(\hat{y} = \operatorname*{majority\_vote}_{b=1}^{B} \{ \hat{y}_b(x) \}\) where B is the number of trees and \(\hat{y}_b(x)\) is the prediction of the \(b^{th}\) tree trained on a bootstrap sample and considering random feature subsets.

b) Feature Selection (Context)

This script assumes that a preliminary analysis has been performed to identify potentially predictive features. In our development process, Mutual Information scores were used to rank ~30 technical indicators based on their statistical relationship with the 1-day price direction. We will use the top 15 features identified in that analysis as inputs to our Random Forest model, assuming they might also hold relevance for the 7-day horizon.

c) Rolling Forecast Evaluation

Why Use It: Financial markets evolve. A model trained on data from years ago might not perform well today. A simple train-test split doesn’t capture this dynamic. A rolling forecast provides a more realistic simulation of how a model might perform when periodically retrained on recent data and used to predict the near future.
How it Works:
1. Define a fixed-size training window (e.g., the last 30 days).
2. Train the model on the data within this window.
3. Make a prediction for the target period (e.g., 7 days ahead).
4. Slide the window forward by one time step (e.g., one day).
5. Repeat steps 2-4 until the end of the dataset is reached.
6. Evaluate the model based on the aggregated predictions made across all windows.

d) Classification Metrics

Since we’re predicting direction (Up=1, Down=0), we use classification metrics:

Accuracy: Overall percentage of correct predictions. Accuracy=TP+TN+FP+FNTP+TN
Precision (for class 1): Of the times the model predicted ‘Up’, how often was it right? Minimizes False Positives (FP). Precision=TP+FPTP
Recall (Sensitivity, for class 1): Of all the actual ‘Up’ movements, how many did the model catch? Minimizes False Negatives (FN). Recall=TP+FNTP
F1-Score (for class 1): Harmonic mean of Precision and Recall, useful for imbalanced datasets. F1=2×Precision+RecallPrecision×Recall
AUC-ROC: Area Under the Receiver Operating Characteristic Curve. Measures the model’s ability to distinguish between classes across1 different probability thresholds (0.5 = random, 1.0 = perfect).
Confusion Matrix: A table visualizing performance:

	Predicted Down (0)	Predicted Up (1)
Actual Down (0)	True Negative (TN)	False Positive(FP)
Actual Up (1)	False Negative(FN)	True Positive (TP)

2. Python Implementation Details

Let’s walk through the key parts of the Python script.

a) Setup and Configuration

Import libraries and set up parameters. Critically, set PREDICTION_HORIZON = 7 and define the TRAINING_WINDOW_DAYS and the list of TOP_FEATURES derived from previous analysis.

Python

# ==============================================================================
# Imports
# ==============================================================================
import pandas as pd
import numpy as np
import yfinance as yf
import talib # Make sure TA-Lib is installed
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import (accuracy_score, precision_score, recall_score,
                             f1_score, confusion_matrix, ConfusionMatrixDisplay,
                             roc_auc_score)
import warnings
# ... (warnings configuration) ...

# ==============================================================================
# Configuration
# ==============================================================================
TICKER = 'BTC-USD'
START_DATE = '2021-01-01' # Needs enough data for rolling
END_DATE = None
INTERVAL = '1d' # Daily data

# --- Rolling Window Parameters ---
TRAINING_WINDOW_DAYS = 30 # Approx 1 month training window
PREDICTION_HORIZON = 7    # Predict direction 7 days ahead

# --- Feature Selection ---
# Using Top 15 features identified previously from MI analysis
TOP_FEATURES = [
    'ROC_10', 'STOCHRSI_d', 'ADX_14', 'STOCHRSI_k', 'RSI_14',
    'STOCH_k', 'ATR_14', 'EMA_20', 'STOCH_d', 'MACD',
    'ULTOSC', 'BB_upper', 'SAR', 'Open_Close', 'MACD_hist'
]

# --- Random Forest Model Parameters ---
N_ESTIMATORS = 150
MAX_DEPTH = 8
MIN_SAMPLES_LEAF = 5
CLASS_WEIGHT = 'balanced'
RANDOM_STATE = 42

b) Data Loading and Indicator Calculation

Standard functions using yfinance and talib are used to fetch OHLCV data and compute the full set of ~30 technical indicators.

Python

# Function definitions for load_data and calculate_indicators
# (Use the full function definitions from the previous script response)

# In main execution block:
data = load_data(TICKER, START_DATE, END_DATE, INTERVAL)
if data is not None:
    data_indicators = calculate_indicators(data.copy())

c) Target Variable and Feature Preparation

The 7-day target variable (1 if price is higher 7 days later, 0 otherwise) is created. The data is cleaned of NaNs, and only the TOP_FEATURES columns are selected into the X_all_features DataFrame, while the Target column becomes Y_all.

Python

# Function definition for create_target (horizon=PREDICTION_HORIZON)
# (Use the function definition from the previous script response)

# In main execution block:
data_target = create_target(data_indicators, horizon=PREDICTION_HORIZON)
data_processed = data_target.dropna()

available_features = [f for f in TOP_FEATURES if f in data_processed.columns]
# ... (Error handling if features are missing) ...

X_all_features = data_processed[available_features]
Y_all = data_processed['Target']
Dates_all = data_processed.index # Keep dates for plotting results

d) The Rolling Forecast Loop

This is the core logic change from a simple train/test split.

Python

# --- Rolling Forecast Loop ---
all_predictions = []
all_actuals = []
all_predict_dates = []
all_probabilities = []

start_index = TRAINING_WINDOW_DAYS
end_index = len(X_all_features) - PREDICTION_HORIZON

print(f"\nStarting rolling forecast from index {start_index} to {end_index-1}...")

for i in range(start_index, end_index):
    # 1. Define window boundaries
    train_start_idx = i - TRAINING_WINDOW_DAYS
    train_end_idx = i
    predict_feature_idx = i
    actual_target_idx = i

    # 2. Extract current window data
    X_train_window = X_all_features.iloc[train_start_idx:train_end_idx]
    Y_train_window = Y_all.iloc[train_start_idx:train_end_idx]
    X_predict_point = X_all_features.iloc[[predict_feature_idx]]
    Y_actual_point = Y_all.iloc[actual_target_idx]

    # 3. Scale features WITHIN the loop
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train_window)
    X_predict_scaled = scaler.transform(X_predict_point)

    # 4. Build and Train Model WITHIN the loop
    rf_model = RandomForestClassifier(
        n_estimators=N_ESTIMATORS,
        max_depth=MAX_DEPTH,
        min_samples_leaf=MIN_SAMPLES_LEAF,
        random_state=RANDOM_STATE,
        n_jobs=-1,
        class_weight=CLASS_WEIGHT
    )
    rf_model.fit(X_train_scaled, Y_train_window)

    # 5. Predict and Store Results
    prediction = rf_model.predict(X_predict_scaled)[0]
    probability = rf_model.predict_proba(X_predict_scaled)[0, 1] # Robust extraction might be needed here too

    all_predictions.append(prediction)
    all_actuals.append(Y_actual_point)
    all_probabilities.append(probability)
    all_predict_dates.append(Dates_all[actual_target_idx])

    # ... (Optional progress print) ...

print("Rolling forecast complete.")

Crucially, the StandardScaler and RandomForestClassifier are initialized and fitted inside the loop on each window’s data.

e) Aggregated Evaluation

After the loop completes, the collected predictions and actual values are used to calculate the overall performance metrics.

Python

# --- Evaluate Aggregated Results ---
if not all_actuals:
    print("No predictions were made.")
else:
    print("\n--- Aggregated Rolling Forecast Metrics ---")
    accuracy = accuracy_score(all_actuals, all_predictions)
    precision = precision_score(all_actuals, all_predictions, zero_division=0)
    recall = recall_score(all_actuals, all_predictions, zero_division=0)
    f1 = f1_score(all_actuals, all_predictions, zero_division=0)
    try:
        roc_auc = roc_auc_score(all_actuals, all_probabilities)
    except ValueError:
        roc_auc = float('nan')
        # ... (print warning) ...

    print(f"Accuracy:         {accuracy:.4f}")
    print(f"Precision (for 1):{precision:.4f}")
    # ... (print other metrics) ...

    # Baseline comparison
    majority_class_overall = Y_all.value_counts().idxmax()
    baseline_accuracy = accuracy_score(all_actuals, np.full(len(all_actuals), majority_class_overall))
    print(f"\nBaseline Accuracy (...): {baseline_accuracy:.4f}")

    # Confusion Matrix Plotting
    print("\n--- Confusion Matrix (Aggregated Rolling Forecast) ---")
    cm = confusion_matrix(all_actuals, all_predictions)
    print(cm)
    disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=[0, 1])
    # ... (Plotting code for CM) ...
    plt.show()

    # Optional: Plot actual vs predicted directions over time
    # ... (Plotting code for results_df) ...
    plt.show()

3. Results and Interpretation (Based on Your Last Run)

Your last run with this rolling Random Forest approach yielded:

Accuracy: ~0.6793 (vs. Baseline ~0.5136)
Precision (Up): ~0.6919
Recall (Up): ~0.6772
F1-Score (Up): ~0.6845
AUC-ROC: ~0.7524
Confusion Matrix: [[122 57] / [ 61 128]]

Interpretation:

These results show a clear improvement over random chance and the baseline of simply predicting the majority class. The model achieved ~68% accuracy in predicting the 7-day direction over the rolling test period. Precision and Recall are reasonably balanced (around 68-69%), indicating the model identifies ‘Up’ moves moderately well without excessively predicting ‘Up’ incorrectly. The AUC of ~0.75 suggests a decent discriminatory ability. While not perfect, these results indicate that the combination of selected features, the Random Forest model, and the rolling approach captured a statistically significant predictive signal in the historical data tested.

4. How to Use the Code

Install Prerequisites: Ensure Python, pandas, numpy, yfinance, matplotlib, seaborn, scikit-learn, and crucially, TA-Lib (C library + Python wrapper) are installed.
Save: Save the complete code as a Python file (e.g., rolling_rf_btc.py).
Configure: Modify settings like TICKER, START_DATE, TRAINING_WINDOW_DAYS, PREDICTION_HORIZON, or Random Forest parameters if desired.
Run: Execute from your terminal: python rolling_rf_btc.py. It will take some time as the model retrains repeatedly.
Analyze: Review the printed metrics and the confusion matrix plot. Compare accuracy to the baseline. Assess if the Precision/Recall/F1/AUC meet your requirements for considering a signal potentially useful.

5. Limitations and Conclusion

Historical Performance: Success on past data doesn’t guarantee future results. Markets change.
Not a Trading Strategy: This analyzes predictive accuracy ONLY. It lacks entry/exit rules, risk management, cost simulation, etc.
Need for Tuning/Testing: Results depend heavily on the chosen features, hyperparameters, and time period. Extensive testing and tuning are required for any real application.
Feature Stability: The selected TOP_FEATURES might lose predictive power over time.

In conclusion, this script provides a robust framework for evaluating the predictive power of technical indicators for Bitcoin’s weekly direction using a Random Forest model and a realistic rolling forecast method. The results achieved (~68% accuracy, ~0.75 AUC historically) demonstrate a potential edge worthy of further investigation, but require critical interpretation and significant further development before any practical trading application.