Forecasting Crypto Prices with Gradient Boosting


Gradient Boosting

Gradient boosting is a machine learning technique used in regression and classification tasks, among others. It gives a prediction model in the form of an ensemble of weak prediction models, i.e., models that make very few assumptions about the data, which are typically simple decision trees. Gradient Boosting minimizes a loss function by iteratively fitting new models to the negative gradient of the loss function with respect to the predictions of the ensemble. The predicted value \(F(x)\) at each step is updated as: \[\hat{F}=\underset{F}{\arg \min } \mathbb{E}_{x, y}[L(y, F(x))]\] approximating \(\hat F(x)\) in the wighted sum of functions \(h_m(x)\) that are the base or weak learners: \[\hat{F}(x)=\sum_{m=1}^M \gamma_m h_m(x)+ cons\] We find the approximation \(\hat F(x)\) by minimizing average loss function: \[\begin{aligned} & F_0(x)=\underset{\gamma}{\arg \min } \sum_{i=1}^n L\left(y_i, \gamma\right), \\ & F_m(x)=F_{m-1}(x)+\left(\underset{h_m \in \mathcal{H}}{\arg \min } \left[\sum_{i=1}^n L\left(y_i, F_{m-1}\left(x_i\right)+h_m\left(x_i\right)\right)\right]\right)(x), \end{aligned}\]

Python Code

Let's download the price for Etherium for the past 3 months an calculate returns:
# Download and preprocess data
data = pd.DataFrame(columns=['price'])
data['price'] = pd.DataFrame(
    yf.download('ETH-USD', period='3mo', interval='1h')['Close'])
data['ret'] = np.log(data['price'] / data['price'].shift(1))
data.dropna(inplace=True)

plt.figure()
data[['price','ret']].plot(subplots=True)


Some features for training and predicting price movements. We use 24 hr min max price and momentum and volatility:
# Feature Engineering
data['r'] = np.log(data['price'] / data['price'].shift(1)).shift(1)
data['min'] = data['price'].rolling(24).min().shift(1)
data['max'] = data['price'].rolling(24).max().shift(1)
data['mom'] = data['ret'].rolling(24).mean().shift(1)
data['vol'] = data['ret'].rolling(24).std().shift(1)
data['d'] = np.where(data['ret'] > 0, 1, -1)

plt.figure()
data[['price', 'min', 'max']].plot()

plt.figure()
data[['mom', 'vol']].plot(subplots=True)




now, let's split our data into training and test parts and use Gradient Booster Classifier to model and then predict market movements:
# Split data into train and test sets
split = int(len(data) * 0.8)
data.dropna(inplace=True)
train = data.iloc[:split].copy()
test = data.iloc[split:].copy()

# Feature Scaling
scaler = StandardScaler()
scaled_train_features = scaler.fit_transform(train[['r', 'min', 'max', 'mom', 'vol']])
scaled_test_features = scaler.transform(test[['r', 'min', 'max', 'mom', 'vol']])

from sklearn.ensemble import GradientBoostingClassifier

# Create and train Gradient Boosting model
gb_model = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1)
gb_model.fit(scaled_train_features, train['d'])

# Make predictions and evaluate
test['gb_pred'] = gb_model.predict(scaled_test_features)
gb_score = accuracy_score(test['d'], test['gb_pred'])
print('Gradient Boosting accuracy:', gb_score)
Finally, we test trading using our predictions:
# Plot the results
plt.figure()
plt.plot((test['gb_pred'] * test['ret']).cumsum().apply(np.exp),
         label='strategy cumulative returns')
         
plt.plot(test['ret'].cumsum().apply(np.exp), '-k',
         label='ETH cumulative returns')
plt.xlabel('date')
plt.ylabel('returns')
plt.xticks(rotation=30)
plt.legend()
plt.show()


We can see that the strategy makes about 10% return in about 3 weeks time. The AI methods have great potential and can be used and extended further to develop successful trading strategies.

Contact

Have questions? I will be happy to help!

You can ask me anything. Just maybe not relationship advice.

I might not be very good at that. 😁