Article

What if Darwin Traded Crypto An Experiment with Evolutionary AI & Neural Nets

Algorithmic trading, the use of computer programs to execute trading strategies, has revolutionized financial markets. Designing profitable strategies, however, remains a significant challenge. It often involves navigating complex market dynamics, identifying predictive patterns, and managing risk effectively. Machine learning and optimization techniques offer powerful tools to tackle this complexity.

This article delves into one such approach: using an Evolution Strategy (ES), a type of optimization algorithm inspired by natural evolution, to train a simple Neural Network (NN) based trading agent. We will explore the underlying theory of ES and NNs in this context, walk through a Python implementation using yfinance for Bitcoin data, and emphasize the importance of realistic backtesting with train/test splits.

Background Theory

1. Evolution Strategies (ES)

Evolution Strategies are a class of optimization algorithms belonging to the broader field of Evolutionary Computation. Unlike Genetic Algorithms (GAs) which often work with discrete representations (like binary strings) and rely heavily on crossover, ES typically operates directly on real-valued parameter vectors (like the weights of a neural network) and primarily uses mutation (often Gaussian noise) and selection to guide the search towards optimal solutions.

Core Idea: ES maintains a “population” of candidate solutions (parameter vectors). In each generation (iteration), it creates new candidate solutions by adding random perturbations (mutations) to the current best solution(s). It then evaluates the “fitness” (performance) of these new solutions using an objective function (in our case, a trading simulation reward). Finally, it updates the central solution vector by taking a weighted average of the perturbations, where the weights are determined by the fitness scores of the corresponding perturbed solutions. Solutions that yield better fitness contribute more to the direction of the update.
Simplified ES Update: A common, basic form of ES update rule for a parameter vector (or weight matrix) W can be expressed conceptually as:

\[ W_{t+1}=W_t+\frac{\alpha}{N\sigma}\sum_{k=1}^{N}R_k\epsilon_k \]

Where:
- Wt is the parameter vector at iteration t.
- α is the learning rate (step size).
- N is the population size.
- σ is the standard deviation of the Gaussian noise (mutation strength).
- ϵk is the random Gaussian noise vector added to create the kth population member (Wt+σϵk).
- Rk is the fitness (reward) obtained by the kth population member, often normalized (e.g., converted to standard scores).
This update essentially moves the current parameters Wt in a direction that is positively correlated with the perturbations that led to higher rewards.
Advantages: ES can be very effective for optimizing parameters of complex, non-differentiable systems where gradients are hard or impossible to compute (like the overall profit of a multi-step trading simulation). It’s a black-box optimization technique.

2. Neural Networks (NNs) for Policy Representation

In our agent, the neural network acts as the “brain” or the policy. It maps the current market state to a preferred action.

Function: It’s a function approximator. Given an input vector representing the market state, it outputs scores indicating the desirability of each possible action (Buy, Sell, Hold).
Simple Structure: We use a basic feedforward network with one hidden layer:
- Input Layer: Receives the state vector (e.g., recent price changes).
- Hidden Layer: Performs a linear transformation (Input⋅Whidden+Biashidden) followed potentially by a non-linear activation (though our implementation uses an implicit linear activation here). This layer learns intermediate features.
- Output Layer: Performs another linear transformation (HiddenOutput⋅Woutput) to produce the final action scores.
Parameters: The network’s behavior is determined by its weights (Whidden,Woutput) and biases (Biashidden). These are the parameters that the Evolution Strategy optimizes.

3. Trading Agent Framework

We can frame the trading problem in terms similar to Reinforcement Learning (RL), although ES optimizes differently:

Agent: The program making trading decisions.
Environment: The financial market (Bitcoin price time series).
State (St): A representation of the market at time t. Choosing a good state representation is crucial. Using raw prices can be problematic due to non-stationarity. Price changes or returns over a lookback window are often preferred, as used in our implementation.
Action (At): The decision made by the agent at time t (e.g., Buy, Sell, Hold).
Reward (Rt): A measure of how good the outcome was after taking actions. In ES applied to trading, the reward is typically sparse – calculated only at the end of a simulation episode (e.g., the total percentage profit/loss over the training period).

The Implemented Method: ES Optimizing NN Weights via Simulated Trading

Our approach uses the Evolution Strategy to directly optimize the weights of the neural network policy.

The ES generates variations (population members) of the current NN weights.
For each set of weights, the _calculate_reward_on_train function is called. This function simulates the agent trading over the entire training dataset using the NN with those specific weights to decide actions (Buy/Sell/Hold) at each step.
The simulation result (final percentage profit/loss on the training data) is returned as the fitness score (reward) for that set of weights.
The ES uses these rewards to update the central NN weights according to its update rule, aiming to find weights that maximize the simulated profit on the training data.

Implementation Details (Python)

Let’s look at the key parts of the Python code (using the version with the train/test split).

1. Data Handling and Splitting

We fetch historical Bitcoin data using yfinance and then split it chronologically into training and testing sets. This ensures we train the agent on one period and evaluate it on a completely separate, later period.

Python

import yfinance as yf
import numpy as np
import pandas as pd

ticker = 'BTC-USD'
try:
    # Fetch 3 years data for a reasonable split
    df = yf.download(ticker, period='3y')
    if df.empty:
        raise ValueError(f"No data fetched for {ticker}.")
    print(f"Fetched {len(df)} rows of data for {ticker}")
    df = df.sort_index()
    all_prices = df['Close'].values
    all_dates = df.index
except Exception as e:
    print(f"Error fetching data: {e}")
    exit()

# Split data: 80% train, 20% test
test_size_percentage = 0.20
split_index = int(len(all_prices) * (1 - test_size_percentage))

train_prices = all_prices[:split_index]
test_prices = all_prices[split_index:]
train_dates = all_dates[:split_index]
test_dates = all_dates[split_index:]

print(f"Data split: {len(train_prices)} training samples, {len(test_prices)} testing samples.")

Explanation: We get 3 years of daily closing prices for BTC-USD. We calculate an index (split_index) to divide the data, assigning the earlier 80% to train_prices and the later 20% to test_prices. Corresponding dates are also separated.

2. Neural Network Model (SimpleModel)

This class defines the structure and prediction logic of our simple neural network.

Python

class SimpleModel:
    """ A simple neural network model with one hidden layer. """
    def __init__(self, input_size, layer_size, output_size):
        # Initialize weights randomly with small values
        self.weights = [
            np.random.randn(input_size, layer_size) * 0.1,  # Input -> Hidden
            np.random.randn(layer_size, output_size) * 0.1, # Hidden -> Output
            np.random.randn(1, layer_size) * 0.1            # Hidden layer bias
        ]

    def predict(self, inputs):
        """ Makes a prediction based on the inputs and current weights. """
        if inputs.ndim == 1: inputs = inputs.reshape(1, -1) # Ensure input is 2D
        # Linear transformation for hidden layer + bias
        hidden_input = np.dot(inputs, self.weights[0]) + self.weights[2]
        # Linear activation (no non-linearity applied in this version)
        hidden_output = hidden_input
        # Linear transformation for output layer
        final_output = np.dot(hidden_output, self.weights[1])
        return final_output # Returns raw scores for actions

    def get_weights(self):
        return [w.copy() for w in self.weights] # Return copies

    def set_weights(self, weights):
        self.weights = [w.copy() for w in weights] # Use copies

Explanation: The model stores weights for input-to-hidden, hidden-to-output layers, and a bias for the hidden layer. The predict method performs matrix multiplications to calculate output scores based on the input state. get_weights and set_weights are used by the ES and Agent.

3. Evolution Strategy (EvolutionStrategy)

This class implements the optimization algorithm.

Python

class EvolutionStrategy:
    # ... (init, _get_perturbed_weights) ...

    def train(self, iterations=100, print_every=10):
        # ... (setup) ...
        for i in range(iterations):
            # 1. Generate population noise vectors (epsilon_k)
            population_noise = []
            rewards = np.zeros(self.population_size)
            for _ in range(self.population_size):
                member_noise = [np.random.randn(*w.shape) for w in self.weights]
                population_noise.append(member_noise)

            # 2. Evaluate population fitness (R_k)
            for k in range(self.population_size):
                perturbed_weights = self._get_perturbed_weights(self.weights, population_noise[k])
                # This calls Agent._calculate_reward_on_train
                rewards[k] = self.reward_function(perturbed_weights)

            # 3. Normalize rewards
            if np.std(rewards) > 1e-7:
                 rewards = (rewards - np.mean(rewards)) / np.std(rewards)
            else:
                 rewards = np.zeros_like(rewards)

            # 4. Calculate weighted sum of noise
            weighted_noise_sum = [np.zeros_like(w) for w in self.weights]
            for k in range(self.population_size):
                member_noise = population_noise[k]
                for j in range(len(self.weights)):
                    # Summing R_k * epsilon_k for each weight matrix/vector
                    weighted_noise_sum[j] += member_noise[j] * rewards[k]

            # 5. Update central weights (W_t+1 = W_t + update)
            update_factor = self.learning_rate / (self.population_size * self.sigma)
            for j in range(len(self.weights)):
                self.weights[j] += update_factor * weighted_noise_sum[j]

            # ... (logging) ...
        # ... (end timing) ...

    def get_weights(self):
        return self.weights

Explanation: The train method implements the ES loop: generate random noise (population_noise), create perturbed weights, evaluate them using the reward_function (which simulates trading on the training set), normalize rewards, compute the weighted sum of noise based on rewards, and finally update the central weights using the learning rate and population parameters.

4. Trading Agent (TradingAgent)

This class orchestrates the process, connecting the model, the ES, and the environment simulation.

Python

class TradingAgent:
    # ... (constants, __init__) ...

    def _get_state(self, t):
        """ Returns the state (price changes) at index t using all_prices. """
        start_index = max(0, t - self.window_size)
        end_index = t + 1
        window_prices = self.all_prices[start_index : end_index]
        # Calculate price differences (returns)
        price_diffs = np.diff(window_prices)
        # Pad if needed to ensure fixed size
        padded_diffs = np.zeros(self.window_size)
        if len(price_diffs) > 0:
           padded_diffs[-len(price_diffs):] = price_diffs
        return padded_diffs.reshape(1, -1)

    def _decide_action(self, state):
        """ Uses the model to decide action (0=hold, 1=buy, 2=sell). """
        prediction_scores = self.model.predict(state)
        return np.argmax(prediction_scores[0]) # Action with highest score

    def _calculate_reward_on_train(self, weights):
        """ Fitness function for ES: Simulates trading ONLY on training data. """
        self.model.set_weights(weights) # Use candidate weights
        money = self.initial_money
        inventory = 0.0
        # Simulate only within the training data indices
        start_sim_index = self.window_size
        end_sim_index = self.train_end_index
        for t in range(start_sim_index, end_sim_index, self.skip):
            state = self._get_state(t)
            action = self._decide_action(state)
            price_now = self.all_prices[t]
            # Simplified fractional buy/sell logic
            if action == 1 and money > self.min_order_size * price_now:
                 buy_units = (money * 0.5) / price_now # Example: invest 50% cash
                 if buy_units >= self.min_order_size:
                    inventory += buy_units; money -= buy_units * price_now
            elif action == 2 and inventory >= self.min_order_size:
                 sell_units = inventory * 0.5 # Example: sell 50% inventory
                 if sell_units >= self.min_order_size:
                    money += sell_units * price_now; inventory -= sell_units
        # Calculate final value based on last training price
        final_value = money + inventory * self.all_prices[end_sim_index -1]
        reward = ((final_value - self.initial_money) / self.initial_money) * 100
        return reward

    def train_agent(self, iterations, checkpoint):
        """ Trains the agent using ES on the training data. """
        self.es.train(iterations, print_every=checkpoint)
        self.model.set_weights(self.es.get_weights()) # Use the final weights

    def run_test_simulation(self, test_dates_param, return_logs=True):
        """ Evaluates the TRAINED agent ONLY on the test data. """
        print("\nRunning final simulation on UNSEEN TEST DATA...")
        money = self.initial_money
        inventory = 0.0
        states_buy_test, states_sell_test, log = [], [], []
        # Simulate only within the test data indices
        start_test_sim_index = self.train_end_index
        end_test_sim_index = len(self.all_prices) - 1
        for t in range(start_test_sim_index, end_test_sim_index, self.skip):
            state = self._get_state(t)
            action = self._decide_action(state) # Use trained model
            price_now = self.all_prices[t]
            test_set_index = t - start_test_sim_index
            timestamp = test_dates_param[test_set_index]
            # ... (Execute buy/sell logic as in _calculate_reward) ...
            # ... (Logging actions) ...
        # Calculate final value based on last test price
        final_value = money + inventory * self.all_prices[-1]
        total_gains = final_value - self.initial_money
        invest_percent = ((final_value - self.initial_money) / self.initial_money) * 100
        # ... (Print results) ...
        return states_buy_test, states_sell_test, total_gains, invest_percent, log

Explanation: The agent manages the overall process. _get_state prepares the NN input. _decide_action gets the NN prediction. _calculate_reward_on_train simulates trading only on the training price range to provide the fitness score for the ES. train_agent runs the ES optimization. run_test_simulation uses the final, trained weights to simulate trading only on the unseen test price range, providing a realistic performance evaluation.

5. Main Execution and Plotting

This part sets up the parameters, creates the objects, runs the training, runs the test simulation, and plots the results focusing on the test period.

Python

# --- Main Execution ---
WINDOW_SIZE = 30
SKIP = 1
INITIAL_MONEY = 10000
LAYER_SIZE = 128
OUTPUT_SIZE = 3
ITERATIONS = 200
CHECKPOINT = 20

# Create Model and Agent
model = SimpleModel(input_size=WINDOW_SIZE, layer_size=LAYER_SIZE, output_size=OUTPUT_SIZE)
agent = TradingAgent(model=model,
                     all_prices=all_prices,
                     train_end_index=split_index, # Pass split index
                     window_size=WINDOW_SIZE,
                     initial_money=INITIAL_MONEY,
                     skip=SKIP)

# Train the Agent (uses training data internally)
agent.train_agent(iterations=ITERATIONS, checkpoint=CHECKPOINT)

# Evaluate the Agent (uses test data internally)
states_buy_test, states_sell_test, total_gains_test, invest_percent_test, logs_test = agent.run_test_simulation(test_dates_param=test_dates)

# --- Plotting (Focus on Test Set Performance) ---
fig, ax = plt.subplots(figsize=(15, 7))
# Plot train data (grayed out)
ax.plot(train_dates, train_prices, color='gray', lw=1.0, label='Train Price', alpha=0.5)
# Plot test data
ax.plot(test_dates, test_prices, color='lightblue', lw=1.5, label='Test Price')
# Plot buy/sell markers on test data
buy_marker_dates = test_dates[states_buy_test]
# ... (rest of plotting code) ...
plt.show()

Explanation: We define hyperparameters, instantiate the model and agent (passing the full price list and the training end index). We call train_agent, then run_test_simulation. The plot visualizes both price series but highlights the trades made during the test period.

Sample Results The results using the 3-year period Bitcoin data, 80% of which we used for training the model and the remaining recent 20% for testing its performance are as follows:

Data split: 877 training samples, 220 testing samples.
Training data from 2022-05-04 to 2024-09-26
Testing data from 2024-09-27 to 2025-05-04
Starting Evolution Strategy training for 200 iterations...
Iteration 20/200. Current Reward (on train set): 432.9548
Iteration 40/200. Current Reward (on train set): 625.9546
Iteration 60/200. Current Reward (on train set): 920.6560
Iteration 80/200. Current Reward (on train set): 1004.2816
Iteration 100/200. Current Reward (on train set): 1125.9427
Iteration 120/200. Current Reward (on train set): 1108.8394
Iteration 140/200. Current Reward (on train set): 1231.4112
Iteration 160/200. Current Reward (on train set): 1212.2319
Iteration 180/200. Current Reward (on train set): 1282.9231
Iteration 200/200. Current Reward (on train set): 1377.4133
Training finished in 87.46 seconds.
Final Reward on training set: 1377.4133

Running final simulation on UNSEEN TEST DATA...
Test Day 2 (2024-09-29): Buy 0.076179 units at $65,635.30, Bal: $5,000.00, Inv: 0.076179
Test Day 3 (2024-09-30): Buy 0.039476 units at $63,329.50, Bal: $2,500.00, Inv: 0.115655
Test Day 4 (2024-10-01): Sell 0.057827 units at $60,837.01, Bal: $6,018.04, Inv: 0.057827
Test Day 5 (2024-10-02): Buy 0.049627 units at $60,632.79, Bal: $3,009.02, Inv: 0.107454
Test Day 9 (2024-10-06): Buy 0.023950 units at $62,818.95, Bal: $1,504.51, Inv: 0.131404
Test Day 10 (2024-10-07): Buy 0.012087 units at $62,236.66, Bal: $752.25, Inv: 0.143491
Test Day 11 (2024-10-08): Sell 0.071746 units at $62,131.97, Bal: $5,209.95, Inv: 0.071746
Test Day 12 (2024-10-09): Sell 0.035873 units at $60,582.10, Bal: $7,383.20, Inv: 0.035873
Test Day 14 (2024-10-11): Buy 0.059118 units at $62,445.09, Bal: $3,691.60, Inv: 0.094990
Test Day 16 (2024-10-13): Sell 0.047495 units at $62,851.38, Bal: $6,676.74, Inv: 0.047495
Test Day 17 (2024-10-14): Buy 0.050546 units at $66,046.12, Bal: $3,338.37, Inv: 0.098041
Test Day 18 (2024-10-15): Buy 0.024898 units at $67,041.11, Bal: $1,669.18, Inv: 0.122939
Test Day 19 (2024-10-16): Buy 0.012344 units at $67,612.72, Bal: $834.59, Inv: 0.135283
Test Day 20 (2024-10-17): Buy 0.006191 units at $67,399.84, Bal: $417.30, Inv: 0.141474
Test Day 27 (2024-10-24): Sell 0.070737 units at $68,161.05, Bal: $5,238.81, Inv: 0.070737
Test Day 30 (2024-10-27): Buy 0.038561 units at $67,929.30, Bal: $2,619.40, Inv: 0.109298
Test Day 31 (2024-10-28): Buy 0.018735 units at $69,907.76, Bal: $1,309.70, Inv: 0.128033
Test Day 32 (2024-10-29): Buy 0.009005 units at $72,720.49, Bal: $654.85, Inv: 0.137038
Test Day 33 (2024-10-30): Buy 0.004526 units at $72,339.54, Bal: $327.43, Inv: 0.141564
Test Day 34 (2024-10-31): Sell 0.070782 units at $70,215.19, Bal: $5,297.39, Inv: 0.070782
Test Day 35 (2024-11-01): Sell 0.035391 units at $69,482.47, Bal: $7,756.44, Inv: 0.035391
Test Day 36 (2024-11-02): Sell 0.017695 units at $69,289.27, Bal: $8,982.55, Inv: 0.017695
Test Day 37 (2024-11-03): Sell 0.008848 units at $68,741.12, Bal: $9,590.75, Inv: 0.008848
Test Day 38 (2024-11-04): Buy 0.070716 units at $67,811.51, Bal: $4,795.38, Inv: 0.079564
Test Day 39 (2024-11-05): Buy 0.034569 units at $69,359.56, Bal: $2,397.69, Inv: 0.114133
Test Day 40 (2024-11-06): Buy 0.015850 units at $75,639.08, Bal: $1,198.84, Inv: 0.129982
Test Day 41 (2024-11-07): Buy 0.007897 units at $75,904.86, Bal: $599.42, Inv: 0.137880
Test Day 42 (2024-11-08): Buy 0.003915 units at $76,545.48, Bal: $299.71, Inv: 0.141795
Test Day 43 (2024-11-09): Buy 0.001952 units at $76,778.87, Bal: $149.86, Inv: 0.143747
Test Day 47 (2024-11-13): Sell 0.071873 units at $90,584.16, Bal: $6,660.45, Inv: 0.071873
Test Day 48 (2024-11-14): Buy 0.038169 units at $87,250.43, Bal: $3,330.22, Inv: 0.110042
Test Day 49 (2024-11-15): Sell 0.055021 units at $91,066.01, Bal: $8,340.76, Inv: 0.055021
Test Day 50 (2024-11-16): Buy 0.046052 units at $90,558.48, Bal: $4,170.38, Inv: 0.101073
Test Day 52 (2024-11-18): Buy 0.023030 units at $90,542.64, Bal: $2,085.19, Inv: 0.124103
Test Day 54 (2024-11-20): Buy 0.011052 units at $94,339.49, Bal: $1,042.60, Inv: 0.135154
Test Day 55 (2024-11-21): Buy 0.005292 units at $98,504.73, Bal: $521.30, Inv: 0.140446
Test Day 57 (2024-11-23): Buy 0.002666 units at $97,777.28, Bal: $260.65, Inv: 0.143112
Test Day 58 (2024-11-24): Sell 0.071556 units at $98,013.82, Bal: $7,274.13, Inv: 0.071556
Test Day 59 (2024-11-25): Sell 0.035778 units at $93,102.30, Bal: $10,605.14, Inv: 0.035778
Test Day 60 (2024-11-26): Sell 0.017889 units at $91,985.32, Bal: $12,250.67, Inv: 0.017889
Test Day 62 (2024-11-28): Buy 0.064037 units at $95,652.47, Bal: $6,125.34, Inv: 0.081926
Test Day 63 (2024-11-29): Buy 0.031424 units at $97,461.52, Bal: $3,062.67, Inv: 0.113351
Test Day 64 (2024-11-30): Buy 0.015877 units at $96,449.05, Bal: $1,531.33, Inv: 0.129228
Test Day 65 (2024-12-01): Buy 0.007871 units at $97,279.79, Bal: $765.67, Inv: 0.137099
Test Day 66 (2024-12-02): Buy 0.003993 units at $95,865.30, Bal: $382.83, Inv: 0.141092
Test Day 68 (2024-12-04): Buy 0.001938 units at $98,768.53, Bal: $191.42, Inv: 0.143030
Test Day 70 (2024-12-06): Sell 0.071515 units at $99,920.71, Bal: $7,337.25, Inv: 0.071515
Test Day 72 (2024-12-08): Buy 0.036238 units at $101,236.02, Bal: $3,668.63, Inv: 0.107753
Test Day 73 (2024-12-09): Buy 0.018826 units at $97,432.72, Bal: $1,834.31, Inv: 0.126580
Test Day 74 (2024-12-10): Sell 0.063290 units at $96,675.43, Bal: $7,952.90, Inv: 0.063290
Test Day 75 (2024-12-11): Sell 0.031645 units at $101,173.03, Bal: $11,154.52, Inv: 0.031645
Test Day 77 (2024-12-13): Buy 0.054970 units at $101,459.26, Bal: $5,577.26, Inv: 0.086615
Test Day 78 (2024-12-14): Buy 0.027509 units at $101,372.97, Bal: $2,788.63, Inv: 0.114124
Test Day 79 (2024-12-15): Buy 0.013368 units at $104,298.70, Bal: $1,394.31, Inv: 0.127492
Test Day 80 (2024-12-16): Buy 0.006575 units at $106,029.72, Bal: $697.16, Inv: 0.134068
Test Day 84 (2024-12-20): Sell 0.067034 units at $97,755.93, Bal: $7,250.11, Inv: 0.067034
Test Day 87 (2024-12-23): Sell 0.033517 units at $94,686.24, Bal: $10,423.70, Inv: 0.033517
Test Day 88 (2024-12-24): Buy 0.052818 units at $98,676.09, Bal: $5,211.85, Inv: 0.086335
Test Day 89 (2024-12-25): Buy 0.026243 units at $99,299.20, Bal: $2,605.92, Inv: 0.112578
Test Day 90 (2024-12-26): Sell 0.056289 units at $95,795.52, Bal: $7,998.15, Inv: 0.056289
Test Day 91 (2024-12-27): Buy 0.042469 units at $94,164.86, Bal: $3,999.07, Inv: 0.098758
Test Day 93 (2024-12-29): Sell 0.049379 units at $93,530.23, Bal: $8,617.49, Inv: 0.049379
Test Day 94 (2024-12-30): Sell 0.024689 units at $92,643.21, Bal: $10,904.80, Inv: 0.024689
Test Day 96 (2025-01-01): Buy 0.057746 units at $94,419.76, Bal: $5,452.40, Inv: 0.082436
Test Day 97 (2025-01-02): Buy 0.028138 units at $96,886.88, Bal: $2,726.20, Inv: 0.110574
Test Day 98 (2025-01-03): Buy 0.013894 units at $98,107.43, Bal: $1,363.10, Inv: 0.124468
Test Day 99 (2025-01-04): Buy 0.006938 units at $98,236.23, Bal: $681.55, Inv: 0.131406
Test Day 100 (2025-01-05): Buy 0.003466 units at $98,314.96, Bal: $340.78, Inv: 0.134872
Test Day 103 (2025-01-08): Sell 0.067436 units at $95,043.52, Bal: $6,750.12, Inv: 0.067436
Test Day 108 (2025-01-13): Buy 0.035709 units at $94,516.52, Bal: $3,375.06, Inv: 0.103145
Test Day 109 (2025-01-14): Buy 0.017481 units at $96,534.05, Bal: $1,687.53, Inv: 0.120626
Test Day 110 (2025-01-15): Sell 0.060313 units at $100,504.49, Bal: $7,749.25, Inv: 0.060313
Test Day 111 (2025-01-16): Buy 0.038841 units at $99,756.91, Bal: $3,874.62, Inv: 0.099154
Test Day 112 (2025-01-17): Buy 0.018546 units at $104,462.04, Bal: $1,937.31, Inv: 0.117699
Test Day 114 (2025-01-19): Sell 0.058850 units at $101,089.61, Bal: $7,886.39, Inv: 0.058850
Test Day 115 (2025-01-20): Sell 0.029425 units at $102,016.66, Bal: $10,888.21, Inv: 0.029425
Test Day 117 (2025-01-22): Buy 0.052522 units at $103,653.07, Bal: $5,444.10, Inv: 0.081947
Test Day 118 (2025-01-23): Buy 0.026184 units at $103,960.17, Bal: $2,722.05, Inv: 0.108131
Test Day 119 (2025-01-24): Buy 0.012984 units at $104,819.48, Bal: $1,361.03, Inv: 0.121115
Test Day 120 (2025-01-25): Buy 0.006499 units at $104,714.65, Bal: $680.51, Inv: 0.127614
Test Day 121 (2025-01-26): Buy 0.003314 units at $102,682.50, Bal: $340.26, Inv: 0.130928
Test Day 125 (2025-01-30): Sell 0.065464 units at $104,735.30, Bal: $7,196.63, Inv: 0.065464
Test Day 127 (2025-02-01): Sell 0.032732 units at $100,655.91, Bal: $10,491.29, Inv: 0.032732
Test Day 128 (2025-02-02): Buy 0.053697 units at $97,688.98, Bal: $5,245.64, Inv: 0.086429
Test Day 129 (2025-02-03): Buy 0.025865 units at $101,405.42, Bal: $2,622.82, Inv: 0.112294
Test Day 131 (2025-02-05): Sell 0.056147 units at $96,615.45, Bal: $8,047.49, Inv: 0.056147
Test Day 132 (2025-02-06): Buy 0.041657 units at $96,593.30, Bal: $4,023.75, Inv: 0.097804
Test Day 133 (2025-02-07): Buy 0.020842 units at $96,529.09, Bal: $2,011.87, Inv: 0.118646
Test Day 134 (2025-02-08): Buy 0.010426 units at $96,482.45, Bal: $1,005.94, Inv: 0.129072
Test Day 135 (2025-02-09): Buy 0.005212 units at $96,500.09, Bal: $502.97, Inv: 0.134284
Test Day 136 (2025-02-10): Buy 0.002581 units at $97,437.55, Bal: $251.48, Inv: 0.136865
Test Day 137 (2025-02-11): Buy 0.001313 units at $95,747.43, Bal: $125.74, Inv: 0.138178
Test Day 138 (2025-02-12): Sell 0.069089 units at $97,885.86, Bal: $6,888.59, Inv: 0.069089
Test Day 140 (2025-02-14): Buy 0.035323 units at $97,508.97, Bal: $3,444.29, Inv: 0.104412
Test Day 142 (2025-02-16): Sell 0.052206 units at $96,175.03, Bal: $8,465.20, Inv: 0.052206
Test Day 143 (2025-02-17): Buy 0.044194 units at $95,773.38, Bal: $4,232.60, Inv: 0.096400
Test Day 146 (2025-02-20): Buy 0.021522 units at $98,333.94, Bal: $2,116.30, Inv: 0.117921
Test Day 148 (2025-02-22): Buy 0.010956 units at $96,577.76, Bal: $1,058.15, Inv: 0.128878
Test Day 151 (2025-02-25): Sell 0.064439 units at $88,736.17, Bal: $6,776.22, Inv: 0.064439
Test Day 153 (2025-02-27): Sell 0.032219 units at $84,704.23, Bal: $9,505.34, Inv: 0.032219
Test Day 155 (2025-03-01): Sell 0.016110 units at $86,031.91, Bal: $10,891.30, Inv: 0.016110
Test Day 156 (2025-03-02): Buy 0.057780 units at $94,248.35, Bal: $5,445.65, Inv: 0.073890
Test Day 157 (2025-03-03): Buy 0.031637 units at $86,065.67, Bal: $2,722.82, Inv: 0.105526
Test Day 158 (2025-03-04): Sell 0.052763 units at $87,222.20, Bal: $7,324.93, Inv: 0.052763
Test Day 159 (2025-03-05): Buy 0.040414 units at $90,623.56, Bal: $3,662.47, Inv: 0.093177
Test Day 161 (2025-03-07): Sell 0.046589 units at $86,742.67, Bal: $7,703.68, Inv: 0.046589
Test Day 163 (2025-03-09): Sell 0.023294 units at $80,601.04, Bal: $9,581.22, Inv: 0.023294
Test Day 164 (2025-03-10): Buy 0.061002 units at $78,532.00, Bal: $4,790.61, Inv: 0.084296
Test Day 165 (2025-03-11): Sell 0.042148 units at $82,862.21, Bal: $8,283.10, Inv: 0.042148
Test Day 166 (2025-03-12): Buy 0.049468 units at $83,722.36, Bal: $4,141.55, Inv: 0.091616
Test Day 167 (2025-03-13): Buy 0.025544 units at $81,066.70, Bal: $2,070.78, Inv: 0.117160
Test Day 168 (2025-03-14): Buy 0.012331 units at $83,969.10, Bal: $1,035.39, Inv: 0.129491
Test Day 169 (2025-03-15): Buy 0.006138 units at $84,343.11, Bal: $517.69, Inv: 0.135628
Test Day 173 (2025-03-19): Sell 0.067814 units at $86,854.23, Bal: $6,407.65, Inv: 0.067814
Test Day 174 (2025-03-20): Sell 0.033907 units at $84,167.20, Bal: $9,261.51, Inv: 0.033907
Test Day 175 (2025-03-21): Sell 0.016954 units at $84,043.24, Bal: $10,686.35, Inv: 0.016954
Test Day 176 (2025-03-22): Buy 0.063736 units at $83,832.48, Bal: $5,343.17, Inv: 0.080690
Test Day 177 (2025-03-23): Buy 0.031045 units at $86,054.38, Bal: $2,671.59, Inv: 0.111735
Test Day 178 (2025-03-24): Buy 0.015266 units at $87,498.91, Bal: $1,335.79, Inv: 0.127002
Test Day 179 (2025-03-25): Buy 0.007636 units at $87,471.70, Bal: $667.90, Inv: 0.134637
Test Day 183 (2025-03-29): Buy 0.004043 units at $82,597.59, Bal: $333.95, Inv: 0.138680
Test Day 184 (2025-03-30): Sell 0.069340 units at $82,334.52, Bal: $6,043.03, Inv: 0.069340
Test Day 186 (2025-04-01): Sell 0.034670 units at $85,169.17, Bal: $8,995.85, Inv: 0.034670
Test Day 188 (2025-04-03): Buy 0.054125 units at $83,102.83, Bal: $4,497.93, Inv: 0.088795
Test Day 189 (2025-04-04): Buy 0.026823 units at $83,843.80, Bal: $2,248.96, Inv: 0.115618
Test Day 190 (2025-04-05): Buy 0.013466 units at $83,504.80, Bal: $1,124.48, Inv: 0.129084
Test Day 191 (2025-04-06): Sell 0.064542 units at $78,214.48, Bal: $6,172.61, Inv: 0.064542
Test Day 192 (2025-04-07): Sell 0.032271 units at $79,235.34, Bal: $8,729.62, Inv: 0.032271
Test Day 194 (2025-04-09): Sell 0.016136 units at $82,573.95, Bal: $10,061.99, Inv: 0.016136
Test Day 196 (2025-04-11): Buy 0.060320 units at $83,404.84, Bal: $5,031.00, Inv: 0.076456
Test Day 197 (2025-04-12): Buy 0.029494 units at $85,287.11, Bal: $2,515.50, Inv: 0.105950
Test Day 198 (2025-04-13): Buy 0.015030 units at $83,684.98, Bal: $1,257.75, Inv: 0.120980
Test Day 199 (2025-04-14): Sell 0.060490 units at $84,542.39, Bal: $6,371.71, Inv: 0.060490
Test Day 201 (2025-04-16): Sell 0.030245 units at $84,033.87, Bal: $8,913.31, Inv: 0.030245
Test Day 203 (2025-04-18): Sell 0.015122 units at $84,450.80, Bal: $10,190.41, Inv: 0.015122
Test Day 204 (2025-04-19): Buy 0.059899 units at $85,063.41, Bal: $5,095.21, Inv: 0.075021
Test Day 205 (2025-04-20): Buy 0.029910 units at $85,174.30, Bal: $2,547.60, Inv: 0.104932
Test Day 206 (2025-04-21): Sell 0.052466 units at $87,518.91, Bal: $7,139.36, Inv: 0.052466
Test Day 207 (2025-04-22): Buy 0.038202 units at $93,441.89, Bal: $3,569.68, Inv: 0.090668
Test Day 209 (2025-04-24): Buy 0.018999 units at $93,943.80, Bal: $1,784.84, Inv: 0.109667
Test Day 212 (2025-04-27): Sell 0.054834 units at $93,754.84, Bal: $6,925.75, Inv: 0.054834
Test Day 214 (2025-04-29): Sell 0.027417 units at $94,284.79, Bal: $9,510.74, Inv: 0.027417
Test Day 216 (2025-05-01): Sell 0.013708 units at $96,492.34, Bal: $10,833.49, Inv: 0.013708
Test Day 217 (2025-05-02): Buy 0.055895 units at $96,910.07, Bal: $5,416.75, Inv: 0.069603
Test Day 218 (2025-05-03): Buy 0.028244 units at $95,891.80, Bal: $2,708.37, Inv: 0.097847

Test Set Simulation Results:
Total Gains: $2,035.86
Total Investment Return: 20.36%
Ending Cash: $2,708.37
Ending Inventory: 0.097847 units (@ $95,327.29 = $9,327.49)
Final Portfolio Value: $12,035.86

Realistic Backtesting: The Importance of Train/Test Split

As highlighted previously, testing a trading strategy on the same data used to optimize it leads to inflated and unrealistic performance metrics due to overfitting. The agent learns the specific patterns of the training data, including its noise.

By splitting the data:

Training Set: Used exclusively by the Evolution Strategy (_calculate_reward_on_train) to find the optimal neural network weights.
Test Set: A completely separate period used only once (run_test_simulation) to evaluate how well the strategy, optimized on past data, performs on new, unseen data.

This mimics real-world trading where strategies are developed on historical data and deployed on future, unknown data. The performance on the test set gives a much more reliable (though still not guaranteed) indication of potential real-world viability.

Further Considerations and Limitations

Even with a train/test split, this implementation is still simplified:

Transaction Costs: Real trading involves commissions and potential slippage (difference between expected and execution price), which are ignored here but reduce profits.
Market Regimes: The strategy’s performance might vary drastically depending on whether the market is trending, ranging, or volatile. The train/test split helps, but longer periods or walk-forward analysis might be needed for more robustness.
Parameter Sensitivity: The performance heavily depends on WINDOW_SIZE, LAYER_SIZE, ES hyperparameters (POPULATION_SIZE, SIGMA, LEARNING_RATE), and the number of ITERATIONS. These often require careful tuning (hyperparameter optimization), potentially using a third dataset (validation set) separate from train and test.
Trading Logic: The buy/sell logic (e.g., “invest 50% cash”) is arbitrary. More sophisticated position sizing and risk management rules are essential in real trading.
Feature Engineering: Using only price changes is basic. Incorporating volume, volatility measures, or other indicators could potentially improve performance.

Conclusion

This article demonstrated how Evolution Strategies can be combined with a simple neural network to optimize a trading agent. We implemented this approach in Python, emphasizing the crucial step of separating training and testing data for realistic backtesting. While ES provides a powerful method for optimizing complex strategies where gradients are unavailable, building a consistently profitable trading bot requires careful consideration of data handling, model complexity, realistic simulation (including costs), robust validation techniques, and rigorous risk management. This example serves as an educational foundation for exploring these advanced concepts in algorithmic trading.