This context discusses using genetic algorithms for trading strategy optimization in Python, specifically for a MACD crossover strategy.
Abstract
The context begins with an introduction to genetic algorithms and their potential application in trading strategy optimization. The author explains the importance of optimization in systematic or algorithmic trading and the computational challenges it presents. Genetic algorithms are introduced as a probabilistic and heuristic searching algorithm inspired by Darwin's theory of natural selection. The author then delves into the key concepts of genetic algorithms, including genes, individuals, populations, fitness functions, selection strategies, crossover strategies, and mutation strategies. The author also provides a crash course on genetic algorithms and explains how they can be applied to trading strategy optimization. The context then moves on to the preparation stage, where the author lists the required libraries and provides code snippets for their installation. The author also discusses data acquisition from Alpha Vantage and adjusting stock prices according to splits and dividend events. The context then introduces the Backtrader framework and provides a code snippet for defining a simple MACD crossover strategy. The author also explains how to feed the strategy to the Backtrader engine and run a backtest. The context concludes with a discussion on using genetic algorithms for parameter optimization of the MACD strategy. The author provides an implementation of the genetic algorithm with the defined settings and presents the results.
Bullet points
Genetic algorithms can be used for trading strategy optimization.
Genetic algorithms are inspired by Darwin's theory of natural selection.
Key concepts of genetic algorithms include genes, individuals, populations, fitness functions, selection strategies, crossover strategies, and mutation strategies.
Data acquisition for trading strategy optimization can be done using Alpha Vantage.
Stock prices need to be adjusted according to splits and dividend events.
The Backtrader framework can be used to define and backtest trading strategies.
A simple MACD crossover strategy can be defined using the Backtrader framework.
Genetic algorithms can be used for parameter optimization of the MACD strategy.
The author provides an implementation of the genetic algorithm with defined settings.
The results of the genetic algorithm optimization are presented.
Data Science, Optimization, Programming
Genetic Algorithm for Trading Strategy Optimization in Python
How can GA help cut down problem space and converge towards a better solution?
If you have heard of systematic trading or algorithmic trading, then you must know that optimization of strategy is among one of the most important factors that dictate whether the strategy would even break even. And the worst part is: optimization is very computationally heavy. Imagine a simple MACD crossover strategy, there will be at least 3 parameters: fast, slow and signal moving average period, and hundreds of possible values for each, making it more than a million possible combinations.
Incomes genetic algorithm (GA): a probabilistic & heuristic searching algorithm inspired by Darwin’s theory on natural selection that the fittest survive through generations. In this blog, we are going to use GA as an optimization algorithm for identifying the best set of parameters. We will be illustrating it with a simple MACD crossover on Nvidia. Remember, this is only a demonstration of the application of GA for optimizing trading strategy and should not be copied nor followed blindly.
Note: We will be covering the code section by section. If you find it hard to follow or glue them together, no worries, the full script will be available at the end of the blog.
If you would like to learn more about how to avoid overfitting Genetic Algorithm, here is a sequel to this blog that focuses on some techniques to a more robust Genetic Algorithm:
Inspired by Darwin’s Theory of Evolution, the Genetic Algorithm is an iterative process for search the global optimal solution to a problem statement, from getting the best gene to survive in the harsh world to identify the best parameters to a trading strategy in this blog’s context. To better understand how a genetic algorithm works, here is a short crash course of the key concepts:
Gene: This refers to a parameter/variable of the solution. It is quite usual for a gene to be represented as a bit (i.e. 0 or 1), but that can be changed based on the underlying problem statement.
Individual / Chromosome: A string of genes that represents a solution.
Population: One generation of individuals.
Fitness Function: A function for calculating how successful (fitness score) an individual is. Depending on the underlying problem statement, we can search for individuals with the highest fitness score or the lowest. To embrace the concept of survival of the fittest, the better the fitness score, the higher the chance that an individual can survive and reproduce to form the next generation.
Selection Strategy: This defines how we compare individuals against each other for selection the seeds for the next generation. Some common examples are tournaments (a random sample of individuals face each other 1 v 1, and the winner wins a place in the next generation), roulette (probability be chosen is proportional to the fitness score), or double tournaments for more complex scenarios
Crossover Strategy: This is essentially how parent genes are passed to offspring when reproducing. Crossover is supposed to mimic sexual reproduction, where two parents are needed for reproducing. Genes of the parents will then be recombined to form offspring. Although different strategies are depending on scenario and data types, the most common two are k-point crossover and uniform crossover. In the k-point crossover, k crossover point(s) will be selected randomly, where the genes to the right of a point are swapped. Uniform crossover is relatively more straightforward: instead of having sections of genes swapped, every gene will have the same chance to be swapped.
1-point crossover — Credit to R0oland2-point crossover — Credit to R0oland
Mutation Strategy: As a way to maintain gene diversity and prevent premature convergence, genes of the children will have a random chance of mutating, meaning that the actual value will deviate from that of the parents. Mutation usually comes in the form of bit-flipping, index-shuffling, or bounded and unbounded statistical distributions.
These key concepts will then be combined to form an iterative algorithm:
Parameterise the problem statement
Define a fitness function
Define a crossover, mutation, and a selection strategy
Generate initial population
Evaluate the fitness of the individuals of the population
Select the individuals, crossover and mutate to form the next generation of population
Repeat step 5, and 6 until convergence or until end conditions are met
Now that we have the crash course done, let’s see how this can be applied to trading strategy optimization.
Preparation
Before we start, let’s make sure that we have all our libraries installed and ready. Apart from the usual Pandas, Numpy, etc, we will also be using the following:
Alpha Vantage: A Free-mium data provider. Rest-assured. A free license is good enough.
Backtrader: An awesome open-source python framework that allows you to focus on writing reusable trading strategies, indicators, and analyzers instead of having to spend time building infrastructure. It supports backtesting for you to evaluate the strategy you come up with too!
DEAP: Distributed Evolutionary Algorithms in Python, a novel evolutionary computation framework for rapid prototyping and testing of ideas.
To install them all, simply run the following line:
Note: It is always a good idea to start a new project in an isolated environment, be it virtual environment, conda environment, or a docker container. It helps keep code and environment clean, reproducible, and portable.
Data Acquisition
With a key from Alpha Vantage, we can then use the snippet below for getting the historical stock price of Nvidia under the ticker ‘NVDA’. One thing to keep in mind when working with financial stock data is price adjustments. For example, Tesla went through a five-for-one split on 31st August. So one share before the split should be about five times that after the split. That’s why we will need to adjust prices according to splits and dividend events.
Fortunately, Alpha Vantage’s Daily Adjusted endpoint, we will have a close price adjusted according to the Center for Research in Security Prices’ formulae. We can then estimate the adjusted Open, High, and Low using the percentage change from unadjusted Open to unadjusted Close: if the percentage change in price for the day is +10%, then Close should be 110% that of Open, whether adjusted or not.
Once you have run through the script, read_alpha_vantage should return you a data frame with the first and last 5 rows similar to the followings depending on when you queried Alpha Vantage:
Backtrader Framework
In Backtrader, a strategy needs to follow the interface of backtrader.strategy. The most important components in the interface are:
params: parameters used by the strategy
__init__: where we do data prep and define indicators
next(): which decides what should the strategy do in the next step
import backtrader as bt
classCrossoverStrategy(bt.Strategy):
params = dict()
def__init__(self):
# initialise strategy# do data prep# define indicators
pass
MACD line, the difference between the fast moving average and the slow-moving average
Signal line, a 9-day moving average of the MACD line
We will also need to put in the trading logic under next(). To keep it simple, this strategy will be long-only:
Long entry: When the MACD line crosses above the Signal line
Long exit: When the MACD line crosses below the Signal line
The strategy would now need to be fed to the main engine of Backtrader — backtrader.cerebro, alongside our data, trackers (bt.observers) and analyzers (bt.analyzers) of the strategy, and other brokerage level settings like commissions and account balance.
When you execute run_backtest(plot=True, **STRATEGY_PARAMS), you should get something like this:
Results from Backtrader — Image by Author
Genetic Algorithm Parameter Optimisation
Making only 73.16 dollars out of the rocket of Nvidia with default parameters, that does not look promising at all. Let’s try to optimize the parameter to see if we can make it more viable.
Based on the crash course earlier, let’s define our algorithm as follows:
Gene: This should be the params for our MACD strategy. Unfortunately, DEAP does not work well with keyword arguments as crossing over would require index slicing. Instead, we will be using a list for storing the parameters.
Initial Population: Our initial population will have 100 individuals, each with random integers fast_period in the range of [1, 151), slow_period in the range of [10, 251), and signal_period in the range of [1, 301). This means there are a total of 150 x 240 x 300 = 10.8 million possible combinations.
Fitness function: To balance rewards and risks, we will be using Total Profit / Maximum Draw Down as the only objective of our fitness function. As DEAP is a generalized framework, it supports fitness functions with multiple objectives. As a result, the function needs to return a list of fitness score(s).
Selection Strategy: We will be using the classic tournament method where the winner of each round of the tournament will be selected as the seeds for the next generation.
Crossover Strategy: We will be using uniform crossover with a 50% chance for each gene to crossover.
Mutation Strategy: We will be using a uniform distribution of integers in the range of [1, 101) with a mutation probability of 30% for each gene.
End Conditions: The algorithm will stop once it has finished 20 iterations.
The snippet below is the implementation of our genetic algorithm with the above settings. We have also added in a Hall of Fame which will keep track of the best individuals across the iterations of the algorithm.
After running the snippet, we have the following results
HALL OF FAME:0: [36, 84, 5],Fitness:5.3319139788039281: [36, 87, 5],Fitness:5.3296098844013892: [5, 143, 44],Fitness:5.309777394151804
Fitness scores converge within 20 iterations — Image by Author
From the graph above, we can see how performance converges towards the solution very quickly in about 10 iterations. This means that instead of trying out the entirety of 10.8 million combinations, the algorithm has yielded us a candidate of the optimal solution with just a thousand of backtest runs, which is less than 0.1% of the problem space. And this is already a worst-case estimate for our case assuming that all the individuals are different in every generation of the population.
If we plug the optimal solution of dict(fast_period=5, slow_period=82, signal_period=38) back to the backtest function covered earlier with the following lines, we can see that the profit is now just over 7 times that of the default MACD crossover strategy:
OPTIMISED_STRATEGY_PARAMS = {
k: v for k, v inzip(PARAM_NAMES, hall_of_fame[0])}
run_backtest(**OPTIMISED_STRATEGY_PARAMS)
Performance of Genetic algorithm optimised strategy — Image by Author
Conclusion
In this blog, we have covered the key concepts of genetic algorithms and demonstrated how we can use them to optimize a trading strategy.
Historically, parameter optimization is very demanding on computational power. In our example, the simple MACD crossover strategy with just three parameters was already generating millions of possible combinations. Even if each backtest can finish in 0.01 second, that would already be 30 hours of computational time in total. This will only get grow exponentially as the number of parameters increases.
With a genetic algorithm, we can converge to a candidate of global optima after about 10 iterations, which is less than 0.1% of the problem space. The optimized MACD strategy is also performing a lot better than that with default parameters, with over 7 times the original profit.
That being said, please remember that this blog is merely showcasing how we can leverage genetic algorithm as an optimization tool, and hence has been focusing on the underlying concepts and the simple code structure (available here) for us all to give it a try.
If you would like to learn about how to avoid overfitting genetic algorithms, here a sequel to this blog post focusing on how we can leverage random subset selection and Coefficient of Variation to train a more robust Genetic Algorithm.
This is about it for this blog. Hope you find the blog or the code useful! Feel free to reply or pm if I have missed anything, or if you have any questions. If you are interested in tricks to become a better Python programmer, I have put together a list of these short blogs for you: