Transcript:
Last week’s video was about genetic algorithms and what they are in this week’s video, we will code a genetic algorithm in python. I highly recommend you watch last week’s video first. Since I will be referring to it a lot, let’s get started. We will put together a small framework to play around with and solve. The knapsack problem discussed in the last video. What do we need first? A genetic representation of a solution we need to generate a genome for one solution. A genome in our case is nothing more than a list of ones and zeros, where a one refers to an item being included inside the backpack and a0 means the respected item at this index is left out. I define the type here. I know that Python is an untyped language, but using type-ins doesn’t harm anyone and enhances the readability of your code, a lot in my opinion to generate a genome for one solution, we just generate a random list of ones and zeros of length K, which should be of exactly the same length as the list of items we want to choose from what’s next the function to generate new solutions. A population is simply a list of genomes to generate it. We call our generate genome function multiple times until our population has the desired size, a fitness function to evaluate solutions. All right, past self, here we go. Our fitness function takes a genome the list of things we can choose from and a weight limit as input parameters and simply returns a fitness value to calculate the fitness. We iterate over all things and check if it is part of the solution by checking if the genome has a 1. At the given index if it has, we add the weight and the value of this item to our accumulation variables as soon as the weight exceeds the given weight limit, we abort the iteration and return a fitness value of 0 because this solution is invalid. If we manage to get to the end of the list of things without exceeding the weight limit, we are returning the accumulated value. Ah, and a smart precondition is to assume that the given genome should have the same length as our list of things but key. What exactly is the thing? I hear you ask good question. A thing is a structure with a name, a value and a weight with it. We can put together different list of things, and also I like how referring to old videos makes everything much easier and now the select function to select the solutions to generate the next generation. This function selects a pair of solutions, which will be the parents of two new solutions for the next generation solutions with a higher fitness should be more likely to be chosen. Luckily, the function choices from Python’s random module is giving us the possibility to hand over weights for each element it can choose from by handing over the fitness of a genome as its weight. The fittest solutions are most likely to be chosen for reproduction, the last parameter simply states that we draw twice from our population to get a pair, but what is this fitness function? Why not simply call our fitness function? This comes down to software architecture and separation of concerns the fitness function we defined earlier, wants to have a list of things and a weight limit, which are both problem specific parameters, which means they are only relevant to solving the knapsack problem. As soon as we want to use this implementation for a different problem, we would need to implement a different selection function. All we need inside. Our selection function is a fitness function, which takes a genome and returns a fitness value to make the correct choice. Let’s go on in order to get a new solution for our next generation. We need a crossover function. The single point crossover function takes two genomes as parameters and returns two genomes as output. We randomly choose an index to cut the genomes in half at and take the first half of the first genome A and the rest of the second genome B put them together and return this as our first new solution for the second solution, we take the first part of genome B and the second part of genome a for this to work. We need our genomes to be the same length and a crossover function only makes sense if the length of the genome is at least two, otherwise there would be no point to cut them in half at and the last puzzle piece is mutation and a mutation function too late. The mutation function takes a genome and with a certain probability changes once to zeros and zeros to ones at random positions for that we choose a random index and if random returns a value higher than probability, we leave it alone, otherwise it is in our mutation probability, and we need to change it to be the absolute value of the current value. Minus one. What, yes, if the genome is 1. At the index, we calculate 1. Minus 1. Which is 0 and the absolute value of 0 is still 0 but if it is a 0 we subtract 0-1 which results in -1 and the absolute value of -1 is 1. Voila, we just target a bit in our genome. The most complicated way just to write it down in one line and look more intelligent. Don’t do this at work. Your colleagues will hate you, but like and subscribe, so in order to put all of this together, we need the actual evolution to be happening and in order to keep the separation between the problem and the algorithm, we will use a lot of functions as parameters, a populate function takes nothing and spits out new solutions. A selection function takes a population and a fitness function to select two solutions to be the parents of our next generation solution. I know we could generalize that even more, but that’s your homework. I’d say a crossover function takes two genomes and returns two new genomes. The mutation function takes one genome and sometimes returns a modified one. Let’s put it all to action now, and let’s define the function that actually runs the evolution run. Evolution takes a ton of parameters. First we hand over the populate and the fitness function. The fitness limit defines one end condition. If the fitness of the best solution exceeds this limit, we are done and have reached our goal Next. We hand over the selection crossover and mutation functions, which we can initialize with our default implementations. The generation limit is the maximum number of generations. Our evolution runs for if it is not reaching the fitness limit before that first we get ourselves the first ever generation by calling the populate function now we need to loop for generation limit times. The first thing we do is to sort our population by fitness this way. We know that our top solutions are inhabiting the first indices of our list of genomes, which comes in handy when we want to check if we have already reached the fitness limit and returned early from our loop or if we want to implement elitism and just keep our top two solutions for our next generation. Now it is time to generate all other new solutions for our next generation, we pick two parents and get two new solutions every time, so we loop for half the length of a generation to get as many solutions in our next generation as before, but because we already copied the two top solutions from our last generation, we can save one more loop here in each loop. We call the selection function to get our parents and by putting them into the crossover function, we get two child solutions for our next generation in order to expand the variety of solutions we generate. We also apply the mutation function for each offspring. That’s how we generate our next generation after we are done. We replace the current population with our next generation and start into the next round of the algorithm by sorting the population and checking. If we reached our fitness limit this time after we reached the fitness limit or looped for generation limit times. We need to return something. I chose to return the current population, which needs to be sorted one. Last time, in case, we ran out of generations and to also return the number of generations. We ran! The second! Return value is interesting to quickly distinguish whether the algorithm exhausted the generation limit or actually found a solution that meets our fitness criterion. Easy, let’s put it all together. We call run evolution and store the last population and the number of generation inside of two variables next we want to hand over our populate and fitness function, but as explained earlier, the signature of these two functions differ from what run evolution expects it to be therefore, we use partial to preset the parameters, which are specific to our current problem. That’s how we can adjust our populate function without handing the population size and genome length to the run evolution function and without the need to write a completely new populate function, we hand over the list of things to our fitness function and predefined the weight limit to be 3 kilograms. We use the first example from last week’s video and know that the correct solution is laptop, headphones, coffee mug and water bottle for a value of 740. Let’s see how many generations we need to find the correct solution and let’s print the solution as well. I have prepared a little helper function to print the things based on a genome, Let’s run it. The genetic algorithm found the correct solution in two generations. Let’s run it a couple of times, but this time let’s also measure the time 10 generations in 0.0019 seconds, 11 generations and 0.0023 seconds and 0 generations in 0.000 seconds. Let’s add more things. [MUSIC] You also need to adjust the fitness limit to 1310 and also change the output mapping here. The best solution should be phone baseball, cap, laptop, headphones, coffee mug or it could be mint’s, socks, tissues, phone baseball, cap, laptop headphones, Waterbot, which has the same value. Let’s check this time. It needs more generations to find the correct solution, 22 generations in 0.006873 seconds, or 6 generations in 0.00161 seconds. I hope this video helped you to understand how to write your own genetic algorithms. If not, I also put a link to a Github repository into the description below, so you can re-examine and experiment with this code, a little bit more for an almost weekly video about programming, subscribe and hit the notification Bell now and while you are already here. Check out one of my other videos. I will see you in the next one.