Transcript:
What’s up, guys? Welcome back to this series on reinforcement learning over the next couple of videos, we’re going to be building and playing our very first game with reinforcement learning. We’re going to use the knowledge we gained last time about cue learning to teach an agent how to play. A game called Frozen Lake will be using Python and open a ice gem toolkit to develop our algorithm, so let’s get to it so as mentioned, we’ll be using Python and open Ai Gem to develop our reinforcement learning algorithm. The gem library is a collection of environments that we can use with the reinforcement learning algorithms we develop. Jim has a ton of environments ranging from simple text-based games to Atari Games like breakout and Space Invaders. The library is intuitive to use and simple to install just run Pip install. Jim, and you’re good to go really easy. As that. The link to Jim’s installation, instructions, requirements and documentation is included in the description. So go ahead and get that installed now because we’ll need it in just a moment. We’ll be making use of Jim to provide us with an environment for a simple game called Frozen Lake. Well, then train an agent to play the game using cue learning, and then we’ll get a playback of how the agent does after being trained, so let’s jump into the details for frozen lake. Wait frozen lake like the frozen lake in [Music]. Sorry, but no, the frozen lake will be playing. Won’t have us fighting any white walkers and seriously if no one gets this reference, then you’re spending way too much time learning, deep learning and not enough time vegging out on. Well let me know in the comments. If you know where this scene is from, alright, let’s get into the real details for the actual frozen lake game. We’ll be playing! I’ve grabbed the description of the game directly from Jim’s website. Let’s read through it together, but with an accent, you know to add dramatic effect. Winter is here. You and your friends were tossing around a frisbee at the park. When you meet a wild throw that left the frisbee out in the middle of the lake. The water is mostly frozen, but there are a few holes where the ice has melted. If you step into one of those holes, you’ll fall into the freezing water at this time. There’s an International Frisbee shortage, so it’s absolutely imperative that you navigate across the lake and retrieve the disk. However, the ice is slippery, so you won’t always move in the direction you intend. The surface of the lake is described using a grid like you see here. Well, that was fun. This grid is our environment. Where s is the agent starting point and it’s considered safe for the agent to be here. F represents the frozen surface and is also safe. H represents a hole and if our agent steps in a hole in the middle of a frozen lake. Well, yeah, you know, that’s not good, finally. G represents the goal, which is the space on the grid where the prized Frisbee is located. The agent can navigate left right up and down and the episode ends when the agent reaches the goal or falls in a hole. It receives a reward of 1 if it reaches the goal and 0 otherwise so pretty much. Our agent has to navigate the grid by staying on the frozen surface without falling into any holes until it reaches the Frisbee. If it reaches the Frisbee, it wins with a reward of +1 if it falls in a hole, it loses and receives and no points for the entire episode. Cool, alright, let’s jump into the code. First we’re importing all the libraries will be using not many. Really, we have numpy. Jim, Random Time and clear output from I. Pythons Display next create our environment. We just called Gem, Make and pass a string of the name of the environment. We want to set up. We’ll be using the environment called frozen lake v-0 all the environments with their corresponding names. You can use are available on Jim’s website with this end object, we can do several things we can query for information about the environment we can sample states and actions retrieve rewards. And have our agent navigate. The frozen lake were now going to construct our cue table and initialize all the key values to zero for each state action Pair, remember? The number of rows in the table is equivalent to the size of the state space in the environment, and the number of columns is equivalent to the size of the action space. We can get this information. Using end observation space thaw in and in the action space thought in we can then use this information to build the cue table and fill it with zeros. If you’re foggy about cue tables at all, be sure to check out the earlier videos where we covered all the details. You need all right, so here’s. What our cue table looks like now We’re going to create and initialize all the parameters needed to implement the cue learning algorithm. Let’s step through each of these. First With NUM episodes, we define the total number of episodes. We want our agent to play during training. Then, with Max steps per episode, we define the maximum number of steps that our agent is allowed to take within a single episode, so if by the 100th step that agent hasn’t reached the Frisbee or fallen through a hole, then the episode will terminate with the agent receiving 0 points Next. We set our learning rate, which was mathematically shown using the symbol alpha in the previous video. Then we also set our discount rate as well, which was represented with the symbol gamma. Previously now the last four parameters are all related to the exploration exploitation Trade-off. We talked about last time in regards to the Epsilon greedy strategy we’re initializing our exploration rate that we previously referred to as Epsilon to 1 and we set the MAX Exploration rate to 1 and a main exploration rate to 0.01 The Max and Min are just bounced to how large and how small our exploration rate can be. Lastly, we set the exploration decay rate to 0.01 the rate at which the exploration rate will decay. Now all these parameters can change. These are parameters. You’ll want to play with and tune yourself to see how they influence and change the performance of the algorithm when we get there. Speaking of which, in the next video we’re going to jump right into the code that will write to implement the actual cue learning algorithm for playing frozen lake for now go ahead and make sure your environment is set up with Python and Jim, and that you’ve got the initial code written that we went through. So far also, come check out the corresponding blog for this video on D poster comm to make sure you didn’t miss anything. And while you’re at it, check out the exclusive perks and rewards available for members of the Deep lizard hive mind, Let me know in the comments if you’re able to get everything up and running and leave us a thumbs up to let us know you’re learning, thanks for contributing to collective intelligence and Ill. See you in the next one. Well, that agent lost [Music].