Transcript:

What is going on everybody? And welcome to part 15 of the machine learning with Python Tutorial series in this tutorial, we’re going to be building on the last couple, which we’re talking about K nearest neighbors. We talked about the intuition it’s. Basically, how close is this point to the K closest points and whatever the majority classes of those K closest points, we say this new point is that class and in the previous tutorial, exactly what we did was apply. K nearest neighbors to real-world data set, we found that it’s actually fairly accurate, which was very cool so now we’re going to break down the K nearest neighbor’s algorithm and rewrite it ourselves from scratch in code, but first we have to cover what everything hinges on, right, it hinges on this distance. So what is that distance it is Euclidean distance. So what is Euclidean distances, of course, named after Euclid famous mathematician popularly referred to as the father of geometry definitely wrote the book on it, right Euclid’s elements, which is arguably the Bible for mathematicians and scientists, Also fun fact. Is, you know whenever someone would would create a printing press? The first thing you’d start popping out was the Bible, of course, and then the second thing was most likely Euclid’s elements. So anyways, what is Euclidean distance? First we have is the sum to N and in this case, N represents the number of dimensions in your data, so just think of it as in this case as dimensions, but really, this means sum to N, where I starts off at least as being equal to 1 OK, so it really just means I starts as one goes up to N where I actually is your dimensions. So if you just have one dimension, it would just you would just do this one time, and it’s the sum of what and in let’s do parentheses here it is going to be. Q I minus P. I squared And then this entire calculation, we do the square root of it and this is Euclidean distance, so I is just the dimension’s Cube is one point. P is a different point, right, so this would in theory if you just if you got rid of N and I you got rid of the whole sum and you just left the parentheses part like if you just left. I hate to circle legs. I want to mess it up, but if you just left the whole parenthesis squared, This would be. That would be the calculation for a one-dimensional distance between two points in Euclidean space. Anyways, but now let’s actually break this down into simple mathematics. I always like to do it by hand. For some things, we won’t always do by hand. We won’t actually do the calculation, but I’ll show you how you would plug it in at least so we’ll start off and say Q is equal to 1 Comma 3 so these are the coordinates for our data point, and then P the coordinates for P X and Y So this is two dimensions is two five. Those are the coordinates, so then how would we calculate the Euclidean distance? Well, it’s going to be the square root of basically a couple things, so we know we have two dimensions, so we know that basically, what’s going to start off as will be something like it’ll. Be, you know, the square root without that dot there square root and we know we’re going to have at least two of these right because we’ve got two dimensions here. We recall it’s the summation of these, so it’ll be a plus here and then this will be squared and this will be squared and then we just need to fill in the subtractions so initially it’ll be q1 minus p1 so it would be 1 minus 2 right 1 and then over here, we’ll just put a 3 and then it’s minus minus 2 and 5 and that would be the Euclidean distance. Okay, so simple enough. Let’s head over to Python and actually create this, so in Python here, let’s just recreate exactly what we just did by hand, so I’m instead of Q and P. Let’s say Plot 1 equals, and we’ll do one three and plot two equals to five. Okay, now we’re going to let’s go up to the top and say from math Import SQ RT, which is just importing the square root so coming back down here, converting this to Euclidean distance or basically calculating the Euclidean distance between these two plots is the following so Euclidean. Underscore distance equals s qrt, so remember, it’s the square root of the sum of each of the dimensions, – that same dimension in each of the plots or two plots. Really, you’re going to calculate distance between two plots. So in this case, it would be, for example, plot one, the Zeroeth element, so the X of plot one minus the X of plot, – right, so – plot 2 and the X, so the zeroeth. Okay, so that’s 1 and remember it was the sum of all of these, so it would be that plus, and then basically the exact same thing on the instead of the 0 if it would be the one, all right, so 0 1 1 so you can think of these as your dimensions, right, So this is dimension 0 and this is dimension 1 so this is two dimensions, as indeed it is so that would be the eye in that equation Just for the record so anyways. Euclidean distance boom done, lets. Go ahead and, oh, these also need to be squared, so that squared and this squared right so that is squared. This is squared and then the entire operation is we get to grab the square root of that. So now let’s print the Euclidean distance, so we get two point. Two, three, six zero and so on, but basically, that is your Euclidean distance, so now that we know how to calculate Euclidean distance, We basically the crux of everything we need to do. K nearest neighbors. But we have kind of like a lot of framework to create regardless, so that’s what we read in. The next tutorial is creating the framework that will take a data set and use K nearest neighbors to classify points. So if you have any questions or comments up to this point, feel free to leave them below, otherwise as always, thanks for watching, thanks for all the support and subscriptions and until next time.