Transcript:
Hey, guys, welcome to a new machine. Learning from scratch tutorial. Today, we are going to implement the linear discriminant analysis, algorithm or short. Lda, using only Python and numpy LDA, is a dimensionality reduction technique and a popular pre-processing step in machine learning pipelines. Lea is similar to the PCA technique that I already showed in a previous tutorial, the approach and the implementation of PCA and LDA have a lot in common. So I highly recommend that you watch this video first, and now let’s talk quickly about the concept of Lea. Before we jump to the code. So the goal, as I already said is feature reduction, so we want to project our datasets onto a lower dimensional space and find a good class separation. So here we have the difference between PCA and LE A so in PCA or principal component analysis, we want to find new access onto which we project our data such that we maximize the variance on the new axis. And now in Lda. The big difference is that we know the feature labels, so this is a supervised technique and here we want to find new axis such such that the class separation is maximized. So if you have a look at this image here, we have two different classes, and then we could project our data either onto the Y axis or onto the X Axis, And now, in this case, the Y axis would not be a good choice, but the X Axis is a good choice because here we still have a good class separation. So this is the concept of the LDA and here I listed the differences again between PCA and Lda, so in PCA again, we want to find the component axis that maximizes the variance of our data and in le a we want to do this, too, so within one class within the greenfield and within the blue field, we still want to have a good variance between the single features, but also, additionally, here we are interested in the axis that maximized the separation between multiple classes. So this difference here basically should be maximized in the new axis. And yeah. LDA is supervised learning, so we know our labels and PCA is unsupervised. So this is an important thing that we should remember. And now let’s jump to the math here. We have the so called Scatter Matrix, and we have two different, scatter mattresses, the within class scatter and the between class gather. This basically represents what? I was talking about here, so that within class scatter, make sure that our features within one class are good, separated and the between class scatter makes sure that the two or all the classes are good separated, and if we translate this to the math and we have to deal with the mean values and the variances, so the formula of the within class getter is the sum over the scatters and each scatter of one class, it’s the sum over, and then the feature value minus the mean value of all the features only in this class and then times the same transposed and then we sum over all the features in this class, so this is basically the same as in the PCA algorithm where we want to compute the covariance matrix. So this is almost the same formula as for the covariance matrix, except that we don’t have the scaling at the beginning. So this is the within class. Ketterer and then the between class getter here, the formula is the sum for all the classes and then for each class, we have the number of features in this class or sorry, the number of labels in this class and then times the mean value. So this x-bar is the mean the mean value of the features in this class minus the mean value in total, so the total mean of all features and then times the same transposed. So these are the two matrices that we have to compute and then we calculate the inverse of the within class scatter and multiply that with a between class scatter, and this is our eigenvalue and eigenvector problem that we have to solve. So this is the same as in the PCA. I will not go into detail again, So please make sure that you know what I ghen. Values and eigenvectors are so basically what we have to do then is for this formula. We have to calculate the eigenvalues and then so let’s have a look at the whole approach again here so here. I summarized it So first we want to calculate the between class scatter and the within class scatter. Then here we calculate the inverse of the within class scatter and multiply it with the between class. Scatter, then of this. We calculate the eigenvectors and eigenvalues then we sort the eigenvectors, according to their eigenvalues in decreasing order, and then we choose only the first K eigenvector’s that we specified so only the K dimensions that we want to keep and these eigenvectors are called the linear discriminant. That’s why it has this name, and then we transform our original data points onto this K dimensions and this transformation is basically just a projection with the dot products, so this whole approach is the same as in the PCA algorithm except that we have to solve the eigenvalue and eigenvector problem for a different formula in the beginning, so that’s the approach, and now let’s jump to the code. So, of course, we import Numpy SNP, and then we define our class and let’s call this LD a and here we define our in it, which has self, and it also gets the number of components that we want to keep and here we simply store that, so we say self and components equals N components, and we also create a variable that we call self dot linear this preeminence, and this is none in the beginning and here we want to store the eigen vectors that we compute, and then we define our fit method so here we have self and then we have. X and we also have Y because remember, this is a supervised technique, and then we also implement not the predict method, but we call it transform so transform. This is the same as in the PCA and here we want to get the new features that we want to project, so let’s implement the fit method so here first, what we want to get is the number of features, and we get this by saying X dot shape and then the index 1 so index 0 is the number of samples and here we only want to have the number of features. Then we also want to get all the different class labels, so let’s call this class labels and this is equal to num pi, and then we can apply the unique function of Y, so this will only only return the unique values in our labels as a list. And now we want to calculate the two scat mattresses so s underscore. W for the within class, gather and S underscore be for the between class, So let’s do this and first of all. I want to calculate the mean of all our samples because we need this later for one of the formulas, so we say mean overall equals Numpy dot mean of X and then along the axis 0 and then let’s initialize our two mattresses, so we say S. W or S underscore W equals numb tie 0 so we want to fill this with zeros, and we want to give this a size of the number of features times the number of features and the same thing with the between class scatter, so we initialize this with zeros so later we want to test this, for example, with the features of the iris dataset, so this has a size of this has. I think it’s 150 samples and 4 features. So this has size 4 times 4 and this is the same 4 times 4 and now we have to apply the two formulas, so we have to sum over all the classes and then apply these two formulas so we can do this in one for loop, so we say for C in class labels that we compute it and then what we want to get first Is we want to get only the samples of this class, so we say? X C equals X, where Y equals equals C. So where we have this label in the current iteration. And then we want to get the mean from these features means C equals, and this is Numpy Dot mean of X C along axis 0 so the same as we’re doing it here, but only for the features in this class and then let’s have a look at their within class formula. So here here we have our feature and then subtract the mean value, and then this is basically the dot product times the transposed and so let’s do this so here we say our S within plus equals because here we sum over all the classes so plus equals, and then here we say X C – mean C and then I transpose this and calculate the dot product times the same as we are doing it here so here. We have to be careful, so if we have a look at the formula again and we see that. I have to transpose term at the end and here I transposed the first term and this is because here we are having one more sum, so we do this for all the samples in this class and here we do this sum in one operation with the dot product, so with our numpy operation, and then we have to be careful with the sizes. So what we want at the end again is a four times four matrix like here because we append this to these matrices and in the beginning, our X C and our mean C has the size number of samples in this class times four, so we have to turn this around, so we have to say. This is size four times number of samples in this class. Because when we multiply this or when we compute the dot product with this one here, which is not transpose so here, we have the number of samples in this class times four. And then if we multiply this, then we get a matrix of the size four times four. So these are basic rules of matrix operations. Be sure that you understand this. So the last dimension of the first matrix must match the first dimension of the second. Matrix and then the final output size is composed of these two sizes. So this is why we have to transpose the first term here, so this might be a little bit confusing. Make sure to double-check this for yourself. And then we have the within class getter and now for the between class, scatter. What we want to get is the number of samples in this class. So we get this n C by saying this is equal to X C dot shape and here. We want to have the index 0 because we want to have the number of samples and then here again we have to be careful because we have to reshape our vector. So let’s say our mean div. Let’s have a look at the formula again. Here we calculate the mean of this class. Minus the total mean, so let’s do this, so this is. Let’s say we have the mean of this class. Minus the mean overall, and this is only one dimensional, but we want so this is where you have a look at the shape. Then this would say for comma nothing, but we want to have it to be four by one, so we have to say reshape, and then the number of features times or by one and this is because again if we have a look at the final multiplication, so the same way as we are doing it here, we want to have a matrix of size four by one and multiply it with a matrix of one by four, so this is basically 4 by 1 transposed and then we get a four by four output, so this is why we have to apply the reshape here, and then we say s be plus equals. And then here we have the number of samples in this class times and here. We have the mean diff dot the mean diff transposed, and these are both of our matrixes, so we finally have the matrices now. And now, as I said, we have to get the inverse of the within class getter and then multiplied with the between class getter, so we get the inverse also inverse, also with numpy by saying numpy lin elk dot in off s w and then dot, we multiplied with the between class scatter and let’s call this a and store this in this matrix. And then for this, we have to solve the eigenvalue and eigenvector problem, so we have to calculate the eigenvalues and eigenvectors. And now the following code is exactly the same as in the PCA algorithm. So please check that, so we get the eigen values and the eigen vectors by saying this is Numpy. Lin Alcott, I of a and then we sort the eigenvectors and the eigenvalues and for this the same as we are doing it in the PCA algorithm, so we transpose the eigen vectors by saying eigenvector’s equals eigenvector. Stop T So this makes the calculation easier, and then we sort the eigen values, so we say indices equals numpy dot arc sort off and here we say the eigen values and to make it a little bit nice that we actually want the absolute value of the eigen values, and then we want to sort this in decreasing order, so we use list slicing and use this little trick from start to end with a step of minus one, so this will turn the indices around and then we have it in decreasing order so now let’s get our eigen. Values in decreasing order by saying eyeing values equals eigen values of these indices and the same with the eigen vectors or eigen vectors, equals eigen vectors of this indices. And then we want to store only the first and eigen vectors, and we store this in our linear discriminant that we have here, so we say self dot linear discriminant equals eigen vectors and then from this start so the biggest eigenvector, with the biggest or the highest eigen value and then to self dot number of components that we specified so this is the number of dimensions that we keep and now we are finally with the fit method. So this is the hole fit method and then in droid. Transform the only thing that we do here. Is we project our data onto this new components and the transformation is nothing else than the dot product, so we can write this in one line and return Numpy Dot, and then we project our data onto the self dot linear discriminant. And since we are transposing it here, we have to transpose it again here, and then we are done so this is again the same as in the PCA. Please double, check this for yourself. And now we are done and now we can run the script so here. I have a little test script. And this is basically the same. As in the PCA tests. The only thing that I exchanged here is instead of PCA. We create the le a and want to keep two components and then we call the fit and the transform and we do this for the iris dataset and then I plot the new labels that our project projected onto two new two dimensions. So let’s run this, so let’s say Python l da underscore test up Pi and hope that everything’s working and, yeah, so here we see our trans post features in only two dimensions now, and we see that the classes are very good separated so here we have the three different iris classes, and we see that this is working so our LD, a feature reduction method works. And, yeah, please again, Compare this with the PCA algorithm, And I hope you enjoyed this tutorial. If you like this, then please subscribe to the channel and see you next time bye.