Transcript:

Hello, and welcome to the building blocks for deep learning where we go for different neural network layers and functions and show how and why they are used today we’re talking about categorical data so it should be a shorter one than before. So categorical data is data that’s non numeric, So an example of this might be a color or how happy you are today, like happy, sad like or what’s somewhat emotional state. You’re in these things don’t necessarily have a numeric interpretation, and so we call them categorical. There’s two types of categorical variables, there’s ordinal categorical variables, and then non ordinal categorical variables. If you’re interested in learning more about this, I do have a video on it, so you can go check it out. Now, The problem with categorical data and how it was treated with previous machine learning algorithms before neural networks is the follows if you had lots of categorical data. So if you were in a linear model so doing, you know? Svm or linear regression or something like that, and you had a high cardinality categorical variable. This is one it’s a categorical variable with lots of different categories, for example, zip code. The problem was that it, at least for linear models, you would need to make a feature for each different zip code and then also have a coefficient for each of those so it would sort of blow up the feature space, which made it really really hard for these things to Train. That’s what people would do, is they’d? Have this sort of pre-processing beforehand to go ahead and make sure that there were fewer number of categorical variables, same with decision trees, so gradient boosting or random forests. The problem is that there would be too many splits and because there’d be so many splits. It would be very, very hard to. I mean, you, you would either always ever fit or it’d. Be very, very hard to sort of get the the intricacies of these variables. And so you have to do pre-processing before. I think the sort of root of this problem is that some categorical variables are very similar to each other and they should almost be treated. Similarly, so for example, zip codes that are in cities are pretty similar to each other, or is it because they’re nearby or are really similar to each other? So why were they treated as completely different categories and unfortunately, these previous models would always treat them as different categories. Now, fortunately, with neural networks and embeddings, you don’t need to treat them as entirely different categories, in fact, when you go ahead and train neural networks with these different categories, you’ll find as specific categories grouped together and I do have a video on representational learning which sort of goes over this. So how do we use neural networks how to use embeddings with categorical data? That’s basically where we’re going to be exploring today, and what are the general good rules of thumb? So it’s pretty simple, so I’ll go ahead and sort of Bulrush forward, so I go ahead. I start with the numeric data set because the the initial scikit-learn date classification. And if you want to learn more about that, I also have another video. Thanks, Nate. For the shameless self-promotion. I’m always there for you guys. So because this can only give us numeric features, I go ahead and I sort of manually Make some non numeric features. I use PD cut all those. There’s another video for that too. Oh oh yes, so! If you’re interested in learning about pandas, you couldn’t. You couldn’t figure that out, So I use PD cut to go ahead and do this and you can. At least look at these variables. So initially we’ve got a couple of categories. The first five are categories, so it’s 54 so its category number 54 number 52 number 31 number 41 number 39 Now for neural networks. You’re gonna need to make the categories into into just their numeric representation, so it’s just a map from whatever the actual category was. This might be a word like read to the number 54 so it needs to accept a number, but it doesn’t, really. I mean, and the number needs to go from 0 to whatever the maximum number of categories it is, it doesn’t matter which number it is so just because 54 and 52 are close to each other. Numerically, It doesn’t mean that these categories are going to be close to each other, and it’s based on the implementation of in of embedding’s, which I’ll talk about a little bit later on and the rest of the data says numeric. So all these guys are in numeric, so I go ahead and I Separate The features into numeric and categorical features. I mean, you’ll you’ll need to do this for your data. Set before him, OK? So now we can throw our data into an. Er, now, right, no, no, no, remember? We need to go ahead and scale standard scale or data. Now notice what I did here? I only scaled the numeric data. The categorical data it doesn’t need to be standard scales. It doesn’t need to be standardized, so that’s it’s kind of nice, so it means that you can sort of skip some steps on that, but you still need to standardize your numeric data so like I had standardized it here, and now we are ready, so in this case. I’ve got two different inputs. I need to have them in two different inputs into Rtf Cari’s, so we’ve got numeric inputs, which has 20 and then we’ve got categorical inputs that have 5 so I go ahead and I make these two inputs now. The categorical inputs need to be treated differently and the way they are treated differently is that they need to be thrown into an embedding layer first. What does an embedding layer do so an embedding layer? There’s there’s a couple of things that it specifies so to make an embedding layer. We need to know the number of categories in total now. We need to know the the size that you want to represent each category, as so. What embedding layers do, and you can get a lot more in that in that previous video on on representational learning. But I will give you the long and the short of it Now. So what embedding layers do is they go ahead and they take each categorical input and they map it into a vector of weights. That’s it now. A vector of weights is nice because the neural network can go ahead and change them because their weights so it can learn what weights are appropriate there so we can learn what the Vector best represents this category, so what series of series of numbers five numbers in this case that are numeric they can be, you know, negative. They can be positive they can be decimal numbers. What best represents this category and the nice thing is that you know it doesn’t necessarily matter if this category is represented is like a negative 5 negative 2 and Negative 1 but as long as categories that are similar to, it are represented in a similar way, so maybe negative 5 negative 2 and negative 2 right, then the neural network can express similarity of categories. And therefore you use its subsequent layers to treat these categories. Similarly, right, so we go ahead and we basically say let’s put, lets. Go ahead and, you know, figure out what what numbers representing these categories categories that are similar should be represented by similar numbers so layers later on can learn a single set of weights to go ahead and treat these categories and similar categories will be treated in a similar way. So there is a sort of rule of thumb that. I’ve gone ahead and included here. This is the embedding size rule. This is technically! I just copied this from fast out. Ai, which is generally good for these empirical types of rules. But we don’t need to talk anything more about it right now, and this will go ahead and tell you how many what size of embedding you should use based on the number of categories that you have, It’s a good rule of thumb. You can just use it, it’s. It’s probably fine for most applications. So I do a couple of things here to just go ahead and make this the embedding categories kind of fit. I flatten it not super important if you want to learn more about that, I go ahead and I do a little bit of deep learning and and Caris and in a subsequent video. Boy, it’s not like six references so far, God, so many videos that you need to watch, so we’ve made our embedding layer and this is basically all there is to it, You guys if you watch the previous video in this series which you should sort of see in the play bar or in the playlist side? You’ll actually know what happens next. One might guess there’s three steps so standardized regularize linear regression. Aka dense, so we go ahead. We do one thing we can. We concatenate the categorical inputs together. We regularize we do dense because remember we standardized before this. And the initial categorical input doesn’t need to be standardized and then. I used the exact same neural network as I did before. So is really nothing complex about it. Let me make sure you go ahead and run these things. So if you want to learn more about this one, this neural network and why it’s or structure this way, Please watch the previous one. We go over this in quite some detail. Is the tabular data again? We’re doing the same problem. Binary Cross-entropy Rmsprop. So we go ahead and we run this here, and then we can check out our model Dot summary, so our model is that summary is. Oh, well, that’s kind of nasty that it does this kind of, like, a little bit of trimming here. Sorry, it’s. I’m not used to working in such a blown-out or in such a sort of big a haunted notebook and so again we’ve got lots of layers. You can kind of look at these. Yeah, the thing that I always check out here is down here. We check out the total number of parameters and we notice the total number of parameters is up because again we we represent each of our categories with with with trainable parameters. So in that case, we’ve got what 100 categories and then five for each, so you know, 500 and each of those will be represented in the input layer. This is also where the bootstrap sample generator becomes really really nice, so because we’re using a fit generator, we can really easily have numeric inputs and categorical inputs and Cari’s makes it really nice for this. So I really like this. Actually, so I go ahead and I do this. If you want to check this out on how you input this data in, it’s pretty cool, and then we can go ahead and start fitting and that’s. It really really this lesson was sort of more about like. How do you use an embedding layer? What does it do under the hood? And when should you use it for a neural network? You should basically just always use it. You can be lazy and with categorical inputs that are really really small, like maybe two three or four. You can treat them the exact same way that you treat them. In linear regression, we actually get a higher accuracy than we did in the previous example. Um, so hopefully that was pretty informative for what you need to do in order to use categorical data with neural networks and the real. If there’s anything sort of clutch here, it’s sort of looking at this slide here. So if you if you go ahead and sort of understand, what’s going on in this and this sort of one cell of the neural network, you’ll understand what you’ll need to do for categories. I hope this was interesting. I hope you liked it. Please leave comments or likes below otherwise. Check out the next video where we get into even more complex neural networks. Thanks.