Transcript:
All right, really quickly. What I like to do is go over the function for numpy that looks like this, so it’s basically like a concatenate function and let’s first import numpy, and we’re gonna create two quick rays. We’ll say x1 so think of these as rows in a spreadsheet, we have a spreadsheet that got column and rows were gonna pull some of this data and we’re gonna add it together in different ways to show how to use this concatenate. Let’s start with one row of data. We’re just going to create a numpy array here, and we’re gonna put two five. Let’s do 450 alright, and so now we have this array and lets we can review it really quick. X1 is literally an array of those numbers and we’re gonna create one more array. Alright, now we have two arrays. If we just did something like this, you’d say you have two separate arrays, two different objects that are each and a right now. Usually you have rows kind of stacked on top of each other so to do that. We can really quickly just say we’ll create this object called data and it’ll be another array where we’ll have an array of arrays, so I have multiple arrays in a row, so we’re gonna do this and what we’re doing is we’re basically creating a spreadsheet type of data object, so let’s call data and you can see here now. You’ve actually got an array of two five and four 50 with a comma and then another array and all of that is in brackets, which makes it an array of an array in Jupiter notebook automatically kind of assembles that for you in the spreadsheet feel so you can see a first array is a row and then x2 The secondary we created is another row, so we have two rows and as you get rows, You automatically start to have columns, so as you can see in order to create add rows, you literally can just create a bracket and add your different rows in there. Creating a column is not a story in case This isn’t quite clear how this this is coming together and do one more thing. Maybe to make it a little more clear. Let’s call, let’s give each column a name. Say the first one is people, it’s consider looking at houses, rooms and square feet, and then we can we’re gonna create something called a data frame, which is basically a spreadsheet. We’re gonna create a data frame and we’re gonna use our data object, which is our array of rays and the columns are just gonna be the features. So again we’ve got rows. This is Row 0 which is the first row row 1 The indexes. We’ve got how many people can fit in it. How many rooms it has? How many square feet it has and we start to build this kind of spreadsheet of data? So let’s say, though, that you had Another column features another column that you wanted to add. How would you do that? And this is where the function that we’re going over comes into play, so let’s create something called feature for write. Our first feature is people Our second feature is rooms and our third feature is square feet. Let’s say that we’ve done some data analysis. We’ve pulled some more figures and we want to tack on this fourth column and so we’ll create another array in this case and we’ll just put one – nothing complex. It’ll help us find it quickly, and this is where the this concatenate feature comes in, so you create a new object update data and we’re gonna do MPC underscore, and then all you do is put in there your original array of arrays, and then you add in this case feature for I have features for, and now when we look at this updated data, you can see that we have gone from where it’s 2 3 5 6 4 59 50 We now have 1/4 which is 1 and 2 This also will change the original array. So if you look up here at the original array was only 3 so now if we pull updated Data 0 which is the 0th index, you can see if this array is no longer, all right, and that’s really the main purpose of C. The question is, we know. When would you use this? Well, there’s two main places that I use it if I have data such as this here in these rooms, and I’m running some kind of regression analysis. I might use it to add my bias feature so to do that I would create a bias array, and if you’re if you’re a little fuzzy on what the bias is. If you remember, we’ve got this thing that says, let’s go down here. Y equals MX plus B. That B actually stands for bias, right, so this is how much we’re going to weight X to help us predict Y That’s what M is also known as the slope, but B is the bias, it’s kind of the initial starting figure. It’s where we can move our weights up or down to help us get more accurate version of why this bias is basically just a constant and so if we have a number of Xs or features in this case, we’ve got people room square feet, and we want a shift that we’re going to move that then we need to have a bias array that we can also adjust and use in our regression and so let’s go back here. We’re going to go to the bias array and we’re just going to create a starting point, so we want to, and this is going to give us an array of arrays. So basically, you can see. We’ve got this column. We want to add now to add that column. We can just take our updated data or you can even just do our original data and we’re gonna add in our data with the bias column is going to be [Music]. We can add in our bias, right up front and add in our data and now you’ll see data bias. We’ve got our bias right here. So again. These will be weighted. You know, baseline when you do your regression. But this is gonna be kind of the baseline bias feature that we can add right into our data, so we don’t have to do them separately the other time that I use it, and this is actually more often when I use it is when I’m building a pipeline and pre-processing data and so let’s say you’ve got some kind of data, you know, these are all numbers here, but let’s say you’ve got data that says whether or not it’s an apartment or a condo and so you’ve got let’s put in X 3 here, actually, so B feature 5 and you’ve got a 1 if it’s an apartment and 2 if it’s a condo 3 if it’s a house in this case, we’ve got two of them, so let’s do something like this. We’ve now got this fifth feature that tells us if it’s you know in this case, an apartment condo, which it isn’t or a house and when you pre process this, this is called categorical data. It’s category up here. We’ve got our actual numbers and figures. So these are all numerical figures. In other words, as the square footage gets higher, this number gets higher, the directly related the number of rooms as this number grows higher. The number of rooms actually grows higher as this number grows higher. It has nothing to do with anything getting higher, right, It’s just representing a category, so these are gonna be pre processed differently so often what you’ll do is you’ll split the data. If your original data is got it all together, you’re gonna split feature five off and you’re going to pre-proces’s it differently. Then if you the way you pre process actual numerical data at the end in order to build your regression, you’re gonna have to put them back together and so that’s where I use it. A lot is, um, you know when? I’ve finished, I’ll pull in the original data pre-proces’s clean. All the empties filled in my outlier is adjusted. Whatever I’m doing to pre-process it and get it ready to build my regression or my category analysis and what I’ll do is. I’ll then put it back together in here, so I’ll say OK? This is data with the bias and that I also want to add in my pre process, categorical data, which would be processed differently so basically. I’m splitting them and I’m putting them back together and when I’m done, I’ll have all my data back in one clean format to use for my regression analysis so again. I just wanted to kind of show you this. Np see what you can think of as kind of a concatenate what it actually does is, it adds, or concatenate on the second axis, so the first axis is the rose. So as you can see if I just put a comma and add this, It adds it on the first axis, adds another row. The second axis is the column. And so if you can see when I use this, it adds, whatever my my data that I have in here on the second axis, so that is the technical definition of what the the C underscore does, which is a sub of numpy, all right. I hope this was somewhat helpful. Just take an idea of what this does why it’s used if you see an examples and tutorials, that’s all–thats [Music].