Hey, there, how’s it going? Everybody in this video we’re going to be going over, scatter plots so scatter plots are great when you want to show the relationship between two sets of values and see if they are correlated so first, we’re going to look at a basic example of a scatter plot using some data within our Python script here, and then we’ll look at some real world data that I have within a CSV file and the data within the CSV file are the views and likes of Youtube videos on the trending page The day that I made this video so that might be interesting to plot out. Now I would like to mention that. We have a sponsor for this series of videos and that is brilliant ORG, so I really want to thank brilliant for sponsoring this series and it would be great if you all could go check them out using the link in the description section below to support the sponsors and I’ll talk more about their services in just a bit so with that said, let’s go ahead and get started. OK, so I’ve got some sample code pulled up here in my script. Now first, we’ll see how to use these scatter plots using this list of data directly within my script, and then we’ll look at a real world example with data that I’ll load in from a CSV file now. If you’ve been following along with the series, then you’ll likely recognize the other matplotlib code that I’ve got here at the moment, but if not, then let me go go ahead and go over this real quick so here at the top of the file. I’m importing pandas. I’m also importing Pi plot from the Matplotlib Library. We are using a plot style here of Seabourn. We’ve been using 538 a lot throughout the series, but I want to show what some of these different styles look like now. We just have some X and Y data here of some random points between 1 and 10 I’ve got some other code. Commented out here for now. Don’t worry about that, we’ll talk more about that when we get to it. We also have a title here for our plot X and Y label those are common and out at the moment we also have a tight layout here with just which just add some automatic padding to our plots and also we’re doing Plt show, which will actually show our plot. OK, so now let’s look at how to create a basic scatter plot. OK, so I’ve got a random list of values here for an x-axis and a random list of values here for a why axis and some of these values are repeated so to create a scatterplot out of these values, it’s as simple as saying PLT dot scatter, and we want to do our X and our Y values. So if I run this and let me make this a little larger here, then we can see that we have a scatter plot of these random values. Now scatter plots are really nice for seeing different trends or outliers or things like that. Now since these are random, then we don’t really have any trends here, but sometimes that’s important to know also that there’s not a trend, so our scatter plot since it looks random like this. Then this likely tells us that there’s no correlation between our two lists of random values and we’ll see data data set here in a bit that is more correlated with the values, But for now let’s look at some basic customizations that we can make to our scatter plot so first of all. I feel like the sizes of the dots. Here are a bit small on this plot and you can change that by setting the size and that is an S argument, so we’ll set s equal to these sizes are always a bit weird. I have to always look up in the documentation. How these aren’t related. But if we run this, then now we can see that. These dots are a little bit larger here on our scatter plot. Now, if you bump that up to like 500 or something, then it would really be noticeable, so we can also change the color and the marker styles of these plots as well so to change the color, we can simply pass in a C argument, so we’ll say C is equal to Green, and there are all kinds of different marker styles that we can use as well. I’m going to leave a link to the page where you can go find the different styles. I’ll leave a link to that in the description section below. If you’d like to see the different styles, But for example, if I wanted to have like an X symbol, then as our marker, then I could simply say marker is equal to X. So if we save that and we run it, then we can see that now. We have green markers and these green markers are shaped like. X’s so let me close that out now. I almost never use anything other than to default markers, but the option is a there If you want it, but I’m gonna remove that for now and just go back to those default Circle markers. Now another thing about scatter plots is that I think they look a lot nicer. If we add edges to the circles and also give them some alpha so that we can see through them a bit so to show you what this looks like. Let me add an edge and a color so. I’ll say edge color is equal to well just set that equal to black. That’s going to be the edges of the circles and for the line width, let’s also set the line width of that edge, equal to one and to give an alpha to our color so that it softens it up it a little bit, It’s as easy as just saying Alpha is equal to let’s do 0.75 so if I run this, then we can see that now. This looks a bit nicer here. I think that these dots with these black edges looks a lot better now. If your color’s a bit too soft, then you can always play around with that Alpha. Okay, so another thing that I want to show You is how the colors and sizes can actually be on a per mark basis, rather than applying them to all of the marks. So what would why would you want? Multiple colors or sizes? Well, having the ability to have multiple colors and sizes actually allows us to add additional datasets into our plot, So for example, let’s say that we have our current plot that we just looked at, but we wanted to add some additional information, so for example, let’s pretend that our current plot is some survey data about a bunch of people and we wanted to break down the data further into something more specific. So for example, let’s say that we had these people rate something from 1 to 10 and we wanted to somehow plot their rating as well well to do that, we could simply assign different numbers to these different possibilities and those will then give you different colors on your scatter plot as long as you pass that into your method, so I have a color’s variable commented out here, so let me uncomment this and let me move this above our scatter plot, and I think this will make a bit more sense once we plot this out, Okay, So the numbers this colors lists. These are numbers between 1 and 10 Now like I said before. Maybe this could be a person’s answer to how satisfied they are with a certain product or something like that. So each of these values will correspond to a data point in our X and y variables. So now if we pass this into our scatter method as the color argument, So I’m going to say whoops. I accidentally pasted that. I’m going to say see instead of being equally good at green. I want it to be equal to colors. So now I’m gonna run this. Then we can see that we get different color dots. So what this is doing here is whenever we plot this X and Y value here of 5 and 7 it also has a color of 7 and these different colors here 759 All of those are going to correspond to different colors here on our chart now. I really don’t like the colors that we’re getting here. These are just shades of gray. We can actually change these by using a color map and just like the marker symbols. There are a ton of built-in color maps that we can use and I’ll be circling, leave a link in the description section below to all of the color map options. If you’d like to play around with these as well one that I personally like is called greens, so if I come down here after colors and I say, see map is that argument see map is equal to, and that is greens with a capital. G be sure I put in a comma there. Now let me run that. And now we can see that now. We’re getting these different shades of green as the intensity, so I think the lighter ones are closer to 0 and the dark ones are closer to 10 but we really don’t know based on how this is represented here, so you’re probably going to want to add a label for your color map, too, So that people viewing our chart know what these colors represent so to do that we can add a color bar legend, so I’m going to close this down and below our scatter plot. We can just say C Bar is equal to, and this is going to be Plt Dot color bar, and that is a method, And now I’m going to say C Bar dot set, underscore label and now we can set a label for this So like I said, maybe this could be like a satisfaction level or something, so I’ll just say satisfaction, so now let me run that, and now we can see that we have a color bar here on the right side, and now we have multiple points of information here, so we can see our XY data, but then we can also see okay, Depending on how these are colored. Is that person’s satisfaction level, so these dark ones are very satisfied and the lighter color ones. We’re not satisfied now. We can also change the sizes of our data points as well, so just like with the color. This can add in an another way of explaining our data even further, so for example. I see a lot of scatter plots that use the size of the dot for things like population and things like that or maybe even the sample size for that data point. So I’ve got a random list of sizes, commented out down here as well. So let me grab these and let me move those up underneath colors and just like with the colors. This is a list of 20 different sizes here that correspond to the data points and the X and Y variables that were plotting. So if I was to pass in this size’s list here as my sizes. So let’s say I took sizes. Oh, no sizes right here. Sorry, so I will paste that in as sizes there. And now if we run this, then we can see that now. Each of these individual plots have a size as well and you can use that for different types of data. Okay, so now that we’ve seen how to do these things with our simple sample data here, let’s now look at how we can plot out some real-world data from a CSV file that I have here in my current directory. So in the CSV file, I pulled down some of the data from the Youtube API, and these were the 200 the top 200 trending videos on the day that I recorded this video and I wanted to do a scatter plot of their total views and their total likes, and I also calculated out the ratio of likes to dislikes as well now in a future video. I’ll actually cover the Youtube a di and show how I grab data like that from there, but that’s really a different topic, so let’s just continue focusing on matplotlib here for now and we’ll save that for a future video. So let me pull up this. Csv file here and let’s see what this looks like so. I’ve got that CSV file open here and we can see that. These are just the top 200 trending videos on Youtube. The day that I recorded this, so I didn’t grab the the titles or anything like that. I just wanted to see. If there was a correlation between their view count and their likes so here, we have the view count of the video and so each one of these is a different video. This one had 8 million views. This one had 9 million views and so on. We have the likes for that video in the second column here, and we also have the ratio of likes to dislikes. Now since this is on the trending page, a lot of these have a high like to dislike ratio, so this one has 96% likes two dislikes 98% and so on so we can probably take a guess that the more views a video has the more likes. It’s gonna have, but to see exactly what that correlation looks like. We’re gonna have to plot it out, So I’ve got some code commented out down here at the bottom that’ll pull in some data from that CSV, so let’s remove the sample code that we were working with earlier and uncommon out that other code. So I’m going to keep our scatterplot here, so I’m going to cut that out and I’m just gonna paste that down here between the ratio and the title now for everything else. I’m just going to remove all of this sample data that we were using before. Okay, and now let me uncomment out where I’m loading in that data and also these titles and X&Y labels here. So let me describe how what we’re doing when we’re loading in the state of here. Now I’ve been using pandas Read CSV method throughout this series, But for those of you who haven’t seen the rest of the series and are just watching this video. Let me quickly explain what this is doing so. I’m reading in this CSV file, and this is the name of the CSV file here and again this will be included in the description section below if you’d like to download this and follow long. So it reads in that CSV and grabs all that data and then we’re setting this view count variable here equal to data and the view count key. Now what that does is it sets this view count equal to that entire column, so it’s setting it equal to all of these view counts here and also with likes it’s setting that equal to the likes column, so it’ll be equal to the first value. Is, you know, three hundred thousand five hundred and sixty thousand and so on and Lastly ratio same thing, it’s getting that ratio key and setting it equal to that ratio column. Okay, so to plot this out using the same scatter plot that we used before we can simply say that we want the view count, Let’s put this on the x-axis and I’ll put the likes on the Y-axis. Now let me remove the size colors and the color map for now and I will leave the edge color, set the black and the line width and the Alpha, but the sizes, colors and color map. I’m gonna take out for now. Okay, so let me run this. Okay, so we can see here that we get a scatter plot now like I said, This is the top 200 videos on the trending page, so there should be two hundred dots here now. It looks like some of our data is bunched here in the bottom left, and that’s because we have one outlier here and at the top, right, that was a video that had a lot more views and a lot more likes than the other views on the trending page actually went back to the original data to see what video was messing up. My nice little scatter plot here, and it was the new Old Town Road music video by Lil Nas X and Billy Ray Cyrus. So that’s who to blame for that outlier, but I’m actually glad that there was an outlier because it reminded me that we can use a log scale with scatter plots as well to lessen how much those outliers actually skew the plot, so let’s make this look a bit better and use a log scale for our axis. We can simply say down here below our scatter method. Let’s say PLT DOT X scale, and we will use a log scale for our X axis and we will also use a log scale for our Y axis. So I’ve got those put in there now if I run this. Then now that it’s using a log scale instead of a regular scale now we can see that those outliers don’t skew the data so much and that we can kind of see the correlation better here, so the correlation between how many views a video has and how many likes it has really stands out here in this plot. So now let’s also use the ratio of likes and dislikes in this plot. I think that would be a good metric to use for the color of our points and we could also try to use that for size as well, but I think that the ratios might be a little bit too close for us to really tell the difference in sizes like we can for the colors. So I’m just going to use the colors and not worry about the sizes so to do this. I can close down our current plot, and now I’m also going to use another color map so that we can see another example of a different color map so here within our scatter plot. Right after our likes, this doesn’t have to be in any particular order. I just want to put them here. I will say C is equal to, and we want the color to be equal to the ratio, so that will come in and it’ll set the colors on a color map scale using these ratio values here for each of our markers. So now that we have that. I’m also going to pass in a C map, a color map and let’s use a color map of summer. I think that’s a color map that I liked all right and also below scatter. Let’s also put a color bar so that we know what this represents, so I’m gonna say C Bar is equal to PL T dot color bar, And now we want to set a label for that as well, so I’ll see C bar dot label and we will say, like dislike ratio. Okay, whoops, not ration ratio. Okay, so let me run this and make this a little larger here and. I think that this looks really nice. We can see now that we get those colors representing the like to dislike ratio. And we have our color bar here, telling us what these numbers actually represent. So the ones that are more bluish green. Have you know performed less well? The ones that are bright, yellow performed better, so the bright ones are up in the 90s and the ones that are a little, darker and bluish green are kind of down in the 50s and 60s which would mean that they had almost as many dislikes as likes on that video. But since I got these from the trending page. Most of these are actually going to be on the higher end since those are more popular videos anyway, but we can see that we do have some dark ones mixed in here and most of those dark ones do fall on the bottom side of our page here, so using a scatter plot like this is a great way to get the correlation for the values that you’re plotting out and also using colors and sizes. Can you know add it to where you are adding in even more metrics and putting more information into your plots? Okay, so we’re just about finished up here, but before we end. I’d like to mention the sponsor of this video and that is brilliant Org. So in this series we’ve been learning about matplotlib and how to plot data in python and brilliant would be an excellent way to supplement what you learn here with their hands-on courses. They have some excellent courses covering the fundamentals of statistics and these lessons do a deep dive on how to think about and analyze data correctly, they even use Python in their statistics courses and will quiz you on how to correctly analyze the data within the language, they’re guided lessons will challenge you, but you also have the ability to get hints or even solutions. If you need them, it’s really tailored towards understanding the material. They’ve also recently released a programming with Python course, and they even have a coding environment built into their website so that you can run code directly in the browser. And that is a great way to compliment watching my tutorials because you can apply what you’ve learned in their active problem-solving environment, and that helps to solidify that now so to support my channel and learn more about brilliant, You can go to brilliant Org Ford Slash CMS to sign up for free and also the first 200 people that go to that link will get 20% off the annual premium subscription And you can find that link in the description section below again. That’s brilliant Dot Org forge slash. CMS, okay, so I think that is going to do it for this video. I hope you feel like you got a good understanding of how to use scatter plots and the kind of data that this type of plot is good for now like I said, it’s really nice for seeing these correlations in the data like how the views and likes were related for the training pages. Now in the next video, we’ll be going over time series plots and these are very similar to the line plots that we saw before, but they’re focused on data over a certain amount of time instead. So definitely be sure to check that out, but if anyone has any questions about what we covered in this video, then feel free to ask in the comment section below, and I’ll do my best to answer those, and if you enjoy these tutorials and would like to support them, then there are several ways you can do that. The easiest ways is simply like the video and give it a thumbs up and also it’s a huge help to share these videos with anyone who you think would find them useful. And if you have the means, you can contribute through Patreon and there’s a link that page in the description section below. Be sure to subscribe for future videos. And thank you all for watching you.