Transcript:

When the equation seemed too complicated, stat quest to the rescue, Hello and welcome to Star Quest Stat Quest is brought to you by the friendly folks in the genetics department at the University of North Carolina at Chapel Hill. Today we’re going to talk about p-values. People often think that p-value means probability they are related, but not the same. Let’s look at a simple example to learn more. Our simple example is just going to be flipping a coin two times that’s about as simple as it gets the first time we flip the coin 50% of the time we’ll get heads and 50% of the time. We’ll get tails and the second time we flip the coin again 50% of the time we’ll get heads and 50% of the time. We’ll get tails. What’s the probability of getting two heads in a row and what’s the p-value for getting two heads in a row? We’re going to answer both of those questions real soon. Let’s start with the first one. What’s the probability of getting two heads in a row like we just said, the first flip will give us heads 50% of the time and tails 50% of the time. If we get heads the first time we flip the coin and then we flip the coin again. There’s a 50% chance. We’ll get heads and a 50% chance. We’ll get tails. If we get tails the first time we flip a coin, the second time we flip a coin. We still have a 50% chance of getting heads and a 50% chance of getting tails after flipping a coin twice. We have four possible outcomes in this case. Each outcome is equally likely because each one is equally likely we can use the following equation to calculate probability. We take the number of times. Two heads occurred divided by the total number of outcomes of the four outcomes. Two heads in a row only occurred once. Doing the math gives us the probability of getting two heads in a row In this case. It’s 0.25 What if we got two tails in a row? The probability of getting two tails is just like before the number of times two tails occurred in the outcomes divided by the total number of outcomes, this gives us 1/4 or 0.25 the exact same Probability we got for two heads in a row. That’s kind of a no-brainer, but this will come in handy in a few more slides to just bear with me. What about getting one heads and one tails and don’t? You think it’s a little funny that in English, we say one heads were heads is plural weird. The probability of getting one heads and one tails, regardless of order is the number of times H and T or T and H occurred divided by the total number of outcomes. In this case, we got one. HT + 1 Th So that’s 2/4 In the probability of getting one heads and one tails is 0.5 thus, we are twice as likely to get a heads and a tails than we are to get two heads in a row. Why don’t we care about the order? Usually the order of things doesn’t matter, for example, if we are interested in Mouse weights, it doesn’t matter if we weigh Mouse X before or after Mouse Y Wang Mouse X First or second won’t change mouse wise. Wait, here’s a biological example instead of H meaning heads assume H is one allele for a gene and T is the other allele for a gene. If the mother is a heterozygote and has both alleles for the gene, then 50% of our gametes will have the H allele and the other 50% of our gametes will have the T allele. Similarly, if Dad is also a heterozygote for the gene, then 50% of his gametes will have the H allele and 50% of his gametes will have the T allele knowing what the gametes are. We can figure out the possible genotypes for the offspring, Just as before. The probability of one of the offspring, getting to H alleles from apparents is the number of times H H occurred in the list of possible. Genotypes of the offspring, divided by the total number of outcomes in this case that means 1/4 or 0.25 What if Mom was a homozygote for this gene? That means she has two. H alleles. If Dad was still a heterozygote, then our list of possible genotypes for the offspring would be slightly different than before this time. H H occurs twice in the list of possible outcomes. Thus, the probability of one of the offspring getting H H from a parents is just as before the same formula, but now we have two in the numerator and the probability is 0.5 Let’s go back to assuming both parents or heterozygotes. Or if this genetics goon Babel doesn’t mean anything to you, You can imagine we’re flipping a coin again. We’ve solved for the probability for getting H H Now. Let’s solve for the p-value. A p-value is the probability that random chance generated the data or something else that is equal or rarer. A p-value consists of three parts. The first part is the probability that random chance generated the data that you observed we’ve already taken care of this first part. We’ve worked out the probability for getting two heads or two. H Alleles. The second part of a p-value is to add anything else in the outcomes that has equal probability that is to say getting two. Ts is considered equal to getting two HS. Since it has the same probability of occurring, we already worked out that probability as well. The last part of a p-value is to add on anything that’s even rarer than what you observed since there are no outcomes that are rarer than getting H H. This part is equal to zero, adding up the three parts. The p-value for H H is 0.5 to summarize this example. The probability of getting H H is 0.25 The p-value for getting H H is 0.5 in this case. These two values are not equal. Okay, so now that we’ve looked at a simple example, let’s move on to a slightly more complex example. Now we’re going to flip the coin five times, and if we get heads each time, what’s the probability of that well, as we saw before the probability is the number of outcomes of interest divided by the total number of outcomes? We need to figure out what the outcomes are and then plug them into this equation here. I’ve listed all the outcomes of flipping a coin five times in a row. This is the outcome of getting all heads. These outcomes represent getting four heads and one tail. These outcomes represent getting three heads and two tails. These outcomes represent getting two heads and three tails. These outcomes represent getting one heads and four tails. Lastly, this outcome represents getting five tails. Whew, that’s a lot of outcomes Thirty-two. In total, here’s our equation for calculating probability and since there is only one occurrence of five. HS in a row. We put one in the numerator and since there are 32 different outcomes, we put 32 in the denominator and the resulting value is a small number 0.03 something something something? Now, Let’s calculate the p-value for getting HHH HHH. I hope that was five. H’s just to remind you. A p-value is the probability that random chance that generated the data in this case, five HS or something else that is equal or rarer. In this case, there is nothing rarer than H HH HH, So we’ll ignore this last part, just like before so part. One of a p-value is the probability of 5 heads equals 1 over 32 part 2 of the p-value is the probability of 5 tails. It’s also equal to 1 over 32 since there is nothing rarer. This is what we get 0.0625 for. All you statisticians out there, you’ll notice that this p-value is not less than point zero five, which is the traditional threshold for significance. Even though the probability of getting five heads in a row is smaller than point zero five. The p-value isn’t so we wouldn’t think it’s all that unusual to get five heads in a row. What about getting four heads and one tail While calculating the probability is easy? We already know how to do that. We just count up the number of outcomes of interest and put that number in this case, five in the numerator and then count up the total number of outcomes and put that in the denominator. The result is 0.15 Something something something. What’s the p-value by? Now you’ve probably got this memorized. A p-value is the probability that random chance generated the data and in this case, the data is four heads and one tails or something else that is equal or rarer. In this case, there are two outcomes that are rarer than getting four heads and one tails, so we start with the probability of the outcome of interest and we add to that the probability of an equivalent outcome, which is one heads and four tails, and then we add the probability of the two rare outcomes plugging in the numbers. We get a p-value equal to 0.375 okay. Enough about coin tosses. What about measuring height with coin tosses? It’s easy to list all the possible outcomes, but what if we want to calculate probabilities for? Heights, do we list every value for height if so, how many decimal places should we use? The good news is that we don’t have to list all the possible values or worry about how many decimal places to use instead. We use something called a density for these examples. I’ve got height. Measurements of Brazilian women between the ages of 15 and 49 and these people were measured in 1996 the area under the curve indicates the probability that a person will have a height Within a range of possible values, 95% of the area under the curve is between 142 centimeters and 169 centimeters, indicating that most Brazilian women are between these two values. In other words, there is a 95% probability that each time you measure someone. Their height will be between 142 and 169 centimeters, 2.5 percent of the total area under the curve is greater than 169 centimeters. In other words, There’s a 2.5 percent probability that each time we measure a Brazilian woman. Their height will be greater than 169 centimeters. Also, 2.5 percent of the total area under the curve is less than 142 centimeters. In other words, There’s a 2.5 percent probability that each time you measure a Brazilian woman. Their height will be less than 142 centimeters to calculate p-values you add up the percentages of areas under the curve, For example, the p-value for someone who is 142 Centimeters tall is the 2.5 percent of the area for people that are 142 centimeters or shorter. This accounts for the first half of the equal or rarer part of calculating a p-value and we add to that the 2.5% of the area for people that are 169 centimeters or taller, this accounts for the other half of the equal or rarer part of calculating a p-value. Thus, the p-value is 0.05 or 5% Bam, alright, one, last example, What’s the p-value for someone who’s between one hundred and fifty five point, four centimeters, tall and 156 centimeters, tall note, the probability of someone being between one hundred and fifty five point Four in one hundred and fifty six centimeters is only 0.04 or four percent. The red area under the curve is pretty small, it’s barely aligned note. This is the first part of calculating a p-value. It is the probability of the event of interest. The next part is to calculate the probability of rarer events happening. We’ll do this in two steps. First, we see that 48% of the area is for people shorter than 150 5.4 centimeters second. We see that 48 percent of the area is for people taller than 156 centimeters. Now we can put these three things together. We’ve got four percent of the people between 155 point, four and 156 centimeters, tall, 48 percent of the areas for people shorter than that and 48 percent of the areas for people taller than that that makes the p-value equal to one. That means there’s nothing all that special about measuring someone who has the average height. Even though that event, in particular is fairly rare, in this example, the probability of measuring someone between 155 point, four and 156 centimeters tall is tiny 0.04 But the p-value is huge. One like I just said that means that there’s nothing special about measuring someone who has the average height. Even though that particular event is relatively rare. Double bam gray! We made it to the end tune in next time when we talk about the multiple testing problem and false discovery rates. It’s going to be a crazy quest.