It is critical to keep in mind that, in this simple example, the P value depends on three quantities: the difference between the means (the bigger the difference, the smaller the P value), the SD of the individual observations (the smaller the SD, the smaller the P value), and the sample size (the bigger the sample size, the smaller the P value). In short, big differences are “more significant” (ie, a smaller P value) than little differences (and “more significant” is placed in quotation marks for a very good reason, stay tuned), and small differences arising from large samples are “more significant” than small differences from small samples. The P value, then, is a way to separate real effects from effects due to random fluctuations in the data and sampling error. As such, that’s not a bad thing. After all, it is the nature of the world that when you divide people into two groups, using a coin flip or any other strategy, the groups will never come out exactly the same on any measure (height, weight, BMI, or anything else), so without some help from our statistician friends, we would be unable to tell differences arising from random variation from “real” differences.