Download complete chapters as pdf files.

  • Chapter 1: Statistics and probability are not intuitive 
  • Chapter 19: Interpreting a result that is not statistically significant  
  • Chapter 22: Multiple comparisons concepts 


Short extracts

Statistics means being uncertain (chapter 3, page 19)

The whole idea of statistics is to make general conclusions from limited amounts of data. All that statistical calculations can do is quantify probabilities, so every conclusion must include words like “probably,” “most likely,” or “almost certainly.” Be wary if you ever encounter statistical conclusions that seem 100% definitive. The analysis, or your understanding of it, is probably wrong. Be especially wary of the conclusion that a result is statistically significant, because that phrase is often misunderstood.

Q and A about confidence intervals (chapter 4, pages 35-36)

Q. What’s the difference between a 95% CI and a 99% CI?

A. To be more certain that an interval contains the true population value, you must generate a wider interval. A 99% CI is wider than a 95%CI. See Figure 4.2.

Q. Is it possible to generate a 100% CI?

A. A 100% CI would have to include every possible value, so it would extend from 0.0 to 100.0%. That is always the same, regardless of the data, so it isn’t at all useful.

Q. How do CIs change if you increase the sample size?

A. The width of the CI is approximately proportional to the reciprocal of the square root of the sample size. So, if you increase the sample size by a factor of 4, you can expect to cut the length of the CI in half. Figure 4.3 illustrates how the CI gets narrower if the sample size gets larger.

Q. Why isn’t the CI symmetrical around the observed proportion?

A. Because a proportion cannot go below 0.0 or above 1.0, the CI will be lopsided when the sample proportion is far from 0.50 or the sample size is small. See Figure 4.4.

A misconception about P values (chapter 18, page 136)

Many scientists and students misunderstand the definition of statistical significance (and P values).

Table 18.1 shows the results of many hypothetical statistical analyses, each analyzed to reach a decision to reject or not reject the null hypothesis. The top row tabulates results for experiments where the null hypothesis is really true.

The second row tabulates experiments where the null hypothesis is not true. This kind of table is only useful to understand statistical theory. When you analyze data, you don’t know whether the null hypothesis is true, so you could never create this table from an actual series of experiments. Table 18.2 reviews the definitions of Type I and Type II errors.

The significance level (usually set to 5%) is defined to equal the ratio A/(A + B). The significance level is the answer to these two equivalent questions:

  •  If the null hypothesis is true, what is the probability of incorrectly rejecting that null hypothesis?
  • Of all experiments you could conduct when the null hypothesis is true, in what fraction will you reach a conclusion that the results are statistically significant?

Many people mistakenly think that the significance level is the ratio A/(A + C). This ratio, called the false discovery rate (FDR), is quite different. The FDR, which we’ll return to in Chapter 22, answers these two equivalent questions:

  • If a result is statistically significant, what is the probability that the null hypothesis is really true?
  • Of all experiments that reach a statistically significant conclusion, in what fraction is the null hypothesis true?


An analogy to understand power (chapter 20, pages 147-148) 

This analogy helps illustrate the concept of statistical power (Hartung, 2005).

You send your child into the basement to find a tool. He comes back and says, “It isn’t there.” What do you conclude? Is the tool there or not? There is no way to be sure, so the answer must be a probability. The question you really want to answer is, “What is the probability that the tool is in the basement?” But that question can’t really be answered without knowing the prior probability and using Bayesian thinking (see Chapter 18). Instead, let’s ask a different question: “If the tool really is in the basement, what is the chance your child would have found it?” The answer, of course, is “it depends.” To estimate the probability, you’d want to know three things:

  • How long did he spend looking? If he looked for  a long time, he is more likely to have found the tool. This is analogous to sample size. An experiment with a large sample size has high power to find an effect.
  • How big is the tool? It is easier to find a snow shovel than the tiny screw driver used to fix eyeglasses. This is analogous to the size of the effect you are looking for. An experiment has more power to find a big effect than a small one.
  • How messy is the basement? If the basement is a real mess, he was less likely to find the tool than if it is carefully organized. This is analogous to experimental scatter. An experiment has more power when the data are very tight (little variation).

If the child spent a long time looking for a large tool in an organized basement, there is a high chance that he would have found the tool if it were there. So you can be quite confident of his conclusion that the tool isn’t there. Similarly, an experiment has high power when you have a large sample size, are looking for a large effect, and have data with little scatter (small standard deviation). In this situation, there is a high chance that you would have obtained a statistically significant effect if the effect existed.

If the child spent a short time looking for a small tool in a messy basement, his conclusion that “the tool isn’t there” doesn’t really mean very much. Even if the tool were there, he probably would have not found it. Similarly, an experiment has little power when you use a small sample size, are looking for a small effect, and the data have lots of scatter. In this situation, there is a high chance of obtaining a conclusion of “statistically significant even if the effect exists.