Regression slope – Null worlds

viewof sample_size = Inputs.range([30, 2000], {
  step: 10,
  value: 100,
  label: "Sample size:"
})

viewof slope_size = Inputs.range([-5, 5], {
  step: 0.1,
  value: 0.8,
  label: "Slope:"
})

The sample statistic (δ) is the regression slope—the estimated change in y for a one-unit increase in x.

We create a null distribution by shuffling (or “permuting” to use the official stats term) the values of x. This simulates a world where all the real, measured values of both x and y are still the same, but where the relationship between x and y doesn’t matter. This eliminates any association between x and y.

Think of this as being a world where there is no relationship between x and y. Importantly, this doesn’t mean that the slope is exactly 0. There is variation in the data, and that variation is reflected in the null world. What it means is that in the null world, the slope is 0 ± some amount.

Here’s what this null world looks like:

viewof n_reps = Inputs.range([100, 2000], {
  step: 100,
  value: 500,
  label: "Number of simulations:"
})

Here’s another way to see the null world slopes. Each thin line is a regression line for one of the simulated worlds where where x and y are unrelated.

Next we put δ inside that null world and see how comfortably it fits there.

Is it surprising to see the red line in this null world? Is the line way out to one of the sides, or is it near the middle with the rest of the null world?

Or alternatively, we can put the observed regression line from the actual data into the scatterplot with the null world regression lines. Is it surprising to see the red line in this null world?

We can actually quantify the probability of seeing that red line in a null world. This is a p-value—the probability of seeing a δ at least that big in a world where there’s no relationship between x and y.

Finally, we have to decide if the p-value meets an evidentiary standard or threshold that would provide us with enough evidence that we aren’t in the null world (or, in more statsy terms, enough evidence to reject the null hypothesis).

There are lots of possible thresholds. By convention, most people use a threshold (often shortened to α) of 0.05, or 5%. But that’s not required! You could have a lower standard with an α of 0.1 (10%), or a higher standard with an α of 0.01 (1%).

viewof alpha = Inputs.select([0.10, 0.05, 0.01], {
  label: "Significance threshold (α):",
  value: 0.05
})

Evidentiary standards

When thinking about p-values and thresholds, I like to imagine myself as a judge or a member of a jury. Many legal systems around the world have formal evidentiary thresholds or standards of proof. If prosecutors provide evidence that meets a threshold (i.e. goes beyond a reasonable doubt, or shows evidence on a balance of probabilities), the judge or jury can rule guilty. If there’s not enough evidence to clear the standard or threshold, the judge or jury has to rule not guilty.

With p-values:

If the probability of seeing an effect or difference (or δ) in a null world is less than 5% (or whatever the threshold is), we rule it statistically significant and say that the difference does not fit in that world. We’re pretty confident that it’s not zero.
If the p-value is larger than the threshold, we do not have enough evidence to claim that δ doesn’t come from a world of where there’s no difference. We don’t know if it’s not zero.

Importantly, if the difference is not significant, that does not mean that there is no difference. It just means that we can’t detect one if there is. If a prosecutor doesn’t provide sufficient evidence to clear a standard or threshold, it does not mean that the defendant didn’t do whatever they’re charged with*—it means that the judge or jury can’t detect guilt.

* Kind of—in common law systems, defendants are presumed innocent until proven guilty, so if there’s not enough evidence to prove guilt, they are innocent by definition.

Different evidentiary standards

Many legal systems have different levels of evidentiary standards:

Standards of proof in common law systems (juries):
- Balance of probabilities (civil cases)
- Beyond a reasonable doubt (criminal cases)
Evidentiary thresholds in the United States (juries):
- Preponderance of the evidence (civil cases)
- Clear and convincing evidence (more important civil cases)
- Beyond a reasonable doubt (criminal cases)
Levels of doubt in Sharia systems (judges):
- غلبة الظن [ghalabat al-zann] / preponderance of assumption (ta’zir cases and family matters)
- اليقين [yaqin] / certainty (hudud/qisas cases)
Standard of proof in the International Criminal Court (judges):
- Beyond reasonable doubt (genocide, crimes against humanity, or war crimes)