Difference in proportions

Difference in proportions

◎◉○

Live simulation

Penguin sex ratios across species

For this example, we want to know if the proportion of female penguins is the same in Adelie and Gentoo species. Here’s what the sex breakdown looks like:

We can look at this more officially. First, we’ll load some packages:

library(tidyverse)
library(infer)
library(parameters)

penguins <- penguins |> drop_na(sex)

The proportions look pretty similar across the two species:

penguins |>
  filter(species %in% c("Adelie", "Gentoo")) |>
  count(species, sex) |>
  group_by(species) |>
  mutate(proportion = n / sum(n)) |>
  filter(sex == "female")
# A tibble: 2 × 4
# Groups:   species [2]
  species sex        n proportion
  <fct>   <fct>  <int>      <dbl>
1 Adelie  female    73      0.5  
2 Gentoo  female    58      0.487

Is there actually a difference, or is it just noise? We need to do some hypothesis testing.

Null hypothesis inference with {infer}

Null hypothesis inference with prop.test()

In practice, most people do not simulate null worlds. Instead, they use a proportion test (prop.test()), which approximates the null world mathematically using a χ² distribution. The intuition is the same: a p-value is still the probability of seeing a difference at least that extreme in a world where the proportions are equal.

tab <- penguins |>
  filter(species %in% c("Adelie", "Gentoo")) |>
  mutate(species = fct_drop(species)) |>
  count(species, sex) |>
  pivot_wider(names_from = sex, values_from = n) |>
  column_to_rownames("species") |>
  as.matrix()

prop.test(tab)

    2-sample test for equality of proportions with continuity correction

data:  tab
X-squared = 0.0065013, df = 1, p-value = 0.9357
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.1160296  0.1412397
sample estimates:
  prop 1   prop 2 
0.500000 0.487395 

Buried in that output is the p-value: 0.936. That’s huge. In a world where the two species have the same sex ratios, there’s a 93.57% probability of seeing a difference in proportions of 1.3 percentage points. We don’t have enough evidence to declare that there’s a difference between the two species. That doesn’t necessarily mean that there’s no difference. It means that if there really were a difference, we wouldn’t be able to detect it.

If you don’t like all that text output, you can feed the results of prop.test() to the model_parameters() function from the {parameters} package:

prop.test(tab) |>
  model_parameters() |>
  display(caption = "")
Proportion Difference 95% CI Chi2(1) p
Alternative hypothesis: two.sided
50.00% / 48.74% 1.26% (-0.12, 0.14) 6.50e-03 0.936

Footnotes

  1. Kind of—in common law systems, defendants are presumed innocent until proven guilty, so if there’s not enough evidence to prove guilt, they are innocent by definition. ↩︎