Drawing Inference

Lucy D’Agostino McGowan

One continuous variable

How can we numerically summarize a single continuous variable?

starwars |>
  summarise(mean = mean(height, na.rm = TRUE))

Assumptions

When is the mean an appropriate summary measure to calculate?

What assumptions need to be true in order to use a mean to represent your single continuous variable?

What if we want to draw inference on another sample?

Inference

So far we’ve only been able to describe our sample
For example, we’ve just been describing \(\bar{y}\) the estimated mean of a variable or \(\hat{\beta}_0\) the estimated intercept of \(y\)
What if we want to extend these claims to the population?

`Application Exercise`

Today we are going to work with letter beads.

We have two sets of letter beads, which is a better deal in terms of common letter frequency?

Each set has over 1,000 beads, instead of counting them all, you will each select a sample of 20.

Bead data

How can I calculate the average frequency score in my sample?

\[ \Large\bar{y} =\sum_{i=1}^n \frac{y_i}{n} \]

Bead data

How can I calculate the average frequency score in my sample?

\[ \Large\bar{y} =\sum_{i=1}^{20} \frac{y_i}{20} \]

Bead data

What if I want to know the average frequency score for the whole population of beads?

How can we quantify how much we’d expect the mean to differ from one random sample to another?

We need a measure of uncertainty
How about the standard error of the mean?
The standard error is how much we expect the sample mean to vary from one random sample to another.

Standard Error

How can we quantify how much we’d expect the mean to differ from one random sample to another?

\[ \Large\frac{s}{\sqrt{n}}\] . . .

where \(s\) is the sample standard deviation of \(y\).

Standard Error

How can we quantify how much we’d expect the mean to differ from one random sample to another?

\[ \Large\frac{s}{\sqrt{20}}\]

where \(s\) is the sample standard deviation of \(y\).

Sample standard deviation

\[ \Large s = \sqrt{\frac{\sum_{i=1}^n (y_i - \bar{y})^2}{n-1}} \]

Sample standard deviation

\[ \Large s = \sqrt{\frac{\sum_{i=1}^n (y_i - \bar{y})^2}{20-1}} \]

confidence intervals

If we use the same sampling method to select different samples and computed an interval estimate for each sample, we would expect the true population parameter (the average frequency score) to fall within the interval estimates 95% of the time.

Confidence interval

\[\bar{y} \pm t^∗ \times SE_{\bar{y}}\]

\(t^*\) is the critical value for the \(t_{n−1}\) density curve to obtain the desired confidence level
Often we want a 95% confidence level.

Let’s do it in R!

qt(0.025, df = 20 - 1, lower.tail = FALSE)

Let’s do it in R!

Why 0.025?

qt(0.025, df = 20 - 1, lower.tail = FALSE)

Let’s do it in R!

Why lower.tail = FALSE?

qt(0.025, df = 20 - 1, lower.tail = FALSE)

qt(0.975, df = 20 - 1)

confidence intervals

If we use the same sampling method to select different samples and computed an interval estimate for each sample, we would expect the true population parameter (the mean) to fall within the interval estimates 95% of the time.

`Application Exercise`

Come draw 20 beads from each of the two groups
Using the frequency score table provided on the handout, write out the data set for each of your samples in the space provided
Calculate the mean, standard deviation, and confidence intervals “by hand” using the \(t^*\) value provided

Make sure to use the standard error in your confidence interval, this is the standard deviation divided \(\sqrt{20}\)

Fill out the form here with your values: bit.ly/sta-112-s24-beads

Example sheet

20:00