# A tibble: 1 × 1
mean
<dbl>
1 175.
How can we visualize a single continuous variable?
Histogram
Density
How can we numerically summarize a single continuous variable?
Why do we calculate a mean?
n
to 1
)Symmetric
Bimodal
Guess the mean for each of these variables.
Symmetric
Bimodal
Does this value represent a “typical” observation?
\[\Large\bar{y} =\sum_{i=1}^n \frac{y_i}{n}\]
\[\Large{\require{color}\colorbox{#86a293}{$\bar{y}$}} =\sum_{i=1}^n \frac{y_i}{n}\]
the mean of the variable \(y\)
\[\Large\bar{y} ={\require{color}\colorbox{#86a293}{$\sum$}}_{i=1}^n \frac{y_i}{n}\]
add up the observations
\[\Large\bar{y} =\sum_{{\require{color}\colorbox{#86a293}{$i=1$}}}^n \frac{y_i}{n}\]
from the first
\[\Large\bar{y} =\sum_{i=1}^{\require{color}\colorbox{#86a293}{$n$}} \frac{y_i}{{\require{color}\colorbox{#86a293}{$n$}}}\]
total number of observations
\[\Large\bar{y} =\sum_{i=1}^n \frac{{\require{color}\colorbox{#86a293}{$y_i$}}}{n}\]
continuous variable for observation i
\[\Large\bar{y} =\sum_{i=1}^n \frac{y_i}{\require{color}\colorbox{#86a293}{${n}$}}\]
divide by the total number of observations
Application Exercise
data | |
---|---|
\(y_1\) | 3 |
\(y_2\) | 5 |
\(y_3\) | 1 |
\(y_4\) | 7 |
\(y_5\) | 8 |
03:00
ggplot(d, aes(x = i, y = y)) +
geom_point() +
geom_texthline(yintercept = mean(d$y), label = "mean = 4.8") +
geom_segment(aes(y = y, yend = mean(y), x = i, xend = i), color = "blue") +
theme(axis.ticks.x = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor.x = element_blank())
\[\Large y = \beta_0 + \varepsilon\]
\[\Large {\require{color}\colorbox{#86a293}{$y$}} = \beta_0 + \varepsilon\]
This is the vector \(y=\{y_1,\dots,y_n\}\)
\[\Large y = {\require{color}\colorbox{#86a293}{$\beta_0$}} + \varepsilon\]
we call this the “intercept”, when there are no other variables, it is just the mean, \(\bar{y}\)
\[\Large y = \beta_0 + {\require{color}\colorbox{#86a293}{$\varepsilon$}}\]
the error
ggplot(d, aes(x = i, y = y)) +
geom_point() +
geom_texthline(yintercept = mean(d$y), lwd = 5, hjust = 0.1,
label = as.character(expression(beta[0])), parse = TRUE) +
geom_segment(aes(y = y, yend = mean(y), x = i, xend = i), color = "blue") +
theme(axis.ticks.x = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor.x = element_blank())
ggplot(d, aes(x = i, y = y)) +
geom_point() +
geom_texthline(yintercept = mean(d$y), lwd = 5, hjust = 0.1,
label = as.character(expression(beta[0])), parse = TRUE) +
geom_textsegment(aes(y = y, yend = mean(y), x = i, xend = i), color = "blue",
label = as.character(expression(epsilon)), parse = TRUE,
lwd = 5) +
theme(axis.ticks.x = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor.x = element_blank())
ggplot(d, aes(x = 1, y = y)) +
geom_point() +
geom_texthline(yintercept = mean(d$y), lwd = 5, hjust = 0.1,
label = as.character(expression(beta[0])), parse = TRUE) +
geom_segment(aes(y = y, yend = mean(y), x = 1, xend = 1), color = "blue") +
theme(axis.ticks.x = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor.x = element_blank())
ggplot(d, aes(x = 1, y = y)) +
geom_point() +
geom_texthline(yintercept = mean(d$y), lwd = 5, hjust = 0.1,
label = as.character(expression(beta[0])), parse = TRUE) +
geom_segment(aes(y = y, yend = mean(y), x = 1, xend = 1), color = "blue") +
theme(axis.ticks.x = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_blank())
lm
: linear modelApplication Exercise
Open your 04-appex.qmd
file. Load the packages by running the top R chunk of code.
What do you think this code does? Try typing ?tibble
in the Console - what does this function do?
mean
of y
. Do this two ways, using the summarize
function and using the lm
function.error
to the data set d
that is equal to y
minus the mean of y.05:00
When is the mean
an appropriate summary measure to calculate?
What assumptions need to be true in order to use a mean to represent your single continuous variable?