\[SSTotal = \sum (y - \bar{y})^2\]
\[SSTotal = \sum (y - \bar{y})^2\]
\[SSTotal = \sum (y - \bar{y})^2\]
\[SSE = \sum (y - \hat{y})^2\]
\[SSE = \sum (y - \hat{y})^2\]
\[SSE = \sum (y - \hat{y})^2\]
\[SSE = \sum (y - \hat{y})^2\]
\[SSModel = \sum (\hat{y}-\bar{y})^2\]
\[SSModel = \sum (\hat{y}-\bar{y})^2\]
What will this be?
What will this be?
data |>
summarise(
sstotal = sum((frequency_score - mean(frequency_score))^2),
ssmodel = sum((fitted(mod) - mean(frequency_score))^2),
sse = sum(residuals(mod)^2),
ssmodel + sse,
sstotal - ssmodel
)
# A tibble: 1 × 5
sstotal ssmodel sse `ssmodel + sse` `sstotal - ssmodel`
<dbl> <dbl> <dbl> <dbl> <dbl>
1 46372. 640 45732. 46372. 45732.
What will this be?
data |>
summarise(
sstotal = sum((frequency_score - mean(frequency_score))^2),
ssmodel = sum((fitted(mod) - mean(frequency_score))^2),
sse = sum(residuals(mod)^2),
ssmodel + sse,
sstotal - ssmodel,
sstotal - sse
)
# A tibble: 1 × 6
sstotal ssmodel sse `ssmodel + sse` `sstotal - ssmodel` `sstotal - sse`
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 46372. 640 45732. 46372. 45732. 640.
\[SSTotal = \sum_{i=1}^n (y - \bar{y})^2\]
How many observations?
\[SSTotal = \sum_{i=1}^{\require{color}\colorbox{#86a293}{$n$}} (y - \bar{y})^2\]
How many observations?
\[SSTotal = \sum_{i=1}^{n} (y - \bar{y})^2\]
How many things are “estimated”?
\[SSTotal = \sum_{i=1}^{n} (y - \require{color}\colorbox{#86a293}{$\bar{y}$})^2\]
How many things are “estimated”?
\[SSTotal = \sum_{i=1}^{n} (y - \bar{y})^2\]
How many degrees of freedom?
\[SSTotal = \sum_{i=1}^{n} (y - \bar{y})^2\]
\[\Large df_{SSTOTAL}=n-1\]
\[SSE = \sum_{i=1}^{n} (y - \hat{y})^2\]
How many observations?
\[SSE = \sum_{i=1}^{\require{color}\colorbox{#86a293}{$n$}} (y - \hat{y})^2\]
How many observations?
\[SSE = \sum_{i=1}^{n} (y - \hat{y})^2\]
How is \(\hat{y}\) estimated with simple linear regression?
\[SSE = \sum_{i=1}^{n} (y - (\hat{\beta}_0+\hat{\beta_1}x))^2\]
How is \(\hat{y}\) estimated with simple linear regression?
\[SSE = \sum_{i=1}^{n} (y - (\hat{\beta}_0+\hat{\beta_1}x))^2\]
How many things are “estimated”?
\[SSE = \sum_{i=1}^{n} (y - (\require{color}\colorbox{#86a293}{$\hat{\beta}_0$}+\colorbox{#86a293}{$\hat{\beta}_1$}x))^2\]
How many things are “estimated”?
\[SSE = \sum_{i=1}^{n} (y - (\hat{\beta}_0+\hat{\beta_1}x))^2\]
How many degrees of freedom?
\[SSE = \sum_{i=1}^{n} (y - (\hat{\beta}_0+\hat{\beta_1}x))^2\]
\[\Large df_{SSE} = n - 2\]
\[SSTotal = SSModel + SSE\]
\[df_{SSTotal} = df_{SSModel} + df_{SSE} \]
\[n - 1 = df_{SSModel} + (n - 2)\]
Application Exercise
How many degrees of freedom does SSModel have?
\[n - 1 = df_{SSModel} + (n - 2)\]
01:00
\[MSE = \frac{SSE}{n - 2}\]
\[MSModel = \frac{SSModel}{1}\]
What is the pattern?
\[\Large F = \frac{MSModel}{MSE}\]
Under the null hypothesis
We can see all of these statistics by using the anova
function on the output of lm
Analysis of Variance Table
Response: frequency_score
Df Sum Sq Mean Sq F value Pr(>F)
group 1 640 640.0 0.5318 0.4703
Residuals 38 45732 1203.5
What is the SSModel?
We can see all of these statistics by using the anova
function on the output of lm
Analysis of Variance Table
Response: frequency_score
Df Sum Sq Mean Sq F value Pr(>F)
group 1 640 640.0 0.5318 0.4703
Residuals 38 45732 1203.5
What is the MSModel?
We can see all of these statistics by using the anova
function on the output of lm
Analysis of Variance Table
Response: frequency_score
Df Sum Sq Mean Sq F value Pr(>F)
group 1 640 640.0 0.5318 0.4703
Residuals 38 45732 1203.5
What is the SSE?
We can see all of these statistics by using the anova
function on the output of lm
Analysis of Variance Table
Response: frequency_score
Df Sum Sq Mean Sq F value Pr(>F)
group 1 640 640.0 0.5318 0.4703
Residuals 38 45732 1203.5
What is the MSE?
We can see all of these statistics by using the anova
function on the output of lm
Analysis of Variance Table
Response: frequency_score
Df Sum Sq Mean Sq F value Pr(>F)
group 1 640 640.0 0.5318 0.4703
Residuals 38 45732 1203.5
What is the SSTotal?
We can see all of these statistics by using the anova
function on the output of lm
Analysis of Variance Table
Response: frequency_score
Df Sum Sq Mean Sq F value Pr(>F)
group 1 640 640.0 0.5318 0.4703
Residuals 38 45732 1203.5
What is the F statistic?
We can see all of these statistics by using the anova
function on the output of lm
Analysis of Variance Table
Response: frequency_score
Df Sum Sq Mean Sq F value Pr(>F)
group 1 640 640.0 0.5318 0.4703
Residuals 38 45732 1203.5
Is the F-statistic statistically significant?
The probability of getting a statistic as extreme or more extreme than the observed test statistic given the null hypothesis is true
Under the null hypothesis
To calculate the p-value under the t-distribution we use pt()
. What do you think we use to calculate the p-value under the F-distribution?
Analysis of Variance Table
Response: frequency_score
Df Sum Sq Mean Sq F value Pr(>F)
group 1 640 640.0 0.5318 0.4703
Residuals 38 45732 1203.5
pf()
q
, df1
, and df2
. What do you think we would plug in for q
?df2
df1
To calculate the p-value under the t-distribution we use pt()
. What do you think we use to calculate the p-value under the F-distribution?
Why don’t we multiply this p-value by 2 when we use pf()
?
Under the null hypothesis
Under the null hypothesis
Under the null hypothesis
Analysis of Variance Table
Response: frequency_score
Df Sum Sq Mean Sq F value Pr(>F)
group 1 640 640.0 0.5318 0.4703
Residuals 38 45732 1203.5
Call:
lm(formula = frequency_score ~ group, data = data)
Residuals:
Min 1Q Median 3Q Max
-38.80 -22.55 -11.80 30.20 95.20
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 39.800 7.757 5.131 8.82e-06 ***
groupsquare -8.000 10.970 -0.729 0.47
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 34.69 on 38 degrees of freedom
Multiple R-squared: 0.0138, Adjusted R-squared: -0.01215
F-statistic: 0.5318 on 1 and 38 DF, p-value: 0.4703
Analysis of Variance Table
Response: frequency_score
Df Sum Sq Mean Sq F value Pr(>F)
group 1 640 640.0 0.5318 0.4703
Residuals 38 45732 1203.5
Analysis of Variance Table
Response: frequency_score
Df Sum Sq Mean Sq F value Pr(>F)
group 1 640 640.0 0.5318 0.4703
Residuals 38 45732 1203.5
Call:
lm(formula = frequency_score ~ group, data = data)
Residuals:
Min 1Q Median 3Q Max
-38.80 -22.55 -11.80 30.20 95.20
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 39.800 7.757 5.131 8.82e-06 ***
groupsquare -8.000 10.970 -0.729 0.47
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 34.69 on 38 degrees of freedom
Multiple R-squared: 0.0138, Adjusted R-squared: -0.01215
F-statistic: 0.5318 on 1 and 38 DF, p-value: 0.4703
Application Exercise
appex-06.qmd
06:00