Model Comparisons

Lucy D’Agostino McGowan

Comparing models

Goal: Trying to predict the weight of fish based on their length and width

data("Perch")
model1 <- lm(
  Weight ~ Length + Width + Length * Width,
  data = Perch
  )
model2 <- lm(
  Weight ~ Length + Width + Length * Width + I(Length ^ 2) + I(Width ^ 2),
  data = Perch
  )

What is the equation for model1?
What is the equation for model2?

🛠 \(R^2\) for Multiple Linear Regression

\(R^2= \frac{SSModel}{SSTotal}\)
\(R^2= 1 - \frac{SSE}{SSTotal}\)
As is, if you add a predictor this will always increase. Therefore, we have \(R^2_{adj}\) that has a small “penalty” for adding more predictors
\(R^2_{adj} = 1 - \frac{SSE/(n-k-1)}{SSTotal / (n-1)}\)
\(\frac{SSTotal}{n-1} = \frac{\sum(y - \bar{y})^2}{n-1}\) What is this?
- Sample variance! \(S_Y^2\)
\(R^2_{adj} = 1 - \frac{\hat\sigma^2_\epsilon}{S_Y^2}\)

🛠 \(R^2_{adj}\) for Multiple Linear Regression

\(R^2_{adj} = 1 - \frac{SSE/(n-k-1)}{SSTotal / (n-1)}\)
The denominator stays the same for all models fit to the same response variable and data
the numerator actually increase when a new predictor is added to a model if the decrease in the SSE is not sufficient to offset the decrease in the error degrees of freedom.
So \(R^2_{adj}\) can 👇 when a weak predictor is added to a model

🛠 \(R^2_{adj}\) for Multiple Linear Regression

summary(model1)


Call:
lm(formula = Weight ~ Length + Width + Length * Width, data = Perch)

Residuals:
    Min      1Q  Median      3Q     Max 
-140.11  -12.23    1.23    8.49  181.41 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)   113.935     58.784    1.94    0.058 .  
Length         -3.483      3.152   -1.10    0.274    
Width         -94.631     22.295   -4.24  9.1e-05 ***
Length:Width    5.241      0.413   12.69  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 44.2 on 52 degrees of freedom
Multiple R-squared:  0.985, Adjusted R-squared:  0.984 
F-statistic: 1.11e+03 on 3 and 52 DF,  p-value: <2e-16

summary(model2)


Call:
lm(formula = Weight ~ Length + Width + Length * Width + I(Length^2) + 
    I(Width^2), data = Perch)

Residuals:
    Min      1Q  Median      3Q     Max 
-117.17  -11.90    2.82   11.56  157.60 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept)   156.349     61.415    2.55    0.014 *
Length        -25.001     14.273   -1.75    0.086 .
Width          20.977     82.588    0.25    0.801  
I(Length^2)     1.572      0.724    2.17    0.035 *
I(Width^2)     34.406     18.745    1.84    0.072 .
Length:Width   -9.776      7.145   -1.37    0.177  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 43.1 on 50 degrees of freedom
Multiple R-squared:  0.986, Adjusted R-squared:  0.985 
F-statistic:  705 on 5 and 50 DF,  p-value: <2e-16

Model Comparision criteria

We are looking for reasonable ways to balance “goodness of fit” (how well the model fits the data) with “parsimony”
\(R^2_{adj}\) gets at this by adding a penalty for adding variables
AIC and BIC are two more methods that balance goodness of fit and parsimony

Log Likelihood

Both AIC and BIC are calculated using the log likelihood
\(\log(\mathcal{L}) = -\frac{n}{2}[\log(2\pi) + \log(SSE/n) + 1]\)
- \(\log = \log_e\), log() in R

logLik(model1)

'log Lik.' -290 (df=5)

-56 / 2 * (log(2 * pi) + log(101765 / 56) + 1)

[1] -290

“goodness of fit” measure
higher log likelihood is better

Log Likelihood

What I want you to remember

Both AIC and BIC are calculated using the log likelihood

\[\log(\mathcal{L}) = -\frac{n}{2}[\log(SSE/n) ]+\textrm{some constant}\]

\(\log = \log_e\), log() in R
“goodness of fit” measure
higher log likelihood is better

AIC

Akaike’s Information Criterion
\(AIC = 2(k+1) - 2\log(\mathcal{L})\)
\(k\) is the number of predictors in the model
lower AIC values are better

AIC(model1)

[1] 589

AIC(model2)

[1] 588

BIC

Bayesian Information Criterion
\(BIC = \log(n)(k+1) - 2\log(\mathcal{L})\)
\(k\) is the number of predictors in the model
lower BIC values are better

BIC(model1)

[1] 599

BIC(model2)

[1] 602

AIC and BIC can disagree!

AIC(model1)

[1] 589

AIC(model2)

[1] 588

BIC(model1)

[1] 599

BIC(model2)

[1] 602

the penalty term is larger in BIC than in AIC
What to do? Both are valid, pre-specify which you are going to use before running your models in the methods section of your analysis

🛠 toolkit for comparing models

👉 \(\Large R^2\)

👉 \(AIC\)

👉 \(BIC\)

`Application Exercise`

Copy the following template into RStudio Pro:

https://github.com/sta-112-s24/appex-13.git

Fit a model predicting GPA using high school gpa (HSGPA) and verbal SAT score (SATV). Save this as model1
Fit a model predicting GPA using high school gpa, verbal SAT score (SATV), and math SAT score (SATM). Save this as model2.
Fit a model predicting GPA using high school gpa, verbal SAT score (SATV), math SAT score (SATM), and number of humanities credits taken in high school (HU). Save this as model3.
Choose AIC or BIC to compare models 1, 2, and 3. Rank the models.

08:00