Model Comparisons

Lucy D’Agostino McGowan

Comparing models

  • Goal: Trying to predict the weight of fish based on their length and width
data("Perch")
model1 <- lm(
  Weight ~ Length + Width + Length * Width,
  data = Perch
  )
model2 <- lm(
  Weight ~ Length + Width + Length * Width + I(Length ^ 2) + I(Width ^ 2),
  data = Perch
  )
  • What is the equation for model1?
  • What is the equation for model2?

🛠 \(R^2\) for Multiple Linear Regression

  • \(R^2= \frac{SSModel}{SSTotal}\)
  • \(R^2= 1 - \frac{SSE}{SSTotal}\)
  • As is, if you add a predictor this will always increase. Therefore, we have \(R^2_{adj}\) that has a small “penalty” for adding more predictors
  • \(R^2_{adj} = 1 - \frac{SSE/(n-k-1)}{SSTotal / (n-1)}\)
  • \(\frac{SSTotal}{n-1} = \frac{\sum(y - \bar{y})^2}{n-1}\) What is this?
    • Sample variance! \(S_Y^2\)
  • \(R^2_{adj} = 1 - \frac{\hat\sigma^2_\epsilon}{S_Y^2}\)

🛠 \(R^2_{adj}\) for Multiple Linear Regression

  • \(R^2_{adj} = 1 - \frac{SSE/(n-k-1)}{SSTotal / (n-1)}\)
  • The denominator stays the same for all models fit to the same response variable and data
  • the numerator actually increase when a new predictor is added to a model if the decrease in the SSE is not sufficient to offset the decrease in the error degrees of freedom.
  • So \(R^2_{adj}\) can 👇 when a weak predictor is added to a model

🛠 \(R^2_{adj}\) for Multiple Linear Regression

summary(model1)

Call:
lm(formula = Weight ~ Length + Width + Length * Width, data = Perch)

Residuals:
    Min      1Q  Median      3Q     Max 
-140.11  -12.23    1.23    8.49  181.41 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)   113.935     58.784    1.94    0.058 .  
Length         -3.483      3.152   -1.10    0.274    
Width         -94.631     22.295   -4.24  9.1e-05 ***
Length:Width    5.241      0.413   12.69  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 44.2 on 52 degrees of freedom
Multiple R-squared:  0.985, Adjusted R-squared:  0.984 
F-statistic: 1.11e+03 on 3 and 52 DF,  p-value: <2e-16
summary(model2)

Call:
lm(formula = Weight ~ Length + Width + Length * Width + I(Length^2) + 
    I(Width^2), data = Perch)

Residuals:
    Min      1Q  Median      3Q     Max 
-117.17  -11.90    2.82   11.56  157.60 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept)   156.349     61.415    2.55    0.014 *
Length        -25.001     14.273   -1.75    0.086 .
Width          20.977     82.588    0.25    0.801  
I(Length^2)     1.572      0.724    2.17    0.035 *
I(Width^2)     34.406     18.745    1.84    0.072 .
Length:Width   -9.776      7.145   -1.37    0.177  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 43.1 on 50 degrees of freedom
Multiple R-squared:  0.986, Adjusted R-squared:  0.985 
F-statistic:  705 on 5 and 50 DF,  p-value: <2e-16

Model Comparision criteria

  • We are looking for reasonable ways to balance “goodness of fit” (how well the model fits the data) with “parsimony”
  • \(R^2_{adj}\) gets at this by adding a penalty for adding variables
  • AIC and BIC are two more methods that balance goodness of fit and parsimony

Log Likelihood

  • Both AIC and BIC are calculated using the log likelihood
  • \(\log(\mathcal{L}) = -\frac{n}{2}[\log(2\pi) + \log(SSE/n) + 1]\)
    • \(\log = \log_e\), log() in R
logLik(model1)
'log Lik.' -290 (df=5)
-56 / 2 * (log(2 * pi) + log(101765 / 56) + 1)
[1] -290
  • “goodness of fit” measure
  • higher log likelihood is better

Log Likelihood

What I want you to remember

  • Both AIC and BIC are calculated using the log likelihood

\[\log(\mathcal{L}) = -\frac{n}{2}[\log(SSE/n) ]+\textrm{some constant}\]

  • \(\log = \log_e\), log() in R
  • “goodness of fit” measure
  • higher log likelihood is better

AIC

  • Akaike’s Information Criterion
  • \(AIC = 2(k+1) - 2\log(\mathcal{L})\)
  • \(k\) is the number of predictors in the model
  • lower AIC values are better
AIC(model1)
[1] 589
AIC(model2)
[1] 588

BIC

  • Bayesian Information Criterion
  • \(BIC = \log(n)(k+1) - 2\log(\mathcal{L})\)
  • \(k\) is the number of predictors in the model
  • lower BIC values are better
BIC(model1)
[1] 599
BIC(model2)
[1] 602

AIC and BIC can disagree!

AIC(model1)
[1] 589
AIC(model2)
[1] 588
BIC(model1)
[1] 599
BIC(model2)
[1] 602
  • the penalty term is larger in BIC than in AIC
  • What to do? Both are valid, pre-specify which you are going to use before running your models in the methods section of your analysis

🛠 toolkit for comparing models

👉 \(\Large R^2\)

👉 \(AIC\)

👉 \(BIC\)

Application Exercise

  1. Copy the following template into RStudio Pro:
https://github.com/sta-112-s24/appex-13.git
  1. Fit a model predicting GPA using high school gpa (HSGPA) and verbal SAT score (SATV). Save this as model1

  2. Fit a model predicting GPA using high school gpa, verbal SAT score (SATV), and math SAT score (SATM). Save this as model2.

  3. Fit a model predicting GPA using high school gpa, verbal SAT score (SATV), math SAT score (SATM), and number of humanities credits taken in high school (HU). Save this as model3.

  4. Choose AIC or BIC to compare models 1, 2, and 3. Rank the models.

08:00