Nested F-tests for groups of variables

Lucy D’Agostino McGowan

🛠 F-test for Multiple Linear Regression

  • Comparing the full model to the intercept only model \(H_0: \beta_1 = \beta_2 = \dots = \beta_k = 0\) \(H_A: \textrm{at least one } \beta_i \neq 0\)

🛠 F-test for Multiple Linear Regression

  • \(\Large F = \frac{MSModel}{MSE}\)
  • df for the Model?
    • k (k: number of predictors in model)
    • p - 1 (p: number of paramters in model)
  • df for the errors?
    • n - k - 1
    • n - p

🛠 Nested F-test for Multiple Linear Regression

  • What does “nested” mean?
    • You have a “small” model and a “large” model where the “small” model is completely contained in the “large” model
  • The F-test we have learned so far is one example of this, comparing:
    • \(y = \beta_0 + \epsilon\) (small)
    • \(y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \dots +\beta_kx_k + \epsilon\) (large)
  • The full (large) model has \(p_{Full}\) parameters, the reduced (small) model has \(p_{Reduced}\) parameters

🛠 Nested F-test for Multiple Linear Regression

  • The full (large) model has \(p_{Full}\) parameters, the reduced (small) model has \(p_{Reduced}\) parameters
  • What is \(H_0\)?
    • \(H_0:\) \(\beta_i=0\) for all \(p_{Full}-p_{Reduced}\) predictors being dropped from the full model
  • What is \(H_A\)?
    • \(H_A:\) \(\beta_i\neq 0\) for at least one of the \(p_{Full}-p_{Reduced}\) predictors dropped from the full model
  • Does the full model do a (statistically significant) better job of explaining the variability in the response than the reduced model?

🛠 Nested F-test for Multiple Linear Regression

  • The full (large) model has \(p_{Full}\) parameters, the reduced (small) model has \(p_{Reduced}\) parameters
  • \(F = \frac{(SSMODEL_{Full} - SSMODEL_{Reduced})/(p_{Full}-p_{Reduced})}{SSE_{Full}/(n-p_{Full})}\)

🛠 Nested F-test for Multiple Linear Regression

  • Which of these are nested models?
  1. \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \epsilon\)
  2. \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_1 * x_2 + \epsilon\)
  3. \(y = \beta_0 + \beta_1 x_3 + \epsilon\)
  4. \(y = \beta_0 + \beta_1 x_1 + \epsilon\)
  5. \(y = \beta_0 + \beta_1 x_4 + \epsilon\)

((4) in (1) in (2))

🛠 Nested F-test for Multiple Linear Regression

  1. \(y = \beta_0 + \beta_1 x_1 + \epsilon\)
  2. \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_1 * x_2 + \epsilon\)
  • Comparing these two models, what is \(p_{Full}-p_{Reduced}\)?
    • \(p_{Full}: 4\)
    • \(p_{Reduced}: 2\)
    • \(p_{Full}-p_{Reduced} = 2\)

🛠 Nested F-test for Multiple Linear Regression

Why is this useful?

  • You may want to test the impact of some group of variables (like all “lab values”)
  • When you include many non-linear terms, you might want to test the effect of that variable overall (not just the “linear” or “quadratic” part of the variable for example)

🛠 Nested F-test for Multiple Linear Regression

  • Goal: Trying to predict the weight of fish based on their length and width
data("Perch")
model1 <- lm(
  Weight ~ Length,
  data = Perch
  )
model2 <- lm(
  Weight ~ Length + Width + I(Width ^ 2),
  data = Perch
  )
  • What is the equation for model1?
  • What is the equation for model2?

🛠 Nested F-test for Multiple Linear Regression

data("Perch")
model1 <- lm(
  Weight ~ Length,
  data = Perch
  )
model2 <- lm(
  Weight ~ Length + Width + I(Width ^ 2),
  data = Perch
  )
  • If we want to do a nested F-test, what is \(H_0\)?
    • \(H_0: \beta_{width} = \beta_{width^2} = 0\)
  • What is \(H_A\)?
    • \(H_A: \beta_{width}\neq 0\) or \(\beta_{width^2}\neq 0\)
  • What are the degrees of freedom of this test? (n = 56)
    • 2, 52

🛠 Nested F-test for Multiple Linear Regression

anova(model1)
Analysis of Variance Table

Response: Weight
          Df  Sum Sq Mean Sq F value Pr(>F)    
Length     1 6118739 6118739     627 <2e-16 ***
Residuals 54  527355    9766                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(SSModel1 <- 6118739)
[1] 6118739

🛠 Nested F-test for Multiple Linear Regression

anova(model2)
Analysis of Variance Table

Response: Weight
           Df  Sum Sq Mean Sq F value  Pr(>F)    
Length      1 6118739 6118739  2608.3 < 2e-16 ***
Width       1  110593  110593    47.1 8.1e-09 ***
I(Width^2)  1  294775  294775   125.7 1.7e-15 ***
Residuals  52  121987    2346                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(SSModel1 <- 6118739)
[1] 6118739
(SSModel2 <- 6118739 + 110593 + 294775)
[1] 6524107

🛠 Nested F-test for Multiple Linear Regression

  • \(F = \frac{(SSMODEL_{Full} - SSMODEL_{Reduced})/(p_{Full} - p_{Reduced})}{SSE_{Full}/(n - p_{Full})}\)
  • \(SSMODEL_{Full} - SSMODEL_{Reduced}\):
SSModel2 - SSModel1
[1] 405368
  • What is \(p_{Full}-p_{Reduced}\)?

🛠 Nested F-test for Multiple Linear Regression

  • \(F = \frac{(SSMODEL_{Full} - SSMODEL_{Reduced})/(p_{Full}-p_{Reduced})}{SSE_{Full}/(n - p_{Full})}\)
  • \((SSMODEL_{Full} - SSMODEL_{Reduced}) / (p_{Full}-p_{Reduced})\):
(SSModel2 - SSModel1) / 2
[1] 202684

🛠 Nested F-test for Multiple Linear Regression

  • \(F = \frac{(SSMODEL_{Full} - SSMODEL_{Reduced})/(p_{Full}-p_{Reduced})}{SSE_{Full}/(n - p_{Full})}\)
  • \(SSE_{Full}/(n - p_{Full})\)
anova(model2)
Analysis of Variance Table

Response: Weight
           Df  Sum Sq Mean Sq F value  Pr(>F)    
Length      1 6118739 6118739  2608.3 < 2e-16 ***
Width       1  110593  110593    47.1 8.1e-09 ***
I(Width^2)  1  294775  294775   125.7 1.7e-15 ***
Residuals  52  121987    2346                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

🛠 Nested F-test for Multiple Linear Regression

  • \(F = \frac{(SSMODEL_{Full} - SSMODEL_{Reduced})/(p_{Full}-p_{Reduced})}{SSE_{Full}/(n - p_{Full})}\)
((SSModel2 - SSModel1) / 2) /
  2346
[1] 86.4
  • What are the degrees of freedom for this test?
    • 2, 52
pf(86.4, 2, 52, lower.tail = FALSE)
[1] 2.95e-17

🛠 Nested F-test for Multiple Linear Regression

An easier way

anova(model1, model2)
Analysis of Variance Table

Model 1: Weight ~ Length
Model 2: Weight ~ Length + Width + I(Width^2)
  Res.Df    RSS Df Sum of Sq    F Pr(>F)    
1     54 527355                             
2     52 121987  2    405368 86.4 <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1