Logistic Regression in Practice

Lucy D’Agostino McGowan

Logistic vs Linear

Logistic in R

glm(scored ~ distance, data = data, family = binomial)


Call:  glm(formula = scored ~ distance, family = binomial, data = data)

Coefficients:
(Intercept)     distance  
   0.309152    -0.007256  

Degrees of Freedom: 24 Total (i.e. Null);  23 Residual
Null Deviance:      34.3 
Residual Deviance: 34.29    AIC: 38.29

glm stands for “Generalized Linear Model” – we extend the linear models we’ve learned about to other types of outcomes with different distributions (so far we have been assuming the normal (AKA “Gaussian”) distribution)

Logistic in R

glm(scored ~ distance, data = data, family = binomial)


Call:  glm(formula = scored ~ distance, family = binomial, data = data)

Coefficients:
(Intercept)     distance  
   0.309152    -0.007256  

Degrees of Freedom: 24 Total (i.e. Null);  23 Residual
Null Deviance:      34.3 
Residual Deviance: 34.29    AIC: 38.29

family = "binomial" We let R known what expect the distribution of the relationship between the outcome and the predictors is. Again, so far we’ve been assuming this is normal, but now we have binary data

Logistic in R

glm(scored ~ distance, data = data, family = binomial)


Call:  glm(formula = scored ~ distance, family = binomial, data = data)

Coefficients:
(Intercept)     distance  
   0.309152    -0.007256  

Degrees of Freedom: 24 Total (i.e. Null);  23 Residual
Null Deviance:      34.3 
Residual Deviance: 34.29    AIC: 38.29

the binomial distribution is used to model the number of successes in a series of independent trials, what we actually have is a special case of the binomial, the “Bernoulli”, where we want to known whether a “success” (outcome) occurs or not

Logistic in R

glm(scored ~ distance, data = data, family = binomial)


Call:  glm(formula = scored ~ distance, family = binomial, data = data)

Coefficients:
(Intercept)     distance  
   0.309152    -0.007256  

Degrees of Freedom: 24 Total (i.e. Null);  23 Residual
Null Deviance:      34.3 
Residual Deviance: 34.29    AIC: 38.29

How do you think you interpret this coefficient?

For every one inch increase in distance, the expected change in the log odds of scoring is -0.007

Logistic in R

glm(scored ~ distance, data = data, family = binomial)


Call:  glm(formula = scored ~ distance, family = binomial, data = data)

Coefficients:
(Intercept)     distance  
   0.309152    -0.007256  

Degrees of Freedom: 24 Total (i.e. Null);  23 Residual
Null Deviance:      34.3 
Residual Deviance: 34.29    AIC: 38.29

What if I want it on the odds scale instead of log(odds)?

For every one inch increase in distance, the expected odds of scoring decreases by a factor of 0.993

Logistic in R

glm(scored ~ distance, data = data, family = binomial)


Call:  glm(formula = scored ~ distance, family = binomial, data = data)

Coefficients:
(Intercept)     distance  
   0.309152    -0.007256  

Degrees of Freedom: 24 Total (i.e. Null);  23 Residual
Null Deviance:      34.3 
Residual Deviance: 34.29    AIC: 38.29

For every one inch decrease in distance, odds of scoring are expected to be 1.007 times higher

What if the predictor is binary?

data <- data |>
  mutate(distance_9 = ifelse(distance < 9, 1, 0))

glm(scored ~ distance_9, data, family = binomial)


Call:  glm(formula = scored ~ distance_9, family = binomial, data = data)

Coefficients:
(Intercept)   distance_9  
     0.1542       0.1823  

Degrees of Freedom: 24 Total (i.e. Null);  23 Residual
Null Deviance:      34.3 
Residual Deviance: 34.25    AIC: 38.25

How do you think you interpret this coefficient?

Being less than 9 inches away from the goal increases the expected log odds of scoring by 0.182 compared to being 9 or more inches away.

What if the predictor is binary?

data <- data |>
  mutate(distance_9 = ifelse(distance < 9, 1, 0))

glm(scored ~ distance_9, data, family = binomial)


Call:  glm(formula = scored ~ distance_9, family = binomial, data = data)

Coefficients:
(Intercept)   distance_9  
     0.1542       0.1823  

Degrees of Freedom: 24 Total (i.e. Null);  23 Residual
Null Deviance:      34.3 
Residual Deviance: 34.25    AIC: 38.25

What if I wanted the odds?

Being less than 9 inches away from the goal increases the expected odds of scoring by 1.2 times compared to being 9 or more inches away.

What if the predictor is binary?

data <- data |>
  mutate(distance_9 = ifelse(distance < 9, 1, 0))

glm(scored ~ distance_9, data, family = binomial)


Call:  glm(formula = scored ~ distance_9, family = binomial, data = data)

Coefficients:
(Intercept)   distance_9  
     0.1542       0.1823  

Degrees of Freedom: 24 Total (i.e. Null);  23 Residual
Null Deviance:      34.3 
Residual Deviance: 34.25    AIC: 38.25

exp(0.182) = 1.2 is known as the Odds Ratio. It is the ratio of odds of scoring if you were less than 9 inches away compared to the odds of scoring if you were 9 or more inches away.