Lab 06 - Logistic

Due: 2024-04-16 at 11:59pm Turn your .html file in on Canvas

Getting started

Go to RStudio Pro and click:

Step 1. File > New Project
Step 2. “Version Control”
Step 3. Git
Step 4. Copy the following into the “Repository URL”:

https://github.com/sta-112-s24/lab-06-logistic-regression.git

Set up

We will use two packages, tidyverse and Stat2Data. Additionally, if you need to create an empirical logit plot, below is example code to do so:

library(tidyverse)
library(Stat2Data)
data(MedGPA)
# you can change ngroups as you see necessary, this is kind of like setting the number of bins in a histogram
emplogitplot1(Acceptance ~ GPA, data = MedGPA, ngroups = 5)

Exercises

We are analyzing the Leukemia data from the Stat2Data package. Be sure to learn about the data and variables. We are doing inference on a model predicting whether the patient responded to treatment from Age.

  1. Describe the data (number of observations, what the observations are, number of variables, what the variables are, and whether there is any missing data).

  2. Run the following code to drop Time (the survival time) and Status (an indicator for whether the patient survived), from your dataset.

Leukemia <- Leukemia |>
  select(-Time, -Status)
  1. We are interested in whether age at diagnosis influences the probability of responding to treatment. I propose that the differential percentage of blasts, the percentage of absolute marrow leukemia infiltrate and the highest temperature of the patient prior to treatment in degrees Fahrenheit are important to control for when assessing the relationship between age at diagnosis and the probability of responding to treatment. Write out the equation for the generic model we would fit to assess this relationship.

  2. Fit the model proposed in Exercise 3. Check that this model fits the assumption(s) for logistic regression, particularly focusing on the Age variable. Be sure to include a figure to demonstrate the fit of the assumptions for the Age variable.

  3. Report the coefficient for Age as well as the associated odds ratio with a 95% confidence interval. Interpret this in the context of this data.