Some data is already loaded when you load certain packages in R
, to access these, you just need to use the data()
function like this:
Other times you’ll have data in a file, like a .csv
or Excel file. You can use read_*
functions that load when you load the tidyverse package to read these in. For example, to read a .csv
file in, you could run:
Note, movie_data.csv
would need to be saved in your RStudio project folder for this code to run. We will practice this in a few weeks.
glimpse
at your dataRows: 1,846
Columns: 3
$ dataset <chr> "dino", "dino", "dino", "dino", "dino", "d…
$ x <dbl> 55.3846, 51.5385, 46.1538, 42.8205, 40.769…
$ y <dbl> 97.1795, 96.0256, 94.4872, 91.4103, 88.333…
How many rows are in this dataset? How many columns?
00:30
glimpse
at your dataRows: 344
Columns: 8
$ species <fct> Adelie, Adelie, Adelie, Adelie, …
$ island <fct> Torgersen, Torgersen, Torgersen,…
$ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3…
$ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6…
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181…
$ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650…
$ sex <fct> male, female, female, NA, female…
$ year <int> 2007, 2007, 2007, 2007, 2007, 20…
What type of variable is species
? How many numeric variables are there?
00:30
double
Numbers c(1.5, 2, 3.12, 4)
integer
Whole numbers c(1, 2, 3, 4)
character
Text c("red", "banana")
factor
Category c("Yes", "Maybe", "No")
Let’s grab one of the datasaurus_dozen
datasets.
What does filter
do? Why ==
?
00:30
The geom_*
in ggplot2 describe the type of plot you want to create. What do you think would create a histogram?
00:30
geom_histogram
What does this warning mean? How do you think we can get rid of it?
What does this plot tell us about the shape of this data?
00:30
What geom_
do you think would create a density plot?
00:30
geom_density
What geom_
do you think would create a boxplot?
00:30
geom_boxplot
Does this give us as much information as the histogram?
00:30
What does this plot tell us?
00:30
What does %in%
do?
00:30
What is missing?
00:30
How can we make this more legible?
00:30
Application Exercise
Open the Welcome Penguins folder from the previous application exercise
Replace format: html
with the following in your yaml:
format:
html:
self-contained: true
Create a boxplot examining the relationship between the body mass of a penguin and their species.
Add jittered points to this plot
Add labels and a title to this plot
Upload this to Canvas under Application Exercise 3
08:00