Week 8 practice: Linear modeling and broom

To practice the new tests we’ve learned, we’ll use the built-in mtcars dataset.

Test Practice

correlation

What is the correlation between horsepower and weight? Is it significant?

mtcars %$% cor.test(hp, wt) %>% tidy()
## # A tibble: 1 x 8
##   estimate statistic p.value parameter conf.low conf.high method
##      <dbl>     <dbl>   <dbl>     <int>    <dbl>     <dbl> <chr> 
## 1    0.659      4.80 4.15e-5        30    0.403     0.819 Pears…
## # ... with 1 more variable: alternative <chr>


linear model

Plot

Plot a scatterplot of the rear axle ratio vs. fuel displacement. Does the relationship look linear?

ggplot(mtcars, aes(x = drat, y = disp)) + 
  geom_point(size = 3) +
  labs(x = 'Rear axle ratio', y = 'Displacement (cu. in.)') +
  theme_classic() 


Test

Since the relationship looks approximately linear, test it with lm()

lm(disp ~ drat, data = mtcars) %>% tidy()
## # A tibble: 2 x 5
##   term        estimate std.error statistic      p.value
##   <chr>          <dbl>     <dbl>     <dbl>        <dbl>
## 1 (Intercept)     823.     108.       7.60 0.0000000179
## 2 drat           -165.      29.8     -5.53 0.00000528


ANOVA

Do an ANOVA of quarter mile time over the number of cylinders in the the engine.

aov(qsec ~ cyl, data = mtcars) %>% tidy()
## # A tibble: 2 x 6
##   term         df sumsq meansq statistic   p.value
##   <chr>     <dbl> <dbl>  <dbl>     <dbl>     <dbl>
## 1 cyl           1  34.6  34.6       16.1  0.000366
## 2 Residuals    30  64.4   2.15      NA   NA


Now take the same ANOVA from the chunk above and add a post-hoc Tukey test to it.

aov(qsec ~ as.factor(cyl), data = mtcars) %>% TukeyHSD() %>% tidy()
## # A tibble: 3 x 6
##   term           comparison estimate conf.low conf.high adj.p.value
##   <chr>          <chr>         <dbl>    <dbl>     <dbl>       <dbl>
## 1 as.factor(cyl) 6-4           -1.16    -2.94     0.619     0.257  
## 2 as.factor(cyl) 8-4           -2.37    -3.85    -0.883     0.00133
## 3 as.factor(cyl) 8-6           -1.20    -2.91     0.498     0.205


broom

Pick one of the three tests you did above and practice tidying it up with the three broom functions!


tidy()

Add tidy() onto end of your test.

aov(qsec ~ cyl, data = mtcars) %>% tidy()
## # A tibble: 2 x 6
##   term         df sumsq meansq statistic   p.value
##   <chr>     <dbl> <dbl>  <dbl>     <dbl>     <dbl>
## 1 cyl           1  34.6  34.6       16.1  0.000366
## 2 Residuals    30  64.4   2.15      NA   NA


glance()

Add glance() onto the end of your test to look at the model parameters.

aov(cyl ~ qsec, data = mtcars) %>% glance()
## # A tibble: 1 x 11
##   r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC
## *     <dbl>         <dbl> <dbl>     <dbl>   <dbl> <int>  <dbl> <dbl> <dbl>
## 1     0.350         0.328  1.46      16.1 3.66e-4     2  -56.6  119.  124.
## # ... with 2 more variables: deviance <dbl>, df.residual <int>


augment()

Add the results of the test back to mtcars using augment()

aov(cyl ~ qsec, data = mtcars) %>% augment(mtcars)
## # A tibble: 32 x 19
##    .rownames   mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear
##  * <chr>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 Mazda RX4  21       6  160    110  3.9   2.62  16.5     0     1     4
##  2 Mazda RX…  21       6  160    110  3.9   2.88  17.0     0     1     4
##  3 Datsun 7…  22.8     4  108     93  3.85  2.32  18.6     1     1     4
##  4 Hornet 4…  21.4     6  258    110  3.08  3.22  19.4     1     0     3
##  5 Hornet S…  18.7     8  360    175  3.15  3.44  17.0     0     0     3
##  6 Valiant    18.1     6  225    105  2.76  3.46  20.2     1     0     3
##  7 Duster 3…  14.3     8  360    245  3.21  3.57  15.8     0     0     3
##  8 Merc 240D  24.4     4  147.    62  3.69  3.19  20       1     0     4
##  9 Merc 230   22.8     4  141.    95  3.92  3.15  22.9     1     0     4
## 10 Merc 280   19.2     6  168.   123  3.92  3.44  18.3     1     0     4
## # ... with 22 more rows, and 8 more variables: carb <dbl>, .fitted <dbl>,
## #   .se.fit <dbl>, .resid <dbl>, .hat <dbl>, .sigma <dbl>, .cooksd <dbl>,
## #   .std.resid <dbl>


Use the augmented table from the chunk above to plot something.

aov(cyl ~ qsec, data = mtcars) %>% augment(mtcars) %>%
  ggplot(aes(x = qsec, y = .resid, color = cyl)) +
  geom_point()



Come Up with Your Own Questions

Use the tests talked about this week, cor(), cor.test(), lm(), aov(), and/or TukeyHSD() to ask two more questions about the mtcars dataset.


Question 1

Plot

Look at mtcars again. What’s another question you could ask about it? Plot the variable(s) you’re interested in below

ggplot(mtcars, aes(x = mpg, color = as.factor(vs))) +
  geom_density(size = 1) +
  labs(x = 'miles per gallon', y = '') + 
  scale_color_discrete(name = 'engine ehape', labels = c('v-shaped', 'straight')) +
  theme_classic()


Ask a Question

Based on the variables you just plotted, what’s your question?

Does engine shape affect the number of miles per gallon a car gets?


Test

Pick one of the tests to answer your question and run it in the chunk below.

aov(mpg ~ as.factor(vs), data = mtcars) %>% tidy()
## # A tibble: 2 x 6
##   term             df sumsq meansq statistic    p.value
##   <chr>         <dbl> <dbl>  <dbl>     <dbl>      <dbl>
## 1 as.factor(vs)     1  497.  497.       23.7  0.0000342
## 2 Residuals        30  630.   21.0      NA   NA


Answer the Question

What do you conclude from your test?

With a p-value of 0.0000342, it’s unlikely that the difference in mpg between the different engine types is due to chance.


Question 2

Plot

Look at mtcars again. What’s another question you could ask about it? Plot the variable(s) you’re interested in below

ggplot(mtcars, aes(x = disp, y = hp)) +
  geom_point(size = 3) +
  labs(x = 'displacement (cu. in.)', y = 'gross horsepower') +
  theme_classic()


Ask a Question

Based on the variables you just plotted, what’s your question?

Can displacement predict horsepower?


Test

Pick one of the tests to answer your question and run it in the chunk below.

lm(hp ~ disp, data = mtcars) %>% tidy()
## # A tibble: 2 x 5
##   term        estimate std.error statistic      p.value
##   <chr>          <dbl>     <dbl>     <dbl>        <dbl>
## 1 (Intercept)   45.7     16.1         2.84 0.00811     
## 2 disp           0.438    0.0618      7.08 0.0000000714


Answer the Question

What do you conclude from your test?

With a p-value of 0.00811 for the y-intercept and a p-value of 7.14e-8 for the slope, displacement can make a good prediction of the car’s horsepower.