{
"cells": [
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"tags": [
"remove-cell"
]
},
"outputs": [],
"source": [
"library(repr) ; options(repr.plot.res = 100, repr.plot.width = 6, repr.plot.height = 5) # Change plot sizes (in cm) - this bit of code is only relevant if you are using a jupyter notebook - ignore otherwise"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Linear Models: Multiple explanatory variables"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"\n",
"In this chapter we will explore fitting a linear model to data when you have multiple explanatory (predictor) variables. \n",
"\n",
"The aims of this chapter are[$^{[1]}$](#fn1):\n",
"\n",
"* Learning to build and fit a linear model that includes several explanatory variables\n",
"\n",
"* Learning to interpret the summary tables and diagnostics after fitting a linear model with multiple explanatory variables\n",
"\n",
"## An example\n",
"\n",
"The models we looked at in the [ANOVA chapter](15-anova.ipynb) explored whether the log genome size (C value, in picograms) of terrestrial mammals varied with trophic level and whether or not the species is ground dwelling. We will now look at a single model that includes both explanatory variables.\n",
"\n",
"The first thing to do is look at the data again. \n",
"\n",
"### Exploring the data\n",
"\n",
"$\\star$ Create a new blank script called `MulExpl.R` in your `Code` directory and add some introductory comments.\n",
"\n",
"$\\star$ Load the data saved at the end of the [ANOVA chapter](15-anova.ipynb):"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"load('../data/mammals.Rdata')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Look back at the end of the previous chapter to see how you saved the RData file. If `mammals.Rdata` is missing, just import the data again using `read.csv` and add the `log C Value` column to the imported data frame again (go back to the [ANOVA chapter](15-anova.ipynb) and have a look if you have forgotten how).\n",
"\n",
"Use `ls()`, and then `str` to check that the data has loaded correctly:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"'data.frame':\t379 obs. of 10 variables:\n",
" $ Binomial : Factor w/ 379 levels \"Acinonyx jubatus\",..: 1 2 3 4 5 6 7 8 9 10 ...\n",
" $ meanCvalue : num 2.56 2.64 3.75 3.7 3.98 4.69 2.15 2.43 2.73 2.92 ...\n",
" $ Order : Factor w/ 21 levels \"Artiodactyla\",..: 2 17 17 17 1 1 4 17 17 17 ...\n",
" $ AdultBodyMass_g: num 50500 41.2 130 96.5 94700 52300 15 25.3 50.5 33 ...\n",
" $ DietBreadth : int 1 NA 2 NA 5 2 NA 4 NA NA ...\n",
" $ HabitatBreadth : int 1 NA 2 2 1 1 1 2 NA 1 ...\n",
" $ LitterSize : num 2.99 2.43 3.07 NA 1 1 0.99 4.59 3.9 3.77 ...\n",
" $ GroundDwelling : Factor w/ 2 levels \"No\",\"Yes\": 2 NA 2 2 2 2 1 2 NA 2 ...\n",
" $ TrophicLevel : Factor w/ 3 levels \"Carnivore\",\"Herbivore\",..: 1 NA 2 NA 2 2 NA 3 NA NA ...\n",
" $ logCvalue : num 0.94 0.971 1.322 1.308 1.381 ...\n"
]
}
],
"source": [
"str(mammals)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[Previously](14-regress.ipynb), we asked if carnivores or herbivores had larger genomes. Now we want to ask questions like: do ground-dwelling carnivores have larger genomes than arboreal or flying omnivores? We need to look at plots within groups.\n",
"\n",
"Before we do that, there is a lot of missing data in the data frame and we should make sure that we are using the same data for our plots and models. We will subset the data down to the complete data for the three variables:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"'data.frame':\t259 obs. of 3 variables:\n",
" $ GroundDwelling: Factor w/ 2 levels \"No\",\"Yes\": 2 2 2 2 2 1 2 1 1 1 ...\n",
" $ TrophicLevel : Factor w/ 3 levels \"Carnivore\",\"Herbivore\",..: 1 2 2 2 3 3 3 2 2 3 ...\n",
" $ logCvalue : num 0.94 1.322 1.381 1.545 0.888 ...\n"
]
}
],
"source": [
"mammals <- subset(mammals, select = c(GroundDwelling, TrophicLevel, \n",
"logCvalue))\n",
"mammals <- na.omit(mammals)\n",
"str(mammals)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Boxplots within groups\n",
"\n",
"[Previously](14-regress.ipynb), we used the `subset` option to fit a model just to dragonflies. You can use `subset` with plots too.\n",
"\n",
"$\\star$ Add `par(mfrow=c(1,2))` to your script to split the graphics into two panels.\n",
"\n",
"$\\star$ Copy over and modify the code from the [ANOVA chapter](15-anova.ipynb) to create a boxplot of genome size by trophic level into your script.\n",
"\n",
"$\\star$ Now further modify the code to generate the plots shown in the figure below (you will have to `subset` your data for this, and also use the subset option of the `plot` command).\n",
"\n",
"---\n",
"\n",
"\n",
"
Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
---|---|---|---|---|---|
<int> | <dbl> | <dbl> | <dbl> | <dbl> | |
TrophicLevel | 2 | 0.8141063 | 0.40705316 | 7.859815 | 4.870855e-04 |
GroundDwelling | 1 | 2.7469218 | 2.74692183 | 53.040485 | 4.062981e-12 |
Residuals | 255 | 13.2062341 | 0.05178915 | NA | NA |