This page will show you how you can analyze relationships between more than two variables using R.
Use the tabyl()
function from the janitor
package. With several adorning functions you can adjust the table to
your liking. You can force the table displaying the values ‘none’,
‘some’ and ‘many’ in the correct order, by making sure the variable is
stored as an ‘ordered factor’ (see chapter 2).
library(janitor)
mtcars %>%
tabyl(am, gear, cyl, show_na = FALSE) %>%
adorn_title("combined") %>% #both var names in the title
adorn_totals("col") %>% #column totals
adorn_totals("row") %>% #row totals
adorn_percentages("col") %>% #columnwise, rowwise (row), or total percentages (all)
adorn_pct_formatting(digits = 1) %>%
adorn_ns() #show the numbers of cases
If you want to visualize different groups in the scatterplot, you can add extra variables to the aesthetics of a ggplot. In addition to x and y, you can use color (the “outside” color of points), fill (the “inside” color of the points), shape (of the points) and size.
mtcars %>%
ggplot(aes(x = cyl,
y = mpg,
color = factor(gear)
)
) +
geom_point()
For different subplots, use the layer facet_wrap(~ )
as
default. This is another way to introduce a third variable in the
context of data visualizations.
mpg %>%
ggplot(aes(x = factor(cyl), y = hwy)) +
geom_boxplot() +
facet_wrap(~ year)
For the ordinary linear model, we use lm()
. For a clear
presentation of the regression table, we use the tidy()
function from the broom
package.
library(broom)
model <- mtcars %>%
lm(qsec ~ wt + cyl, data = .)
# the regression table
model %>%
tidy()
# the ANOVA table
model %>%
anova() %>%
tidy()
# overall output, including R squared, is provided by
model %>%
summary()
There are various ways to create plots with the residuals and/or the predicted values.
# always create a model you want to diagnose first
model <- mtcars %>%
lm(qsec ~ wt + cyl, data = .)
# the object 'model' now contains the residuals and the predicted values.
# residuals and predicted values can be added to the data frame in two ways:
# directly
mtcars$residuals <- model$residuals
mtcars$predictions <- model$fitted.values
# or by using the modelr package:
library(modelr)
mtcars <- mtcars %>%
add_predictions(model) %>%
add_residuals(model)
# after adding the residuals and the predicted values displaying residuals plots is done in a ggplot
mtcars %>%
ggplot(aes(x = pred, y = resid)) +
geom_point()
There are various ways to test for equal variances in a model. These
are available in various packages, including car
and in
lmtest
. Levene’s test is introduced in chapter 4.
# Breusch-Pagan Test
library(lmtest)
# estimate the model first and then:
bptest(model)
For outliers and influential cases, use:
# After estimating and storing a model, leverage as measured by the hat values can be stored to the original data frame, by using
data$leverage_cases_model_1 <- hatvalues(model_1)
# Also, one of the plots of the model contains info about Cook's distances
plot(model_1)
# After estimating and storing the model, influence as measured by cooks distances can be stored to the original data frame, by using
data$influential_cases_model_1 <- cooks.distance(model_1)
# Also, one of the plots of the model contains info about Cook's distances
plot(model_1)
After estimating the model, detecting multicollinearity can be done
using the car
library.
library(car)
vif(model)
For the linear mixed model, we use the lmer()
function
from the lme4
package. Note that the output does not show
p-values, nor residual degrees of freedom for fixed effects.
This is for a good reason.
library(lme4)
mtcars %>%
lmer(qsec ~ wt + (1|gear), data = .) %>%
tidy()
# When we have factors as fixed variables:
mtcars %>%
lmer(qsec ~ wt + factor(cyl) + (1|gear), data = .) %>%
anova() %>%
tidy()
If you want approximate p-values, you can use
Satterthwaite’s degrees of freedom method, implemented in the
lmerTest
package. The same method is used in SPSS.
library(lmerTest)
mtcars %>%
lmer(qsec ~ wt + (1|gear), data = .) %>%
summary()
For a residual plot, we can use similar syntax as for the linear
model, using modelr
:
model <- mtcars %>%
lmer(qsec ~ wt + (1|gear), data = .)
mtcars %>%
add_predictions(model) %>%
add_residuals(model) %>%
ggplot(aes(x = pred, y = resid)) +
geom_point()
For a logistic regression model, we use the glm()
function.
model_lr <- mtcars %>%
glm(am ~ wt, family = binomial, data = .)
broom::tidy(model_lr)
summary(model_lr)
# when you want to find out the probabilities for units with scores 2, 3 and 5 on the wt variable:
wt <- c(0, 18, 72) # make sure "x" is the same name as the independent variable in the data set
new_data <- as.data.frame(wt)
predict(model_lr, new_data, type="response")
Similarly for a Poisson regression.
mtcars %>%
glm(carb ~ wt, family = poisson, data = .) %>%
tidy()
For a non-parametric test for repeated measures, we use
iris %>%
select(Sepal.Length, Petal.Length) %>%
as.matrix() %>%
friedman.test()
No examples available yet
No function overview yet
No Frequently Asked Questions yet
No resources yet