This page will show you how you can analyze and visualize 2 variables and their relationships using R.

Explanation

Contingency tables (a.k.a. crosstabs): tabyl

Use the tabyl() function from the janitor package. The first line creates the table. The rest formats the table (giving it a title, adding totals etc…).

library(janitor)
mtcars %>% 
  tabyl(am, gear, show_na = FALSE) %>% 
  adorn_title("combined") %>% #both var names in the title
  adorn_totals("col") %>% #column totals
  adorn_totals("row") %>% #row totals
  adorn_percentages("col") %>% #columnwise, rowwise (row), or total percentages (all)
  adorn_pct_formatting(digits = 1) %>%
  adorn_ns() #show the numbers of cases

Visualizing relationships between variables: ggplot

For a bivariate visualizations:

# A boxplot comparing different species (nominal variable)
iris %>% 
  ggplot(aes(x = Species, y = Petal.Width)) +
  geom_boxplot()

# two scale variables, creating a scatterplot
mtcars %>% 
  ggplot(aes(x = cyl, y = mpg)) + 
  geom_point()

# adding a regression line
mtcars %>% 
  ggplot(aes(x = cyl, y = mpg)) + 
  geom_point() +
  geom_smooth(method = "lm", se = F)

Descriptive and inferential statistics for associations (parametric)

For Pearson’s r, we use

cor.test(x = mtcars$mpg, 
    y = mtcars$disp,
    method = "pearson")

For an independent sample t-test (a linear model with a dummy as independent variable), in which both groups can be assumed to have an equal variance, we use:

# vs is a dummy and disp is a ratio variable in the data set mtcars
t.test(disp ~ vs, mtcars, var.equal = TRUE) 
# or use
lm(disp ~ vs, mtcars) %>% summary()

For an independent sample t-test in which both groups cannot be assumed to have equal variances, we use:

# vs is a dummy and disp is a ratio variable in the data set mtcars
t.test(disp ~ vs, mtcars, var.equal = FALSE) 

When studying the effect of nominal variable on a ratio variable (ANOVA), use:

# spray is a nominal variable (stored as a factor) and 
# count can be treated as a ratio variable 
# in the data set InsectSprays

oneway.test(count ~ spray, InsectSprays, var.equal = TRUE)

# or use
lm(count ~ spray, InsectSprays) %>% summary()

For a test in which groups cannot be assumed to have equal variances, use:

oneway.test(count ~ spray, InsectSprays, var.equal = FALSE)

Testing whether variances are the same: Levene’s test

Sometimes it may be wise to actually TEST whether two different variances of two (or more) different groups may come from the same population. One of the tests use for this is Levene’s Test. In R, you use the leveneTest() function from the car package. You define the dependent variable, here mpg, and the independent variable, here am.

# library(car)
mtcars %>% leveneTest(mpg ~ factor( am ), 
                      data = .)

When Levene’s Test is NOT significant, you can assume the groups have equal variances.

Descriptive and inferential statistics for associations (non parametric)

For Cramer’s V, we use

# install and load the package "rcompanion" first, this contains the command cramerV.
# this function works on tables, so create a table first

library("rcompanion")
my_table <- table(mtcars$vs, mtcars$am)
cramerV(my_table, ci = TRUE)

For Kendall’s tau b, we use

cor.test(x = mtcars$mpg, 
    y = mtcars$gear,
    method = "kendall")

For Spearman’s rho, we use

cor.test(x = mtcars$mpg, 
    y = mtcars$disp,
    method = "spearman")

For a chi-square test, we use

# assuming the data are in vector x and vector y2
csq <- chisq.test(data$x, data$y2)
csq

# after calculating chi square by hand, 
# you can find the associated p-value by using:
chisq <- 3.86 # put the number you find here
df <- 1 # put the degrees of freedom here
pchisq(chisq, df, lower.tail = FALSE) # this gives the p-value

For a non-parametric test for comparing the medians of two groups, we use

mtcars %>% 
  wilcox.test(mpg ~ am, 
              data = .,
              exact = FALSE)

For a non-parametric test for comparing the medians of three or more groups, we use

airquality %>% 
  kruskal.test(Ozone ~ Month, data = .)

Examples

No examples available yet

Functions

No function overview yet

FAQ

No Frequently Asked Questions yet

Resources

No resources yet