One-Way Analysis of Variance
AKA: One-Way ANOVA
I. Purpose:
A one-way analysis of variance compares the means of two or more non-related groups of data. The null hypothesis of a one-way ANOVA states that there are no differences between the groups.
The ANOVA uses the F distribution, which is directly related to the t distribution. The t distribution is simply the square of the F distribution. However, the F distribution gives us more flexibility in terms of how many groups of means we can compare. The t distribution is limited to two groups, while the ANOVA can evaluate two or more groups.
The ANOVA can only describe whether there is a difference between the groups, not where the difference is between the groups. In an ANOVA run on two groups, it is obvious that if there is a difference, that the difference is between the two groups. However, when the ANOVA is run on three or more groups, it is not obvious whether the difference is between the first and second groups, the second and third groups, the first and third groups, or if there are differences present between all groups. Because of this, post hoc tests are run after the ANOVA to determine where the difference is.
In this analysis, we will be using the Tukey HSD (honestly significant difference) post hoc test. The Tukey HSD test is used for each pair (so for three groups there will be three tests run) to determine whether there is a difference in the pair. The null hypothesis of the Tukey HSD test is that there is no difference between the two groups being analyzed.
II. Formula:
One-Way ANOVA
Tukey HSD
Where Ma is the larger mean and Mb is the smaller mean of the two groups that are being compared. SE is standard error.
III. Code in R:
stacked<- stack(mydata)
Stacks the data so it can be used in the one way test
aov(VAR1 ~ VAR2, data=stacked)
This code runs the one way anova.
aov.out = aov(VAR1 ~ VAR2, data=stacked)
This code saves the output so we can run post hoc tests.
TukeyHSD(aov.out)
This code runs the Tukey HSD between all groups.
IV. Scenario:
Dr. Blank wants to examine the differences in amount of time spent studying between his undergraduate, master's, and doctoral students. He asks them to report the average number of hours spent studying per week over the semester. Using this data, he wants to run a one way ANOVA to determine if there is a difference between the three groups of students. If there is a difference, he wants to run a Tukey HSD post hoc test to determine where the differences are.
Hypotheses being tested:
H0 = Groups will all be the same.
μ undergrad = μ master = μ doctoral
H1 = At least two of the groups will be different.
μ undergrad ≠ μ master
μ master ≠ μ doctoral
μ undergrad ≠ μ doctoral
Instructions
1. Open oneway.csv file in R
onewayaov <- read.csv("onewayaov.csv", header=TRUE)
2. View data in R
onewayaov
3. Run descriptive statistics
summary(onewayaov$Undergrad)
summary(onewayaov$Master)
summary(onewayaov$Doctoral)
4. Run descriptive statistics for standard deviation
sd(onewayaov$Undergrad)
sd(onewayaov$Master)
sd(onewayaov$Doctoral)
5. Stack the data
stacked<-stack(onewayaov)
6. Define the variables
hours <- stacked$values
student <- stacked$ind
7. Run a two-tailed (default) one-way ANOVA
oneway.test(hours ~ student)
8. Save the output
aov.out = aov(hours ~ student, data=stacked)
9. Run a Tukey HSD test
TukeyHSD(aov.out)
V. Results Write-Up
One-Way Analysis of Variance
A two-tailed one-way analysis of variance was run to determine whether there was a difference in the amount of time studying in hours by undergraduate, master's, and doctoral students. There was a significant difference found among undergraduate, master's, and doctoral students in how much time they spent studying, F(2, 17.89) = 10.95, p < .01.
Tukey HSD Test
A Tukey HSD test was run to determine where the difference was in study hours among undergraduate, master's, and doctoral students. It was determined that there was a significant difference between undergraduate and doctoral students ( p < .01) and between master's and doctoral students (p = .04) in number of hours studied. No significant difference was found between undergraduate and master's students (p = .06) in number of hours studied.
Reference/Citation:
Caddick, Z., Leonard, M., and Laraway, S. Statistics in R. 2014.