## Calculating Analysis of Variance (ANOVA) and Post Hoc Analyses Following ANOVA

Analysis of variance (ANOVA) is a statistical procedure that compares data between two or more groups or conditions to investigate the presence of differences between those groups on some continuous dependent variable (see Exercise 18). In this exercise, we will focus on the one-way ANOVA, which involves testing one independent variable and one dependent variable (as opposed to other types of ANOVAs, such as factorial ANOVAs that incorporate multiple independent variables).

Why ANOVA and not a t-test? Remember that a t-test is formulated to compare two sets of data or two groups at one time (see Exercise 23 for guidance on selecting appropriate statistics). Thus, data generated from a clinical trial that involves four experimental groups, Treatment 1, Treatment 2, Treatments 1 and 2 combined, and a Control, would require 6 t-tests. Consequently, the chance of making a Type I error (alpha error) increases substantially (or is inflated) because so many computations are being performed. Specifically, the chance of making a Type I error is the number of comparisons multiplied by the alpha level. Thus, ANOVA is the recommended statistical technique for examining differences between more than two groups (Zar, 2010).

ANOVA is a procedure that culminates in a statistic called the F statistic. It is this value that is compared against an F distribution (see Appendix C) in order to determine whether the groups significantly differ from one another on the dependent variable. The formulas for ANOVA actually compute two estimates of variance: One estimate represents differences between the groups/conditions, and the other estimate represents differences among (within) the data.

### Research Designs Appropriate for the One-Way ANOVA

Research designs that may utilize the one-way ANOVA include the randomized experimental, quasi-experimental, and comparative designs (Gliner, Morgan, & Leech, 2009). The independent variable (the “grouping” variable for the ANOVA) may be active or attributional. An active independent variable refers to an intervention, treatment, or program. An attributional independent variable refers to a characteristic of the participant, such as gender, diagnosis, or ethnicity. The ANOVA can compare two groups or more. In the case of a two-group design, the researcher can either select an independent samples t-test or a one-way ANOVA to answer the research question. The results will always yield the same conclusion, regardless of which test is computed; however, when examining differences between more than two groups, the one-way ANOVA is the preferred statistical test.

Example 1: A researcher conducts a randomized experimental study wherein she randomizes participants to receive a high-dosage weight loss pill, a low-dosage weight loss pill, or a placebo. She assesses the number of pounds lost from baseline to post-treatment 378for the three groups. Her research question is: “Is there a difference between the three groups in weight lost?” The independent variables are the treatment conditions (high-dose weight loss pill, low-dose weight loss pill, and placebo) and the dependent variable is number of pounds lost over the treatment span.

Null hypothesis: There is no difference in weight lost among the high-dose weight loss pill, low-dose weight loss pill, and placebo groups in a population of overweight adults.

Example 2: A nurse researcher working in dermatology conducts a retrospective comparative study wherein she conducts a chart review of patients and divides them into three groups: psoriasis, psoriatric symptoms, or control. The dependent variable is health status and the independent variable is disease group (psoriasis, psoriatic symptoms, and control). Her research question is: “Is there a difference between the three groups in levels of health status?”

Null hypothesis: There is no difference between the three groups in health status.

### Statistical Formula and Assumptions

Use of the ANOVA involves the following assumptions (Zar, 2010):

1. Sample means from the population are normally distributed.

2. The groups are mutually exclusive.

3. The dependent variable is measured at the interval/ratio level.

4. The groups should have equal variance, termed “homogeneity of variance.”

The dependent variable in an ANOVA must be scaled as interval or ratio. If the dependent variable is measured with a Likert scale and the frequency distribution is approximately normally distributed, these data are usually considered interval-level measurements and are appropriate for an ANOVA (de Winter & Dodou, 2010; Rasmussen, 1989).

The basic formula for the F without numerical symbols is:

F=Mean Square Between GroupsMean Square Within Groups

The term “mean square” (MS) is used interchangeably with the word “variance.” The formulas for ANOVA compute two estimates of variance: the between groups variance and the within groups variance. The between groups variance represents differences between the groups/conditions being compared, and the within groups variance represents differences among (within) each group’s data. Therefore, the formula is F = MS between/MS within.

### Hand Calculations

Using an example from a study of students enrolled in an RN to BSN program, a subset of graduates from the program were examined (Mancini, Ashwill, & Cipher, 2014). The data are presented in Table 33-1. A simulated subset was selected for this example so that 379the computations would be small and manageable. In actuality, studies involving one-way ANOVAs need to be adequately powered (Aberson, 2010; Cohen, 1988). See Exercises 24 and 25 for more information regarding statistical power.

TABLE 33-1

MONTHS FOR COMPLETION OF RN TO BSN PROGRAM BY HIGHEST DEGREE STATUS

Participant # | Associate’s | Participant # | Bachelor’s | Participant # | Master’s |

Degree | Degree | Degree | |||

1 | 17 | 10 | 16 | 19 | 17 |

2 | 19 | 11 | 15 | 20 | 21 |

3 | 24 | 12 | 16 | 21 | 20 |

4 | 18 | 13 | 12 | 22 | 21 |

5 | 24 | 14 | 16 | 23 | 12 |

6 | 24 | 15 | 12 | 24 | 16 |

7 | 16 | 16 | 16 | 25 | 20 |

8 | 16 | 17 | 12 | 26 | 18 |

9 | 20 | 18 | 10 | 27 | 12 |

The independent variable in this example is highest degree obtained prior to enrollment (Associate’s, Bachelor’s, or Master’s degree), and the dependent variable was number of months it took for the student to complete the RN to BSN program. The null hypothesis is “There is no difference between the groups (highest degree of Associate’s, Bachelor’s, or Master’s) in the months these nursing students require to complete an RN to BSN program.”

The computations for the ANOVA are as follows:

Step 1: Compute correction term, C.

Square the grand sum (G), and divide by total N:

C=460 2 27 =7,837.04

Step 2: Compute Total Sum of Squares.

Square every value in dataset, sum, and subtract C:

(17 2 +19 2 +24 2 +18 2 +24 2 +16 2 +16 2 +…+12 2 )−7,837.04=8,234−7,837.04=396.96

Step 3: Compute Between Groups Sum of Squares.

Square the sum of each column and divide by N. Add each, and then subtract C:

178 2 9 +125 2 9 +157 2 9 −7,837.04(3,520.44+1,736.11+2,738.78)−7,837.04=158.29

Step 4: Compute Within Groups Sum of Squares.

Subtract the Between Groups Sum of Squares (Step 3) from Total Sum of Squares (Step 2):

396.96−158.29=238.67

Step 5: Create ANOVA Summary Table (see Table 33-2).

a. Insert the sum of squares values in the first column.

b. The degrees of freedom are in the second column. Because the F is a ratio of two separate statistics (mean square between groups and mean square within groups) both have different df formulas—one for the “numerator” and one for the denominator:

Mean square between groupsdf=number of groups−1

Mean square within groups df=N-number of groups

For this example, thedffor the numerator is 3−1=2.

Thedffor the denominator is 27−3=24.

c. The mean square between groups and mean square within groups are in the third column. These values are computed by dividing the SS by the df. Therefore, the MS between = 158.29 ÷ 2 = 79.15. The MS within = 238.67 ÷ 24 = 9.94.

d. The F is the final column and is computed by dividing the MS between by the MS within. Therefore, F = 79.15 ÷ 9.94 = 7.96.

TABLE 33-2

ANOVA SUMMARY TABLE

Source of Variation | SS | df | MS | F |

Between Groups | 158.29 | 2 | 79.15 | 7.96 |

Within Groups | 238.67 | 24 | 9.94 | |

Total | 396.96 | 26 |

Step 6: Locate the critical F value on the F distribution table (see Appendix C) and compare it to our obtained F = 7.96 value. The critical F value for 2 and 24 df at α = 0.05 is 3.40, which indicates the F value in this example is statistically significant. Researchers report ANOVA results in a study report using the following format: F(2,24) = 7.96, p < 0.05. Researchers report the exact p value instead of “p < 0.05,” but this usually requires the use of computer software due to the tedious nature of p value computations.

Our obtained F = 7.96 exceeds the critical value in the table, which indicates that the F is statistically significant and that the population means are not equal. Therefore, we can reject our null hypothesis that the three groups spent the same amount of time completing the RN to BSN program. However, the F does not indicate which groups differ from one another, and this F value does not identify which groups are significantly different from one another. Further testing, termed multiple comparison tests or post hoc tests, is required to complete the ANOVA process and determine all the significant differences among the study groups.

#### Post Hoc Tests

Post hoc tests have been developed specifically to determine the location of group differences after ANOVA is performed on data from more than two groups. These tests were developed to reduce the incidence of a Type I error. Frequently used post hoc tests are the Newman-Keuls test, the Tukey Honestly Significant Difference (HSD) test, the Scheffé test, and the Dunnett test (Zar, 2010; see Exercise 18 for examples). When these tests are 381calculated, the alpha level is reduced in proportion to the number of additional tests required to locate statistically significant differences. For example, for several of the aforementioned post hoc tests, if many groups’ mean values are being compared, the magnitude of the difference is set higher than if only two groups are being compared. Thus, post hoc tests are tedious to perform by hand and are best handled with statistical computer software programs. Accordingly, the rest of this example will be presented with the assistance of SPSS.

### SPSS Computations

The following screenshot is a replica of what your SPSS window will look like. The data for ID numbers 24 through 27 are viewable by scrolling down in the SPSS screen.

Step 1: From the “Analyze” menu, choose “Compare Means” and “One-Way ANOVA.” Move the dependent variable, Number of Months to Complete Program, over to the right, as in the window below.

Step 2: Move the independent variable, Highest Degree at Enrollment, to the right in the space labeled “Factor.”

Step 3: Click “Options.” Check the boxes next to “Descriptive” and “Homogeneity of variance test.” Click “Continue” and “OK.”

### Interpretation of SPSS Output

The following tables are generated from SPSS. The first table contains descriptive statistics for months to completion, separated by the three groups. The second table contains the Levene’s test of homogeneity of variances. The third table contains the ANOVA summary table, along with the F and p values.

The first table displays descriptive statistics that allow us to observe the means for the three groups. This table is important because it indicates that the students with an Associate’s degree took an average of 19.78 months to complete the program, compared to 13.89 months for students with a Bachelor’s and 17.44 months for students with a Master’s degree.

### One Way

The second table contains the Levene’s test for equality of variances. The Levene’s test is a statistical test of the equal variances assumption. The p value is 0.488, indicating there was no significant difference among the three groups’ variances; thus, the data have met the equal variances assumption for ANOVA.

The last table contains the contents of the ANOVA summary table, which looks much like Table 33-2. This table contains an additional value that we did not compute by hand—the exact p value, which is 0.002. Because the SPSS output indicates that we have a significant ANOVA, post hoc testing must be performed.

Return to the ANOVA window and click “Post Hoc.” You will see a window similar to the one below. Select the “LSD” and “Tukey” options. Click “Continue” and “OK.”

The following output is added to the original output. This table contains post hoc test results for two different tests: the LSD (Least Significant Difference) test and the Tukey HSD (Honestly Significant Difference) test. The LSD test, the original post hoc test, explores all possible pairwise comparisons of means using the equivalent of multiple t-tests. However, the LSD test, in performing a set of multiple t-tests, reports inaccurate p values that have not been adjusted for multiple computations (Zar, 2010). Consequently, researchers should exercise caution when choosing the LSD post hoc test following an ANOVA.

The Tukey HSD comparison test, on the other hand, is a more “conservative” test, meaning that it requires a larger difference between two groups to indicate a significant difference than some of the other post hoc tests available. By requiring a larger difference between the groups, the Tukey HSD procedure yields more accurate p values of 0.062 to reflect the multiple comparisons (Zar, 2010).

### Post Hoc Tests

Observe the “Mean Difference” column. Any difference noted with an asterisk (*) is significant at p < 0.05. The p values of each comparison are listed in the “Sig.” column, and values below 0.05 indicate a significant difference between the pair of groups. Observe the p values for the comparison of the Bachelor’s degree group versus the Master’s degree group. The Tukey HSD test indicates no significant difference between the groups, with a p of 0.062; however, the LSD test indicates that the groups significantly differed, with a p of 0.025. This example enables you see the difference in results obtained when calculating a conservative versus a lenient post hoc test. However, it should be noted that because an a priori power analysis was not conducted, there is a possibility that these analyses are underpowered. See Exercises 24 and 25 for more information regarding the consequences of low statistical power.

### Final Interpretation in American Psychological Association (Apa) Format

The following interpretation is written as it might appear in a research article, formatted according to APA guidelines (APA, 2010). A one-way ANOVA performed on months to program completion revealed significant differences among the three groups, F(2,24) = 7.96, p = 0.002. Post hoc comparisons using the Tukey HSD comparison test indicated that the students in the Associate’s degree group took significantly longer to complete the program than the students in the Bachelor’s degree group (19.8 versus 13.9 months, respectively) (APA, 2010). However, there were no significant differences in program completion time between the Associate’s degree group and the Master’s degree group or between the Bachelor’s degree group and the Master’s degree group.

### Study Questions

1. Is the dependent variable in the Mancini et al. (2014) example normally distributed? Provide a rationale for your answer.

2. What are the two instances that must occur to warrant post hoc testing following an ANOVA?

3. Do the data in this example meet criteria for homogeneity of variance? Provide a rationale for your answer.

4. What is the null hypothesis in the example?

5. What was the exact likelihood of obtaining an F value at least as extreme as or as close to the one that was actually observed, assuming that the null hypothesis is true?

6. Do the data meet criteria for “mutual exclusivity”? Provide a rationale for your answer.

7. What does the numerator of the F ratio represent?

8. What does the denominator of the F ratio represent?

9. How would our final interpretation of the results have changed if we had chosen to report the LSD post hoc test instead of the Tukey HSD test?

10. Was the sample size adequate to detect differences among the three groups in this example? Provide a rationale for your answer.