# Increasing testing severity in multi-group designs

# Introduction

A little over a year ago Patrick Forscher wrote a very nice blog post about a simple way to increase the severity of a hypothesis test in a multi-group experimental design (in this case one factor, 3-group design). He highlights the main problem as the following:

Multi-group designs are the workhorse of scientific psychology. Multi-group designs apply to any grouping of people, within-person states, situations, or stimuli; interest typically centers around the either the means or conditional means within each group. However, theory-testing using these means typically proceeds in just the way that Meehl criticized: scientists attempt to reject a null hypothesis of no group mean differences (which, due to Meehl’s “crud”, may be trivially false a priori), then follow up the rejected null hypothesis with tests intended to diagnose the pattern that produced the rejected null.

I agree that this is systemic problem. People sometimes default to just plugging numbers into a one-way ANOVA and usually conclude that their hypothesis is confirmed simply by a significant ANOVA-level effect. Patrick’s proposed solution to the problem is *a priori* orthogonal contrasts. As a quick aside, I am always shocked at how few people know about orthogonal contrasts. This was drilled into my head by my statistics professors at Arkansas (Thank you Ronna Turner and Sean Mulvenon!). For more details on orthogonal contrasts (and experimental design altogether), I highly recommend purchasing a copy of “Designing Experiments and Analyzing Data” by Maxwell, Delaney, and Kelley (2018). It covers this and numerous other topics; you won’t find a better value in a statistics textbook.

# R Code

Now I will go through the code in R for how to implement Patrick’s approach using the `afex`

and `emmeans`

package. For more details on *why* you should use this approach please read Patrick’s blogpost!

## Orthogonal Contrasts

There are many helper functions in R that make it easy to use orthogonal contrasts without having to write out the contrasts by hand. I will show these later but I quickly want to show a technique for checking your set of contrasts to ensure they are in fact orthogonal. Essentially, I have a function that takes the contrasts as matrix and calculates the cross-products of the matrix. If any off-diagonal element is not equal to zero then the contrasts are not orthogonal. Please note, that I made this function in ~2 minutes so I haven’t quite checked to ensure it works in all situations.

```
# build function
check_orthog = function(contrast_mat) {
check_off = crossprod(contrast_mat)
upper_check = sum(check_off*upper.tri(check_off))
lower_check = sum(check_off*lower.tri(check_off))
if (upper_check == 0 && lower_check == 0) {
return(TRUE)
} else{
return(FALSE)
}
}
# Test with Patrick's code -- should print TRUE
c1 <- c(-1/2,0,1/2)
c2 <- c(1/3,-2/3,1/3)
cons = cbind(c1,c2)
check_orthog(cons)
```

`## [1] TRUE`

```
# Create non-orthogonal contrasts -- should print FALSE
cons2 = cbind(c(-1,1,0),
c(0,-1,1),
c(-1,0,1))
check_orthog(cons2)
```

`## [1] FALSE`

```
# Another way of coding orthogonal polynomial contrasts.
c1 <- c(-1, 0, 1)
c2 <- c(0.5, -1, 0.5)
cons3 <- cbind(c1,c2)
# Again should print TRUE
check_orthog(cons3)
```

`## [1] TRUE`

I will use this at points in the blogpost just to check my own contrast coding (please email me if you notice a mistake!).

## Generate Data

First, I will generate some data. Please note for this portion of the post I will be directly replicating/copying Patrick’s data and code.

```
library(tidyverse)
library(afex)
library(emmeans)
library(broom)
set.seed(432432) # For reproducibility
# For all examples, I'm using the below sampling error sd and n per cell
err <- 1
n_per_cell <- 200
# I'm also assuming the following:
# (1) our smallest effect of substantive interest is .4
# (2) all contrasts are Helment (successively compare one group to the average of the others)
# (3) all contrasts are unit-weighted
# This method should generalize to other orthogonal contrasts
# The parameter estimates for unit-weighted Helmert contrasts, however, have a
# nice interpretation as a series of mean differences and are therefore easy to understand
#### THREE GROUP ####
# Ceofficients for this case. Modify as desired
# First is the intercept, second is the focal contrast, remainder are the residual contrasts
coefs <- c(0, .7, .2)
# Create the data
# The last line creates the outcome using the coefficients above and the desired sampling error, err
# %*% is matrix multiplication
dat_three <- data.frame(group = rep(paste("group", 1:3), n_per_cell))
dat_three <- mutate(
dat_three,
c1 = case_when(group == "group 1" ~ 2 / 3, TRUE ~ -1 /
3),
c2 = case_when(group == "group 1" ~ 0, group == "group 2" ~ 1 /
2, TRUE ~ -1 / 2),
outcome = as.vector(cbind(1, c1, c2) %*% coefs + rnorm(nrow(dat_three), sd =
err))
) %>%
mutate(group = factor(group,
levels = c("group 1",
"group 2",
"group 3"),
ordered = TRUE)) # creates and ordered & labeled factor for group
knitr::kable(head(dat_three),
caption = "3-group data")
```

group | c1 | c2 | outcome |
---|---|---|---|

group 1 | 0.6666667 | 0.0 | 0.1940993 |

group 2 | -0.3333333 | 0.5 | -0.5684096 |

group 3 | -0.3333333 | -0.5 | -0.3451584 |

group 1 | 0.6666667 | 0.0 | 1.4218414 |

group 2 | -0.3333333 | 0.5 | 0.2550905 |

group 3 | -0.3333333 | -0.5 | 0.3534099 |

Now, we have some data to work with. However, rather than simply using the `lm`

function to make the contrast comparisons I will use `afex`

to build the model and then `emmeans`

to apply the specific contrasts.

## Three Group Example

In Patrick’s post he used the `lm`

function to test the contrasts. I think this fine but may be difficult to understand for many beginners. Plus any addition comparisons will have to be made by creating a new model. This is why I prefer using `afex`

: it has an easier to use interface and the model can be passed onto `emmeans`

for specific additional comparisons. If Patrick’s approach works for you that is great, but I want to make sure people know of alternative approaches.

```
# Patrick's Example
m_three <- lm(outcome ~ c1 + c2, data=dat_three)
knitr::kable(tidy(m_three),
caption = "3-group Contrasts using lm")
```

term | estimate | std.error | statistic | p.value |
---|---|---|---|---|

(Intercept) | 0.0437823 | 0.0417829 | 1.047853 | 0.2951303 |

c1 | 0.6570966 | 0.0886348 | 7.413527 | 0.0000000 |

c2 | 0.1873883 | 0.1023467 | 1.830918 | 0.0676111 |

Now, let us go through the same process with `afex`

and `emmeans`

. We will need to create an `id`

column so that `afex`

knows that these are all between-subject comparisons. I also like to have the partial eta squared (\(\eta^2\)) for the default effect size output so I am also going to set this using the `afex_options`

function.

```
# Add subject id
dat_three = rowid_to_column(dat_three, var = "id")
afex_options(es_aov = "pes")
```

```
# Build model using afex; note we must have an error term "(1|id)"
afex_three = afex::aov_4(outcome ~ group + (1|id),
data = dat_three)
```

`## Contrasts set to contr.sum for the following variables: group`

```
# Output tyical ANOVA table with type-3 SS
knitr::kable(nice(afex_three))
```

Effect | df | MSE | F | pes | p.value |
---|---|---|---|---|---|

group | 2, 597 | 1.05 | 29.16 *** | .089 | <.001 |

Most people reading this should be familiar with the table above. It is a simple one-way ANOVA output.

Now I can make the same comparisons Patrick made with `lm`

using `emmeans`

. However, I can also make my tests directionl using the `side`

argument in the `contrast`

function.

```
# Now get emmeans
emm_three = emmeans(afex_three, ~group)
# Estimated marginal means
knitr::kable(tidy(emm_three) %>%
select(-df,-statistic,-p.value))
```

group | estimate | std.error |
---|---|---|

group 1 | 0.4818467 | 0.07237 |

group 2 | -0.0815557 | 0.07237 |

group 3 | -0.2689441 | 0.07237 |

```
# Create orthogonal contrasts
con_three = list(c1 = c(1,-.5,-.5),
c2 = c(0,1,-1))
check_orthog(cbind(con_three$c1,
con_three$c2))
```

`## [1] TRUE`

```
# Make comparisons with contrast function
knitr::kable(contrast(emm_three,con_three),
caption = "3-group emmeans contrasts")
```

contrast | estimate | SE | df | t.ratio | p.value |
---|---|---|---|---|---|

c1 | 0.6570966 | 0.0886348 | 597 | 7.413527 | 0.0000000 |

c2 | 0.1873883 | 0.1023467 | 597 | 1.830918 | 0.0676111 |

```
knitr::kable(contrast(emm_three,con_three,
side = ">"),
caption = "3-group emmeans w/ directional contrasts")
```

contrast | estimate | SE | df | t.ratio | p.value |
---|---|---|---|---|---|

c1 | 0.6570966 | 0.0886348 | 597 | 7.413527 | 0.0000000 |

c2 | 0.1873883 | 0.1023467 | 597 | 1.830918 | 0.0338055 |

Patrick also mentions that we would want to perform equivalence tests to rule out that the differences are within our equivalence bounds (answers the question “Are these contrast differences, if they exist, smaller than what we consider meaningful?”). Remember, `c1`

is “group 1 - group 2 & group 3” and c2 is “group 2 - group 3”.

```
# Create contrast
con_three_eq = contrast(emm_three,con_three)
# Perform equivalence test (takes absolute difference)
test_three_eq = test(con_three_eq,
delta = .4, # eq bound
side = "equivalence")
knitr::kable(test_three_eq,
caption = "Equivalence Tests for 3-group example")
```

contrast | estimate | SE | df | t.ratio | p.value |
---|---|---|---|---|---|

c1 | 0.6570966 | 0.0886348 | 597 | 2.900628 | 0.9980694 |

c2 | 0.1873883 | 0.1023467 | 597 | -2.077368 | 0.0190973 |

## Four Group Example

Again, we will need to generate the same data that Patrick did in his post.

```
#### FOUR GROUP ####
# Ceofficients for this case. Modify as desired
# First is the intercept, second is the focal contrast, remainder are the residual contrasts
coefs <- c(0, .7, .2, .1)
# Create the data
# The last line creates the outcome using the coefficients above and the desired sampling error, err
# %*% is matrix multiplication
dat_four <- data.frame(group = rep(paste("group", 1:4),
n_per_cell))
dat_four <- mutate(
dat_four,
c1 = case_when(group == "group 1" ~ -3 / 4, TRUE ~ 1 /
4),
c2 = case_when(group == "group 1" ~ 0,
group == "group 2" ~ 2 /
3, TRUE ~ -1 / 3),
c3 = case_when(
group %in% c("group 1", "group 2") ~ 0,
group == "group 3" ~ 1 / 2,
TRUE ~ -1 / 2
),
outcome = as.vector(cbind(1,
c1,
c2,
c3) %*% coefs + rnorm(nrow(dat_four),
sd = err))
) %>%
mutate(group = factor(
group,
levels = c("group 1",
"group 2",
"group 3",
"group 4"),
ordered = TRUE
)) %>% # creates and ordered & labeled factor for group
rowid_to_column(var = "id")
knitr::kable(head(dat_four),
caption = "4-Groups Data")
```

id | group | c1 | c2 | c3 | outcome |
---|---|---|---|---|---|

1 | group 1 | -0.75 | 0.0000000 | 0.0 | -0.6004404 |

2 | group 2 | 0.25 | 0.6666667 | 0.0 | 1.4770844 |

3 | group 3 | 0.25 | -0.3333333 | 0.5 | 2.2156623 |

4 | group 4 | 0.25 | -0.3333333 | -0.5 | -0.1881092 |

5 | group 1 | -0.75 | 0.0000000 | 0.0 | 0.4617127 |

6 | group 2 | 0.25 | 0.6666667 | 0.0 | 0.0611009 |

Now, we replicate the process to build our base model.

```
# Build model using afex; note we must have an error term "(1|id)"
afex_four = afex::aov_4(outcome ~ group + (1|id),
data = dat_four)
```

`## Contrasts set to contr.sum for the following variables: group`

```
# Output tyical ANOVA table with type-3 SS
knitr::kable(nice(afex_four),
caption = "ANOVA: Four Groups")
```

Effect | df | MSE | F | pes | p.value |
---|---|---|---|---|---|

group | 3, 796 | 1.02 | 25.52 *** | .088 | <.001 |

And, we can then pass on this model to the `emmeans`

function to make our specific contrasts.

```
# Now get emmeans
emm_four = emmeans(afex_four, ~group,
adjust = "none")
# Estimated marginal means
knitr::kable(tidy(emm_four) %>%
select(-df,-statistic,-p.value))
```

group | estimate | std.error |
---|---|---|

group 1 | -0.5275826 | 0.0714616 |

group 2 | 0.3089254 | 0.0714616 |

group 3 | 0.1323786 | 0.0714616 |

group 4 | 0.0253503 | 0.0714616 |

```
# Create orthogonal contrasts
con_four = list(c1 = c(1,-1/3,-1/3,-1/3),
c2 = c(0,1,-0.5,-0.5),
c3 = c(0,0,.5,-.5))
check_orthog(cbind(con_four$c1,
con_four$c2,
con_four$c3))
```

`## [1] TRUE`

```
# Perform joint test
knitr::kable(test(contrast(emm_four,
con_four[2:3]),
joint = TRUE),
caption = "Joint Test of Residual Contrasts")
```

df1 | df2 | F.ratio | p.value |
---|---|---|---|

2 | 796 | 4.016 | 0.0183995 |

We start by performing “joint test” of the residual hypotheses (Note: `c1`

is the focal hypothesis so we only include `c2`

and `c3`

). Now, we observed that our residual contrasts actually account for a significant portion of the variance.

So, we can perform equivalence testing for these contrasts. We find that `c2`

and `c3`

are within our equivalence bounds.

```
# Make comparisons with contrast function
knitr::kable(test(contrast(emm_four,
con_four),
delta = .4,
side = "equivalence"),
caption = "4-group equivalence contrasts")
```

contrast | estimate | SE | df | t.ratio | p.value |
---|---|---|---|---|---|

c1 | -0.6831340 | 0.0825167 | 796 | 3.431233 | 0.9996841 |

c2 | 0.2300610 | 0.0875222 | 796 | -1.941668 | 0.0262649 |

c3 | 0.0535141 | 0.0505310 | 796 | -6.856903 | 0.0000000 |

Then, we can finally test focal hypothesis `c1`

.

```
knitr::kable(tidy(contrast(emm_four,con_four[1])),
caption = "4-group: c1 contrast")
```

term | contrast | null.value | estimate | std.error | df | statistic | p.value |
---|---|---|---|---|---|---|---|

group | c1 | 0 | -0.683134 | 0.0825167 | 796 | -8.278736 | 0 |

From these results Patrick concludes the following (and I agree).

In this case, although the deviation from the focal contrast is significantly different from zero, we can say that the amount of deviation is smaller than our threshold for what constitutes a meaningful effect. If we think that any deviation from the focal contrast is reason for concern, we might think about amending our theory. If, on the other hand, we believe that the null hypothesis of no relationship is basically always false and that many of these non-zero relationships are “crud”, we might advance the claim that our theory has been corroborated. In either case our testing procedure is more severe than a mere test of the null hypothesis of no differences between group means.

Another way of saying this is the following: despite the data being incompatibility with the focal contrast (evidenced by the joint test), the differences observed in these other contrasts is small or within the established equivalence bounds. Meanwhile, we can reject the null hypothesis that our focal contrast is zero.

# Extension to Multilevel/Hierarchical Models

In many cases, the designs that Patrick laid out are as simple or clean as a 3-group one-way ANOVA. Very often we have multiple levels of variance we would like to take into account. The most common example, and one I will repeat here, is in education wherein we have students inside classes within schools. In addition, we often have covariates that (like gender, socioecononimc status, or age) and we need to include those in our models.

So, I will apply Patrick’s approach to a study where we have test scores on students within classes within schools. We also have informative covariates such as age and gender on the test scores. Let us say we are testing the hypothesis that our intended treatments (groups 2 and 3) will have a positive effect, *and* there will be an additional benefit for treatment group 3. Therefore, we will have similar contrasts to the original 3-group example except we will reverse the coding *and* only have one-sided tests because I specified directional hypotheses.

```
library(simstudy)
# taken from https://kgoldfeld.github.io/simstudy/articles/clustered.html
gen.school <- defData(
varname = "s0",
dist = "normal",
formula = 0,
variance = 3,
id = "idSchool"
)
gen.school <- defData(gen.school,
varname = "nClasses",
dist = "noZeroPoisson",
formula = 3)
set.seed(282721)
dtSchool <- genData(8, gen.school)
dtSchool <- trtAssign(dtSchool, nTrt = 3)
gen.class <-
defDataAdd(
varname = "c0",
dist = "normal",
formula = 0,
variance = 2
)
gen.class <-
defDataAdd(gen.class,
varname = "nStudents",
dist = "noZeroPoisson",
formula = 20)
dtClass <-
genCluster(dtSchool,
"idSchool",
numIndsVar = "nClasses",
level1ID = "idClass")
dtClass <- addColumns(gen.class, dtClass) %>%
mutate(t2 = ifelse(trtGrp == 2, 1, 0),
t3 = ifelse(trtGrp == 3, 1, 0))
gen.student <- defDataAdd(varname = "Male",
dist = "binary",
formula = 0.5)
gen.student <-
defDataAdd(gen.student,
varname = "age",
dist = "uniform",
formula = "9.5; 10.5")
gen.student <-
defDataAdd(
gen.student,
varname = "test",
dist = "normal",
formula = "50 - 2*Male + s0 + c0 + 4 * t2 + 12 * t3",
variance = 2
)
dtStudent <-
genCluster(
dtClass,
cLevelVar = "idClass",
numIndsVar = "nStudents",
level1ID = "idChild"
)
con_three = list(c1 = c(-1,.5,.5),
c2 = c(0,1,-1))
check_orthog(cbind(con_three$c1,
con_three$c2))
```

`## [1] TRUE`

```
dat_mlm <- addColumns(gen.student, dtStudent) %>%
select(-s0,-c0,-t2,-t3) %>%
mutate(trtGrp = factor(trtGrp,
levels = c(1,2,3),
ordered = TRUE))
```

Now we can utilize the `lme4`

package to build the mlm and use `emmeans`

plot function to take a glimpse at the difference between groups.

```
test_mlm = lme4::lmer(test ~ Male + trtGrp + (1|idClass:idSchool) + (1|idSchool),
data = dat_mlm)
knitr::kable(broom.mixed::tidy(test_mlm),
caption = "Summary Table of MLM")
```

```
## Registered S3 method overwritten by 'broom.mixed':
## method from
## tidy.gamlss broom
```

effect | group | term | estimate | std.error | statistic |
---|---|---|---|---|---|

fixed | NA | (Intercept) | 55.9993351 | 0.5561662 | 100.688138 |

fixed | NA | Male | -1.7990154 | 0.1491026 | -12.065621 |

fixed | NA | trtGrp.L | 7.6402443 | 0.7468733 | 10.229640 |

fixed | NA | trtGrp.Q | 2.0859008 | 1.1211914 | 1.860432 |

ran_pars | idClass:idSchool | sd__(Intercept) | 1.4617520 | NA | NA |

ran_pars | idSchool | sd__(Intercept) | 0.9791225 | NA | NA |

ran_pars | Residual | sd__Observation | 1.5290580 | NA | NA |

```
# Plot estimated differences between treatments
emm_mlm = emmeans(test_mlm, ~ trtGrp)
plot(emm_mlm)
```

Finally, we can apply `emmeans`

to look at the our specific contrasts. For these multi-level models `emmeans`

defaults to Kenword-Roger degrees of freedom.

```
con_mlm = list(c1 = c(-1,.5,.5),
c2 = c(0,-1,1))
check_orthog(cbind(con_mlm$c1,
con_mlm$c2))
```

`## [1] TRUE`

```
knitr::kable(test(contrast(emm_mlm, con_mlm),
side = ">"))
```

contrast | estimate | SE | df | t.ratio | p.value |
---|---|---|---|---|---|

c1 | 6.826355 | 1.030733 | 5.934995 | 6.622817 | 0.0002986 |

c2 | 7.957165 | 1.489940 | 10.030437 | 5.340594 | 0.0001623 |

We see that our data is incompatible with the null hypotheses. We may also want to include “conditional equivalence tests” to ensure that our effects are not within a range we deem practically equivalent. Therefore, we can use almost the same code as the chunk above but add a `delta`

argument as well as change the `side`

argument to “equivalent”.

```
knitr::kable(test(contrast(emm_mlm, con_mlm),
delta = 2,
side = "equivalent"))
```

contrast | estimate | SE | df | t.ratio | p.value |
---|---|---|---|---|---|

c1 | 6.826355 | 1.030733 | 5.934995 | 4.682450 | 0.9982574 |

c2 | 7.957165 | 1.489940 | 10.030437 | 3.998258 | 0.9987449 |

# Conclusion

As you can see the process of creating specific contrasts is fairly straightforward in R and the hypothesis testing procedures are simplified by using the `emmeans`

package. I find contrast coding to be a refreshing alternative to the typical inspection of “ANOVA-level” effects that is often followed up pairwise comparisions between the levels of a factor where there is a significant effect. Instead, contrast coding demands that the user be specific in their hypotheses. As Patrick notes in his blog post, specifying these contrasts *a priori* in many cases may result in a more severe tests of your hypotheses which arguably increases the strength of your claims if your experiments support your hypotheses. In my opinion there is added advantage in how you can go about describign your results (no need for the mundane langauge about “significant main effects”). Also, orthogonal contrasts, by nature, do not require adjustments for multiplicity and therefore may be more statistical powerful than default ANOVA tests. I’ll admit that orthogonal contrasts are not a silver bullet (there is no free lunch in statistics), but I do believe there many experiments that would benefit from this type of analysis.

## Miscellaneous Notes

There are other ways to code contrasts. This post by Rose Maier notes a way of performing contrasts with the `summary`

function.
https://rstudio-pubs-static.s3.amazonaws.com/65059_586f394d8eb84f84b1baaf56ffb6b47f.html