Skip to contents

The calculation of standardized mean differences (SMDs) can be helpful in interpreting data and are essential for meta-analysis. In psychology, effect sizes are very often reported as an SMD rather than raw units (though either is fine: see Caldwell and Vigotsky (2020)). In most papers the SMD is reported as Cohen’s d1. The simplest form involves reporting the mean difference (or mean in the case of a one-sample test) divided by the standard deviation.

\[ Cohen's \space d = \frac{Mean}{SD} = SMD \]

However, two major problems arise: bias and the calculation of the denominator. First, the Cohen’s d calculation is biased (meaning the effect is inflated), and a bias correction (often referred to as Hedges’ g) is applied to provide an unbiased estimate. Second, the denominator can influence the estimate of the SMD, and there are a multitude of choices for how to calculate the denominator. To make matters worse, the calculation (in most cases an approximation) of the confidence intervals involves the noncentral t distribution. This requires calculating a non-centrality parameter (lambda: \(\lambda\)), degrees of freedom (\(df\)), or even the standard error (sigma: \(\sigma\)) for the SMD. None of these are easy to determine and these calculations are hotly debated in the statistics literature (Cousineau and Goulet-Pelletier 2021).

In this package we originally opted to make the default SMD confidence intervals as the formulation outlined by Goulet-Pelletier and Cousineau (2018). We found that that these calculations were simple to implement and provided fairly accurate coverage for the confidence intervals for any type of SMD (independent, paired, or one sample). However, even the authors have outlined some issues with the method in a newer publication (Cousineau and Goulet-Pelletier 2021). Other packages, such as MOTE (Buchanan et al. 2019) or effectsize (Ben-Shachar, Lüdecke, and Makowski 2020), use a simpler formulation of the noncentral t-distribution (nct). The default option in the package is the nct type of confidence intervals. We have created an argument for all TOST functions (tsum_TOST and t_TOST) named smd_ci which allow the user to specify “goulet” (for the Cousineau and Goulet-Pelletier (2021) method), “nct” (this will approximately match the results of Buchanan et al. (2019) and Ben-Shachar, Lüdecke, and Makowski (2020)), “t” (central t method), or “z” (normal method). We would recommend using “nct” as the default for any analysis but as Viechtbauer (2007) points out the approximate methods usually suffice. It is important to remember that all of these methods are only approximations of confidence intervals (of varying degrees of quality) and therefore should be interpreted with caution.

It is my belief that SMDs provide another interesting quantitative effect size description, but have dubious and limited inferential utility2. You may disagree, and if you are basing your inferences on the SMD, and the associated confidence intervals, we recommend you go with a bootstrapping approach (see boot_smd_calc Kirby and Gerlanc 2013) as it is a more robust option.

In this section we will detail on the calculations that are involved in calculating the SMD, their associated degrees of freedom, non-centrality parameter, and variance. If these SMDs are being reported in a scientific manuscript, we strongly recommend that the formulas for the SMDs you report be included in the methods section.

Bias Correction (Hedges)

For all SMD calculative approaches the bias correction was calculated as the following:

\[ J = \frac{\Gamma(\frac{df}{2})}{\sqrt{\frac{df}{2}} \cdot \Gamma(\frac{df-1}{2})} \]

The correction factor3 is calculated in R as the following:

J <- exp ( lgamma(df/2) - log(sqrt(df/2)) - lgamma((df-1)/2) )

Hedges g (bias corrected SMD) can then be calculated by multiplying the SMD by J

\[ g = SMD \cdot J \] When the bias correction is not applied, J is equal to 1.

Independent Samples

For independent samples there are three calculative approaches supported by TOSTER. One the denominator is the pooled standard deviation (Cohen’s d), the average standard deviation (Cohen’s d(av)), and the standard deviation of the control group (Glass’s \(\Delta\)). Currently, the d or d(av) is selected by whether or not variances are assumed to be equal. If the variances are not assumed to be equal then Cohen’s d(av) will be returned, and if variances are assumed to be equal then Cohen’s d is returned. Glass’s delta can be selected by setting the glass argument to “glass1” or “glass2”.

Variances Assumed Unequal: Cohen’s d(av)

For this calculation, the denominator is simply the square root of the average variance.

\[ s_{av} = \sqrt \frac {s_{1}^2 + s_{2}^2}{2} \]

The SMD, Cohen’s d(av), is then calculated as the following:

\[ d_{av} = \frac {\bar{x}_1 - \bar{x}_2} {s_{av}} \]

Note: the x with the bar above it (pronounced as “x-bar”) refers the the means of group 1 and 2 respectively.

The degrees of freedom for Cohen’s d(av), derived from Delacre et al. (2021), is the following:

\[ df = \frac{(n_1-1)(n_2-1)(s_1^2+s_2^2)^2}{(n_2-1) \cdot s_1^4+(n_1-1) \cdot s_2^4} \]

The non-centrality parameter (\(\lambda\)) is calculated as the following:

  1. Under the Cousineau and Goulet-Pelletier (2021) method (smd_ci = "goulet"):

\[ \lambda = d_{av} \times \sqrt{\frac{n_1 \cdot n_2(\sigma^2_1+\sigma^2_2)}{2 \cdot (n_2 \cdot \sigma^2_1+n_1 \cdot \sigma^2_2)}} \]

  1. Under all other methods (nct, t, or z):

\[ \lambda = \frac{2 \cdot (n_2 \cdot \sigma_1^2 + n_1 \cdot \sigma_2^2)} {n_1 \cdot n_2 \cdot (\sigma_1^2 + \sigma_2^2)} \] The standard error (\(\sigma\)) of Cohen’s d(av) is calculated as the following from Bonett (2009)4:

\[ \sigma_{SMD} = \sqrt{d^2 \cdot (\frac{s_1^4}{n_1-1}+\frac{s_2^4}{n_2-1}) \div 8 \cdot s_{av}^4 + (\frac{s_1^2}{n_1-1}+\frac{s_2^2}{n_2-1}) \div s_{av}^2} \]

Variances Assumed Equal: Cohen’s d

For this calculation, the denominator is simply the pooled standard deviation.

\[ s_{p} = \sqrt \frac {(n_{1} - 1)s_{1}^2 + (n_{2} - 1)s_{2}^2}{n_{1} + n_{2} - 2} \]

\[ d = \frac {\bar{x}_1 - \bar{x}_2} {s_{p}} \]

The degrees of freedom for Cohen’s d is the following:

\[ df = n_1 + n_2 - 2 \]

The non-centrality parameter (\(\lambda\)) is calculated as the following:

  1. Under the Cousineau and Goulet-Pelletier (2021) method (smd_ci = "goulet"):

\[ \lambda = d \cdot \sqrt \frac{\tilde n}{2} \]

wherein, \(\tilde n\) is the harmonic mean of the 2 sample sizes which is calculated as the following:

\[ \tilde n = \frac{2 \cdot n_1 \cdot n_2}{n_1 + n_2} \]

  1. Under all other methods (nct, t, or z):

\[ \lambda = \frac{1}{n_1} +\frac{1}{n_2} \]

The standard error (\(\sigma\)) of Cohen’s d is calculated as the following:

  1. Under the Cousineau and Goulet-Pelletier (2021) method (smd_ci = "goulet"):

\[ \sigma_{SMD} = \sqrt{\frac{df}{df-2} \cdot \frac{2}{\tilde n} (1+d^2 \cdot \frac{\tilde n}{2}) -\frac{d^2}{J}} \]

  1. Under all other methods (nct, t, or z) it is calculated using the “unbiased” estimate from Viechtbauer (2007) (table 1; equation 26)5:

\[ \sigma_{SMD} = \sqrt{\frac{1}{n_1} + \frac{1}{n_2} + (1 - \frac{df-2}{df \cdot J^2}) \cdot d_s^2} \]

Glass’s Delta

For this calculation, the denominator is simply the standard deviation of one of the groups (x for glass = "glass1", or y for glass = "glass2".

\[ s_{c} = SD_{control \space group} \]

\[ d = \frac {\bar{x}_1 - \bar{x}_2} {s_{c}} \]

The degrees of freedom for Glass’s delta is the following:

\[ df = n_c - 1 \]

The non-centrality parameter (\(\lambda\)) is calculated as the following:

  1. Under the Cousineau and Goulet-Pelletier (2021) method (smd_ci = "goulet"):

\[ \lambda = d \cdot \sqrt \frac{\tilde n}{2} \]

wherein, \(\tilde n\) is the harmonic mean of the 2 sample sizes which is calculated as the following:

\[ \tilde n = \frac{2 \cdot n_1 \cdot n_2}{n_1 + n_2} \]

  1. Under all other methods (nct, t, or z):

\[ \lambda = \frac{1}{n_T} + \frac{s_c^2}{n_c \cdot s_c^2} \]

The standard error (\(\sigma\)) of Glass’s delta is calculated as the following:

  1. Under the Cousineau and Goulet-Pelletier (2021) method (smd_ci = "goulet"):

\[ \sigma_{SMD} = \sqrt{\frac{df}{df-2} \cdot \frac{2}{\tilde n} (1+d^2 \cdot \frac{\tilde n}{2}) -\frac{d^2}{J}} \]

  1. Under all other methods6 (nct, t, or z) derived from Bonett (2008):

\[ \sigma_{SMD} = \sqrt{\frac{s_e^2 \div s_c^2}{n_e-1}+ \frac{1}{n_c-1}+ \frac{d^2}{2 \cdot (n_c-1)}} \]

Paired Samples

For paired samples there are two calculative approaches supported by TOSTER. One the denominator is the standard deviation of the change score (Cohen’s d(z)), the correlation corrected effect size (Cohen’s d(rm)), and the standard deviation of the control condition (Glass’s \(\Delta\)). Currently, the choice is made by the function based on whether or not the user sets rm_correction to TRUE. If rm_correction is set to t TRUE then Cohen’s d(rm) will be returned, and otherwise Cohen’s d(z) is returned. This can be overridden and Glass’s delta is returned if the glass argument is set to “glass1” or “glass2”.

Cohen’s d(z): Change Scores

For this calculation, the denominator is the standard deviation of the difference scores which can be calculated from the standard deviations of the samples and the correlation between the paired samples.

\[ s_{diff} = \sqrt{sd_1^2 + sd_2^2 - 2 \cdot r_{12} \cdot sd_1 \cdot sd_2} \]

The SMD, Cohen’s d(z), is then calculated as the following:

\[ d_{z} = \frac {\bar{x}_1 - \bar{x}_2} {s_{diff}} \]

The degrees of freedom for Cohen’s d(z) is the following:

  1. Under the Cousineau and Goulet-Pelletier (2021) method (smd_ci = "goulet"):

\[ df = 2 \cdot (N_{pairs}-1) \]

  1. Under all other methods (nct, t, or z):

\[ df = N - 1 \]

The non-centrality parameter (\(\lambda\)) is calculated as the following:

  1. Under the Cousineau and Goulet-Pelletier (2021) method (smd_ci = "goulet"):

\[ \lambda = d_{z} \cdot \sqrt \frac{N_{pairs}}{2 \cdot (1-r_{12})} \]

  1. Under all other methods (nct, t, or z):

\[ \lambda = \frac{1}{n} \]

The standard error (\(\sigma\)) of Cohen’s d(z) is calculated as the following:

  1. Under the Cousineau and Goulet-Pelletier (2021) method (smd_ci = "goulet"):

\[ \sigma_{SMD} = \sqrt{\frac{df}{df-2} \cdot \frac{2 \cdot (1-r_{12})}{n} \cdot (1+d^2 \cdot \frac{n}{2 \cdot (1-r_{12})}) -\frac{d^2}{J^2}} \space \times \space \sqrt {2 \cdot (1-r_{12})} \]

  1. Under all other methods (nct, t, or z) it calculated using the “unbiased” estimate from Viechtbauer (2007) (table 1; equation 26)7:

\[ \sigma_{SMD} = \sqrt{\frac{1}{N} + (1 - \frac{df-2}{df \cdot J^2}) \cdot d_z^2} \]

Cohen’s d(rm): Correlation Corrected

For this calculation, the same values for the same calculations above is adjusted for the correlation between measures. As Goulet-Pelletier and Cousineau (2018) mention, this is useful for when effect sizes are being compared for studies that involve between and within subjects designs.

First, the standard deviation of the difference scores are calculated

\[ s_{diff} = \sqrt{sd_1^2 + sd_2^2 - 2 \cdot r_{12} \cdot sd_1 \cdot sd_2} \]

The SMD, Cohen’s d(rm), is then calculated with a small change to the denominator8:

\[ d_{rm} = \frac {\bar{x}_1 - \bar{x}_2}{s_{diff}} \cdot \sqrt {2 \cdot (1-r_{12})} \]

The degrees of freedom for Cohen’s d(rm) is the following:

  1. Under the Cousineau and Goulet-Pelletier (2021) method (smd_ci = "goulet"):

\[ df = 2 \cdot (N_{pairs}-1) \]

  1. Under all other methods (nct, t, or z):

\[ df = N - 1 \]

The non-centrality parameter (\(\lambda\)) is calculated as the following:

  1. Under the Cousineau and Goulet-Pelletier (2021) method (smd_ci = "goulet"):

\[ \lambda = d_{rm} \cdot \sqrt \frac{N_{pairs}}{2 \cdot (1-r_{12})} \]

  1. Under all other methods (nct, t, or z):

\[ \lambda = \frac{1}{n} \]

The standard error (\(\sigma\)) of Cohen’s d(rm) is calculated as the following:

\[ \sigma_{SMD} = \sqrt{\frac{df}{df-2} \cdot \frac{2 \cdot (1-r_{12})}{n} \cdot (1+d_{rm}^2 \cdot \frac{n}{2 \cdot (1-r_{12})}) -\frac{d_{rm}^2}{J^2}} \]

Glass’s Delta

For this calculation, the denominator is simply the standard deviation of one of the groups (x for glass = "glass1", or y for glass = "glass2".

\[ s_{c} = SD_{control \space condition} \]

\[ d = \frac {\bar{x}_1 - \bar{x}_2} {s_{c}} \]

The degrees of freedom for Glass’s delta is the following:

  1. Under the Cousineau and Goulet-Pelletier (2021) method (smd_ci = "goulet"):

\[ df = 2 \cdot N - 1 \]

  1. Under all other methods (nct, t, or z):

\[ df = N - 1 \]

The non-centrality parameter (\(\lambda\)) is calculated as the following:

  1. Under the Cousineau and Goulet-Pelletier (2021) method (smd_ci = "goulet"):

\[ \lambda = d \cdot \sqrt{\frac{N}{2 \cdot (1 - r_{12})}} \]

  1. Under all other methods (nct, t, or z):

\[ \lambda = \frac{1}{N} \]

The standard error (\(\sigma\)) of Glass’s delta is calculated as the following derived9 from Bonett (2008):

\[ \sigma_{SMD} = \sqrt{\frac{s_{diff}^2}{s_c^2 \cdot df} + \frac{d^2}{2 \cdot df}} \]

One Sample

For a one-sample situation, the calculations are very straight forward

For this calculation, the denominator is simply the standard deviation of the sample.

\[ s={\sqrt {{\frac {1}{N-1}}\sum _{i=1}^{N}\left(x_{i}-{\bar {x}}\right)^{2}}} \]

The SMD is then the mean of X divided by the standard deviation.

\[ d = \frac {\bar{x}} {s} \]

The degrees of freedom for Cohen’s d is the following:

\[ df = N - 1 \]

The non-centrality parameter (\(\lambda\)) is calculated as the following:

\[ \lambda = d \cdot \sqrt N \]

The standard error (\(\sigma\)) of Cohen’s d is calculated as the following:

  1. Under the Cousineau and Goulet-Pelletier (2021) method (smd_ci = "goulet"):

\[ \sigma_{SMD} = \sqrt{\frac{df}{df-2} \cdot \frac{1}{N} (1+d^2 \cdot N) -\frac{d^2}{J^2}} \]

  1. Under all other methods (nct, t, or z):

\[ \sigma_{SMD} = \sqrt{\frac{1}{n} + \frac{d^2}{(2 \cdot n)}} \]

Confidence Intervals

For the SMDs calculated in this package we use the non-central t method, those outlined by Goulet-Pelletier and Cousineau (2018), and approximate methods using the t and normal distributions. These calculations are only approximations and newer formulations may provide better coverage (e.g., Cousineau and Goulet-Pelletier 2021). In any case, if the calculation of confidence intervals for SMDs is of the utmost importance then I would strongly recommend using bootstrapping techniques rather than any of these approximations whenever possible (Kirby and Gerlanc 2013), and these methods are available in the boot_smd_calc function.

The calculations of the confidence intervals for the goulet and nct methods involve a two step process: 1) using the noncentral t-distribution to calculate the lower and upper bounds of \(\lambda\), and 2) transforming this back to the effect size estimate. These noncentrality methods are not without their detractors (Spanos 2021). The normal and t methods are much simpler and less computationally intensive, but may have varying levels of accuracy (see Viechtbauer 2007 for more details).

Cousineau and Goulet-Pelletier (2021) Method (smd_ci = “goulet”)

Calculate confidence intervals around \(\lambda\).

\[ t_L = t_{(1/2-(1-\alpha)/2,\space df, \space \lambda)} \\ t_U = t_{(1/2+(1-\alpha)/2,\space df, \space \lambda)} \]

Then transform back to the SMD.

\[ d_L = \frac{t_L}{\lambda} \cdot d \\ d_U = \frac{t_U}{\lambda} \cdot d \]

The non-central t-method (smd_ci = “nct”)

This method is commonly reported in the literature and in other R packages (e.g., effectsize). However, it is not without criticism (Spanos 2021; Viechtbauer 2007).

In thise method, you calculate the non-centrality parameters necessary to form confidence intervals wherein the observed t-statistic (\(t_{obs}\))is utilized.

\[ t_L = t_{(1-alpha,\space df, \space t_{obs})} \\ t_U = t_{(alpha,\space df, \space t_{obs})} \] The confidence intervals can then be constructed using the non-centrality parameter and the bias correction.

\[ d_L = t_L \cdot \sqrt{\lambda} \cdot J \\ d_U = t_U \cdot \sqrt{\lambda} \cdot J \]

Central t-method

This is less commonly used though it might perform alright in some scenarios, and as Bonett (2008) and Viechtbauer (2007) report it might perform adequately under with an adequately large sample size.

The limits of the t-distribution at the given alpha-level and degrees of freedom (qt(1-alpha,df)) are multiplied by the standard error of the calculated SMD.

\[ CI = SMD \space \pm \space t_{(1-\alpha,df)} \cdot \sigma_{SMD} \]

Normal method

This method will often be the most liberal estimate for the confidence interval. But as Bonett (2008) and Viechtbauer (2007) report, it might perform adequately under with an adequately large sample size. The use of the normal distribution can be justified by the fact that SMDs are asympototically normal.

The limits of the z-distribution at the given alpha-level (qnorm(1-alpha)) are multiplied by the standard error of the calculated SMD.

\[ CI = SMD \space \pm \space z_{(1-\alpha)} \cdot \sigma_{SMD} \]

Just Estimate the SMD

It was requested that a function be provided that only calculates the SMD. Therefore, I created the smd_calc and boot_smd_calc functions10. The interface is almost the same as t_TOST but you don’t set an equivalence bound.

# For paired tests, use separate vectors
smd_calc(x = sleep$extra[sleep$group == 1],
         y = sleep$extra[sleep$group == 2],
         paired = TRUE,
         smd_ci = "nct",
         bias_correction = F)
#> 
#>  Paired Sample Standardized Mean Difference (SMD; Cohen's
#>  d[z]=(x-y)/SD_z)
#> 
#> data:  sleep$extra[sleep$group == 1] and sleep$extra[sleep$group == 2]
#> 
#> alternative hypothesis: none
#> 95 percent confidence interval:
#>  -2.1180165 -0.4146278
#> sample estimates:
#> SMD (d[z]) 
#>  -1.284558

# Setting bootstrap replications low to
## reduce compiling time of vignette
boot_smd_calc(x = sleep$extra[sleep$group == 1],
              y = sleep$extra[sleep$group == 2],
         R = 199,
         paired = TRUE,
         boot_ci = "stud",
         bias_correction = F)
#> 
#>  Paired Sample bootstrapped Standardized Mean Difference (SMD; Cohen's
#>  d[z]=(x-y)/SD_z)
#> 
#> data:  sleep$extra[sleep$group == 1] and sleep$extra[sleep$group == 2]
#> 
#> alternative hypothesis: none
#> 95 percent confidence interval:
#>  -1.696643  2.584537
#> sample estimates:
#> SMD (d[z]) 
#>  -1.284558

If you have a hypothesis that is specifically about the SMD then you can use boot_smd_calc to directly test it by using the null.value and alternative value arguments. For example, you might have a hypothesis of equivalence regarding the SMD, and therefore you could set null.value = c(-0.5,0.5) if you hypothesize that the SMD is between 0.5 and -0.5 and alternative = "equivalent" to directly test this hypothesis.

boot_smd_calc(x = sleep$extra[sleep$group == 1],
              y = sleep$extra[sleep$group == 2],
         R = 199,
         paired = TRUE,
         boot_ci = "stud",
         bias_correction = TRUE,
         null.value = c(-0.5,0.5), 
         alternative = "equ")
#> 
#>  Paired Sample bootstrapped Standardized Mean Difference (SMD; Hedges's
#>  g[z]=(x-y)/SD_z)
#> 
#> data:  sleep$extra[sleep$group == 1] and sleep$extra[sleep$group == 2]
#> z-observed = -2.5211, p-value = 0.7638
#> alternative hypothesis: equivalence
#> null values:
#> lower bound upper bound 
#>        -0.5         0.5 
#> 90 percent confidence interval:
#>  -1.4018610  0.7932114
#> sample estimates:
#> SMD (g[z]) 
#>  -1.173925

Plotting SMDs

Sometimes you may take a different approach to calculating the SMD, or you may only have the summary statistics from another study. For this reason, I have included a way to plot the SMD based on just three values: the estimate of the SMD, the degrees of freedom, and the non-centrality parameter/standard error. So long as all three are reported, or can be estimated, then a plot of the SMD can be produced. The non-centrality parameter is needed if you want to plot the estimates using the "goulet" method while the standard error is needed if you want to plot the z (normal) or t (t-distribution) methods.

Two types of plots can be produced: consonance (type = "c"), consonance density (type = "cd"), or both (the default option; (type = c("c","cd")))

plot_smd(d = .43,
         df = 58,
         sigma = .33,
         smd_label = "Cohen's d",
         smd_ci = "z"
         )

Comparing SMDs

In some cases, the SMDs between original and replication studies want to be compared. Rather than looking at whether or not a replication attempt is significant, a researcher could compare to see how compatible the SMDs are between the two studies.

For example, say there is original study reports an effect of Cohen’s dz = 0.95 in a paired samples design with 25 subjects. However, a replication doubled the sample size, found a non-significant effect at an SMD of 0.2. Are these two studies compatible? Or, to put it another way, should the replication be considered a failure to replicate?

We can use the compare_smd function to at least measure how often we would expect a discrepancy between the original and replication study if the same underlying effect was being measured (also assuming no publication bias or differences in protocol).

We can see from the results below that, if the null hypothesis were true, we would only expect to see a discrepancy in SMDs between studies, at least this large, ~1% of the time.

compare_smd(smd1 = 0.95,
            n1 = 25,
            smd2 = 0.23,
            n2 = 50,
            paired = TRUE)
#> 
#>  Difference in Cohen's dz (paired)
#> 
#> data:  Summary Statistics
#> z = 2.5685, p-value = 0.01021
#> alternative hypothesis: true difference in SMDs is not equal to 0
#> sample estimates:
#> difference in SMDs 
#>               0.72

Raw Data

The above results are only based on an approximating the differences between the SMDs. If the raw data is available, then the optimal solution is the bootstrap the results. This can be accomplished with the boot_compare_smd function.

For this example, we will simulate some data.

set.seed(4522)
diff_study1 = rnorm(25,.95)
diff_study2 = rnorm(50)
boot_test = boot_compare_smd(x1 = diff_study1,
                             x2 = diff_study2,
                             paired = TRUE)

boot_test
#> 
#>  Bootstrapped Differences in SMDs (paired)
#> 
#> data:  Bootstrapped
#> z (observed) = 2.887, p-value = 0.006003
#> alternative hypothesis: true difference in SMDs is not equal to 0
#> 95 percent confidence interval:
#>  0.3161207 1.4831351
#> sample estimates:
#> difference in SMDs 
#>          0.8058872

# Table of bootstrapped CIs
knitr::kable(boot_test$df_ci, digits = 4)
estimate lower.ci upper.ci
Difference in SMD 0.8059 0.3161 1.4831
SMD1 0.9418 0.5403 1.5318
SMD2 0.1359 -0.1420 0.4187

Visualize

The results of the bootstrapping are stored in the results. So we can even visualize the differences in SMDs.

library(ggplot2)

list_res = boot_test$boot_res
df1 = data.frame(study = c(rep("original",length(list_res$smd1)),
                           rep("replication",length(list_res$smd2))),
                 smd = c(list_res$smd1,list_res$smd2))
ggplot(df1,
            aes(fill = study, color =smd, x = smd))+ 
 geom_histogram(aes(y=..density..), alpha=0.5, 
                position="identity")+
 geom_density(alpha=.2) +
  labs(y = "", x = "SMD (bootstrapped estimates)") +
  theme_classic()
#> `stat_bin()` using `bins = 30`. Pick better value `binwidth`.


df2 = data.frame(diff = list_res$d_diff)
ggplot(df2,
            aes(x = diff))+ 
 geom_histogram(aes(y=..density..), alpha=0.5, 
                position="identity")+
 geom_density(alpha=.2) +
  labs(y = "", x = "Difference in SMDs (bootstrapped estimates)") +
  theme_classic()
#> `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

Robust Standardized Mean Differences with Trimming

Motivation

The standard SMD relies on the ordinary mean and standard deviation, which are sensitive to outliers and heavy-tailed distributions. When distributions are non-normal:

  • Heavy tails inflate the standard deviation, making the SMD appear smaller than the true distributional separation warrants.
  • Outliers can distort the mean difference, pulling the numerator in unpredictable directions.

Algina, Keselman, and Penfield (2005) proposed a robust alternative that uses trimmed means for location and Winsorized variances for scale, with a rescaling constant to maintain comparability with Cohen’s \(\delta\) under normality.

How Trimming Works

Trimmed mean. Given an ordered sample \(x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)}\) and a trimming proportion \(\gamma\) (the tr argument), let \(g = \lfloor \gamma \cdot n \rfloor\) observations be removed from each tail. The trimmed mean is:

\[ \bar{X}_t = \frac{1}{n - 2g} \sum_{i=g+1}^{n-g} x_{(i)} \]

This reduces the influence of extreme values on the location estimate.

Winsorized variance. Rather than discarding the trimmed observations, Winsorization replaces them with the nearest remaining value. Define the Winsorized sample \(w_i\) as:

\[ w_i = \begin{cases} x_{(g+1)} & \text{if } x_i \leq x_{(g+1)} \\ x_i & \text{if } x_{(g+1)} < x_i < x_{(n-g)} \\ x_{(n-g)} & \text{if } x_i \geq x_{(n-g)} \end{cases} \]

The Winsorized variance is then computed as the ordinary sample variance of the Winsorized values:

\[ S_W^2 = \frac{1}{n-1} \sum_{i=1}^{n} (w_i - \bar{w})^2 \]

This provides a robust scale estimate that down-weights the influence of extreme observations.

Rescaling constant. To ensure the robust SMD equals Cohen’s \(\delta\) under normality, a rescaling constant \(c(\gamma)\) is applied (Algina, Keselman, and Penfield 2005). Let \(a = \Phi^{-1}(1 - \gamma)\) where \(\Phi^{-1}\) is the standard normal quantile function and \(\phi\) is the standard normal density. Then:

\[ c(\gamma) = \sqrt{1 - 2\gamma + 2\gamma a^2 - 2a\,\phi(a)} \]

When \(\gamma = 0\), \(c(0) = 1\) and the formula reduces to the ordinary Cohen’s \(d\). For \(\gamma = 0.2\) (20% trimming), \(c(0.2) \approx 0.642\).

Robust SMD. The robust effect size for two independent samples is then:

\[ d_R = c(\gamma) \cdot \frac{\bar{X}_{t2} - \bar{X}_{t1}}{S_W} \]

where \(S_W\) is based on a pooled, averaged, or single-group Winsorized standard deviation (depending on the denom argument), following the same denominator options as the standard SMD.

Effective sample size. Trimming reduces the effective sample size to \(h = n - 2g\), which replaces \(n\) in all degrees-of-freedom calculations. For example, in the two-sample pooled case with equal trimming, \(df = h_1 + h_2 - 2\).

Usage Examples

All trimming functionality is controlled by the tr parameter, which defaults to 0 (no trimming):

set.seed(8484)
group1 <- rnorm(40, mean = 100, sd = 15)
group2 <- rnorm(40, mean = 110, sd = 15)

# Standard Cohen's d
smd_calc(x = group1, y = group2, bias_correction = FALSE)
#> 
#>  Two Sample Standardized Mean Difference (SMD; Cohen's
#>  d[av]=(x-y)/SD_avg)
#> 
#> data:  group1 and group2
#> 
#> alternative hypothesis: none
#> 95 percent confidence interval:
#>  -1.2513096 -0.3380927
#> sample estimates:
#> SMD (d[av]) 
#>  -0.7971844

# Robust version with 20% trimming (Algina et al., 2005)
smd_calc(x = group1, y = group2, bias_correction = FALSE, tr = 0.2)
#> 
#>  Two Sample Standardized Mean Difference (SMD; Cohen's
#>  d[av]=(x-y)/SD_avg, 20% trimmed)
#> 
#> data:  group1 and group2
#> 
#> alternative hypothesis: none
#> 95 percent confidence interval:
#>  -1.2375852 -0.4333599
#> sample estimates:
#> SMD (d[av], 20% trimmed) 
#>                -0.839284

# Robust Hedges' g with 10% trimming
smd_calc(x = group1, y = group2, bias_correction = TRUE, tr = 0.1)
#> 
#>  Two Sample Standardized Mean Difference (SMD; Hedges's
#>  g[av]=(x-y)/SD_avg, 10% trimmed)
#> 
#> data:  group1 and group2
#> 
#> alternative hypothesis: none
#> 95 percent confidence interval:
#>  -1.1585132 -0.3166612
#> sample estimates:
#> SMD (g[av], 10% trimmed) 
#>               -0.7405285

Bootstrap confidence intervals can also be computed with trimming:

boot_smd_calc(x = group1, y = group2, tr = 0.2, R = 999, boot_ci = "perc")
#> 
#>  Two Sample bootstrapped Standardized Mean Difference (SMD; Hedges's
#>  g[av]=(x-y)/SD_avg, 20% trimmed)
#> 
#> data:  group1 and group2
#> 
#> alternative hypothesis: none
#> 95 percent confidence interval:
#>  -1.3211342 -0.3052718
#> sample estimates:
#> SMD (g[av], 20% trimmed) 
#>               -0.8252612

Choosing a Trimming Proportion

  • tr = 0.2 (20% trimming) is the most common recommendation in the robust statistics literature and is the default used by Algina et al.
  • tr = 0.1 offers a compromise between robustness and efficiency.
  • tr = 0 recovers the standard (non-robust) SMD.
  • Higher values provide more protection against outliers but lose more information from the tails.

Worked Example with Contaminated Data

To illustrate the benefit of trimming, consider data from a contaminated normal distribution where 10% of observations come from a distribution with much larger variance:

set.seed(7171)
n <- 50

# Clean data from known populations
x_clean <- rnorm(n, mean = 0, sd = 1)
y_clean <- rnorm(n, mean = 0.5, sd = 1)

# Contaminated version: replace 10% with outliers
n_contam <- ceiling(0.1 * n)
x_contam <- c(x_clean[1:(n - n_contam)], rnorm(n_contam, mean = 0, sd = 10))
y_contam <- c(y_clean[1:(n - n_contam)], rnorm(n_contam, mean = 0.5, sd = 10))

# Standard Cohen's d: attenuated by inflated SD
smd_calc(x = x_contam, y = y_contam, bias_correction = FALSE, smd_ci = "z")
#> 
#>  Two Sample Standardized Mean Difference (SMD; Cohen's
#>  d[av]=(x-y)/SD_avg)
#> 
#> data:  x_contam and y_contam
#> 
#> alternative hypothesis: none
#> 95 percent confidence interval:
#>  -0.6777352  0.1182594
#> sample estimates:
#> SMD (d[av]) 
#>  -0.2797379

# Trimmed version: closer to the true separation
smd_calc(x = x_contam, y = y_contam, bias_correction = FALSE, tr = 0.2, smd_ci = "z")
#> 
#>  Two Sample Standardized Mean Difference (SMD; Cohen's
#>  d[av]=(x-y)/SD_avg, 20% trimmed)
#> 
#> data:  x_contam and y_contam
#> 
#> alternative hypothesis: none
#> 95 percent confidence interval:
#>  -0.4386941  0.2234265
#> sample estimates:
#> SMD (d[av], 20% trimmed) 
#>               -0.1076338

# For reference: Cohen's d on the clean data
smd_calc(x = x_clean, y = y_clean, bias_correction = FALSE, smd_ci = "z")
#> 
#>  Two Sample Standardized Mean Difference (SMD; Cohen's
#>  d[av]=(x-y)/SD_avg)
#> 
#> data:  x_clean and y_clean
#> 
#> alternative hypothesis: none
#> 95 percent confidence interval:
#>  -0.5792533  0.2143381
#> sample estimates:
#> SMD (d[av]) 
#>  -0.1824576

The trimmed SMD is more resistant to the contaminating observations and provides an estimate closer to what we would obtain from the clean data.

Limitations

  • The denom = "rm" option (repeated measures correction) is not currently supported with trimming.
  • The smd_ci = "goulet" CI method is not supported with trimming.
  • The noncentral-t CI with trimming may still have some coverage issues for very heavy-tailed distributions; the percentile or BCa bootstrap (via boot_smd_calc) is recommended in such cases, following Algina et al.’s findings.

References

Algina, James, H. J. Keselman, and Randall D. Penfield. 2005. “An Alternative to Cohen’s Standardized Mean Difference Effect Size: A Robust Parameter and Confidence Interval in the Two Independent Groups Case.” Psychological Methods 10 (3): 317–28. https://doi.org/10.1037/1082-989x.10.3.317.
Baguley, Thom. 2009. “Standardized or Simple Effect Size: What Should Be Reported?” British Journal of Psychology 100 (3): 603–17. https://doi.org/10.1348/000712608x377117.
Ben-Shachar, Mattan S., Daniel Lüdecke, and Dominique Makowski. 2020. effectsize: Estimation of Effect Size Indices and Standardized Parameters.” Journal of Open Source Software 5 (56): 2815. https://doi.org/10.21105/joss.02815.
Bonett, Douglas G. 2008. “Confidence Intervals for Standardized Linear Contrasts of Means.” Psychological Methods 13 (2): 99.
———. 2009. “Meta-Analytic Interval Estimation for Standardized and Unstandardized Mean Differences.” Psychological Methods 14 (3): 225.
Borenstein, Michael, Larry V Hedges, Julian PT Higgins, and Hannah R Rothstein. 2021. Introduction to Meta-Analysis. John Wiley & Sons.
Buchanan, Erin M., Amber Gillenwaters, John E. Scofield, and K. D. Valentine. 2019. MOTE: Measure of the Effect: Package to Assist in Effect Size Calculations and Their Confidence Intervals. https://github.com/doomlab/MOTE.
Caldwell, Aaron, and Andrew D. Vigotsky. 2020. “A Case Against Default Effect Sizes in Sport and Exercise Science.” PeerJ 8 (November): e10314. https://doi.org/10.7717/peerj.10314.
Cousineau, Denis, and Jean-Christophe Goulet-Pelletier. 2021. “A Study of Confidence Intervals for Cohen’s Dp in Within-Subject Designs with New Proposals.” The Quantitative Methods for Psychology 17 (1): 51–75. https://doi.org/10.20982/tqmp.17.1.p051.
Delacre, Marie, Daniel Lakens, Christophe Ley, Limin Liu, and Christophe Leys. 2021. “Why Hedges’ gs Based on the Non-Pooled Standard Deviation Should Be Reported with Welch’s t-Test,” May. https://doi.org/10.31234/osf.io/tu6mp.
Goulet-Pelletier, Jean-Christophe, and Denis Cousineau. 2018. “A Review of Effect Sizes and Their Confidence Intervals, Part i: The Cohen’s d Family.” The Quantitative Methods for Psychology 14 (4): 242–65. https://doi.org/10.20982/tqmp.14.4.p242.
Kirby, Kris N., and Daniel Gerlanc. 2013. BootES: An r Package for Bootstrap Confidence Intervals on Effect Sizes.” Behavior Research Methods 45 (4): 905–27. https://doi.org/10.3758/s13428-013-0330-5.
Lakens, Daniël. 2013. “Calculating and Reporting Effect Sizes to Facilitate Cumulative Science: A Practical Primer for t-Tests and ANOVAs.” Frontiers in Psychology 4: 863.
Senn, Stephen. 2011. “U Is for Unease: Reasons for Mistrusting Overlap Measures for Reporting Clinical Trials.” Statistics in Biopharmaceutical Research 3 (2): 302–9. https://doi.org/10.1198/sbr.2010.10024.
Spanos, Aris. 2021. “Revisiting Noncentrality-Based Confidence Intervals, Error Probabilities and Estimation-Based Effect Sizes.” Journal of Mathematical Psychology 104: 102580.
Viechtbauer, Wolfgang. 2007. “Approximate Confidence Intervals for Standardized Effect Sizes in the Two-Independent and Two-Dependent Samples Design.” Journal of Educational and Behavioral Statistics 32 (1): 39–60.