
Standardized Effect Size (SES) Calculation
ses_calc.RdCalculates non-SMD standardized effect sizes for group comparisons. This function focuses on rank-based and probability-based effect size measures, which are especially useful for non-parametric analyses and when data do not meet normality assumptions.
Usage
ses_calc(
x,
...,
paired = FALSE,
ses = "rb",
alpha = 0.05,
se_method = c("score", "agresti", "fisher"),
correct = FALSE,
output = c("htest", "data.frame"),
null.value = NULL,
alternative = c("none", "two.sided", "less", "greater", "equivalence",
"minimal.effect")
)
# Default S3 method
ses_calc(
x,
y = NULL,
paired = FALSE,
ses = c("rb", "odds", "logodds", "cstat"),
alpha = 0.05,
mu = 0,
se_method = c("score", "agresti", "fisher"),
correct = FALSE,
output = c("htest", "data.frame"),
null.value = NULL,
alternative = c("none", "two.sided", "less", "greater", "equivalence",
"minimal.effect"),
...
)
# S3 method for class 'formula'
ses_calc(formula, data, subset, na.action, ...)Arguments
- x
a (non-empty) numeric vector of data values.
- ...
further arguments to be passed to or from methods.
- paired
a logical indicating whether you want a paired t-test. Cannot be used with the formula method; use x and y vectors instead for paired tests.
- ses
a character string specifying the effect size measure to calculate: - "rb": rank-biserial correlation (default) - "odds": Wilcoxon-Mann-Whitney odds - "logodds": Wilcoxon-Mann-Whitney log-odds - "cstat": concordance statistic (C-statistic, equivalent to the area under the ROC curve)
- alpha
alpha level for confidence interval calculation (default = 0.05).
- se_method
a character string specifying the method for computing standard errors and confidence intervals: - "score": (default) Uses a score-type approach where confidence intervals are constructed by test inversion (finding the values of the concordance probability where the score statistic equals the critical value), then transformed to the requested effect size scale. This method has better small-sample coverage than the Wald-based methods and produces confidence intervals that are coherent with the corresponding rank-based test. Available for all designs. For two-sample independent designs, uses the Fay-Malinovsky approach based on the proportional odds model (see Fay and Malinovsky, 2018). For paired/one-sample designs, uses a Wilson score approach based on the independent-signs model for the signed-rank statistic, producing p-values that match
wilcox.test(..., paired = TRUE, exact = FALSE)at the standard null of 0.5. - "agresti": Uses the Agresti/Lehmann placement-based variance estimation with confidence intervals computed on the log-odds scale and back-transformed. This method has better asymptotic properties and faster convergence to normality. Available for all designs (one-sample, paired, and two-sample independent). However, this method can produce degenerate intervals at the boundaries (when all pairwise comparisons favor one group), in which case a Haldane-type shrinkage correction is applied to enable interval construction. - "fisher": Uses the legacy Fisher z-transformation method for confidence intervals. This method is retained for backward compatibility.- correct
logical; whether to apply a continuity correction to the test statistic and confidence interval. When
se_method = "score", settingcorrect = TRUEproduces p-values that matchwilcox.test(..., exact = FALSE, correct = TRUE). Default isFALSE. Only used withse_method = "score".- output
a character string specifying the output format: - "htest": (default) Returns an object of class "htest" compatible with standard R output. - "data.frame": Returns a data frame with effect size estimates and confidence intervals.
- null.value
a number or vector specifying the null hypothesis value(s) on the scale of the effect size estimate. Only used when
alternative != "none". - For standard alternatives: a single value (default = 0 for rb/logodds, 0.5 for cstat, 1 for odds) - For equivalence/minimal.effect: two values representing the lower and upper bounds- alternative
a character string specifying the alternative hypothesis for optional hypothesis testing: - "none": (default) No hypothesis test is performed; only effect size and CI are returned. - "two.sided": Test whether effect differs from null.value - "less": Test whether effect is less than null.value - "greater": Test whether effect is greater than null.value - "equivalence": Test whether effect is between specified bounds - "minimal.effect": Test whether effect is outside specified bounds
- y
an optional (non-empty) numeric vector of data values.
- mu
number indicating the value around which asymmetry (for one-sample or paired samples) or shift (for independent samples) is to be estimated (default = 0).
- formula
a formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs either 1 for a one-sample test or a factor with two levels giving the corresponding groups. For paired tests, use the default method with x and y vectors instead of the formula method.
- data
an optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).
- subset
an optional vector specifying a subset of observations to be used.
- na.action
a function which indicates what should happen when the data contain NAs. Defaults to getOption("na.action").
Value
If output = "htest" (default), returns a list with class "htest" containing:
estimate: The effect size estimate
stderr: Standard error of the estimate
conf.int: Confidence interval with conf.level attribute
null.value: The null hypothesis value (if alternative != "none")
alternative: The alternative hypothesis
method: Description of the method used
data.name: Names of the input data
statistic: Test statistic (only if alternative != "none")
parameter: Degrees of freedom or other parameters (only if alternative != "none")
p.value: P-value for the test (only if alternative != "none")
If output = "data.frame", returns a data frame containing:
estimate: The effect size estimate
SE: Standard error of the estimate
lower.ci: Lower bound of the confidence interval
upper.ci: Upper bound of the confidence interval
conf.level: Confidence level (1-alpha)
se_method: The SE method used
Details
This function calculates standardized effect sizes that are not standardized mean differences (SMDs). These effect sizes are particularly useful to compliment non-parametric analyses or when analyzing ordinal data.
The available effect size measures are:
Rank-biserial correlation ("rb"): A correlation coefficient based on ranks, ranging from -1 to 1. It can be interpreted as the difference between the proportion of favorable pairs and the proportion of unfavorable pairs. For independent samples, this is equivalent to Cliff's delta.
Wilcoxon-Mann-Whitney odds ("odds"): The ratio of the probability that a randomly selected observation from group 1 exceeds a randomly selected observation from group 2, to the probability of the reverse. Values range from 0 to infinity, with 1 indicating no effect.
Wilcoxon-Mann-Whitney log-odds ("logodds"): The natural logarithm of the WMW odds. This transforms the odds scale to range from negative infinity to positive infinity, with 0 indicating no effect.
Concordance statistic ("cstat"): The probability that a randomly selected observation from group 1 exceeds a randomly selected observation from group 2. Also known as the common language effect size or the area under the ROC curve. Values range from 0 to 1, with 0.5 indicating no effect.
Standard Error Methods
Three methods are available for computing standard errors and confidence intervals:
Agresti method (
se_method = "agresti"): This method computes the variance of the concordance probability \(\hat{p} = \Pr(X > Y)\) using the Lehmann/Agresti placement-based formula. For two independent samples with sizes \(n_1\) and \(n_2\), the placement values \(V_i = \Pr(X > Y | X = x_i)\) and \(W_j = \Pr(X > Y | Y = y_j)\) are computed, and the variance is estimated as:$$\widehat{\mathrm{Var}}(\hat{p}) = \frac{\bar{V^2} - \hat{p}^2}{n_1} + \frac{\bar{W^2} - \hat{p}^2}{n_2}$$
where \(\bar{V^2} = \frac{1}{n_1}\sum V_i^2\) and \(\bar{W^2} = \frac{1}{n_2}\sum W_j^2\). For paired samples, the variance is derived from the Wilcoxon signed-rank statistic using its null-distribution variance with a tie correction.
Standard errors for other effect size scales are obtained via the delta method. Let \(\mathrm{SE}_p\) denote the standard error of \(\hat{p}\). Then:
\(\mathrm{SE}_{\mathrm{rb}} = 2 \cdot \mathrm{SE}_p\) (since \(r_b = 2p - 1\))
\(\mathrm{SE}_{\eta} = \mathrm{SE}_p / [\hat{p}(1 - \hat{p})]\) for the log-odds \(\eta = \log[\hat{p}/(1 - \hat{p})]\)
\(\mathrm{SE}_{\alpha} = \mathrm{SE}_p / (1 - \hat{p})^2\) for the odds \(\alpha = \hat{p}/(1 - \hat{p})\)
Confidence intervals are constructed on the log-odds scale and back-transformed to the requested effect size scale. This ensures that intervals respect the natural bounds of each measure (e.g., \([0, 1]\) for cstat, \([-1, 1]\) for rb).
Score method (
se_method = "score"): Uses a score-type approach where the variance is evaluated at the candidate parameter value, not the estimate. Confidence intervals are constructed via test inversion: finding the values of \(\phi\) (concordance probability) where the score statistic equals the critical value. This method has better small-sample coverage than Wald-type methods and guarantees CI/p-value coherence for equivalence testing.For two-sample independent designs, uses the Fay-Malinovsky V_LAPH variance function from the proportional odds model. The reported SE is descriptive (from V_LAPH at \(\hat{\phi}\)), and the CI comes from root-finding.
For paired/one-sample designs, uses a Wilson score approach based on the independent signs model. Under this model, each pair's sign is Bernoulli(\(\pi\)) independent of rank, giving \(T^+\) known mean \(\pi S\) and variance \(\pi(1-\pi)Q\), where \(S = N(N+1)/2\) and \(Q = \sum r_i^2\). The CI has a closed-form Wilson-score solution (no root-finding needed). At \(\pi_0 = 0.5\), the score test reproduces the Wilcoxon signed-rank z exactly.
The reported standard error is descriptive, but the confidence interval is not computed as estimate ± z × SE. Instead, it comes from test inversion.
When
correct = TRUE, a continuity correction is applied that makes p-values matchwilcox.test(..., exact = FALSE, correct = TRUE).Fisher method (
se_method = "fisher"): This legacy method uses Fisher's z-transformation (arctanh) for the rank-biserial correlation. Confidence intervals for other effect sizes are obtained by simple transformation of the rank-biserial CI bounds.
Boundary Case Handling (Complete Separation)
When there is complete separation between groups (i.e., all pairwise comparisons favor one group), the concordance probability \(\hat{p}\) equals exactly 0 or 1. This leads to undefined odds and log-odds, and a degenerate (zero) placement-based variance.
For se_method = "agresti", a Haldane-type shrinkage correction is applied:
$$\tilde{p} = \frac{C + 0.5}{N_{\mathrm{pairs}} + 1}$$
where \(C\) is the concordance count and \(N_{\mathrm{pairs}}\) is the total
number of pairwise comparisons (\(n_1 n_2\) for two-sample designs, or
\(N(N+1)/2\) for paired/one-sample designs where \(N\) is the number of
non-zero differences). This corresponds to the posterior mean under a Jeffreys
Beta(0.5, 0.5) prior and shrinks the estimate toward 0.5, with stronger shrinkage
for smaller samples.
The Agresti placement variance is then evaluated at the corrected estimate. A message
is printed when this correction is applied. For more reliable inference at boundaries,
consider se_method = "score" for score-type intervals that handle
boundaries naturally without correction.
For se_method = "score", no correction is needed. The score-type
CI is constructed via test inversion, where the variance function is
evaluated at candidate parameter values in the interior of (0, 1). When
\(\hat{p} = 1\), the upper CI bound is trivially 1 and the lower bound is found
by the score inversion. No modification of the point estimate is required.
This applies to both two-sample (via root-finding) and paired/one-sample
(via the closed-form Wilson quadratic) designs.
For two-sample designs using the Agresti method, a score-type CI is automatically used as a fallback at boundaries for better interval coverage.
Hypothesis Testing
When alternative != "none", a hypothesis test is performed. The approach depends on the
SE method:
Agresti method: All hypothesis tests are conducted on the log-odds scale, regardless of which effect size is requested. The log-odds scale \(\eta = \log[p / (1-p)]\) has superior asymptotic properties: it is unbounded (\(-\infty\) to \(+\infty\)), converges more quickly to normality, and the null hypothesis (\(\eta_0 = 0\)) is interior to the parameter space.
The user specifies
null.valueon the scale of their chosen effect size. These values are internally transformed to the log-odds scale for testing:rb: \(\eta_0 = \log[(r_0 + 1) / (1 - r_0)]\)
cstat: \(\eta_0 = \log[p_0 / (1 - p_0)]\)
odds: \(\eta_0 = \log(\alpha_0)\)
logodds: \(\eta_0 = \eta_0\) (no transformation needed)
The z-statistic is computed as \(z = (\hat{\eta} - \eta_0) / \mathrm{SE}_\eta\), and the p-value is obtained from the standard normal distribution. When the user specifies custom null values (not the default), a message reports the transformation to the log-odds scale.
Fisher method: Hypothesis tests are conducted on the scale of the requested effect size, using the Wald z-statistic \(z = (\hat{\theta} - \theta_0) / \mathrm{SE}_\theta\).
For equivalence testing (alternative = "equivalence"), the TOST procedure is used: two
one-sided tests are performed against the lower and upper bounds, and the p-value is the
maximum of the two one-sided p-values. For minimal effect testing
(alternative = "minimal.effect"), the p-value is the minimum.
The function supports three study designs:
One-sample design: Compares a single sample to a specified value
Two-sample independent design: Compares two independent groups
Paired samples design: Compares paired observations
For detailed information on calculation methods, see vignette("robustTOST").
Purpose
Use this function when:
You want to report non-parametric effect size measures
You need to quantify the magnitude of differences using ranks or probabilities
Your outcome variable is ordinal
You want to complement results from Wilcoxon-Mann-Whitney type tests
References
Agresti, A. (1980). Generalized odds ratios for ordinal data. Biometrics, 36, 59-67.
Bamber, D. (1975). The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology, 12, 387-415.
Fay, M.P. and Malinovsky, Y. (2018). Confidence Intervals of the Mann-Whitney Parameter that are Compatible with the Wilcoxon-Mann-Whitney Test. Statistics in Medicine, 37, 3991-4006. doi:10.1002/sim.7890
Kerby, D. S. (2014). The simple difference formula: An approach to teaching nonparametric correlation. Comprehensive Psychology, 3, 11-IT.
Lehmann, E.L. (1975). Nonparametrics: Statistical Methods Based on Ranks. Holden-Day.
O'Brien, R.G. & Castelloe, J. (2006). Exploiting the link between the Wilcoxon-Mann-Whitney test and a simple odds statistic. SUGI 31 Proceedings, Paper 209-31.
See also
Other effect sizes:
boot_ses_calc(),
boot_smd_calc(),
rank_diff(),
smd_calc(),
trans_rank_prob()
Examples
# Example 1: Independent groups comparison (rank-biserial correlation)
set.seed(123)
group1 <- c(1.2, 2.3, 3.1, 4.6, 5.2, 6.7)
group2 <- c(3.5, 4.8, 5.6, 6.9, 7.2, 8.5)
ses_calc(x = group1, y = group2, ses = "rb")
#>
#> Two Sample Rank-Biserial Correlation estimate with CI
#>
#> data: group1 and group2
#>
#> alternative hypothesis: none
#> 95 percent confidence interval:
#> -0.91638826 0.01326171
#> sample estimates:
#> P(X>Y) - P(X<Y)
#> -0.6666667
#>
# Example 2: Using formula notation to calculate WMW odds
data(mtcars)
ses_calc(formula = mpg ~ am, data = mtcars, ses = "odds")
#>
#> Two Sample WMW Odds estimate with CI
#>
#> data: mpg by am
#>
#> alternative hypothesis: none
#> 95 percent confidence interval:
#> 0.07771216 0.58217291
#> sample estimates:
#> odds(P('0'>'1') + .5*P('0'='1'))
#> 0.204878
#>
# Example 3: Paired samples with concordance statistic
data(sleep)
with(sleep, ses_calc(x = extra[group == 1],
y = extra[group == 2],
paired = TRUE,
ses = "cstat"))
#>
#> Paired Sample Concordance estimate with CI
#>
#> data: extra[group == 1] and extra[group == 2]
#>
#> alternative hypothesis: none
#> 95 percent confidence interval:
#> 0.0000000 0.3505234
#> sample estimates:
#> P(X - Y>0) + .5*P(X - Y=0)
#> 0
#>
# Example 4: With hypothesis testing
ses_calc(x = group1, y = group2, ses = "rb",
alternative = "two.sided", null.value = 0)
#>
#> Two Sample Rank-Biserial Correlation test
#>
#> data: group1 and group2
#> z = -1.9215, p-value = 0.05466
#> alternative hypothesis: true P(X>Y) - P(X<Y) is not equal to 0
#> 95 percent confidence interval:
#> -0.91638826 0.01326171
#> sample estimates:
#> P(X>Y) - P(X<Y)
#> -0.6666667
#>
# Example 5: Return as data frame (legacy format)
ses_calc(x = group1, y = group2, ses = "rb", output = "data.frame")
#> estimate SE lower.ci upper.ci conf.level
#> Rank-Biserial Correlation -0.6666667 0.2480483 -0.9163883 0.01326171 0.95
#> se_method
#> Rank-Biserial Correlation score
# Example 6: Using Fisher method for backward compatibility
ses_calc(x = group1, y = group2, ses = "rb", se_method = "fisher")
#>
#> Two Sample Rank-Biserial Correlation estimate with CI
#>
#> data: group1 and group2
#>
#> alternative hypothesis: none
#> 95 percent confidence interval:
#> -0.9023481 -0.1240779
#> sample estimates:
#> P(X>Y) - P(X<Y)
#> -0.6666667
#>
# Example 7: Using score method for WMW-compatible CIs (two-sample only)
ses_calc(x = group1, y = group2, ses = "cstat", se_method = "score")
#>
#> Two Sample Concordance estimate with CI
#>
#> data: group1 and group2
#>
#> alternative hypothesis: none
#> 95 percent confidence interval:
#> 0.04180587 0.50663086
#> sample estimates:
#> P(X>Y) + .5*P(X=Y)
#> 0.1666667
#>
# Example 8: Score method with continuity correction (matches wilcox.test)
ses_calc(x = group1, y = group2, ses = "cstat", se_method = "score",
correct = TRUE, alternative = "two.sided", null.value = 0.5)
#>
#> Two Sample Concordance test
#>
#> data: group1 and group2
#> z = -1.8415, p-value = 0.06555
#> alternative hypothesis: true P(X>Y) + .5*P(X=Y) is not equal to 0.5
#> 95 percent confidence interval:
#> 0.03633436 0.52022884
#> sample estimates:
#> P(X>Y) + .5*P(X=Y)
#> 0.1666667
#>