Skip to contents

[Stable]

Calculates non-SMD standardized effect sizes for group comparisons. This function focuses on rank-based and probability-based effect size measures, which are especially useful for non-parametric analyses and when data do not meet normality assumptions.

Usage

ses_calc(
  x,
  ...,
  paired = FALSE,
  ses = "rb",
  alpha = 0.05,
  se_method = c("score", "agresti", "fisher"),
  correct = FALSE,
  output = c("htest", "data.frame"),
  null.value = NULL,
  alternative = c("none", "two.sided", "less", "greater", "equivalence",
    "minimal.effect")
)

# Default S3 method
ses_calc(
  x,
  y = NULL,
  paired = FALSE,
  ses = c("rb", "odds", "logodds", "cstat"),
  alpha = 0.05,
  mu = 0,
  se_method = c("score", "agresti", "fisher"),
  correct = FALSE,
  output = c("htest", "data.frame"),
  null.value = NULL,
  alternative = c("none", "two.sided", "less", "greater", "equivalence",
    "minimal.effect"),
  ...
)

# S3 method for class 'formula'
ses_calc(formula, data, subset, na.action, ...)

Arguments

x

a (non-empty) numeric vector of data values.

...

further arguments to be passed to or from methods.

paired

a logical indicating whether you want a paired t-test. Cannot be used with the formula method; use x and y vectors instead for paired tests.

ses

a character string specifying the effect size measure to calculate: - "rb": rank-biserial correlation (default) - "odds": Wilcoxon-Mann-Whitney odds - "logodds": Wilcoxon-Mann-Whitney log-odds - "cstat": concordance statistic (C-statistic, equivalent to the area under the ROC curve)

alpha

alpha level for confidence interval calculation (default = 0.05).

se_method

a character string specifying the method for computing standard errors and confidence intervals: - "score": (default) Uses a score-type approach where confidence intervals are constructed by test inversion (finding the values of the concordance probability where the score statistic equals the critical value), then transformed to the requested effect size scale. This method has better small-sample coverage than the Wald-based methods and produces confidence intervals that are coherent with the corresponding rank-based test. Available for all designs. For two-sample independent designs, uses the Fay-Malinovsky approach based on the proportional odds model (see Fay and Malinovsky, 2018). For paired/one-sample designs, uses a Wilson score approach based on the independent-signs model for the signed-rank statistic, producing p-values that match wilcox.test(..., paired = TRUE, exact = FALSE) at the standard null of 0.5. - "agresti": Uses the Agresti/Lehmann placement-based variance estimation with confidence intervals computed on the log-odds scale and back-transformed. This method has better asymptotic properties and faster convergence to normality. Available for all designs (one-sample, paired, and two-sample independent). However, this method can produce degenerate intervals at the boundaries (when all pairwise comparisons favor one group), in which case a Haldane-type shrinkage correction is applied to enable interval construction. - "fisher": Uses the legacy Fisher z-transformation method for confidence intervals. This method is retained for backward compatibility.

correct

logical; whether to apply a continuity correction to the test statistic and confidence interval. When se_method = "score", setting correct = TRUE produces p-values that match wilcox.test(..., exact = FALSE, correct = TRUE). Default is FALSE. Only used with se_method = "score".

output

a character string specifying the output format: - "htest": (default) Returns an object of class "htest" compatible with standard R output. - "data.frame": Returns a data frame with effect size estimates and confidence intervals.

null.value

a number or vector specifying the null hypothesis value(s) on the scale of the effect size estimate. Only used when alternative != "none". - For standard alternatives: a single value (default = 0 for rb/logodds, 0.5 for cstat, 1 for odds) - For equivalence/minimal.effect: two values representing the lower and upper bounds

alternative

a character string specifying the alternative hypothesis for optional hypothesis testing: - "none": (default) No hypothesis test is performed; only effect size and CI are returned. - "two.sided": Test whether effect differs from null.value - "less": Test whether effect is less than null.value - "greater": Test whether effect is greater than null.value - "equivalence": Test whether effect is between specified bounds - "minimal.effect": Test whether effect is outside specified bounds

y

an optional (non-empty) numeric vector of data values.

mu

number indicating the value around which asymmetry (for one-sample or paired samples) or shift (for independent samples) is to be estimated (default = 0).

formula

a formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs either 1 for a one-sample test or a factor with two levels giving the corresponding groups. For paired tests, use the default method with x and y vectors instead of the formula method.

data

an optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).

subset

an optional vector specifying a subset of observations to be used.

na.action

a function which indicates what should happen when the data contain NAs. Defaults to getOption("na.action").

Value

If output = "htest" (default), returns a list with class "htest" containing:

  • estimate: The effect size estimate

  • stderr: Standard error of the estimate

  • conf.int: Confidence interval with conf.level attribute

  • null.value: The null hypothesis value (if alternative != "none")

  • alternative: The alternative hypothesis

  • method: Description of the method used

  • data.name: Names of the input data

  • statistic: Test statistic (only if alternative != "none")

  • parameter: Degrees of freedom or other parameters (only if alternative != "none")

  • p.value: P-value for the test (only if alternative != "none")

If output = "data.frame", returns a data frame containing:

  • estimate: The effect size estimate

  • SE: Standard error of the estimate

  • lower.ci: Lower bound of the confidence interval

  • upper.ci: Upper bound of the confidence interval

  • conf.level: Confidence level (1-alpha)

  • se_method: The SE method used

Details

This function calculates standardized effect sizes that are not standardized mean differences (SMDs). These effect sizes are particularly useful to compliment non-parametric analyses or when analyzing ordinal data.

The available effect size measures are:

  • Rank-biserial correlation ("rb"): A correlation coefficient based on ranks, ranging from -1 to 1. It can be interpreted as the difference between the proportion of favorable pairs and the proportion of unfavorable pairs. For independent samples, this is equivalent to Cliff's delta.

  • Wilcoxon-Mann-Whitney odds ("odds"): The ratio of the probability that a randomly selected observation from group 1 exceeds a randomly selected observation from group 2, to the probability of the reverse. Values range from 0 to infinity, with 1 indicating no effect.

  • Wilcoxon-Mann-Whitney log-odds ("logodds"): The natural logarithm of the WMW odds. This transforms the odds scale to range from negative infinity to positive infinity, with 0 indicating no effect.

  • Concordance statistic ("cstat"): The probability that a randomly selected observation from group 1 exceeds a randomly selected observation from group 2. Also known as the common language effect size or the area under the ROC curve. Values range from 0 to 1, with 0.5 indicating no effect.

Standard Error Methods

Three methods are available for computing standard errors and confidence intervals:

  • Agresti method (se_method = "agresti"): This method computes the variance of the concordance probability \(\hat{p} = \Pr(X > Y)\) using the Lehmann/Agresti placement-based formula. For two independent samples with sizes \(n_1\) and \(n_2\), the placement values \(V_i = \Pr(X > Y | X = x_i)\) and \(W_j = \Pr(X > Y | Y = y_j)\) are computed, and the variance is estimated as:

    $$\widehat{\mathrm{Var}}(\hat{p}) = \frac{\bar{V^2} - \hat{p}^2}{n_1} + \frac{\bar{W^2} - \hat{p}^2}{n_2}$$

    where \(\bar{V^2} = \frac{1}{n_1}\sum V_i^2\) and \(\bar{W^2} = \frac{1}{n_2}\sum W_j^2\). For paired samples, the variance is derived from the Wilcoxon signed-rank statistic using its null-distribution variance with a tie correction.

    Standard errors for other effect size scales are obtained via the delta method. Let \(\mathrm{SE}_p\) denote the standard error of \(\hat{p}\). Then:

    • \(\mathrm{SE}_{\mathrm{rb}} = 2 \cdot \mathrm{SE}_p\) (since \(r_b = 2p - 1\))

    • \(\mathrm{SE}_{\eta} = \mathrm{SE}_p / [\hat{p}(1 - \hat{p})]\) for the log-odds \(\eta = \log[\hat{p}/(1 - \hat{p})]\)

    • \(\mathrm{SE}_{\alpha} = \mathrm{SE}_p / (1 - \hat{p})^2\) for the odds \(\alpha = \hat{p}/(1 - \hat{p})\)

    Confidence intervals are constructed on the log-odds scale and back-transformed to the requested effect size scale. This ensures that intervals respect the natural bounds of each measure (e.g., \([0, 1]\) for cstat, \([-1, 1]\) for rb).

  • Score method (se_method = "score"): Uses a score-type approach where the variance is evaluated at the candidate parameter value, not the estimate. Confidence intervals are constructed via test inversion: finding the values of \(\phi\) (concordance probability) where the score statistic equals the critical value. This method has better small-sample coverage than Wald-type methods and guarantees CI/p-value coherence for equivalence testing.

    For two-sample independent designs, uses the Fay-Malinovsky V_LAPH variance function from the proportional odds model. The reported SE is descriptive (from V_LAPH at \(\hat{\phi}\)), and the CI comes from root-finding.

    For paired/one-sample designs, uses a Wilson score approach based on the independent signs model. Under this model, each pair's sign is Bernoulli(\(\pi\)) independent of rank, giving \(T^+\) known mean \(\pi S\) and variance \(\pi(1-\pi)Q\), where \(S = N(N+1)/2\) and \(Q = \sum r_i^2\). The CI has a closed-form Wilson-score solution (no root-finding needed). At \(\pi_0 = 0.5\), the score test reproduces the Wilcoxon signed-rank z exactly.

    The reported standard error is descriptive, but the confidence interval is not computed as estimate ± z × SE. Instead, it comes from test inversion.

    When correct = TRUE, a continuity correction is applied that makes p-values match wilcox.test(..., exact = FALSE, correct = TRUE).

  • Fisher method (se_method = "fisher"): This legacy method uses Fisher's z-transformation (arctanh) for the rank-biserial correlation. Confidence intervals for other effect sizes are obtained by simple transformation of the rank-biserial CI bounds.

Boundary Case Handling (Complete Separation)

When there is complete separation between groups (i.e., all pairwise comparisons favor one group), the concordance probability \(\hat{p}\) equals exactly 0 or 1. This leads to undefined odds and log-odds, and a degenerate (zero) placement-based variance.

For se_method = "agresti", a Haldane-type shrinkage correction is applied: $$\tilde{p} = \frac{C + 0.5}{N_{\mathrm{pairs}} + 1}$$ where \(C\) is the concordance count and \(N_{\mathrm{pairs}}\) is the total number of pairwise comparisons (\(n_1 n_2\) for two-sample designs, or \(N(N+1)/2\) for paired/one-sample designs where \(N\) is the number of non-zero differences). This corresponds to the posterior mean under a Jeffreys Beta(0.5, 0.5) prior and shrinks the estimate toward 0.5, with stronger shrinkage for smaller samples.

The Agresti placement variance is then evaluated at the corrected estimate. A message is printed when this correction is applied. For more reliable inference at boundaries, consider se_method = "score" for score-type intervals that handle boundaries naturally without correction.

For se_method = "score", no correction is needed. The score-type CI is constructed via test inversion, where the variance function is evaluated at candidate parameter values in the interior of (0, 1). When \(\hat{p} = 1\), the upper CI bound is trivially 1 and the lower bound is found by the score inversion. No modification of the point estimate is required. This applies to both two-sample (via root-finding) and paired/one-sample (via the closed-form Wilson quadratic) designs.

For two-sample designs using the Agresti method, a score-type CI is automatically used as a fallback at boundaries for better interval coverage.

Hypothesis Testing

When alternative != "none", a hypothesis test is performed. The approach depends on the SE method:

  • Agresti method: All hypothesis tests are conducted on the log-odds scale, regardless of which effect size is requested. The log-odds scale \(\eta = \log[p / (1-p)]\) has superior asymptotic properties: it is unbounded (\(-\infty\) to \(+\infty\)), converges more quickly to normality, and the null hypothesis (\(\eta_0 = 0\)) is interior to the parameter space.

    The user specifies null.value on the scale of their chosen effect size. These values are internally transformed to the log-odds scale for testing:

    • rb: \(\eta_0 = \log[(r_0 + 1) / (1 - r_0)]\)

    • cstat: \(\eta_0 = \log[p_0 / (1 - p_0)]\)

    • odds: \(\eta_0 = \log(\alpha_0)\)

    • logodds: \(\eta_0 = \eta_0\) (no transformation needed)

    The z-statistic is computed as \(z = (\hat{\eta} - \eta_0) / \mathrm{SE}_\eta\), and the p-value is obtained from the standard normal distribution. When the user specifies custom null values (not the default), a message reports the transformation to the log-odds scale.

  • Fisher method: Hypothesis tests are conducted on the scale of the requested effect size, using the Wald z-statistic \(z = (\hat{\theta} - \theta_0) / \mathrm{SE}_\theta\).

For equivalence testing (alternative = "equivalence"), the TOST procedure is used: two one-sided tests are performed against the lower and upper bounds, and the p-value is the maximum of the two one-sided p-values. For minimal effect testing (alternative = "minimal.effect"), the p-value is the minimum.

The function supports three study designs:

  • One-sample design: Compares a single sample to a specified value

  • Two-sample independent design: Compares two independent groups

  • Paired samples design: Compares paired observations

For detailed information on calculation methods, see vignette("robustTOST").

Purpose

Use this function when:

  • You want to report non-parametric effect size measures

  • You need to quantify the magnitude of differences using ranks or probabilities

  • Your outcome variable is ordinal

  • You want to complement results from Wilcoxon-Mann-Whitney type tests

References

Agresti, A. (1980). Generalized odds ratios for ordinal data. Biometrics, 36, 59-67.

Bamber, D. (1975). The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology, 12, 387-415.

Fay, M.P. and Malinovsky, Y. (2018). Confidence Intervals of the Mann-Whitney Parameter that are Compatible with the Wilcoxon-Mann-Whitney Test. Statistics in Medicine, 37, 3991-4006. doi:10.1002/sim.7890

Kerby, D. S. (2014). The simple difference formula: An approach to teaching nonparametric correlation. Comprehensive Psychology, 3, 11-IT.

Lehmann, E.L. (1975). Nonparametrics: Statistical Methods Based on Ranks. Holden-Day.

O'Brien, R.G. & Castelloe, J. (2006). Exploiting the link between the Wilcoxon-Mann-Whitney test and a simple odds statistic. SUGI 31 Proceedings, Paper 209-31.

Examples

# Example 1: Independent groups comparison (rank-biserial correlation)
set.seed(123)
group1 <- c(1.2, 2.3, 3.1, 4.6, 5.2, 6.7)
group2 <- c(3.5, 4.8, 5.6, 6.9, 7.2, 8.5)
ses_calc(x = group1, y = group2, ses = "rb")
#> 
#> 	Two Sample Rank-Biserial Correlation estimate with CI
#> 
#> data:  group1 and group2
#> 
#> alternative hypothesis: none
#> 95 percent confidence interval:
#>  -0.91638826  0.01326171
#> sample estimates:
#> P(X>Y) - P(X<Y) 
#>      -0.6666667 
#> 

# Example 2: Using formula notation to calculate WMW odds
data(mtcars)
ses_calc(formula = mpg ~ am, data = mtcars, ses = "odds")
#> 
#> 	Two Sample WMW Odds estimate with CI
#> 
#> data:  mpg by am
#> 
#> alternative hypothesis: none
#> 95 percent confidence interval:
#>  0.07771216 0.58217291
#> sample estimates:
#> odds(P('0'>'1') + .5*P('0'='1')) 
#>                         0.204878 
#> 

# Example 3: Paired samples with concordance statistic
data(sleep)
with(sleep, ses_calc(x = extra[group == 1],
                     y = extra[group == 2],
                     paired = TRUE,
                     ses = "cstat"))
#> 
#> 	Paired Sample Concordance estimate with CI
#> 
#> data:  extra[group == 1] and extra[group == 2]
#> 
#> alternative hypothesis: none
#> 95 percent confidence interval:
#>  0.0000000 0.3505234
#> sample estimates:
#> P(X - Y>0) + .5*P(X - Y=0) 
#>                          0 
#> 

# Example 4: With hypothesis testing
ses_calc(x = group1, y = group2, ses = "rb",
         alternative = "two.sided", null.value = 0)
#> 
#> 	Two Sample Rank-Biserial Correlation test
#> 
#> data:  group1 and group2
#> z = -1.9215, p-value = 0.05466
#> alternative hypothesis: true P(X>Y) - P(X<Y) is not equal to 0
#> 95 percent confidence interval:
#>  -0.91638826  0.01326171
#> sample estimates:
#> P(X>Y) - P(X<Y) 
#>      -0.6666667 
#> 

# Example 5: Return as data frame (legacy format)
ses_calc(x = group1, y = group2, ses = "rb", output = "data.frame")
#>                             estimate        SE   lower.ci   upper.ci conf.level
#> Rank-Biserial Correlation -0.6666667 0.2480483 -0.9163883 0.01326171       0.95
#>                           se_method
#> Rank-Biserial Correlation     score

# Example 6: Using Fisher method for backward compatibility
ses_calc(x = group1, y = group2, ses = "rb", se_method = "fisher")
#> 
#> 	Two Sample Rank-Biserial Correlation estimate with CI
#> 
#> data:  group1 and group2
#> 
#> alternative hypothesis: none
#> 95 percent confidence interval:
#>  -0.9023481 -0.1240779
#> sample estimates:
#> P(X>Y) - P(X<Y) 
#>      -0.6666667 
#> 

# Example 7: Using score method for WMW-compatible CIs (two-sample only)
ses_calc(x = group1, y = group2, ses = "cstat", se_method = "score")
#> 
#> 	Two Sample Concordance estimate with CI
#> 
#> data:  group1 and group2
#> 
#> alternative hypothesis: none
#> 95 percent confidence interval:
#>  0.04180587 0.50663086
#> sample estimates:
#> P(X>Y) + .5*P(X=Y) 
#>          0.1666667 
#> 

# Example 8: Score method with continuity correction (matches wilcox.test)
ses_calc(x = group1, y = group2, ses = "cstat", se_method = "score",
         correct = TRUE, alternative = "two.sided", null.value = 0.5)
#> 
#> 	Two Sample Concordance test
#> 
#> data:  group1 and group2
#> z = -1.8415, p-value = 0.06555
#> alternative hypothesis: true P(X>Y) + .5*P(X=Y) is not equal to 0.5
#> 95 percent confidence interval:
#>  0.03633436 0.52022884
#> sample estimates:
#> P(X>Y) + .5*P(X=Y) 
#>          0.1666667 
#>