Skip to contents

[Maturing]

Calculates non-SMD standardized effect sizes with bootstrap confidence intervals and optional hypothesis testing. This function provides robust confidence intervals for rank-based and probability-based effect size measures through resampling methods.

Usage

boot_ses_calc(
  x,
  ...,
  paired = FALSE,
  ses = "rb",
  alpha = 0.05,
  mu = 0,
  boot_ci = c("bca", "stud", "basic", "perc"),
  R = 1999,
  se_method = c("agresti", "fisher"),
  output = c("htest", "data.frame"),
  alternative = c("none", "two.sided", "less", "greater", "equivalence",
    "minimal.effect"),
  null.value = NULL
)

# Default S3 method
boot_ses_calc(
  x,
  y = NULL,
  paired = FALSE,
  ses = c("rb", "odds", "logodds", "cstat"),
  alpha = 0.05,
  mu = 0,
  boot_ci = c("basic", "stud", "perc", "bca"),
  R = 1999,
  se_method = c("agresti", "fisher"),
  output = c("htest", "data.frame"),
  alternative = c("none", "two.sided", "less", "greater", "equivalence",
    "minimal.effect"),
  null.value = NULL,
  ...
)

# S3 method for class 'formula'
boot_ses_calc(formula, data, subset, na.action, ...)

Arguments

x

a (non-empty) numeric vector of data values.

...

further arguments to be passed to or from methods.

paired

a logical indicating whether you want a paired t-test. Cannot be used with the formula method; use x and y vectors instead for paired tests.

ses

a character string specifying the effect size measure to calculate: - "rb": rank-biserial correlation (default) - "odds": Wilcoxon-Mann-Whitney odds - "logodds": Wilcoxon-Mann-Whitney log-odds - "cstat": concordance statistic (C-statistic/AUC)

alpha

alpha level (default = 0.05)

mu

number indicating the value around which asymmetry (for one-sample or paired samples) or shift (for independent samples) is to be estimated (default = 0).

boot_ci

method for bootstrap confidence interval calculation: "stud" (studentized, default), "basic" (basic bootstrap), "bca" (bias-corrected and accelerated), or "perc" (percentile bootstrap).

R

number of bootstrap replications (default = 1999).

se_method

a character string specifying the method for computing standard errors within each bootstrap sample: - "agresti": (default) Uses the Agresti/Lehmann placement-based variance estimation with the log-odds working scale, which has better asymptotic properties (faster convergence to normality per Agresti, 1980). - "fisher": Uses the legacy Fisher z-transformation method. Retained for backward compatibility.

output

a character string specifying the output format: - "htest": (default) Returns an object of class "htest" compatible with standard R output. - "data.frame": Returns a data frame with effect size estimates and confidence intervals.

alternative

a character string specifying the alternative hypothesis for optional hypothesis testing: - "none": (default) No hypothesis test is performed; only effect size and CI are returned. - "two.sided": Test whether effect differs from null.value - "less": Test whether effect is less than null.value - "greater": Test whether effect is greater than null.value - "equivalence": Test whether effect is between specified bounds - "minimal.effect": Test whether effect is outside specified bounds

null.value

a number or vector specifying the null hypothesis value(s): - For standard alternatives: a single value (default = 0 for rb/logodds, 0.5 for cstat, 1 for odds) - For equivalence/minimal.effect: two values representing the lower and upper bounds

y

an optional (non-empty) numeric vector of data values.

formula

a formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs either 1 for a one-sample test or a factor with two levels giving the corresponding groups. For paired tests, use the default method with x and y vectors instead of the formula method.

data

an optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).

subset

an optional vector specifying a subset of observations to be used.

na.action

a function which indicates what should happen when the data contain NAs. Defaults to getOption("na.action").

Value

If output = "htest" (default), returns a list with class "htest" containing:

  • estimate: The effect size estimate calculated from the original data

  • stderr: Standard error estimated from the bootstrap distribution

  • conf.int: Bootstrap confidence interval with conf.level attribute

  • alternative: A character string describing the alternative hypothesis

  • method: A character string indicating what type of test was performed

  • boot: The bootstrap samples of the effect size (on the requested scale)

  • data.name: A character string giving the name(s) of the data

  • call: The matched call

  • statistic: Test statistic (only if alternative != "none")

  • p.value: The bootstrapped p-value for the test (only if alternative != "none")

  • null.value: The specified hypothesized value(s) (only if alternative != "none")

If output = "data.frame", returns a data frame containing:

  • estimate: The effect size estimate

  • SE: Standard error from the bootstrap distribution

  • lower.ci: Lower bound of the bootstrap confidence interval

  • upper.ci: Upper bound of the bootstrap confidence interval

  • conf.level: Confidence level (1-alpha or 1-2*alpha for equivalence)

  • boot_ci: The bootstrap CI method used

Details

This function calculates bootstrapped confidence intervals for rank-based and probability-based effect size measures. It extends the ses_calc() function by using resampling to provide more robust confidence intervals, especially for small sample sizes.

The function implements the following bootstrap approach:

  • Calculate the raw effect size using the original data

  • Create R bootstrap samples by resampling with replacement from the original data

  • Calculate the effect size for each bootstrap sample

  • Transform bootstrap estimates to the working scale for CI construction: the log-odds scale when se_method = "agresti" (default), or Fisher z when se_method = "fisher"

  • Calculate confidence intervals using the specified method

  • Back-transform the confidence intervals to the original scale

  • Convert to the requested effect size measure (if not rank-biserial)

Standard Error Methods

Two methods are available for computing standard errors within each bootstrap sample:

  • Agresti method (se_method = "agresti"): Uses the Agresti/Lehmann placement-based variance estimation. This method computes the variance of the concordance probability and propagates it to other effect size scales using the delta method.

  • Fisher method (se_method = "fisher"): Uses the legacy formula based on the Wilcoxon statistic variance. This is retained for backward compatibility.

Bootstrap Confidence Interval Methods

Four bootstrap confidence interval methods are available via the boot_ci argument:

  • Basic bootstrap ("basic"): Reflects the bootstrap distribution of estimates around the observed value

  • Studentized bootstrap ("stud"): Uses the bootstrap distribution of pivotal t-statistics to account for variability in standard error estimates. This is the default and usually provides the most accurate coverage.

  • Percentile bootstrap ("perc"): Uses percentiles of the bootstrap distribution directly

  • Bias-corrected and accelerated ("bca"): Corrects for both bias and skewness in the bootstrap distribution using jackknife-based acceleration

All CI methods operate on a working scale that is better suited to symmetric bootstrap distributions: the log-odds scale when se_method = "agresti" (default), or the Fisher z scale when se_method = "fisher". Confidence limits are then back-transformed to the requested effect size scale.

Hypothesis Testing

When an alternative other than "none" is specified, or when null.value is not the default, the function performs bootstrap hypothesis testing. For equivalence and minimal effect testing, specify null.value as a vector of two values (lower and upper bounds).

The p-value is computed using the method that matches the selected boot_ci, ensuring that p < alpha if and only if the corresponding confidence interval excludes the null value (CI inversion principle). Previously, all bootstrap CI methods used the studentized (pivot) p-value, which could produce p-values inconsistent with non-studentized CIs. The null value is converted to the working scale (log-odds or Fisher z) before computing the p-value, maintaining consistency with the CI construction.

For different alternatives, the p-values are calculated as follows:

  • "two.sided": Two-tailed p-value from the bootstrap distribution

  • "less": One-sided p-value for the hypothesis that the true value is less than the null

  • "greater": One-sided p-value for the hypothesis that the true value is greater than the null

  • "equivalence": Maximum of two one-sided p-values (for lower and upper bounds)

  • "minimal.effect": Minimum of two one-sided p-values (for lower and upper bounds)

The function supports three study designs:

  • One-sample design: Compares a single sample to a specified value

  • Two-sample independent design: Compares two independent groups

  • Paired samples design: Compares paired observations

Note that extreme values (perfect separation between groups) can produce infinite values during the bootstrapping process. The function will issue a warning if this occurs.

For detailed information on calculation methods, see vignette("robustTOST").

Edge Cases

  • Complete separation: When one group entirely dominates the other (concordance probability = 0 or 1), the bootstrap distribution collapses and CIs become degenerate. The function stops with an informative error directing users to ses_calc() with se_method = "agresti" for asymptotic inference. This condition is detected before resampling begins.

  • Near-complete separation: When the observed rb is close to +/-1 but not exactly at the boundary, the working-scale transformation (log-odds or Fisher z) helps stabilize the bootstrap but coverage may still degrade. The function issues a message when bootstrap replicates contain infinite values after transformation, which is a symptom of this problem.

  • Why the bootstrap fails at boundaries: The rank-biserial is bounded on [-1, 1]. When the observed value is near a boundary, resampled values pile up at the boundary, producing a distribution that is not well-approximated by the symmetric bootstrap CI methods (basic, percentile, studentized). The log-odds and Fisher z working scales mitigate this by mapping [-1, 1] to the real line, but the mapping itself becomes unstable as rb approaches +/-1.

  • Recommendation: For data with complete or near-complete separation, prefer the asymptotic Agresti/Lehmann interval from ses_calc(), which handles boundary behavior more gracefully through the placement-based variance estimator.

Purpose

Use this function when:

  • You need more robust confidence intervals for non-parametric effect sizes

  • You prefer resampling-based confidence intervals over asymptotic approximations

  • You need to quantify uncertainty in rank-based effect sizes more accurately

  • You want to perform hypothesis testing with bootstrap methods

References

Agresti, A. (1980). Generalized odds ratios for ordinal data. Biometrics, 36, 59-67.

Bamber, D. (1975). The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology, 12, 387-415.

Efron, B., & Tibshirani, R. J. (1994). An introduction to the bootstrap. CRC press.

Lehmann, E.L. (1975). Nonparametrics: Statistical Methods Based on Ranks. Holden-Day.

Examples

# Example 1: Independent groups comparison with basic bootstrap CI
set.seed(123)
group1 <- c(1.2, 2.3, 3.1, 4.6, 5.2, 6.7)
group2 <- c(3.5, 4.8, 5.6, 6.9, 7.2, 8.5)

# Use fewer bootstrap replicates for a quick example
result <- boot_ses_calc(x = group1, y = group2,
                        ses = "rb",
                        boot_ci = "basic",
                        R = 99)
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.

# Example 2: Hypothesis testing (two-sided)
result <- boot_ses_calc(x = group1, y = group2,
                        ses = "rb",
                        alternative = "two.sided",
                        null.value = 0,
                        R = 99)
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.

# Example 3: Equivalence testing
result <- boot_ses_calc(x = group1, y = group2,
                        ses = "rb",
                        alternative = "equivalence",
                        null.value = c(-0.3, 0.3),
                        R = 99)
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.
#> Complete separation detected (all pairwise comparisons favor one group). A Haldane-type shrinkage correction was applied to enable confidence interval construction on the log-odds scale. Score-type CIs used for better boundary behavior.

# Example 4: Paired samples
set.seed(42)
pre  <- c(4.5, 5.2, 3.8, 6.1, 4.9, 5.7, 3.6, 5.0, 4.3, 6.5)
post <- c(5.1, 4.9, 4.5, 5.8, 5.5, 5.2, 4.3, 5.4, 4.0, 6.2)
boot_ses_calc(x = pre, y = post,
              paired = TRUE,
              ses = "rb",
              alternative = "greater",
              R = 99)
#> 
#> 	Bootstrapped Paired Sample Rank-Biserial Correlation test
#> 
#> data:  pre and post
#> z-observed = -1.0385, p-value = 0.8687
#> alternative hypothesis: true Rank-Biserial Correlation is greater than 0
#> 95 percent confidence interval:
#>  -0.8099141  0.4894545
#> sample estimates:
#> Rank-Biserial Correlation 
#>                -0.4181818 
#> 

# Example 5: Using formula notation
data(mtcars)
result <- boot_ses_calc(formula = mpg ~ am,
                        data = mtcars,
                        ses = "cstat",
                        boot_ci = "perc",
                        R = 99)