Skip to contents

[Experimental]

Performs hypothesis testing for rank-based effect sizes using a parametric bootstrap. This function is designed primarily for equivalence testing (TOST) and minimal effect testing with non-zero null hypotheses, where permutation-based approaches are not valid for rank-based effect sizes.

Usage

boot_ses_test(
  x,
  ...,
  paired = FALSE,
  ses = c("rb", "cstat", "odds", "logodds"),
  alpha = 0.05,
  mu = NULL,
  alternative = c("two.sided", "less", "greater", "equivalence", "minimal.effect"),
  B = 2000L,
  keep_boot = TRUE
)

# Default S3 method
boot_ses_test(
  x,
  y = NULL,
  paired = FALSE,
  ses = c("rb", "cstat", "odds", "logodds"),
  alpha = 0.05,
  mu = NULL,
  alternative = c("two.sided", "less", "greater", "equivalence", "minimal.effect"),
  B = 2000L,
  keep_boot = TRUE,
  ...
)

# S3 method for class 'formula'
boot_ses_test(formula, data, subset, na.action, ...)

Arguments

x

numeric vector of data values (first group or pre-treatment).

...

further arguments to be passed to or from methods.

paired

a logical indicating whether you want a paired t-test. Cannot be used with the formula method; use x and y vectors instead for paired tests.

ses

a character string specifying the effect size measure: - "rb": rank-biserial correlation (default) - "cstat": concordance statistic (C-statistic/AUC) - "odds": Wilcoxon-Mann-Whitney odds - "logodds": Wilcoxon-Mann-Whitney log-odds

alpha

significance level (default = 0.05).

mu

the null hypothesis value(s) on the scale of the chosen effect size. - For standard alternatives: a single value (default = 0 for rb/logodds, 0.5 for cstat, 1 for odds) - For equivalence/minimal.effect: two values representing the lower and upper bounds, or a single value for symmetric bounds

alternative

a character string specifying the alternative hypothesis: - "two.sided": Test whether effect differs from mu - "less": Test whether effect is less than mu - "greater": Test whether effect is greater than mu - "equivalence": TOST equivalence test (effect inside bounds) - "minimal.effect": Minimal effect test (effect outside bounds)

B

integer; the number of bootstrap replicates (default = 2000). Increase to 5000+ for publication-quality results.

keep_boot

logical; if TRUE (default), return the bootstrap distributions in the output.

y

numeric vector of data values (second group or post-treatment).

formula

a formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs either 1 for a one-sample test or a factor with two levels giving the corresponding groups. For paired tests, use the default method with x and y vectors instead of the formula method.

data

an optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).

subset

an optional vector specifying a subset of observations to be used.

na.action

a function which indicates what should happen when the data contain NAs. Defaults to getOption("na.action").

Value

A list with class "htest" containing:

estimate

Observed effect size on the requested scale.

p.value

Bootstrap p-value.

alternative

Character string describing the alternative hypothesis.

method

Description of the test performed.

null.value

Null hypothesis value(s) on the requested scale.

data.name

Character string giving the name(s) of the data.

call

The matched call.

model.param

The model parameter(s) used for null generation (for diagnostics). For two-sample: the Lehmann gamma parameter(s). For paired: the sign probability parameter(s).

boot.dist

Bootstrap null distribution (if keep_boot = TRUE and standard alternative).

boot.dist.low

Bootstrap distribution under lower bound (if keep_boot = TRUE and TOST).

boot.dist.high

Bootstrap distribution under upper bound (if keep_boot = TRUE and TOST).

Details

This function calculates p-values for rank-based effect sizes using a parametric bootstrap. It generates data under the null hypothesis and compares the observed effect size to the resulting null reference distribution.

Why Not Permutation?

Permutation tests are exact and assumption-free when testing the null \(\rho = 0\). However, for non-zero nulls — as required by equivalence (TOST) and minimal effect testing — the permutation distribution cannot be shifted to the correct null by arithmetic operations. The rank-biserial correlation is a nonlinear, bounded function of the data, so there is no data transformation that shifts rb by a fixed \(\Delta\). Studentization (as used in perm_t_test() and brunner_munzel()) cannot rescue this because rb is not a studentized statistic.

This function uses parametric models to generate data under the non-zero null. The tradeoff is that validity now depends on how well the model approximates the true data-generating process.

Users who need TOST for means should use boot_t_TOST(), which handles non-zero nulls correctly via studentization without any parametric assumption.

Why No Confidence Intervals?

This function intentionally omits confidence intervals. The parametric bootstrap here generates data under a specific null to produce a reference distribution for p-value computation. This is fundamentally different from the nonparametric bootstrap in boot_ses_calc(), which resamples from the observed data to characterize the sampling distribution of the estimator. Users who need confidence intervals should use boot_ses_calc() or the asymptotic intervals from ses_calc().

Algorithm

Two-sample (independent): Uses the Lehmann alternative (proportional hazards) model. Data are pooled and treated as a common empirical CDF \(F\). Under the null \(P(X > Y) = \text{target}\), group Y is resampled from \(F\) and group X is resampled from a transformed CDF \(G(t) = 1 - (1 - F(t))^{1/\gamma}\) where \(\gamma = \text{target\_cstat} / (1 - \text{target\_cstat})\). This produces data with the correct population rank-biserial under the proportional hazards assumption.

Paired samples: Uses a sign-randomization model. The absolute differences \(|d_i| = |x_i - y_i|\) are resampled with replacement, and signs are assigned independently with probability \(P(\text{sign} = +) = (1 + \Delta) / 2\). This produces a bootstrap distribution with the correct expected signed-rank rb under the assumption of rank-independent sign probabilities.

For equivalence testing, two null distributions are generated (one per bound) and the TOST p-value is the maximum of the two one-sided p-values.

Warning

This function is experimental. Important caveats:

  • Validity depends on parametric assumptions (Lehmann alternative for two-sample, sign-randomization for paired). Unlike permutation tests, this is not assumption-free.

  • The procedure is most reliable for continuous data, moderate n (>= 20), and equivalence bounds not too close to +/-1.

  • Results should be interpreted with caution and ideally cross-checked against asymptotic methods from ses_calc().

  • This function exists because no assumption-free alternative for non-zero null TOST is available for rank-based effect sizes. The choice is between this and no test at all, not between this and a better test.

Future Work

  • Confidence intervals: Not currently provided. Adding CIs would require test inversion across a grid of null values (Berger & Boos, 1994), which is computationally expensive. Use boot_ses_calc() or ses_calc() for interval estimation.

  • Rank-dependent sign model: The current paired bootstrap assigns signs independently of rank. A rank-weighted sign model could improve accuracy by allowing the sign probability to depend on the magnitude of the difference.

  • Copula-based generation: A normal copula model could provide an alternative data generation mechanism, especially for paired data where the joint distribution matters beyond marginals. This would require numerical calibration of the copula parameter to the target rb.

References

Berger, R.L. and Boos, D.D. (1994). P values maximized over a confidence set for the nuisance parameter. Journal of the American Statistical Association, 89, 1012-1016.

Lehmann, E.L. (1975). Nonparametrics: Statistical Methods Based on Ranks. Holden-Day.

See also

ses_calc() for asymptotic inference, boot_ses_calc() for bootstrap confidence intervals, brunner_munzel() with test_method = "perm" for robust TOST on the probability scale.

Other Robust tests: boot_log_TOST(), boot_t_TOST(), boot_t_test(), brunner_munzel(), hodges_lehmann(), log_TOST(), perm_t_test(), wilcox_TOST()

Examples

# \donttest{
# Example 1: Two-sided test
set.seed(42)
x <- rnorm(30, mean = 0)
y <- rnorm(30, mean = 0.5)
boot_ses_test(x = x, y = y, ses = "rb",
              mu = 0, alternative = "two.sided", B = 599)
#> 
#> 	Parametric Bootstrap Two Sample Rank-Biserial Correlation Test
#> 
#> data:  x and y
#> p-value = 0.2905
#> alternative hypothesis: true rank-biserial is not equal to 0
#> sample estimates:
#> Rank-Biserial Correlation 
#>                -0.1488889 
#> 

# Example 2: Equivalence test with rank-biserial
boot_ses_test(x = x, y = y, ses = "rb",
              mu = c(-0.4, 0.4), alternative = "equivalence", B = 599)
#> 
#> 	Parametric Bootstrap Two Sample Rank-Biserial Correlation Test
#> 
#> data:  x and y
#> p-value = 0.04007
#> alternative hypothesis: equivalence
#> null values:
#> lower bound upper bound 
#>        -0.4         0.4 
#> sample estimates:
#> Rank-Biserial Correlation 
#>                -0.1488889 
#> 

# Example 3: Paired samples
pre  <- c(4.5, 5.2, 3.8, 6.1, 4.9, 5.7, 3.6, 5.0, 4.3, 6.5,
          4.1, 5.5, 3.9, 6.0, 4.7, 5.3, 3.7, 5.1, 4.4, 6.3)
post <- c(5.1, 4.9, 4.5, 5.8, 5.5, 5.2, 4.3, 5.4, 4.0, 6.2,
          4.8, 5.3, 4.2, 5.7, 5.1, 5.0, 4.1, 5.3, 4.2, 6.1)
boot_ses_test(x = pre, y = post, paired = TRUE,
              ses = "rb", mu = 0, alternative = "two.sided", B = 599)
#> 
#> 	Parametric Bootstrap Paired Sample Rank-Biserial Correlation Test
#> 
#> data:  pre and post
#> p-value = 0.1636
#> alternative hypothesis: true rank-biserial is not equal to 0
#> sample estimates:
#> Rank-Biserial Correlation 
#>                -0.3571429 
#> 

# Example 4: Using formula interface
data(mtcars)
boot_ses_test(formula = mpg ~ am, data = mtcars,
              ses = "rb", mu = 0,
              alternative = "two.sided", B = 599)
#> 
#> 	Parametric Bootstrap Two Sample Rank-Biserial Correlation Test
#> 
#> data:  mpg by am
#> p-value = 0.005008
#> alternative hypothesis: true rank-biserial is not equal to 0
#> sample estimates:
#> Rank-Biserial Correlation 
#>                 -0.659919 
#> 
# }