
Parametric Bootstrap Test for Rank-Based Effect Sizes
boot_ses_test.RdPerforms hypothesis testing for rank-based effect sizes using a parametric bootstrap. This function is designed primarily for equivalence testing (TOST) and minimal effect testing with non-zero null hypotheses, where permutation-based approaches are not valid for rank-based effect sizes.
Usage
boot_ses_test(
x,
...,
paired = FALSE,
ses = c("rb", "cstat", "odds", "logodds"),
alpha = 0.05,
mu = NULL,
alternative = c("two.sided", "less", "greater", "equivalence", "minimal.effect"),
B = 2000L,
keep_boot = TRUE
)
# Default S3 method
boot_ses_test(
x,
y = NULL,
paired = FALSE,
ses = c("rb", "cstat", "odds", "logodds"),
alpha = 0.05,
mu = NULL,
alternative = c("two.sided", "less", "greater", "equivalence", "minimal.effect"),
B = 2000L,
keep_boot = TRUE,
...
)
# S3 method for class 'formula'
boot_ses_test(formula, data, subset, na.action, ...)Arguments
- x
numeric vector of data values (first group or pre-treatment).
- ...
further arguments to be passed to or from methods.
- paired
a logical indicating whether you want a paired t-test. Cannot be used with the formula method; use x and y vectors instead for paired tests.
- ses
a character string specifying the effect size measure: - "rb": rank-biserial correlation (default) - "cstat": concordance statistic (C-statistic/AUC) - "odds": Wilcoxon-Mann-Whitney odds - "logodds": Wilcoxon-Mann-Whitney log-odds
- alpha
significance level (default = 0.05).
- mu
the null hypothesis value(s) on the scale of the chosen effect size. - For standard alternatives: a single value (default = 0 for rb/logodds, 0.5 for cstat, 1 for odds) - For equivalence/minimal.effect: two values representing the lower and upper bounds, or a single value for symmetric bounds
- alternative
a character string specifying the alternative hypothesis: - "two.sided": Test whether effect differs from mu - "less": Test whether effect is less than mu - "greater": Test whether effect is greater than mu - "equivalence": TOST equivalence test (effect inside bounds) - "minimal.effect": Minimal effect test (effect outside bounds)
- B
integer; the number of bootstrap replicates (default = 2000). Increase to 5000+ for publication-quality results.
- keep_boot
logical; if
TRUE(default), return the bootstrap distributions in the output.- y
numeric vector of data values (second group or post-treatment).
- formula
a formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs either 1 for a one-sample test or a factor with two levels giving the corresponding groups. For paired tests, use the default method with x and y vectors instead of the formula method.
- data
an optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).
- subset
an optional vector specifying a subset of observations to be used.
- na.action
a function which indicates what should happen when the data contain NAs. Defaults to getOption("na.action").
Value
A list with class "htest" containing:
- estimate
Observed effect size on the requested scale.
- p.value
Bootstrap p-value.
- alternative
Character string describing the alternative hypothesis.
- method
Description of the test performed.
- null.value
Null hypothesis value(s) on the requested scale.
- data.name
Character string giving the name(s) of the data.
- call
The matched call.
- model.param
The model parameter(s) used for null generation (for diagnostics). For two-sample: the Lehmann gamma parameter(s). For paired: the sign probability parameter(s).
- boot.dist
Bootstrap null distribution (if
keep_boot = TRUEand standard alternative).- boot.dist.low
Bootstrap distribution under lower bound (if
keep_boot = TRUEand TOST).- boot.dist.high
Bootstrap distribution under upper bound (if
keep_boot = TRUEand TOST).
Details
This function calculates p-values for rank-based effect sizes using a parametric bootstrap. It generates data under the null hypothesis and compares the observed effect size to the resulting null reference distribution.
Why Not Permutation?
Permutation tests are exact and assumption-free when testing the null
\(\rho = 0\). However, for non-zero nulls — as required by equivalence
(TOST) and minimal effect testing — the permutation distribution cannot be
shifted to the correct null by arithmetic operations. The rank-biserial
correlation is a nonlinear, bounded function of the data, so there is no
data transformation that shifts rb by a fixed \(\Delta\). Studentization
(as used in perm_t_test() and brunner_munzel()) cannot rescue this
because rb is not a studentized statistic.
This function uses parametric models to generate data under the non-zero null. The tradeoff is that validity now depends on how well the model approximates the true data-generating process.
Users who need TOST for means should use boot_t_TOST(), which handles
non-zero nulls correctly via studentization without any parametric
assumption.
Why No Confidence Intervals?
This function intentionally omits confidence intervals. The parametric
bootstrap here generates data under a specific null to produce a reference
distribution for p-value computation. This is fundamentally different from
the nonparametric bootstrap in boot_ses_calc(), which resamples from the
observed data to characterize the sampling distribution of the estimator.
Users who need confidence intervals should use boot_ses_calc() or the
asymptotic intervals from ses_calc().
Algorithm
Two-sample (independent): Uses the Lehmann alternative (proportional hazards) model. Data are pooled and treated as a common empirical CDF \(F\). Under the null \(P(X > Y) = \text{target}\), group Y is resampled from \(F\) and group X is resampled from a transformed CDF \(G(t) = 1 - (1 - F(t))^{1/\gamma}\) where \(\gamma = \text{target\_cstat} / (1 - \text{target\_cstat})\). This produces data with the correct population rank-biserial under the proportional hazards assumption.
Paired samples: Uses a sign-randomization model. The absolute differences \(|d_i| = |x_i - y_i|\) are resampled with replacement, and signs are assigned independently with probability \(P(\text{sign} = +) = (1 + \Delta) / 2\). This produces a bootstrap distribution with the correct expected signed-rank rb under the assumption of rank-independent sign probabilities.
For equivalence testing, two null distributions are generated (one per bound) and the TOST p-value is the maximum of the two one-sided p-values.
Warning
This function is experimental. Important caveats:
Validity depends on parametric assumptions (Lehmann alternative for two-sample, sign-randomization for paired). Unlike permutation tests, this is not assumption-free.
The procedure is most reliable for continuous data, moderate n (>= 20), and equivalence bounds not too close to +/-1.
Results should be interpreted with caution and ideally cross-checked against asymptotic methods from
ses_calc().This function exists because no assumption-free alternative for non-zero null TOST is available for rank-based effect sizes. The choice is between this and no test at all, not between this and a better test.
Future Work
Confidence intervals: Not currently provided. Adding CIs would require test inversion across a grid of null values (Berger & Boos, 1994), which is computationally expensive. Use
boot_ses_calc()orses_calc()for interval estimation.Rank-dependent sign model: The current paired bootstrap assigns signs independently of rank. A rank-weighted sign model could improve accuracy by allowing the sign probability to depend on the magnitude of the difference.
Copula-based generation: A normal copula model could provide an alternative data generation mechanism, especially for paired data where the joint distribution matters beyond marginals. This would require numerical calibration of the copula parameter to the target rb.
References
Berger, R.L. and Boos, D.D. (1994). P values maximized over a confidence set for the nuisance parameter. Journal of the American Statistical Association, 89, 1012-1016.
Lehmann, E.L. (1975). Nonparametrics: Statistical Methods Based on Ranks. Holden-Day.
See also
ses_calc() for asymptotic inference, boot_ses_calc() for
bootstrap confidence intervals, brunner_munzel() with
test_method = "perm" for robust TOST on the probability scale.
Other Robust tests:
boot_log_TOST(),
boot_t_TOST(),
boot_t_test(),
brunner_munzel(),
hodges_lehmann(),
log_TOST(),
perm_t_test(),
wilcox_TOST()
Examples
# \donttest{
# Example 1: Two-sided test
set.seed(42)
x <- rnorm(30, mean = 0)
y <- rnorm(30, mean = 0.5)
boot_ses_test(x = x, y = y, ses = "rb",
mu = 0, alternative = "two.sided", B = 599)
#>
#> Parametric Bootstrap Two Sample Rank-Biserial Correlation Test
#>
#> data: x and y
#> p-value = 0.2905
#> alternative hypothesis: true rank-biserial is not equal to 0
#> sample estimates:
#> Rank-Biserial Correlation
#> -0.1488889
#>
# Example 2: Equivalence test with rank-biserial
boot_ses_test(x = x, y = y, ses = "rb",
mu = c(-0.4, 0.4), alternative = "equivalence", B = 599)
#>
#> Parametric Bootstrap Two Sample Rank-Biserial Correlation Test
#>
#> data: x and y
#> p-value = 0.04007
#> alternative hypothesis: equivalence
#> null values:
#> lower bound upper bound
#> -0.4 0.4
#> sample estimates:
#> Rank-Biserial Correlation
#> -0.1488889
#>
# Example 3: Paired samples
pre <- c(4.5, 5.2, 3.8, 6.1, 4.9, 5.7, 3.6, 5.0, 4.3, 6.5,
4.1, 5.5, 3.9, 6.0, 4.7, 5.3, 3.7, 5.1, 4.4, 6.3)
post <- c(5.1, 4.9, 4.5, 5.8, 5.5, 5.2, 4.3, 5.4, 4.0, 6.2,
4.8, 5.3, 4.2, 5.7, 5.1, 5.0, 4.1, 5.3, 4.2, 6.1)
boot_ses_test(x = pre, y = post, paired = TRUE,
ses = "rb", mu = 0, alternative = "two.sided", B = 599)
#>
#> Parametric Bootstrap Paired Sample Rank-Biserial Correlation Test
#>
#> data: pre and post
#> p-value = 0.1636
#> alternative hypothesis: true rank-biserial is not equal to 0
#> sample estimates:
#> Rank-Biserial Correlation
#> -0.3571429
#>
# Example 4: Using formula interface
data(mtcars)
boot_ses_test(formula = mpg ~ am, data = mtcars,
ses = "rb", mu = 0,
alternative = "two.sided", B = 599)
#>
#> Parametric Bootstrap Two Sample Rank-Biserial Correlation Test
#>
#> data: mpg by am
#> p-value = 0.005008
#> alternative hypothesis: true rank-biserial is not equal to 0
#> sample estimates:
#> Rank-Biserial Correlation
#> -0.659919
#>
# }