
Hodges-Lehmann Test
hodges_lehmann.RdPerforms location tests based on the Hodges-Lehmann estimator with permutation-based or asymptotic inference. This function supports one-sample, two-sample (independent), and paired designs, as well as equivalence and minimal effect testing.
Usage
hodges_lehmann(x, ...)
# Default S3 method
hodges_lehmann(
x,
y = NULL,
alternative = c("two.sided", "less", "greater", "equivalence", "minimal.effect"),
mu = 0,
paired = FALSE,
alpha = 0.05,
R = NULL,
scale = c("S2", "S1"),
p_method = NULL,
keep_perm = TRUE,
...
)
# S3 method for class 'formula'
hodges_lehmann(formula, data, subset, na.action, ...)Arguments
- x
a (non-empty) numeric vector of data values.
- ...
further arguments (currently ignored).
- y
an optional numeric vector of data values.
- alternative
the alternative hypothesis: - "two.sided": different from mu (default) - "less": less than mu - "greater": greater than mu - "equivalence": between specified bounds - "minimal.effect": outside specified bounds
- mu
a number or vector specifying the null hypothesis value(s):
For standard alternatives (two.sided, less, greater): a single value (default: 0)
For equivalence/minimal.effect: either a single value (symmetric bounds will be created) or a vector of two values representing the lower and upper bounds
- paired
a logical indicating whether this is a paired test. If
TRUE,xandymust have the same length, and differences (x - y) are analyzed.- alpha
significance level (default = 0.05). For standard alternatives, the confidence interval has level 1-alpha. For equivalence/minimal.effect, the confidence interval has level 1-2*alpha (90% when alpha = 0.05).
- R
the number of permutations. Default is
NULL, which uses the asymptotic test. IfR >= max_perms(the maximum number of possible permutations), exact permutation is computed. Otherwise, Monte Carlo (permutation with replacement; i.e., randomization testing) sampling is used. Note: permutation tests are not supported foralternative = "equivalence"oralternative = "minimal.effect"(see Details).- scale
the scale estimator for standardizing the test statistic in permutation tests. Options are:
"S2"(default): Median of absolute pairwise differences from the median-corrected combined sample. Recommended when samples come from the same distribution under H0."S1": Pooled within-sample absolute differences. Preferred when sample sizes are small or unequal.
- p_method
the method for computing permutation p-values. Options are:
NULL(default): Automatically selects "exact" for exact permutation tests and "plusone" for randomization tests."exact": Uses b/R where b is the count of permutation statistics at least as extreme as observed. Appropriate when all permutations are enumerated."plusone": Uses (b+1)/(R+1), which guarantees p > 0 and provides exact Type I error control for randomization tests where permutations are sampled with replacement (Phipson & Smyth, 2010).
- keep_perm
logical. If
TRUE(default), the permutation distribution of the test statistic and effects are stored in the output. Set toFALSEfor large datasets to save memory.- formula
a formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs either 1 for a one-sample test or a factor with two levels giving the corresponding groups. For paired tests, use the default method with x and y vectors instead of the formula method.
- data
an optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).
- subset
an optional vector specifying a subset of observations to be used.
- na.action
a function which indicates what should happen when the data contain NAs. Defaults to getOption("na.action").
Value
A list with class "htest" containing the following components:
statistic: the value of the test statistic.p.value: the p-value for the test.conf.int: a confidence interval for the location parameter.estimate: the Hodges-Lehmann estimate(s).null.value: the specified hypothesized value(s).alternative: a character string describing the alternative hypothesis.method: a character string indicating the test type.data.name: a character string giving the name(s) of the data.call: the matched call.R: the requested number of permutations (NULL for asymptotic).R.used: the actual number of permutations used.perm.stat: (ifkeep_perm = TRUE) the permutation distribution of test statistics.perm.eff: (ifkeep_perm = TRUE) the permutation distribution of effects.
Details
Hodges-Lehmann Estimators
One-sample/paired (HL1): The median of all pairwise averages (Walsh averages): $$\hat{\theta}_{HL1} = \text{med}\left\{\frac{X_i + X_j}{2} : 1 \leq i \leq j \leq n\right\}$$
This estimator is consistent with the pseudomedian returned by stats::wilcox.test()
for one-sample and paired tests.
Two-sample (HL2): The median of all pairwise differences between samples: $$\hat{\Delta}_{HL2} = \text{med}\{Y_j - X_i : i = 1, \ldots, m; j = 1, \ldots, n\}$$
This estimator is consistent with the location shift estimate returned by
stats::wilcox.test() for two-sample tests.
Test Methods
Asymptotic test (R = NULL): Uses kernel density estimation (Fried & Dehling, 2011) to estimate the
variance of the Hodges-Lehmann estimator (note: this generates confidence intervals that will differ from stats::wilcox.test()). The test statistic follows an approximate
normal distribution. This method may have issues with very heavy-tailed distributions,
very skewed distributions, or small sample sizes (n < 30 per group). In these cases,
consider using the permutation test instead.
Exact permutation test (R >= max_perms): Enumerates all possible permutations and provides exact p-values. For one-sample/paired tests, there are 2^n possible sign-flipping permutations. For two-sample tests, there are choose(m+n, m) possible group reassignments.
Randomization test (R < max_perms): Samples R permutations randomly (with replacement) and computes approximate p-values. The (b+1)/(R+1) formula guarantees exact Type I error control (Phipson & Smyth, 2010).
Scale Estimators
For permutation tests, the test statistic is standardized using a robust scale estimator (Fried & Dehling, 2011):
S1 (pooled within-sample): $$S^{(1)}_{m,n} = \text{med}\{|X_i - X_j| : 1 \leq i < j \leq m, |Y_i - Y_j| : 1 \leq i < j \leq n\}$$
S2 (median-corrected joint): $$S^{(2)}_{m,n} = \text{med}\{|Z_i - Z_j| : 1 \leq i < j \leq m+n\}$$
where Z is the median-corrected combined sample.
Permutation Tests with Non-Zero Null Values
For standard alternatives (two.sided, less, greater) with mu != 0,
the permutation test uses an approximate approach: the observed test statistic
is centered at mu, but the permutation distribution is generated under
exchangeability (effectively mu = 0). Because the Hodges-Lehmann statistic
divided by the S1/S2 scale estimator is not a true pivot, this comparison is
approximate rather than exact. The approximation is generally adequate when
mu is moderate relative to the scale of the data, but may lose accuracy for
extreme null values. For the two-sample case, the scale estimator is
recomputed for each permutation, which partially mitigates this issue.
Equivalence and Minimal Effect Testing
Equivalence and minimal effect tests are only available with the asymptotic
method (R = NULL). Permutation tests are not supported for these
alternatives because the scale estimators (S1, S2) from Fried & Dehling
(2011) do not produce a pivotal test statistic for the Hodges-Lehmann
estimator. Without pivotality, the permutation distribution generated under
the exchangeability null is not a valid reference distribution for testing
at the equivalence bounds, and the resulting p-values can be unreliable.
This limitation compounds with the inherent conservatism of the naive
intersection-union procedure, potentially yielding substantial power loss.
The asymptotic method uses kernel density estimation to approximate the standard error, which provides a proper pivot and valid boundary-null inference.
Alternatives
The function supports five alternative hypotheses:
"two.sided": Tests whether the location parameter differs frommu"less": Tests whether the location parameter is less thanmu"greater": Tests whether the location parameter is greater thanmu"equivalence": Tests whether the location parameter lies within the bounds specified bymu(TOST procedure)"minimal.effect": Tests whether the location parameter lies outside the bounds specified bymu
Purpose
The Hodges-Lehmann estimator provides a robust alternative to the mean for testing location differences. It is arguably a more stable option in the presence of outliers. This function offers:
Exact permutation tests for small samples
Randomization tests (permutation with replacement) for larger samples
Asymptotic tests using kernel density estimation of the scale parameter
Support for equivalence and minimal effect testing
An interface that mirrors
wilcox.testandperm_t_test
References
Hodges, J. L., & Lehmann, E. L. (1963). Estimates of location based on rank tests. Annals of Mathematical Statistics, 34, 598-611.
Fried, R., & Dehling, H. (2011). Robust nonparametric tests for the two-sample location problem. Statistical Methods & Applications, 20, 409-422.
Lehmann, E. L. (1963). Nonparametric confidence intervals for a shift parameter. Annals of Mathematical Statistics, 34, 1507-1512.
Phipson, B., & Smyth, G. K. (2010). Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn. Statistical Applications in Genetics and Molecular Biology, 9(1), Article 39.
See also
Other Robust tests:
boot_log_TOST(),
boot_ses_test(),
boot_t_TOST(),
boot_t_test(),
brunner_munzel(),
log_TOST(),
perm_t_test(),
wilcox_TOST()
Examples
# Two-sample test (asymptotic)
set.seed(123)
x <- rnorm(30, mean = 0)
y <- rnorm(30, mean = 0.5)
hodges_lehmann(x, y)
#>
#> Asymptotic Hodges-Lehmann Two Sample Test
#>
#> data: x and y
#> Z = -2.9132, p-value = 0.003578
#> alternative hypothesis: true location is not equal to 0
#> 95 percent confidence interval:
#> -1.2738382 -0.2491673
#> sample estimates:
#> Hodges-Lehmann estimate (x - y)
#> -0.7615028
#>
# Two-sample test with permutation
hodges_lehmann(x, y, R = 1999)
#>
#> Randomization Hodges-Lehmann Two Sample Test
#>
#> data: x and y
#> Z = -0.85706, p-value = 0.0035
#> alternative hypothesis: true location is not equal to 0
#> 95 percent confidence interval:
#> -1.3053775 -0.2508359
#> sample estimates:
#> Hodges-Lehmann estimate (x - y)
#> -0.7615028
#>
# One-sample test
x <- rnorm(20, mean = 0.5)
hodges_lehmann(x, mu = 0)
#>
#> Asymptotic Hodges-Lehmann One Sample Test
#>
#> data: x
#> Z = 1.103, p-value = 0.27
#> alternative hypothesis: true location is not equal to 0
#> 95 percent confidence interval:
#> -0.1610828 0.5757722
#> sample estimates:
#> (pseudo)median of x
#> 0.2073447
#>
# Paired test
before <- c(5.1, 4.8, 6.2, 5.7, 6.0, 5.5, 4.9, 5.8)
after <- c(5.6, 5.2, 6.7, 6.1, 6.5, 5.8, 5.3, 6.2)
hodges_lehmann(before, after, paired = TRUE)
#>
#> Asymptotic Hodges-Lehmann Paired Test
#>
#> data: before and after
#> Z = -15.888, p-value < 2.2e-16
#> alternative hypothesis: true location is not equal to 0
#> 95 percent confidence interval:
#> -0.4774274 -0.3725726
#> sample estimates:
#> Hodges-Lehmann estimate (z = x - y)
#> -0.425
#>
# Equivalence test (asymptotic only)
hodges_lehmann(x, y, alternative = "equivalence", mu = c(-1, 1))
#>
#> Asymptotic Hodges-Lehmann Two Sample Test
#>
#> data: x and y
#> Z = 2.0195, p-value = 0.02172
#> alternative hypothesis: equivalence
#> null values:
#> location location
#> -1 1
#> 90 percent confidence interval:
#> -0.9048422 -0.0692449
#> sample estimates:
#> Hodges-Lehmann estimate (x - y)
#> -0.4870435
#>
# Formula interface
hodges_lehmann(extra ~ group, data = sleep)
#>
#> Asymptotic Hodges-Lehmann Two Sample Test
#>
#> data: extra by group
#> Z = -1.3179, p-value = 0.1875
#> alternative hypothesis: true location is not equal to 0
#> 95 percent confidence interval:
#> -3.3576425 0.6576425
#> sample estimates:
#> Hodges-Lehmann estimate ('1' - '2')
#> -1.35
#>