Skip to contents

[Stable]

Performs t-tests with bootstrapped p-values and confidence intervals, with optional trimmed means (Yuen's approach) for robust inference. This function supports standard hypothesis testing alternatives as well as equivalence and minimal effect testing, all with the familiar htest output structure.

Usage

boot_t_test(x, ...)

# Default S3 method
boot_t_test(
  x,
  y = NULL,
  var.equal = FALSE,
  paired = FALSE,
  alternative = c("two.sided", "less", "greater", "equivalence", "minimal.effect"),
  mu = 0,
  alpha = 0.05,
  tr = 0,
  boot_ci = c("stud", "basic", "perc", "bca"),
  R = 1999,
  ...
)

# S3 method for class 'formula'
boot_t_test(formula, data, subset, na.action, ...)

Arguments

x

a (non-empty) numeric vector of data values.

...

further arguments to be passed to or from the underlying test functions.

y

an optional (non-empty) numeric vector of data values.

var.equal

a logical variable indicating whether to treat the two variances as being equal. If TRUE then the pooled variance is used to estimate the variance otherwise the Welch (or Satterthwaite) approximation to the degrees of freedom is used.

paired

a logical indicating whether you want a paired t-test. Cannot be used with the formula method; use x and y vectors instead for paired tests.

alternative

the alternative hypothesis: * "two.sided": different from mu (default) * "less": less than mu * "greater": greater than mu * "equivalence": between specified bounds * "minimal.effect": outside specified bounds

mu

a number or vector specifying the null hypothesis value(s): * For standard alternatives: a single value (default = 0) * For equivalence/minimal.effect: two values representing the lower and upper bounds

alpha

alpha level (default = 0.05)

tr

the fraction (0 to 0.5) of observations to be trimmed from each end before computing the mean and winsorized variance. Default is 0 (no trimming). When tr > 0, the function performs a bootstrapped Yuen's trimmed t-test.

boot_ci

method for bootstrap confidence interval calculation: "stud" (studentized, default), "basic" (basic bootstrap), "bca" (bias-corrected and accelerated), or "perc" (percentile bootstrap).

R

number of bootstrap replications (default = 1999).

formula

a formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs either 1 for a one-sample test or a factor with two levels giving the corresponding groups. For paired tests, use the default method with x and y vectors instead of the formula method.

data

an optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).

subset

an optional vector specifying a subset of observations to be used.

na.action

a function which indicates what should happen when the data contain NAs. Defaults to getOption("na.action").

Value

A list with class "htest" containing the following components:

  • "statistic": the observed t-statistic (note: the p-value is derived from the bootstrap distribution, not from this statistic and the degrees of freedom).

  • "parameter": the degrees of freedom for the t-statistic.

  • "p.value": the bootstrapped p-value for the test.

  • "stderr": the bootstrapped standard error.

  • "conf.int": a bootstrapped confidence interval for the mean appropriate to the specified alternative hypothesis.

  • "estimate": the estimated mean or difference in means.

  • "null.value": the specified hypothesized value(s) of the mean or mean difference.

  • "alternative": a character string describing the alternative hypothesis.

  • "method": a character string indicating what type of bootstrapped t-test was performed.

  • "boot": the bootstrap samples of the mean or mean difference.

  • "data.name": a character string giving the name(s) of the data.

  • "call": the matched call.

Details

This function performs bootstrapped t-tests, providing more robust inference than standard parametric t-tests. It supports one-sample, two-sample (independent), and paired designs, as well as five different alternative hypotheses.

The bootstrap procedure follows these steps:

  • Calculate the test statistic from the original data

  • Generate R bootstrap samples by resampling with replacement

  • Calculate the test statistic for each bootstrap sample

  • Compute the p-value by comparing the original test statistic to the bootstrap distribution

  • Calculate confidence intervals using the specified bootstrap method

Bootstrap Confidence Interval Methods

Four bootstrap confidence interval methods are available via the boot_ci argument:

  • Studentized bootstrap ("stud"): Uses the bootstrap distribution of pivotal t-statistics to account for variability in standard error estimates. This is the default and usually provides the most accurate coverage.

  • Basic bootstrap ("basic"): Reflects the bootstrap distribution of estimates around the observed value.

  • Percentile bootstrap ("perc"): Uses percentiles of the bootstrap distribution directly.

  • Bias-corrected and accelerated ("bca"): Corrects for both bias and skewness in the bootstrap distribution using jackknife-based acceleration. Most accurate when the bootstrap distribution is skewed, but computationally more expensive.

Bootstrap P-values

The p-value is computed using the method that matches the selected boot_ci, ensuring that p < alpha if and only if the corresponding confidence interval excludes the null value (CI inversion principle). Previously, all bootstrap CI methods used the studentized (pivot) p-value, which could produce p-values inconsistent with non-studentized CIs.

For different alternatives, the p-values are calculated as follows:

  • "two.sided": Two-tailed p-value from the bootstrap distribution

  • "less": One-sided p-value for the hypothesis that the true value is less than the null

  • "greater": One-sided p-value for the hypothesis that the true value is greater than the null

  • "equivalence": Maximum of two one-sided p-values (for lower and upper bounds)

  • "minimal.effect": Minimum of two one-sided p-values (for lower and upper bounds)

For two-sample tests, the test is of \(\bar x - \bar y\) (mean of x minus mean of y). For paired samples, the test is of the difference scores (z), wherein \(z = x - y\), and the test is of \(\bar z\) (mean of the difference scores). For one-sample tests, the test is of \(\bar x\) (mean of x).

When tr > 0, the function uses Yuen's trimmed t-test approach: trimmed means are computed by removing the fraction tr of observations from each tail, and winsorized variances are used in place of standard variances. This provides robustness against outliers and heavy-tailed distributions. The bootstrap procedure recomputes trimmed means and winsorized standard errors for each bootstrap replicate.

Unlike the t_TOST function, this function returns a standard htest object for compatibility with other R functions, while still providing the benefits of bootstrapping.

For detailed information on calculation methods, see vignette("robustTOST").

Purpose

Use this function when:

  • You need more robust inference than provided by standard t-tests

  • Your data don't meet the assumptions of normality or homogeneity

  • You want to perform equivalence or minimal effect testing with bootstrap methods

  • Sample sizes are small or standard parametric approaches may be unreliable

  • You prefer the standard htest output format for compatibility with other R functions

References

Efron, B., & Tibshirani, R. J. (1994). An introduction to the bootstrap. CRC press.

Yuen, K. K. (1974). The two-sample trimmed t for unequal population variances. Biometrika, 61(1), 165-170.

Examples


# Example 1: Basic two-sample test with formula notation
data(sleep)
result <- boot_t_test(extra ~ group, data = sleep)
result  # Standard htest output format
#> 
#> 	Bootstrapped Welch Two Sample t-test (studentized)
#> 
#> data:  extra by group
#> t-observed = -1.8608, df = 17.776, p-value = 0.08004
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -3.431076  0.223132
#> sample estimates:
#>           mean of group '1'           mean of group '2' 
#>                        0.75                        2.33 
#> mean difference ('1' - '2') 
#>                       -1.58 
#> 

# Example 2: One-sample bootstrapped t-test
set.seed(123)
x <- rnorm(20, mean = 0.5, sd = 1)
boot_t_test(x, mu = 0, R = 999) # Using fewer replicates for demonstration
#> 
#> 	Bootstrapped One Sample t-test (studentized)
#> 
#> data:  x
#> t-observed = 2.9501, df = 19, p-value = 0.008008
#> alternative hypothesis: true mean is not equal to 0
#> 95 percent confidence interval:
#>  0.184273 1.112973
#> sample estimates:
#> mean of x 
#> 0.6416238 
#> 

# Example 3: Paired samples test with percentile bootstrap CI
before <- c(5.1, 4.8, 6.2, 5.7, 6.0, 5.5, 4.9, 5.8)
after <- c(5.6, 5.2, 6.7, 6.1, 6.5, 5.8, 5.3, 6.2)
boot_t_test(x = before, y = after,
            paired = TRUE,
            alternative = "less",  # Testing if before < after
            boot_ci = "perc",
            R = 999)
#> 
#> 	Bootstrapped Paired t-test (percentile)
#> 
#> data:  x and y
#> t-observed = -17, df = 7, p-value < 2.2e-16
#> alternative hypothesis: true mean difference is less than 0
#> 95 percent confidence interval:
#>  -0.4625 -0.3875
#> sample estimates:
#> mean of the differences (z = x - y) 
#>                              -0.425 
#> 

# Example 4: Equivalence testing with bootstrapped t-test
# Testing if the effect is within ±0.5 units
data(mtcars)
boot_t_test(mpg ~ am, data = mtcars,
            alternative = "equivalence",
            mu = c(-0.5, 0.5),
            boot_ci = "stud",
            R = 999)
#> 
#> 	Bootstrapped Welch Two Sample t-test (studentized)
#> 
#> data:  mpg by am
#> t-observed = -3.5071, df = 18.332, p-value = 0.999
#> alternative hypothesis: equivalence
#> null values:
#> difference in means difference in means 
#>                -0.5                 0.5 
#> 90 percent confidence interval:
#>  -10.568744  -4.177433
#> sample estimates:
#>           mean of group '0'           mean of group '1' 
#>                   17.147368                   24.392308 
#> mean difference ('0' - '1') 
#>                   -7.244939 
#> 

# Example 5: Minimal effect testing with bootstrapped t-test
# Testing if the effect is outside ±3 units
boot_t_test(mpg ~ am, data = mtcars,
            alternative = "minimal.effect",
            mu = c(-3, 3),
            R = 999)
#> 
#> 	Bootstrapped Welch Two Sample t-test (studentized)
#> 
#> data:  mpg by am
#> t-observed = -5.327, df = 18.332, p-value = 0.01502
#> alternative hypothesis: minimal.effect
#> null values:
#> difference in means difference in means 
#>                  -3                   3 
#> 90 percent confidence interval:
#>  -10.598981  -4.175593
#> sample estimates:
#>           mean of group '0'           mean of group '1' 
#>                   17.147368                   24.392308 
#> mean difference ('0' - '1') 
#>                   -7.244939 
#> 

# Example 6: Bootstrapped Yuen's trimmed t-test (10% trimming)
boot_t_test(extra ~ group, data = sleep, tr = 0.1, R = 999)
#> 
#> 	Bootstrapped Welch Yuen Two Sample t-test
#> 
#> data:  extra by group
#> t-observed = -1.568, df = 13.896, p-value = 0.1562
#> alternative hypothesis: true trimmed mean difference (tr = 0.1) is not equal to 0
#> 95 percent confidence interval:
#>  -3.7916990  0.6614114
#> sample estimates:
#>                           trimmed mean of '1' 
#>                                        0.6750 
#>                           trimmed mean of '2' 
#>                                        2.2375 
#> trimmed mean difference ('1' - '2', tr = 0.1) 
#>                                       -1.5625 
#>