Correlations
Aaron R. Caldwell
2025-03-28
correlations.Rmd
The TOSTER package provides several functions for calculating and
analyzing correlations. These functions extend beyond traditional
correlation tests by offering equivalence testing capabilities and
robust correlation methods. The included functions are based on research
by Goertzen and Cribbie (2010)
(z_cor_test
& compare_cor
), and Wilcox (2011) (boot_cor_test
)1.
Simple Correlation Test
Basic tests of association can be performed with the
z_cor_test
function. This function is styled after R’s
built-in cor.test
function but uses Fisher’s z
transformation as the basis for all significance tests (p-values).
Despite this difference in methodology, the confidence intervals are
typically very similar to those produced by cor.test
.
##
## Pearson's product-moment correlation
##
## data: mtcars$mpg and mtcars$qsec
## t = 2.5252, df = 30, p-value = 0.01708
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.08195487 0.66961864
## sample estimates:
## cor
## 0.418684
# TOSTER's z-transformed correlation test
z_cor_test(mtcars$mpg, mtcars$qsec)
##
## Pearson's product-moment correlation
##
## data: mtcars$mpg and mtcars$qsec
## z = 2.4023, N = 32, p-value = 0.01629
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.08195487 0.66961864
## sample estimates:
## cor
## 0.418684
Like cor.test
, the z_cor_test
function
supports Spearman and Kendall correlation coefficients:
# Spearman correlation
z_cor_test(mtcars$mpg,
mtcars$qsec,
method = "spear") # Short form accepted; "spearman" also works
##
## Spearman's rank correlation rho
##
## data: mtcars$mpg and mtcars$qsec
## z = 2.6474, N = 32, p-value = 0.008111
## alternative hypothesis: true rho is not equal to 0
## 95 percent confidence interval:
## 0.1306771 0.7068501
## sample estimates:
## rho
## 0.4669358
# Kendall correlation
z_cor_test(mtcars$mpg,
mtcars$qsec,
method = "kendall")
##
## Kendall's rank correlation tau
##
## data: mtcars$mpg and mtcars$qsec
## z = 2.6134, N = 32, p-value = 0.008964
## alternative hypothesis: true tau is not equal to 0
## 95 percent confidence interval:
## 0.08145572 0.51634821
## sample estimates:
## tau
## 0.3153652
Advantages of z_cor_test
The main advantage of z_cor_test
over the standard
cor.test
is its ability to perform equivalence testing
(TOST) or any hypothesis test where the null hypothesis isn’t zero. This
makes it particularly useful for research questions focused on
demonstrating practical equivalence or testing against specific
correlation thresholds.
# Equivalence test with null boundary of 0.4
z_cor_test(mtcars$mpg,
mtcars$qsec,
alternative = "e", # e for equivalence
null = .4)
##
## Pearson's product-moment correlation
##
## data: mtcars$mpg and mtcars$qsec
## z = 0.12088, N = 32, p-value = 0.5481
## alternative hypothesis: equivalence
## null values:
## correlation correlation
## 0.4 -0.4
## 90 percent confidence interval:
## 0.1397334 0.6360650
## sample estimates:
## cor
## 0.418684
In this example, we’re testing whether the correlation is equivalent to zero within the boundaries of ±0.4.
Using Summary Statistics
A key advantage of TOSTER is the ability to perform correlation tests
using only summary statistics, which is particularly useful when
reviewing published literature or working with limited data access. The
corsum_test
function enables this functionality:
# Testing a correlation of 0.121 from a sample of 105 paired observations
corsum_test(r = .121,
n = 105,
alternative = "e",
null = .4)
##
## Pearson's product-moment correlation
##
## data: x and y
## z = -3.0506, N = 105, p-value = 0.001142
## alternative hypothesis: equivalence
## null values:
## correlation correlation
## 0.4 -0.4
## 90 percent confidence interval:
## -0.0412456 0.2770284
## sample estimates:
## cor
## 0.121
This example tests whether a correlation of 0.121 from a sample of 105 paired observations is equivalent to zero within the boundaries of ±0.4.
Bootstrapped Correlation Test
For more robust analyses when raw data is available, TOSTER provides
the boot_cor_test
function. This bootstrapping approach
generally produces more reliable results than Fisher’s z-based tests,
especially when outliers are present or distribution assumptions are
violated.
set.seed(993) # Setting seed for reproducibility
boot_cor_test(mtcars$mpg,
mtcars$qsec,
alternative = "e",
null = .4)
##
## Bootstrapped Pearson's product-moment correlation
##
## data: mtcars$mpg and mtcars$qsec
## N = 32, p-value = 0.6088
## alternative hypothesis: equivalence
## null values:
## correlation correlation
## 0.4 -0.4
## 90 percent confidence interval:
## 0.2512992 0.5930089
## sample estimates:
## cor
## 0.418684
# Bootstrapped Spearman correlation
boot_cor_test(mtcars$mpg,
mtcars$qsec,
method = "spear",
alternative = "e",
null = .4)
##
## Bootstrapped Spearman's rank correlation rho
##
## data: mtcars$mpg and mtcars$qsec
## N = 32, p-value = 0.6713
## alternative hypothesis: equivalence
## null values:
## rho rho
## 0.4 -0.4
## 90 percent confidence interval:
## 0.2676957 0.7365611
## sample estimates:
## rho
## 0.4669358
# Bootstrapped Kendall correlation
boot_cor_test(mtcars$mpg,
mtcars$qsec,
method = "ken", # Short form accepted
alternative = "e",
null = .4)
##
## Bootstrapped Kendall's rank correlation tau
##
## data: mtcars$mpg and mtcars$qsec
## N = 32, p-value = 0.2276
## alternative hypothesis: equivalence
## null values:
## tau tau
## 0.4 -0.4
## 90 percent confidence interval:
## 0.1442440 0.5090649
## sample estimates:
## tau
## 0.3153652
Robust Correlation Methods
The boot_cor_test
function also provides access to
robust correlation methods that are less sensitive to outliers and
violations of normality:
# Winsorized correlation with 10% trimming
boot_cor_test(mtcars$mpg,
mtcars$qsec,
method = "win",
alternative = "e",
null = .4,
tr = .1) # Set trim amount (default is 0.2)
##
## Bootstrapped Winsorized correlation wincor
##
## data: mtcars$mpg and mtcars$qsec
## N = 32, p-value = 0.6878
## alternative hypothesis: equivalence
## null values:
## wincor wincor
## 0.4 -0.4
## 90 percent confidence interval:
## 0.2651060 0.7125775
## sample estimates:
## wincor
## 0.464062
# Percentage bend correlation
boot_cor_test(mtcars$mpg,
mtcars$qsec,
method = "bend",
alternative = "e",
null = .4,
beta = .15) # Beta parameter controlling resistance to outliers
##
## Bootstrapped percentage bend correlation pb
##
## data: mtcars$mpg and mtcars$qsec
## N = 32, p-value = 0.6933
## alternative hypothesis: equivalence
## null values:
## pb pb
## 0.4 -0.4
## 90 percent confidence interval:
## 0.2509131 0.6634791
## sample estimates:
## pb
## 0.4484488
The Winsorized correlation reduces the impact of outliers by replacing extreme values with less extreme values. The percentage bend correlation is another robust method that downweights the influence of outliers in the calculation.
Comparing Correlations
TOSTER provides tools for comparing correlations between independent groups or studies. This is useful for testing differences in relationships across populations or for evaluating replication studies.
Summary Statistics Approach
When only summary statistics are available, the
compare_cor
function can be used:
# Comparing correlation r1=0.8 from n=40 with r2=0.2 from n=100
compare_cor(r1 = .8,
df1 = 38, # df = n-2
r2 = .2,
df2 = 98) # df = n-2
##
## Difference between two independent correlations (Fisher's z transform)
##
## data: Summary Statistics
## z = 4.6364, p-value = 3.545e-06
## alternative hypothesis: true difference between correlations is not equal to 0
## sample estimates:
## difference between correlations
## 0.6
The compare_cor
function supports different methods for
comparing correlations:
# Testing equivalence using Fisher's method
compare_cor(r1 = .8,
df1 = 38,
r2 = .2,
df2 = 98,
null = .2,
method = "f", # Fisher (can also use "fisher")
alternative = "e") # Equivalence test
##
## Difference between two independent correlations (Fisher's z transform)
##
## data: Summary Statistics
## z = 0.69315, p-value = 0.9998
## alternative hypothesis: equivalence
## null values:
## difference between correlations difference between correlations
## 0.2 -0.2
## sample estimates:
## difference between correlations
## 0.6
Available methods include:
-
Fisher’s z transformation
(
method = "fisher"
or"f"
): Tests the difference between correlations on the z-transformed scale. This is generally recommended for most applications. -
Kraatz’s method (
method = "kraatz"
or"k"
): Directly measures the difference between correlation coefficients.
While both methods are appropriate for general significance testing, they may have limited statistical power in some scenarios (Counsell and Cribbie 2015).
Bootstrapped Comparison
When raw data is available for both correlations, the
boot_compare_cor
function offers a more robust approach
through bootstrapping:
set.seed(8922) # Setting seed for reproducibility
# Generating example data
x1 = rnorm(40)
y1 = rnorm(40)
x2 = rnorm(100)
y2 = rnorm(100)
# Bootstrap comparison with winsorized correlation
boot_compare_cor(
x1 = x1,
x2 = x2,
y1 = y1,
y2 = y2,
null = .2,
alternative = "e", # Equivalence test
method = "win" # Winsorized correlation
)
##
## Bootstrapped difference in Winsorized correlation wincor
##
## data: x1 and y1 vs. x2 and y2
## n1 = 40, n2 = 100, p-value = 0.7739
## alternative hypothesis: true differnce in wincor is 0.2
## 90 percent confidence interval:
## -0.2970547 0.3978333
## sample estimates:
## wincor
## 0.06383164
This approach has several advantages:
- It does not rely on the Fisher’s z-transformation approximation
- It can incorporate robust correlation methods
- It can provide more accurate confidence intervals, especially when typical assumptions are violated
Practical Recommendations
When choosing which correlation method to use in TOSTER:
-
If raw data is available:
- For most cases, use
boot_cor_test
with Pearson, Spearman, or Kendall methods - When outliers or distribution assumptions are concerns, consider the robust methods (winsorized or percentage bend)
- For most cases, use
-
If only summary statistics are available:
- Use
corsum_test
for single correlation analysis - Use
compare_cor
with the Fisher method for comparing correlations
- Use
-
For equivalence testing:
- Carefully select meaningful boundaries (null values) based on your research context
- Consider what effect size would be practically insignificant in your field
Advanced Usage
Custom Bootstrap Methods
The bootstrapped functions in TOSTER allow customization of the bootstrap procedure:
# Customizing the bootstrap procedure
boot_cor_test(
x = mtcars$mpg,
y = mtcars$qsec,
method = "pearson",
R = 2000, # Increasing number of bootstrap samples
alpha = 0.01, # Using 99% confidence interval
alternative = "t" # Two-sided test
)
Working with Missing Data
By default, the correlation functions in TOSTER use pairwise complete observations:
# Example with missing data
x_with_na <- c(mtcars$mpg, NA, NA)
y_with_na <- c(mtcars$qsec, 10, NA)
# Default behavior handles NAs with pairwise deletion
z_cor_test(x_with_na, y_with_na)