correlations.Rmd
TOSTER has a few different functions to calculate correlations. All
the included functions are based on a few papers by Goertzen and Cribbie (2010)
(z_cor_test
& compare_cor
), and Wilcox (2011) (boot_cor_test
)1.
Simple tests of association can be accomplished with the
z_cor_test
function. This function was stylized after the
cor.test
function, but you will notice that the results may
differ. This is caused by fact that z_cor_test
uses
Fisher’s z transformation as the basis for all significance tests (i.e.,
p-values). However, notice that the confidence intervals are the
same.
##
## Pearson's product-moment correlation
##
## data: mtcars$mpg and mtcars$qsec
## t = 2.5252, df = 30, p-value = 0.01708
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.08195487 0.66961864
## sample estimates:
## cor
## 0.418684
z_cor_test(mtcars$mpg,
mtcars$qsec)
##
## Pearson's product-moment correlation
##
## data: mtcars$mpg and mtcars$qsec
## z = 2.4023, N = 32, p-value = 0.01629
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.08195487 0.66961864
## sample estimates:
## cor
## 0.418684
But, just as cor.test
, the Spearman and Kendall
correlation coefficients can be estimated.
z_cor_test(mtcars$mpg,
mtcars$qsec,
method = "spear") # Don't need to spell full name
##
## Spearman's rank correlation rho
##
## data: mtcars$mpg and mtcars$qsec
## z = 2.6474, N = 32, p-value = 0.008111
## alternative hypothesis: true rho is not equal to 0
## 95 percent confidence interval:
## 0.1306771 0.7068501
## sample estimates:
## rho
## 0.4669358
z_cor_test(mtcars$mpg,
mtcars$qsec,
method = "kendall")
##
## Kendall's rank correlation tau
##
## data: mtcars$mpg and mtcars$qsec
## z = 2.6134, N = 32, p-value = 0.008964
## alternative hypothesis: true tau is not equal to 0
## 95 percent confidence interval:
## 0.08145572 0.51634821
## sample estimates:
## tau
## 0.3153652
The main advantage of z_cor_test
is that it can perform
equivalence testing (TOST), or any hypothesis test where the null isn’t
zero.
z_cor_test(mtcars$mpg,
mtcars$qsec,
alternative = "e", # e for equivalence
null = .4)
##
## Pearson's product-moment correlation
##
## data: mtcars$mpg and mtcars$qsec
## z = 0.12088, N = 32, p-value = 0.5481
## alternative hypothesis: equivalence
## null values:
## correlation correlation
## 0.4 -0.4
## 90 percent confidence interval:
## 0.1397334 0.6360650
## sample estimates:
## cor
## 0.418684
If you only have the summary statistics you perform the same tests. Just imagine you are reviewing a study with an observed correlation of 0.121 with a sample size of 105 paired observations. You could then perform an equivalence test with the following code.
corsum_test(r = .121,
n = 105,
alternative = "e",
null = .4)
##
## Pearson's product-moment correlation
##
## data: x and y
## z = -3.0506, N = 105, p-value = 0.001142
## alternative hypothesis: equivalence
## null values:
## correlation correlation
## 0.4 -0.4
## 90 percent confidence interval:
## -0.0412456 0.2770284
## sample estimates:
## cor
## 0.121
If the raw data is available, I would strongly recommend
using the bootstrapping function which should be more robust than the
Fisher’s z based function. Further, the boot_cor_test
function also has 2 other correlations that can be estimated: a
Winsorized correlation and the percentage bend correlation. The input
for the function is fairly similar to the z_cor_test
function.
set.seed(993)
boot_cor_test(mtcars$mpg,
mtcars$qsec,
alternative = "e",
null = .4)
##
## Bootstrapped Pearson's product-moment correlation
##
## data: mtcars$mpg and mtcars$qsec
## N = 32, p-value = 0.6088
## alternative hypothesis: equivalence
## null values:
## correlation correlation
## 0.4 -0.4
## 90 percent confidence interval:
## 0.2512992 0.5930089
## sample estimates:
## cor
## 0.418684
boot_cor_test(mtcars$mpg,
mtcars$qsec,
method = "spear",
alternative = "e",
null = .4)
##
## Bootstrapped Spearman's rank correlation rho
##
## data: mtcars$mpg and mtcars$qsec
## N = 32, p-value = 0.6713
## alternative hypothesis: equivalence
## null values:
## rho rho
## 0.4 -0.4
## 90 percent confidence interval:
## 0.2676957 0.7365611
## sample estimates:
## rho
## 0.4669358
boot_cor_test(mtcars$mpg,
mtcars$qsec,
method = "ken",
alternative = "e",
null = .4)
##
## Bootstrapped Kendall's rank correlation tau
##
## data: mtcars$mpg and mtcars$qsec
## N = 32, p-value = 0.2276
## alternative hypothesis: equivalence
## null values:
## tau tau
## 0.4 -0.4
## 90 percent confidence interval:
## 0.1442440 0.5090649
## sample estimates:
## tau
## 0.3153652
Robust correlations, such as a winsorized correlation coefficient or percentage bend correlation, can also be tested.
boot_cor_test(mtcars$mpg,
mtcars$qsec,
method = "win",
alternative = "e",
null = .4,
tr = .1) # set trim
##
## Bootstrapped Winsorized correlation wincor
##
## data: mtcars$mpg and mtcars$qsec
## N = 32, p-value = 0.6878
## alternative hypothesis: equivalence
## null values:
## wincor wincor
## 0.4 -0.4
## 90 percent confidence interval:
## 0.2651060 0.7125775
## sample estimates:
## wincor
## 0.464062
boot_cor_test(mtcars$mpg,
mtcars$qsec,
method = "bend",
alternative = "e",
null = .4,
beta = .15) # bend argument
##
## Bootstrapped percentage bend correlation pb
##
## data: mtcars$mpg and mtcars$qsec
## N = 32, p-value = 0.6933
## alternative hypothesis: equivalence
## null values:
## pb pb
## 0.4 -0.4
## 90 percent confidence interval:
## 0.2509131 0.6634791
## sample estimates:
## pb
## 0.4484488
In some cases, researchers may want to compare two independent correlations. Sometimes this may be used to compare correlations between two variables between two groups (e.g., the correlation between two variables between male and female subjects) or between two independent studies (e.g., replication study).
When only summary statistics are available the
compare_cor
function can be used. All the user needs is the
correlations (r1 and r2) and the degrees of freedom for each
correlation. The degrees of freedom for most cases would the number of
pairs minus 2 (\(df = N-2\)).
Note: this function, similar to z_cor_test
is an
approximation.
compare_cor(r1 = .8,
df1 = 38,
r2 = .2,
df2 = 98)
##
## Difference between two independent correlations (Fisher's z transform)
##
## data: Summary Statistics
## z = 4.6364, p-value = 3.545e-06
## alternative hypothesis: true difference between correlations is not equal to 0
## sample estimates:
## difference between correlations
## 0.6
compare_cor(r1 = .8,
df1 = 38,
r2 = .2,
df2 = 98)
##
## Difference between two independent correlations (Fisher's z transform)
##
## data: Summary Statistics
## z = 4.6364, p-value = 3.545e-06
## alternative hypothesis: true difference between correlations is not equal to 0
## sample estimates:
## difference between correlations
## 0.6
The methods included to compare correlations include Fisher’s z transformation (“fisher”), and Kraatz’s method (“kraatz”). The Fisher and Kraatz methods are appropriate for general significance tests, but may have low statistical power (Counsell and Cribbie 2015). The Fisher’s method can test the difference between correlations on the z-transformed scale while Kraatz’s methods directly measures the difference between the correlation coefficients. My personal recommendation would is Fisher’s method.
compare_cor(r1 = .8,
df1 = 38,
r2 = .2,
df2 = 98,
null = .2,
method = "f", # Fisher
alternative = "e") # Equivalence
##
## Difference between two independent correlations (Fisher's z transform)
##
## data: Summary Statistics
## z = 0.69315, p-value = 0.9998
## alternative hypothesis: equivalence
## null values:
## difference between correlations difference between correlations
## 0.2 -0.2
## sample estimates:
## difference between correlations
## 0.6
When data is available for both correlations then the
boot_compare_cor
function can be utilized.
set.seed(8922)
x1 = rnorm(40)
y1 = rnorm(40)
x2 = rnorm(100)
y2 = rnorm(100)
boot_compare_cor(
x1 = x1,
x2 = x2,
y1 = y1,
y2 = y2,
null = .2,
alternative = "e", # Equivalence
method = "win" # Winsorized correlation
)
##
## Bootstrapped difference in Winsorized correlation wincor
##
## data: x1 and y1 vs. x2 and y2
## n1 = 40, n2 = 100, p-value = 0.7739
## alternative hypothesis: true differnce in wincor is 0.2
## 90 percent confidence interval:
## -0.2970547 0.3978333
## sample estimates:
## wincor
## 0.06383164
Bootstrapped functions were based off code posted by
Rand Wilcox on his website, and was modified after looking at Guillaume
Rousselet’s code, bootcorci
R package, on GitHub https://github.com/GRousselet↩︎