Title: | Modeling Correlational Magnitude Transformations in Discretization Contexts |
---|---|
Description: | Modeling the correlation transitions under specified distributional assumptions within the realm of discretization in the context of the latency and threshold concepts. The details of the method are explained in Demirtas, H. and Vardar-Acar, C. (2017) <DOI:10.1007/978-981-10-3307-0_4>. |
Authors: | Rawan Allozi, Hakan Demirtas, Ran Gao |
Maintainer: | Ran Gao <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 1.6.4 |
Built: | 2025-03-09 04:10:49 UTC |
Source: | https://github.com/cran/CorrToolBox |
This package implements the computational algorithms for modeling the correlation transitions under specified distributional assumptions within the realm of discretization in the context of the latency and threshold concepts. Functions that compute the correlational magnitude changes in both directions (identification of the pre-discretization correlation value in order to attain a specified post-discretization magnitude, and the other way around) are provided.
This package consists of eight main functions. Computing the tetrachoric correlation from the phi coefficient and vice versa are done in phi2tet
and tet2phi
, respectively. Computing the polychoric correlation from the ordinal phi coefficient and vice versa are done in ophi2poly
and poly2ophi
, respectively. Computing the biserial correlation from the point-biserial correlation and vice versa are done in pbs2bs
and bs2pbs
, respectively. Computing the polyserial correlation from the point-polyserial correlation and vice versa are done in pps2ps
and ps2pps
, respectively.
Auxiliary functions are also provided. corrY2corrZ
, corrZ2corrY
, corrZ2ophi
, corrZ2phi
, and ophi2corrZ
are intermediate functions utilized within the main functions but can be used as stand-alone functions. ordY
discretizes a continuous variable, and mps2cps
provides cumulative probabilities for each set of marginal probabilities in a list. Additional intermediate functions from imported packages include phi2tetra
from the psych
package, ordcont
and contord
from the GenOrd
package, skewness
and kurtosis
from the moments
package, validation.skewness.kurtosis
from the BinNonNor
package, and pmvnorm
from the mvtnorm
package.
Within each correlation transition function, the correlation boundaries for the given marginal distributions are compared to the specified input correlation to ensure there are no violations according to Demirtas and Hedeker (2011). The function valid.limits.BinOrdNN
in the package BinOrdNonNor
is utilized for this step. Additionally, Fleishman.coef.NN
in the package BinOrdNonNor
is used wherever Fleishman coefficients need to be calculated for a continuous variable.
Package: | CorrToolBox |
Type: | Package |
Version: | 1.6.4 |
Date: | 2022-02-21 |
License: | GPL-2 | GPL-3 |
Rawan Allozi, Hakan Demirtas, Ran Gao
Maintainer: Ran Gao <[email protected]>
Demirtas, H. (2016). A note on the relationship between the phi coefficient and the tetrachoric correlation under nonnormal underlying distributions. The American Statistician, 70(2), 143-148.
Demirtas, H., Ahmadian, R., Atis, S., Can, F.E., and Ercan, I. (2016). A nonnormal look at polychoric correlations: modeling the change in correlations before and after discretization. Computational Statistics, 31(4), 1385-1401.
Demirtas, H. and Hedeker, D. (2011). A practical way for computing approximate lower and upper correlation bounds. The American Statistician, 65(2), 104-109.
Demirtas, H. and Hedeker, D. (2016). Computing the point-biserial correlation under any underlying continuous distribution. Communications in Statistics-Simulation and Computation, 45(8), 2744-2751.
Demirtas, H., Hedeker, D., and Mermelstein, R. J. (2012). Simulation of massive public health data by power polynomials. Statistics in Medicine, 31(27), 3337-3346.
Demirtas, H. and Vardar-Acar, C. (2017). Anatomy of correlational magnitude transformations in latency and discretization contexts in Monte-Carlo studies. In ICSA Book Series in Statistics, John Dean Chen and Ding-Geng (Din) Chen (Eds): Monte-Carlo Simulation-Based Statistical Modeling. Singapore: Springer, 59-84.
Ferrari, P.A. and Barbiero, A. (2012). Simulating ordinal data. Multivariate Behavioral Research, 47(4), 566-589.
Fleishman A.I. (1978). A method for simulating non-normal distributions. Psychometrika, 43(4), 521-532.
Vale, C.D. and Maurelli, V.A. (1983). Simulating multivariate nonnormal distributions. Psychometrika, 48(3), 465-471.
This function computes the point-biserial correlation between two variables after one of the variables is dichotomized given the correlation before dichotomization (biserial correlation) as seen in Demirtas and Hedeker (2016). Before computation of the point-biserial correlation, the specified biserial correlation is compared to the lower and upper correlation bounds of the two continuous variables using the generate, sort and correlate (GSC) algorithm in Demirtas and Hedeker (2011).
bs2pbs(bs, bin.var, cont.var, p=NULL, cutpoint=NULL)
bs2pbs(bs, bin.var, cont.var, p=NULL, cutpoint=NULL)
bs |
The biserial correlation. |
bin.var |
A numeric vector of the continuous variable before dichotomization. |
cont.var |
A numeric vector of the continuous variable that is not transformed. |
p |
The expected value of the numeric vector |
cutpoint |
The value at which the numeric vector |
The point-biserial correlation.
Demirtas, H. and Hedeker, D. (2011). A practical way for computing approximate lower and upper correlation bounds. The American Statistician, 65(2), 104-109.
Demirtas, H. and Hedeker, D. (2016). Computing the point-biserial correlation under any underlying continuous distribution. Communications in Statistics-Simulation and Computation, 45(8), 2744-2751.
set.seed(123) y1<-rweibull(n=100000, scale=1, shape=1.2) gaussmix <- function(n,m1,m2,s1,s2,pi) { I <- runif(n)<pi rnorm(n,mean=ifelse(I,m1,m2),sd=ifelse(I,s1,s2)) } y2<-gaussmix(n=100000, m1=0, s1=1, m2=3, s2=1, pi=0.6) bs2pbs(bs=0.6, bin.var=y1, cont.var=y2, p=0.55) bs2pbs(bs=0.6, bin.var=y1, cont.var=y2, cutpoint=0.65484)
set.seed(123) y1<-rweibull(n=100000, scale=1, shape=1.2) gaussmix <- function(n,m1,m2,s1,s2,pi) { I <- runif(n)<pi rnorm(n,mean=ifelse(I,m1,m2),sd=ifelse(I,s1,s2)) } y2<-gaussmix(n=100000, m1=0, s1=1, m2=3, s2=1, pi=0.6) bs2pbs(bs=0.6, bin.var=y1, cont.var=y2, p=0.55) bs2pbs(bs=0.6, bin.var=y1, cont.var=y2, cutpoint=0.65484)
This is an intermediate function that computes the correlation of bivariate standard normal variables from the correlation of continuous nonnormal variables. Fleishman coefficients for each nonnormal variable with the specified skewness and excess kurtosis are found. The Fleishman coefficients and correlation of nonnormal variables are used to find the correlation of the two respective standard normal variables as seen in Demirtas, Hedeker, and Mermelstein (2012).
corrY2corrZ(corrY, skew.vec, kurto.vec)
corrY2corrZ(corrY, skew.vec, kurto.vec)
corrY |
The correlation of two continuous nonnormal variables. |
skew.vec |
The skewness vector for continuous variables. |
kurto.vec |
The kurtosis vector for continuous variables. |
The correlation of the two respective standard normal variables.
Demirtas, H., Hedeker, D., and Mermelstein, R. J. (2012). Simulation of massive public health data by power polynomials. Statistics in Medicine, 31(27), 3337-3346.
Fleishman A.I. (1978). A method for simulating non-normal distributions. Psychometrika, 43(4), 521-532.
set.seed(987) library(moments) y1<-rweibull(n=100000, scale=1, shape=1) y1.skew<-round(skewness(y1), 5) y1.exkurt<-round(kurtosis(y1)-3, 5) gaussmix <- function(n,m1,m2,s1,s2,pi) { I <- runif(n)<pi rnorm(n,mean=ifelse(I,m1,m2),sd=ifelse(I,s1,s2)) } y2<-gaussmix(n=100000, m1=0, s1=1, m2=3, s2=1, pi=0.5) y2.skew<-round(skewness(y2), 5) y2.exkurt<-round(kurtosis(y2)-3, 5) corrY2corrZ(corrY=-0.4, skew.vec=c(y1.skew, y2.skew), kurto.vec=c(y1.exkurt, y2.exkurt))
set.seed(987) library(moments) y1<-rweibull(n=100000, scale=1, shape=1) y1.skew<-round(skewness(y1), 5) y1.exkurt<-round(kurtosis(y1)-3, 5) gaussmix <- function(n,m1,m2,s1,s2,pi) { I <- runif(n)<pi rnorm(n,mean=ifelse(I,m1,m2),sd=ifelse(I,s1,s2)) } y2<-gaussmix(n=100000, m1=0, s1=1, m2=3, s2=1, pi=0.5) y2.skew<-round(skewness(y2), 5) y2.exkurt<-round(kurtosis(y2)-3, 5) corrY2corrZ(corrY=-0.4, skew.vec=c(y1.skew, y2.skew), kurto.vec=c(y1.exkurt, y2.exkurt))
Fleishman coefficients for each nonnormal continuous variable with the specified skewness and excess kurtosis are found. The Fleishman coefficients and correlation of two standard normal variables are used to find the correlation of the two nonnormal variables as described in Demirtas, Hedeker, and Mermelstein (2012).
corrZ2corrY(corrZ, skew.vec, kurto.vec)
corrZ2corrY(corrZ, skew.vec, kurto.vec)
corrZ |
The correlation of two standard normal variables. |
skew.vec |
The skewness vector for continuous variables. |
kurto.vec |
The kurtosis vector for continuous variables. |
The correlation of two continuous nonnormal variables as defined by the skewness and excess kurtosis vectors.
Demirtas, H., Hedeker, D., and Mermelstein, R. J. (2012). Simulation of massive public health data by power polynomials. Statistics in Medicine, 31(27), 3337-3346.
Fleishman A.I. (1978). A method for simulating non-normal distributions. Psychometrika, 43(4), 521-532.
set.seed(987) library(moments) y1<-rweibull(n=100000, scale=1, shape=1) y1.skew<-round(skewness(y1), 5) y1.exkurt<-round(kurtosis(y1)-3, 5) gaussmix <- function(n,m1,m2,s1,s2,pi) { I <- runif(n)<pi rnorm(n,mean=ifelse(I,m1,m2),sd=ifelse(I,s1,s2)) } y2<-gaussmix(n=100000, m1=0, s1=1, m2=3, s2=1, pi=0.5) y2.skew<-round(skewness(y2), 5) y2.exkurt<-round(kurtosis(y2)-3, 5) corrZ2corrY(corrZ=-0.849, skew.vec=c(y1.skew, y2.skew), kurto.vec=c(y1.exkurt, y2.exkurt))
set.seed(987) library(moments) y1<-rweibull(n=100000, scale=1, shape=1) y1.skew<-round(skewness(y1), 5) y1.exkurt<-round(kurtosis(y1)-3, 5) gaussmix <- function(n,m1,m2,s1,s2,pi) { I <- runif(n)<pi rnorm(n,mean=ifelse(I,m1,m2),sd=ifelse(I,s1,s2)) } y2<-gaussmix(n=100000, m1=0, s1=1, m2=3, s2=1, pi=0.5) y2.skew<-round(skewness(y2), 5) y2.exkurt<-round(kurtosis(y2)-3, 5) corrZ2corrY(corrZ=-0.849, skew.vec=c(y1.skew, y2.skew), kurto.vec=c(y1.exkurt, y2.exkurt))
This is an intermediate function that utilizes mps2cps
to transform the specified marginal probabilities into cumulative probabilities and uses the contord
function in the GenOrd
package to compute the ordinal phi coefficient derived from discretizing bivariate standard normal variables.
corrZ2ophi(corrZ, p1, p2)
corrZ2ophi(corrZ, p1, p2)
corrZ |
The correlation of two standard normal variables. |
p1 |
A numeric vector containing marginal probabilities defining categories for the first ordinal variable. |
p2 |
A numeric vector containing marginal probabilities defining categories for the second ordinal variable. |
The ordinal phi coefficient.
Demirtas, H., Ahmadian, R., Atis, S., Can, F.E., and Ercan, I. (2016). A nonnormal look at polychoric correlations: modeling the change in correlations before and after discretization. Computational Statistics, 31(4), 1385-1401.
Ferrari, P.A. and Barbiero, A. (2012). Simulating ordinal data. Multivariate Behavioral Research, 47(4), 566-589.
set.seed(567) library(moments) y1<-rweibull(n=100000, scale=1, shape=3.6) y1.skew<-round(skewness(y1), 5) y1.exkurt<-round(kurtosis(y1)-3, 5) gaussmix <- function(n,m1,m2,s1,s2,pi) { I <- runif(n)<pi rnorm(n,mean=ifelse(I,m1,m2),sd=ifelse(I,s1,s2)) } y2<-gaussmix(n=100000, m1=0, s1=1, m2=2, s2=1, pi=0.3) y2.skew<-round(skewness(y2), 5) y2.exkurt<-round(kurtosis(y2)-3, 5) corrZ2ophi(corrZ=0.502, p1=c(0.4, 0.3, 0.2, 0.1), p2=c(0.2, 0.2, 0.6))
set.seed(567) library(moments) y1<-rweibull(n=100000, scale=1, shape=3.6) y1.skew<-round(skewness(y1), 5) y1.exkurt<-round(kurtosis(y1)-3, 5) gaussmix <- function(n,m1,m2,s1,s2,pi) { I <- runif(n)<pi rnorm(n,mean=ifelse(I,m1,m2),sd=ifelse(I,s1,s2)) } y2<-gaussmix(n=100000, m1=0, s1=1, m2=2, s2=1, pi=0.3) y2.skew<-round(skewness(y2), 5) y2.exkurt<-round(kurtosis(y2)-3, 5) corrZ2ophi(corrZ=0.502, p1=c(0.4, 0.3, 0.2, 0.1), p2=c(0.2, 0.2, 0.6))
This function computes the phi coefficient derived from dichotomizing bivariate standard normal variables.
corrZ2phi(corrZ, p1, p2)
corrZ2phi(corrZ, p1, p2)
corrZ |
The correlation of two standard normal variables. |
p1 |
The expected value of the first variable after dichotomization. |
p2 |
The expected value of the second variable after dichotomization. |
The phi coefficient.
Demirtas, H. (2016). A note on the relationship between the phi coefficient and the tetrachoric correlation under nonnormal underlying distributions. The American Statistician, 70(2), 143-148.
set.seed(987) library(moments) y1<-rweibull(n=100000, scale=1, shape=1) y1.skew<-round(skewness(y1), 5) y1.exkurt<-round(kurtosis(y1)-3, 5) gaussmix <- function(n,m1,m2,s1,s2,pi) { I <- runif(n)<pi rnorm(n,mean=ifelse(I,m1,m2),sd=ifelse(I,s1,s2)) } y2<-gaussmix(n=100000, m1=0, s1=1, m2=3, s2=1, pi=0.5) y2.skew<-round(skewness(y2), 5) y2.exkurt<-round(kurtosis(y2)-3, 5) corrZ2phi(corrZ=-0.456, p1=0.85, p2=0.15)
set.seed(987) library(moments) y1<-rweibull(n=100000, scale=1, shape=1) y1.skew<-round(skewness(y1), 5) y1.exkurt<-round(kurtosis(y1)-3, 5) gaussmix <- function(n,m1,m2,s1,s2,pi) { I <- runif(n)<pi rnorm(n,mean=ifelse(I,m1,m2),sd=ifelse(I,s1,s2)) } y2<-gaussmix(n=100000, m1=0, s1=1, m2=3, s2=1, pi=0.5) y2.skew<-round(skewness(y2), 5) y2.exkurt<-round(kurtosis(y2)-3, 5) corrZ2phi(corrZ=-0.456, p1=0.85, p2=0.15)
This function computes cumulative probabilities for each ordinal variable as defined by marginal probabilities provided in a list.
mps2cps(mps)
mps2cps(mps)
mps |
A list of marginal probability vectors corresponding to each ordinal variable. Each vector within the list |
A list of vectors containing cumulative probabilities for each set of marginal probabilities specified in mps
. The i-th element of the list is a vector of the cumulative probabilities defining the marginal distribution of the i-th element of mps
. If the i-th variable has k categories, the i-th vector in the output will contain (k-1) probability values. The k-th element is implicitly 1.
mps2cps(list(c(0.4, 0.3, 0.2, 0.1), c(0.2, 0.2, 0.6)))
mps2cps(list(c(0.4, 0.3, 0.2, 0.1), c(0.2, 0.2, 0.6)))
This is an intermediate function that transforms marginal probabilities into cumulative probabilities and uses the ordcont
function in the GenOrd
package to compute the correlation of bivariate standard normal variables from the ordinal phi coefficient.
ophi2corrZ(ophi, p1, p2)
ophi2corrZ(ophi, p1, p2)
ophi |
The ordinal phi coefficient. |
p1 |
A numeric vector containing marginal probabilities defining categories for the first ordinal variable. |
p2 |
A numeric vector containing marginal probabilities defining categories for the second ordinal variable. |
The correlation of standard normal variables.
Demirtas, H., Ahmadian, R., Atis, S., Can, F.E., and Ercan, I. (2016). A nonnormal look at polychoric correlations: modeling the change in correlations before and after discretization. Computational Statistics, 31(4), 1385-1401.
Ferrari, P.A. and Barbiero, A. (2012). Simulating ordinal data. Multivariate Behavioral Research, 47(4), 566-589.
set.seed(567) library(moments) y1<-rweibull(n=100000, scale=1, shape=3.6) y1.skew<-round(skewness(y1), 5) y1.exkurt<-round(kurtosis(y1)-3, 5) gaussmix <- function(n,m1,m2,s1,s2,pi) { I <- runif(n)<pi rnorm(n,mean=ifelse(I,m1,m2),sd=ifelse(I,s1,s2)) } y2<-gaussmix(n=100000, m1=0, s1=1, m2=2, s2=1, pi=0.3) y2.skew<-round(skewness(y2), 5) y2.exkurt<-round(kurtosis(y2)-3, 5) ophi2corrZ(ophi=-0.7, p1=c(0.4, 0.3, 0.2, 0.1), p2=c(0.2, 0.2, 0.6))
set.seed(567) library(moments) y1<-rweibull(n=100000, scale=1, shape=3.6) y1.skew<-round(skewness(y1), 5) y1.exkurt<-round(kurtosis(y1)-3, 5) gaussmix <- function(n,m1,m2,s1,s2,pi) { I <- runif(n)<pi rnorm(n,mean=ifelse(I,m1,m2),sd=ifelse(I,s1,s2)) } y2<-gaussmix(n=100000, m1=0, s1=1, m2=2, s2=1, pi=0.3) y2.skew<-round(skewness(y2), 5) y2.exkurt<-round(kurtosis(y2)-3, 5) ophi2corrZ(ophi=-0.7, p1=c(0.4, 0.3, 0.2, 0.1), p2=c(0.2, 0.2, 0.6))
This function computes the polychoric correlation between two continuous variables given the correlation after ordinalization of both variables (ordinal phi coefficient) as seen in Demirtas et al. (2016). Before computation of the polychoric correlation, the specified ordinal phi coefficient is compared to the lower and upper correlation bounds of the two ordinal variables using the generate, sort and correlate (GSC) algorithm in Demirtas and Hedeker (2011).
ophi2poly(ophicoef, dist1, dist2)
ophi2poly(ophicoef, dist1, dist2)
ophicoef |
The ordinal phi coefficient. |
dist1 |
A list of length 3 containing the skewness, excess kurtosis, and a numeric vector of marginal probabilities after dichotomization for the first continuous variable with names skewness, exkurtosis, and p, respectively. |
dist2 |
A list of length 3 containing the skewness, excess kurtosis, and a numeric vector of marginal probabilities after dichotomization for the second continuous variable with names skewness, exkurtosis, and p, respectively. |
The polychoric correlation.
Demirtas, H., Ahmadian, R., Atis, S., Can, F.E., and Ercan, I. (2016). A nonnormal look at polychoric correlations: modeling the change in correlations before and after discretization. Computational Statistics, 31(4), 1385-1401.
Demirtas, H. and Hedeker, D. (2011). A practical way for computing approximate lower and upper correlation bounds. The American Statistician, 65(2), 104-109.
Ferrari, P.A. and Barbiero, A. (2012). Simulating ordinal data. Multivariate Behavioral Research, 47(4), 566-589.
corrZ2corrY
, ophi2corrZ
, mps2cps
set.seed(567) library(moments) y1<-rweibull(n=100000, scale=1, shape=3.6) y1.skew<-round(skewness(y1), 5) y1.exkurt<-round(kurtosis(y1)-3, 5) gaussmix <- function(n,m1,m2,s1,s2,pi) { I <- runif(n)<pi rnorm(n,mean=ifelse(I,m1,m2),sd=ifelse(I,s1,s2)) } y2<-gaussmix(n=100000, m1=0, s1=1, m2=2, s2=1, pi=0.3) y2.skew<-round(skewness(y2), 5) y2.exkurt<-round(kurtosis(y2)-3, 5) ophi2poly(ophicoef=-0.7, dist1=list(skewness=y1.skew, exkurtosis=y1.exkurt, p=c(0.4, 0.3, 0.2, 0.1)), dist2=list(skewness=y2.skew, exkurtosis=y2.exkurt, p=c(0.2, 0.2, 0.6))) ophi2poly(ophicoef=0.2, dist1=list(skewness=y1.skew, exkurtosis=y1.exkurt, p=c(0.1, 0.1, 0.1, 0.7)), dist2=list(skewness=y2.skew, exkurtosis=y2.exkurt, p=c(0.8, 0.1, 0.1)))
set.seed(567) library(moments) y1<-rweibull(n=100000, scale=1, shape=3.6) y1.skew<-round(skewness(y1), 5) y1.exkurt<-round(kurtosis(y1)-3, 5) gaussmix <- function(n,m1,m2,s1,s2,pi) { I <- runif(n)<pi rnorm(n,mean=ifelse(I,m1,m2),sd=ifelse(I,s1,s2)) } y2<-gaussmix(n=100000, m1=0, s1=1, m2=2, s2=1, pi=0.3) y2.skew<-round(skewness(y2), 5) y2.exkurt<-round(kurtosis(y2)-3, 5) ophi2poly(ophicoef=-0.7, dist1=list(skewness=y1.skew, exkurtosis=y1.exkurt, p=c(0.4, 0.3, 0.2, 0.1)), dist2=list(skewness=y2.skew, exkurtosis=y2.exkurt, p=c(0.2, 0.2, 0.6))) ophi2poly(ophicoef=0.2, dist1=list(skewness=y1.skew, exkurtosis=y1.exkurt, p=c(0.1, 0.1, 0.1, 0.7)), dist2=list(skewness=y2.skew, exkurtosis=y2.exkurt, p=c(0.8, 0.1, 0.1)))
This functions creates an ordinalized form of a continuous variable.
ordY(mp, cat, y)
ordY(mp, cat, y)
mp |
A vector of marginal probabilities defining the ordinalized variable. |
cat |
A numeric vector containing the categories for each respective marginal probability in |
y |
A continuous variable to be ordinalized into categories in |
A data frame containing the given continuous variable and the ordinalized variable with names y and x, respectively.
y<-rnorm(100000) dat<-ordY(mp=c(0.25, 0.5, 0.25), cat=c(1,2,3), y=y)
y<-rnorm(100000) dat<-ordY(mp=c(0.25, 0.5, 0.25), cat=c(1,2,3), y=y)
This function computes the biserial correlation between two continuous variables given the correlation after dichotomization of one of the variables (point-biserial correlation) as seen in Demirtas and Hedeker (2016). Before computation of the biserial correlation, the specified point-biserial correlation is compared to the lower and upper correlation bounds of the continuous variable and binary variable using the generate, sort and correlate (GSC) algorithm in Demirtas and Hedeker (2011).
pbs2bs(pbs, bin.var, cont.var, p=NULL, cutpoint=NULL)
pbs2bs(pbs, bin.var, cont.var, p=NULL, cutpoint=NULL)
pbs |
The point-biserial correlation. |
bin.var |
A numeric vector of the continuous variable before dichotomization. |
cont.var |
A numeric vector of the the continuous variable that is not transformed. |
p |
The expected value of the numeric vector |
cutpoint |
The value at which the vector |
The biserial correlation.
Demirtas, H. and Hedeker, D. (2011). A practical way for computing approximate lower and upper correlation bounds. The American Statistician, 65(2), 104-109.
Demirtas, H. and Hedeker, D. (2016). Computing the point-biserial correlation under any underlying continuous distribution. Communications in Statistics-Simulation and Computation, 45(8), 2744-2751.
set.seed(123) y1<-rweibull(n=100000, scale=1, shape=1.2) gaussmix <- function(n,m1,m2,s1,s2,pi) { I <- runif(n)<pi rnorm(n,mean=ifelse(I,m1,m2),sd=ifelse(I,s1,s2)) } y2<-gaussmix(n=100000, m1=0, s1=1, m2=3, s2=1, pi=0.6) pbs2bs(pbs=0.25, bin.var=y1, cont.var=y2, p=0.55) pbs2bs(pbs=0.25, bin.var=y1, cont.var=y2, cutpoint=0.65484)
set.seed(123) y1<-rweibull(n=100000, scale=1, shape=1.2) gaussmix <- function(n,m1,m2,s1,s2,pi) { I <- runif(n)<pi rnorm(n,mean=ifelse(I,m1,m2),sd=ifelse(I,s1,s2)) } y2<-gaussmix(n=100000, m1=0, s1=1, m2=3, s2=1, pi=0.6) pbs2bs(pbs=0.25, bin.var=y1, cont.var=y2, p=0.55) pbs2bs(pbs=0.25, bin.var=y1, cont.var=y2, cutpoint=0.65484)
This function computes the tetrachoric correlation between two continuous variables given the correlation after dichotomization of both variables (phi coefficient) as seen in Demirtas (2016). Before computation of the tetrachoric correlation, the specified phi coefficient is compared to the lower and upper correlation bounds for the two binary variables using the generate, sort and correlate (GSC) algorithm in Demirtas and Hedeker (2011).
phi2tet(phicoef, dist1, dist2)
phi2tet(phicoef, dist1, dist2)
phicoef |
The phi coefficient. |
dist1 |
A list of length 3 containing the skewness, excess kurtosis, and expected value after dichotomization for the first continuous variable with names skewness, exkurtosis, and p, respectively. |
dist2 |
A list of length 3 containing the skewness, excess kurtosis, and expected value after dichotomization for the second continuous variable with names skewness, exkurtosis, and p, respectively. |
The tetrachoric correlation.
Demirtas, H. (2016). A note on the relationship between the phi coefficient and the tetrachoric correlation under nonnormal underlying distributions. The American Statistician, 70(2), 143-148.
Demirtas, H. and Hedeker, D. (2011). A practical way for computing approximate lower and upper correlation bounds. The American Statistician, 65(2), 104-109.
set.seed(987) library(moments) y1<-rweibull(n=100000, scale=1, shape=1) y1.skew<-round(skewness(y1), 5) y1.exkurt<-round(kurtosis(y1)-3, 5) gaussmix <- function(n,m1,m2,s1,s2,pi) { I <- runif(n)<pi rnorm(n,mean=ifelse(I,m1,m2),sd=ifelse(I,s1,s2)) } y2<-gaussmix(n=100000, m1=0, s1=1, m2=3, s2=1, pi=0.5) y2.skew<-round(skewness(y2), 5) y2.exkurt<-round(kurtosis(y2)-3, 5) phi2tet(phicoef=0.1, dist1=list(skewness=y1.skew, exkurtosis=y1.exkurt, p=0.85), dist2=list(skewness=y2.skew, exkurtosis=y2.exkurt, p=0.15)) phi2tet(phicoef=0.5, dist1=list(skewness=y1.skew, exkurtosis=y1.exkurt, p=0.10), dist2=list(skewness=y2.skew, exkurtosis=y2.exkurt, p=0.30))
set.seed(987) library(moments) y1<-rweibull(n=100000, scale=1, shape=1) y1.skew<-round(skewness(y1), 5) y1.exkurt<-round(kurtosis(y1)-3, 5) gaussmix <- function(n,m1,m2,s1,s2,pi) { I <- runif(n)<pi rnorm(n,mean=ifelse(I,m1,m2),sd=ifelse(I,s1,s2)) } y2<-gaussmix(n=100000, m1=0, s1=1, m2=3, s2=1, pi=0.5) y2.skew<-round(skewness(y2), 5) y2.exkurt<-round(kurtosis(y2)-3, 5) phi2tet(phicoef=0.1, dist1=list(skewness=y1.skew, exkurtosis=y1.exkurt, p=0.85), dist2=list(skewness=y2.skew, exkurtosis=y2.exkurt, p=0.15)) phi2tet(phicoef=0.5, dist1=list(skewness=y1.skew, exkurtosis=y1.exkurt, p=0.10), dist2=list(skewness=y2.skew, exkurtosis=y2.exkurt, p=0.30))
This function computes the ordinal phi coefficient between two variables after both of the variables are ordinalized given the correlation before ordinalization (polychoric correlation) as seen in Demirtas et al. (2016). Before computation of the ordinal phi coefficient, the specified polychoric correlation is compared to the lower and upper correlation bounds of the two continuous variables as defined by the respective skewness and excess kurtosis using the generate, sort and correlate (GSC) algorithm in Demirtas and Hedeker (2011).
poly2ophi(polycorr, dist1, dist2)
poly2ophi(polycorr, dist1, dist2)
polycorr |
The polychoric correlation. |
dist1 |
A list of length 3 containing the skewness, excess kurtosis, and a numeric vector of marginal probabilities for the first continuous variable with names skewness, exkurtosis, and p, respectively. |
dist2 |
A list of length 3 containing the skewness, excess kurtosis, and a numeric vector of marginal probabilities for the second continuous variable with names skewness, exkurtosis, and p, respectively. |
The ordinal phi coefficient.
Demirtas, H., Ahmadian, R., Atis, S., Can, F.E., and Ercan, I. (2016). A nonnormal look at polychoric correlations: modeling the change in correlations before and after discretization. Computational Statistics, 31(4), 1385-1401.
Demirtas, H. and Hedeker, D. (2011). A practical way for computing approximate lower and upper correlation bounds. The American Statistician, 65(2), 104-109.
Ferrari, P.A. and Barbiero, A. (2012). Simulating ordinal data. Multivariate Behavioral Research, 47(4), 566-589.
corrY2corrZ
, corrZ2ophi
, mps2cps
set.seed(567) library(moments) y1<-rweibull(n=100000, scale=1, shape=3.6) y1.skew<-round(skewness(y1), 5) y1.exkurt<-round(kurtosis(y1)-3, 5) gaussmix <- function(n,m1,m2,s1,s2,pi) { I <- runif(n)<pi rnorm(n,mean=ifelse(I,m1,m2),sd=ifelse(I,s1,s2)) } y2<-gaussmix(n=100000, m1=0, s1=1, m2=2, s2=1, pi=0.3) y2.skew<-round(skewness(y2), 5) y2.exkurt<-round(kurtosis(y2)-3, 5) poly2ophi(polycorr=0.5, dist1=list(skewness=y1.skew, exkurtosis=y1.exkurt, p=c(0.4, 0.3, 0.2, 0.1)), dist2=list(skewness=y2.skew, exkurtosis=y2.exkurt , p=c(0.2, 0.2, 0.6))) poly2ophi(polycorr=0.5, dist1=list(skewness=y1.skew, exkurtosis=y1.exkurt, p=c(0.1, 0.1, 0.1, 0.7)), dist2=list(skewness=y2.skew, exkurtosis=y2.exkurt , p=c(0.8, 0.1, 0.1)))
set.seed(567) library(moments) y1<-rweibull(n=100000, scale=1, shape=3.6) y1.skew<-round(skewness(y1), 5) y1.exkurt<-round(kurtosis(y1)-3, 5) gaussmix <- function(n,m1,m2,s1,s2,pi) { I <- runif(n)<pi rnorm(n,mean=ifelse(I,m1,m2),sd=ifelse(I,s1,s2)) } y2<-gaussmix(n=100000, m1=0, s1=1, m2=2, s2=1, pi=0.3) y2.skew<-round(skewness(y2), 5) y2.exkurt<-round(kurtosis(y2)-3, 5) poly2ophi(polycorr=0.5, dist1=list(skewness=y1.skew, exkurtosis=y1.exkurt, p=c(0.4, 0.3, 0.2, 0.1)), dist2=list(skewness=y2.skew, exkurtosis=y2.exkurt , p=c(0.2, 0.2, 0.6))) poly2ophi(polycorr=0.5, dist1=list(skewness=y1.skew, exkurtosis=y1.exkurt, p=c(0.1, 0.1, 0.1, 0.7)), dist2=list(skewness=y2.skew, exkurtosis=y2.exkurt , p=c(0.8, 0.1, 0.1)))
This function computes the polyserial correlation between two continuous variables given the correlation after ordinalization of one of the variables (point-polyserial correlation) as seen in Demirtas and Hedeker (2016). Before computation of the polyserial correlation, the specified point-polyserial correlation is compared to the lower and upper correlation bounds of the continuous variable and ordinalized variable using the generate, sort and correlate (GSC) algorithm in Demirtas and Hedeker (2011).
pps2ps(pps, ord.var, cont.var, cats, p=NULL, cutpoint=NULL)
pps2ps(pps, ord.var, cont.var, cats, p=NULL, cutpoint=NULL)
pps |
The point-polyserial correlation. |
ord.var |
A numeric vector of the continuous variable before ordinalization. |
cont.var |
A numeric vector of the the continuous variable that is not transformed. |
cats |
A numeric vector of the categories in the ordinalization of |
p |
A numeric vector of the marginal probabilities corresponding to each category in |
cutpoint |
A numeric vector of the cutpoints used to define the categories |
The polyserial correlation.
Demirtas, H. and Hedeker, D. (2011). A practical way for computing approximate lower and upper correlation bounds. The American Statistician, 65(2), 104-109.
Demirtas, H. and Hedeker, D. (2016). Computing the point-biserial correlation under any underlying continuous distribution. Communications in Statistics-Simulation and Computation, 45(8), 2744-2751.
set.seed(234) y1<-rweibull(n=100000, scale=1, shape=25) gaussmix <- function(n,m1,m2,s1,s2,pi) { I <- runif(n)<pi rnorm(n,mean=ifelse(I,m1,m2),sd=ifelse(I,s1,s2)) } y2<-gaussmix(n=100000, m1=0, s1=1, m2=2, s2=1, pi=0.5) pps2ps(pps=0.3, ord.var=y1, cont.var=y2, cats=c(1,2,3,4), p=c(0.4, 0.3, 0.2, 0.1)) pps2ps(pps=0.3, ord.var=y1, cont.var=y2, cats=c(1,2,3,4), cutpoint=c(0.97341, 1.00750, 1.03421))
set.seed(234) y1<-rweibull(n=100000, scale=1, shape=25) gaussmix <- function(n,m1,m2,s1,s2,pi) { I <- runif(n)<pi rnorm(n,mean=ifelse(I,m1,m2),sd=ifelse(I,s1,s2)) } y2<-gaussmix(n=100000, m1=0, s1=1, m2=2, s2=1, pi=0.5) pps2ps(pps=0.3, ord.var=y1, cont.var=y2, cats=c(1,2,3,4), p=c(0.4, 0.3, 0.2, 0.1)) pps2ps(pps=0.3, ord.var=y1, cont.var=y2, cats=c(1,2,3,4), cutpoint=c(0.97341, 1.00750, 1.03421))
This function computes the point-polyserial correlation between two variables after one of the variables is ordinalized given the correlation before ordinalization (polyserial correlation) as seen in Demirtas and Hedeker (2016). Before computation of the point-polyserial correlation, the specified polyserial correlation is compared to the lower and upper correlation bounds of the two continuous variables using the generate, sort and correlate (GSC) algorithm in Demirtas and Hedeker (2011).
ps2pps(ps, ord.var, cont.var, cats, p=NULL, cutpoint=NULL)
ps2pps(ps, ord.var, cont.var, cats, p=NULL, cutpoint=NULL)
ps |
The polyserial correlation. |
ord.var |
A numeric vector of the continuous variable before ordinalization. |
cont.var |
A numeric vector of the the continuous variable that is not transformed. |
cats |
A numeric vector of the categories in the ordinalization of |
p |
A numeric vector of the marginal probabilities corresponding to each category in |
cutpoint |
A numeric vector of the cutpoints used to define the categories in |
The point-polyserial correlation.
Demirtas, H. and Hedeker, D. (2011). A practical way for computing approximate lower and upper correlation bounds. The American Statistician, 65(2), 104-109.
Demirtas, H. and Hedeker, D. (2016). Computing the point-biserial correlation under any underlying continuous distribution. Communications in Statistics-Simulation and Computation, 45(8), 2744-2751.
set.seed(234) y1<-rweibull(n=100000, scale=1, shape=25) gaussmix <- function(n,m1,m2,s1,s2,pi) { I <- runif(n)<pi rnorm(n,mean=ifelse(I,m1,m2),sd=ifelse(I,s1,s2)) } y2<-gaussmix(n=100000, m1=0, s1=1, m2=2, s2=1, pi=0.5) ps2pps(ps=0.6, ord.var=y1, cont.var=y2, cats=c(1,2,3,4), p=c(0.4, 0.3, 0.2, 0.1)) ps2pps(ps=0.6, ord.var=y1, cont.var=y2, cats=c(1,2,3,4), cutpoint=c(0.97341, 1.00750, 1.03421))
set.seed(234) y1<-rweibull(n=100000, scale=1, shape=25) gaussmix <- function(n,m1,m2,s1,s2,pi) { I <- runif(n)<pi rnorm(n,mean=ifelse(I,m1,m2),sd=ifelse(I,s1,s2)) } y2<-gaussmix(n=100000, m1=0, s1=1, m2=2, s2=1, pi=0.5) ps2pps(ps=0.6, ord.var=y1, cont.var=y2, cats=c(1,2,3,4), p=c(0.4, 0.3, 0.2, 0.1)) ps2pps(ps=0.6, ord.var=y1, cont.var=y2, cats=c(1,2,3,4), cutpoint=c(0.97341, 1.00750, 1.03421))
This function computes the phi coefficient between two variables after both of the variables are dichotomized given the correlation before dichotomization (tetrachoric correlation) as seen in Demirtas (2016). Before computation of the phi coefficient, the specified tetrachoric correlation is compared to the lower and upper correlation bounds of the two continuous variables as defined by the respective skewness and excess kurtosis using the generate, sort and correlate (GSC) algorithm in Demirtas and Hedeker (2011).
tet2phi(tetcorr, dist1, dist2)
tet2phi(tetcorr, dist1, dist2)
tetcorr |
The tetrachoric correlation. |
dist1 |
A list of length 3 containing the skewness, excess kurtosis, and expected value after dichotomization for the first continuous variable with names skewness, exkurtosis, and p, respectively. |
dist2 |
A list of length 3 containing the skewness, excess kurtosis, and expected value after dichotomization for the second continuous variable with names skewness, exkurtosis, and p, respectively. |
The phi coefficient.
Demirtas, H. (2016). A note on the relationship between the phi coefficient and the tetrachoric correlation under nonnormal underlying distributions. The American Statistician, 70(2), 143-148.
Demirtas, H. and Hedeker, D. (2011). A practical way for computing approximate lower and upper correlation bounds. The American Statistician, 65(2), 104-109.
set.seed(987) library(moments) y1<-rweibull(n=100000, scale=1, shape=1) y1.skew<-round(skewness(y1), 5) y1.exkurt<-round(kurtosis(y1)-3, 5) gaussmix <- function(n,m1,m2,s1,s2,pi) { I <- runif(n)<pi rnorm(n,mean=ifelse(I,m1,m2),sd=ifelse(I,s1,s2)) } y2<-gaussmix(n=100000, m1=0, s1=1, m2=3, s2=1, pi=0.5) y2.skew<-round(skewness(y2), 5) y2.exkurt<-round(kurtosis(y2)-3, 5) tet2phi(tetcorr=-0.4, dist1=list(skewness=y1.skew, exkurtosis=y1.exkurt, p=0.85), dist2=list(skewness=y2.skew, exkurtosis=y2.exkurt, p=0.15)) tet2phi(tetcorr=0.7, dist1=list(skewness=y1.skew, exkurtosis=y1.exkurt, p=0.10), dist2=list(skewness=y2.skew, exkurtosis=y2.exkurt, p=0.30))
set.seed(987) library(moments) y1<-rweibull(n=100000, scale=1, shape=1) y1.skew<-round(skewness(y1), 5) y1.exkurt<-round(kurtosis(y1)-3, 5) gaussmix <- function(n,m1,m2,s1,s2,pi) { I <- runif(n)<pi rnorm(n,mean=ifelse(I,m1,m2),sd=ifelse(I,s1,s2)) } y2<-gaussmix(n=100000, m1=0, s1=1, m2=3, s2=1, pi=0.5) y2.skew<-round(skewness(y2), 5) y2.exkurt<-round(kurtosis(y2)-3, 5) tet2phi(tetcorr=-0.4, dist1=list(skewness=y1.skew, exkurtosis=y1.exkurt, p=0.85), dist2=list(skewness=y2.skew, exkurtosis=y2.exkurt, p=0.15)) tet2phi(tetcorr=0.7, dist1=list(skewness=y1.skew, exkurtosis=y1.exkurt, p=0.10), dist2=list(skewness=y2.skew, exkurtosis=y2.exkurt, p=0.30))