Title: | Gaussian Graphical Models Using Ridge Penalty Followed by Thresholding and Reestimation |
---|---|
Description: | Estimation of partial correlation matrix using ridge penalty followed by thresholding and reestimation. Under multivariate Gaussian assumption, the matrix constitutes an Gaussian graphical model (GGM). |
Authors: | Min Jin Ha [aut, cre], Shannon T. Holloway [ctb] |
Maintainer: | Shannon T. Holloway <[email protected]> |
License: | GPL-2 |
Version: | 1.4 |
Built: | 2024-11-18 05:23:54 UTC |
Source: | https://github.com/cran/GGMridge |
Estimation of the parameters, null proportion, and degrees of freedom of the exact null density in the mixture distribution.
EM.mixture(p, eta0, df, tol)
EM.mixture(p, eta0, df, tol)
p |
A numeric vector representing partial correlation coefficients. |
eta0 |
An initial value for the null proportion; 1-eta0 is the non-null proportion. |
df |
An initial value for the degrees of freedom of the exact null density. |
tol |
The tolerance level for convergence. |
A list object containing
df |
Estimated degrees of freedom of the null density. |
eta0 |
Estimated null proportion. |
iter |
The number of iterations required to reach convergence. |
Min Jin Ha
Schafer, J. and Strimmer, K. (2005). An empirical Bayes approach to inferring large-scale gene association networks. Bioinformatics, 21, 754–764.
Estimation of empirical null distribution using Efron's central matching.
getEfronp( z, bins = 120L, maxQ = 9, pct = 0, pct0 = 0.25, cc = 1.2, plotIt = FALSE )
getEfronp( z, bins = 120L, maxQ = 9, pct = 0, pct0 = 0.25, cc = 1.2, plotIt = FALSE )
z |
A numeric vector of z values following the theoretical normal null distribution. |
bins |
The number of intervals for density estimation of the marginal density of z. |
maxQ |
The maximum degree of the polynomial to be considered for density estimation of the marginal density of z. |
pct |
Low and top (pct*100) f(z). |
pct0 |
Low and top (pct0*100) estimate f0(z). |
cc |
The central parts
of the empirical distribution z are used for an estimate of the null proportion (eta). |
plotIt |
TRUE if density plot is to be produced. |
A list containing
correctz |
The corrected z values to follow empirically standard normal distribution. |
correctp |
The corrected p values using the correct z values. |
q |
The chosen degree of polynomial for the estimated marginal density. |
mu0hat |
The location parameter for the normal null distribution. |
sigma0hat |
The scale parameter for the normal null distribution. |
eta |
The estimated null proportion. |
Min Jin Ha
Efron, B. (2004). Large-scale simultaneous hypothesis testing. Journal of the American Statistical Association, 99, 96–104.
Ha, M. J. and Sun, W. (2014). Partial correlation matrix estimation using ridge penalty followed by thresholding and re-estimation. Biometrics, 70, 762–770.
p <- 100 # number of variables n <- 50 # sample size ############################### # Simulate data ############################### simulation <- simulateData(G = p, etaA = 0.02, n = n, r = 1) data <- simulation$data[[1]] stddata <- scale(x = data, center = TRUE, scale = TRUE) ############################### # estimate ridge parameter ############################### lambda.array <- seq(from = 0.1, to = 20, by = 0.1) * (n - 1.0) fit <- lambda.cv(x = stddata, lambda = lambda.array, fold = 10L) lambda <- fit$lambda[which.min(fit$spe)] / (n - 1.0) ############################### # calculate partial correlation # using ridge inverse ############################### w.upper <- which(upper.tri(diag(p))) partial <- solve(lambda * diag(p) + cor(data)) partial <- (-scaledMat(x = partial))[w.upper] ############################### # get p-values from empirical # null distribution ############################### efron.fit <- getEfronp(z = transFisher(x = partial))
p <- 100 # number of variables n <- 50 # sample size ############################### # Simulate data ############################### simulation <- simulateData(G = p, etaA = 0.02, n = n, r = 1) data <- simulation$data[[1]] stddata <- scale(x = data, center = TRUE, scale = TRUE) ############################### # estimate ridge parameter ############################### lambda.array <- seq(from = 0.1, to = 20, by = 0.1) * (n - 1.0) fit <- lambda.cv(x = stddata, lambda = lambda.array, fold = 10L) lambda <- fit$lambda[which.min(fit$spe)] / (n - 1.0) ############################### # calculate partial correlation # using ridge inverse ############################### w.upper <- which(upper.tri(diag(p))) partial <- solve(lambda * diag(p) + cor(data)) partial <- (-scaledMat(x = partial))[w.upper] ############################### # get p-values from empirical # null distribution ############################### efron.fit <- getEfronp(z = transFisher(x = partial))
Calculates the Kolmogorov-Smirnov statistic for p-values
ksStat(p)
ksStat(p)
p |
A numeric vector with p-values. |
Kolmogorov-Smirnov statistic
Min Jin Ha
p <- stats::runif(100) ksStat(p = p) ks.test(p, y = "punif") # compare with ks.test
p <- stats::runif(100) ksStat(p = p) ks.test(p, y = "punif") # compare with ks.test
Choose the tuning parameter of the ridge inverse by minimizing cross validation estimates of the total prediction errors of the p separate ridge regressions.
lambda.cv(x, lambda, fold)
lambda.cv(x, lambda, fold)
x |
An n by p data matrix. |
lambda |
A numeric vector of candidate tuning parameters. |
fold |
fold-cross validation is performed. |
A list containing
lambda |
The selected tuning parameter, which minimizes the total prediction errors. |
spe |
The total prediction error for all the candidate lambda values. |
Min Jin Ha
Ha, M. J. and Sun, W. (2014). Partial correlation matrix estimation using ridge penalty followed by thresholding and re-estimation. Biometrics, 70, 762–770.
p <- 100 # number of variables n <- 50 # sample size ############################### # Simulate data ############################### simulation <- simulateData(G = p, etaA = 0.02, n = n, r = 1) data <- simulation$data[[1L]] stddata <- scale(x = data, center = TRUE, scale = TRUE) ############################### # estimate ridge parameter ############################### lambda.array <- seq(from = 0.1, to = 20, by = 0.1) * (n - 1.0) fit <- lambda.cv(x = stddata, lambda = lambda.array, fold = 10L) lambda <- fit$lambda[which.min(fit$spe)] / (n - 1.0) ############################### # calculate partial correlation # using ridge inverse ############################### partial <- solve(lambda*diag(p) + cor(data)) partial <- -scaledMat(x = partial)
p <- 100 # number of variables n <- 50 # sample size ############################### # Simulate data ############################### simulation <- simulateData(G = p, etaA = 0.02, n = n, r = 1) data <- simulation$data[[1L]] stddata <- scale(x = data, center = TRUE, scale = TRUE) ############################### # estimate ridge parameter ############################### lambda.array <- seq(from = 0.1, to = 20, by = 0.1) * (n - 1.0) fit <- lambda.cv(x = stddata, lambda = lambda.array, fold = 10L) lambda <- fit$lambda[which.min(fit$spe)] / (n - 1.0) ############################### # calculate partial correlation # using ridge inverse ############################### partial <- solve(lambda*diag(p) + cor(data)) partial <- -scaledMat(x = partial)
Choose the tuning parameter of the ridge inverse and p-value cutoff by minimizing cross validation estimates of the total prediction errors of the p separate ridge regressions.
lambda.pcut.cv(x, lambda, pcut, fold = 10L)
lambda.pcut.cv(x, lambda, pcut, fold = 10L)
x |
n by p data matrix. |
lambda |
A vector of candidate tuning parameters. |
pcut |
A vector of candidate cutoffs of pvalues. |
fold |
fold-cross validation is performed. |
The total prediction errors for all lambda (row-wise) and pcut (column-wise)
Min Jin Ha
Ha, M. J. and Sun, W. (2014). Partial correlation matrix estimation using ridge penalty followed by thresholding and re-estimation. Biometrics, 70, 762–770.
p <- 100 # number of variables n <- 50 # sample size ############################### # Simulate data ############################### simulation <- simulateData(G = p, etaA = 0.02, n = n, r = 1) data <- simulation$data[[1L]] stddata <- scale(x = data, center = TRUE, scale = TRUE) ############################### # Selection of a lambda and a # p-value cutoff ############################### lambda.array <- seq(from = 0.1, to = 5, length = 10) * (n-1.0) pcut.array <- seq(from = 0.01, to = 0.05, by = 0.01) tpe <- lambda.pcut.cv(x = stddata, lambda = lambda.array, pcut = pcut.array, fold = 3L) w.mintpe <- which(tpe == min(tpe), arr.ind = TRUE) lambda <- lambda.array[w.mintpe[1L]] alpha <- pcut.array[w.mintpe[2L]]
p <- 100 # number of variables n <- 50 # sample size ############################### # Simulate data ############################### simulation <- simulateData(G = p, etaA = 0.02, n = n, r = 1) data <- simulation$data[[1L]] stddata <- scale(x = data, center = TRUE, scale = TRUE) ############################### # Selection of a lambda and a # p-value cutoff ############################### lambda.array <- seq(from = 0.1, to = 5, length = 10) * (n-1.0) pcut.array <- seq(from = 0.01, to = 0.05, by = 0.01) tpe <- lambda.pcut.cv(x = stddata, lambda = lambda.array, pcut = pcut.array, fold = 3L) w.mintpe <- which(tpe == min(tpe), arr.ind = TRUE) lambda <- lambda.array[w.mintpe[1L]] alpha <- pcut.array[w.mintpe[2L]]
Choose the Tuning Parameter of the Ridge Inverse and Thresholding Level of the Empirical p-Values.
Calculate total prediction error for test data after fitting partial correlations from train data for all values of lambda and pcut.
lambda.pcut.cv1(train, test, lambda, pcut)
lambda.pcut.cv1(train, test, lambda, pcut)
train |
An n x p data matrix from which the model is fitted. |
test |
An m x p data matrix from which the model is evaluated. |
lambda |
A vector of candidate tuning parameters. |
pcut |
A vector of candidate cutoffs of pvalues. |
Total prediction error for all the candidate lambda and pvalue cutoff values.
Min Jin Ha
Ha, M. J. and Sun, W. (2014). Partial correlation matrix estimation using ridge penalty followed by thresholding and re-estimation. Biometrics, 70, 762–770.
p <- 100 # number of variables n <- 50 # sample size ############################### # Simulate data ############################### simulation <- simulateData(G = p, etaA = 0.02, n = n, r = 1) data <- simulation$data[[1L]] ############################### # Split into train/test sets ############################### testindex <- sample(1L:n, 10L) train <- data[-testindex,,drop = FALSE] stdTrain <- scale(x = train, center = TRUE, scale = TRUE) test <- data[testindex,,drop = FALSE] stdTest <- scale(x = test, center = TRUE, scale = TRUE) ############################### # Calculate total prediction # errors for all candidate # lambda and p-value cutoffs ############################### lambda.array <- seq(from = 0.1, to = 5, length = 10) * (n - 1.0) pcut.array <- seq(from = 0.01, to = 0.05, by = 0.01) tpe <- lambda.pcut.cv1(train = stdTrain, test = stdTest, lambda = lambda.array, pcut = pcut.array)
p <- 100 # number of variables n <- 50 # sample size ############################### # Simulate data ############################### simulation <- simulateData(G = p, etaA = 0.02, n = n, r = 1) data <- simulation$data[[1L]] ############################### # Split into train/test sets ############################### testindex <- sample(1L:n, 10L) train <- data[-testindex,,drop = FALSE] stdTrain <- scale(x = train, center = TRUE, scale = TRUE) test <- data[testindex,,drop = FALSE] stdTest <- scale(x = test, center = TRUE, scale = TRUE) ############################### # Calculate total prediction # errors for all candidate # lambda and p-value cutoffs ############################### lambda.array <- seq(from = 0.1, to = 5, length = 10) * (n - 1.0) pcut.array <- seq(from = 0.01, to = 0.05, by = 0.01) tpe <- lambda.pcut.cv1(train = stdTrain, test = stdTest, lambda = lambda.array, pcut = pcut.array)
Estimation of a weighted average of a sample covariance (correlation) matrix and an identity matrix.
lambda.TargetD(x)
lambda.TargetD(x)
x |
Centered data for covariance shrinkage and standardized data for correlation shrinkage. |
An analytical approach to the estimate ridge parameter.
The estimates of shrinkage intensity.
Min Jin Ha
Schafer, J. and Strimmer, K. (2005). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology, 4, 32.
Ha, M. J. and Sun, W. (2014). Partial correlation matrix estimation using ridge penalty followed by thresholding and re-estimation. Biometrics, 70, 762–770.
############################### # Simulate data ############################### simulation <- simulateData(G = 100, etaA = 0.02, n = 50, r = 10) dat <- simulation$data[[1L]] stddat <- scale(x = dat, center = TRUE, scale = TRUE) shrinkage.lambda <- lambda.TargetD(x = stddat) ############################### # the ridge parameter ############################### ridge.lambda <- shrinkage.lambda / (1.0 - shrinkage.lambda) ############################### # partial correlation matrix ############################### partial <- solve(cor(dat) + ridge.lambda * diag(ncol(dat))) partial <- -scaledMat(x = partial)
############################### # Simulate data ############################### simulation <- simulateData(G = 100, etaA = 0.02, n = 50, r = 10) dat <- simulation$data[[1L]] stddat <- scale(x = dat, center = TRUE, scale = TRUE) shrinkage.lambda <- lambda.TargetD(x = stddat) ############################### # the ridge parameter ############################### ridge.lambda <- shrinkage.lambda / (1.0 - shrinkage.lambda) ############################### # partial correlation matrix ############################### partial <- solve(cor(dat) + ridge.lambda * diag(ncol(dat))) partial <- -scaledMat(x = partial)
Choose the tuning parameter of a ridge regression using cross-validation.
ne.lambda.cv(y, x, lambda, fold)
ne.lambda.cv(y, x, lambda, fold)
y |
Length n response vector. |
x |
n x p matrix for covariates with p variables and n sample size. |
lambda |
A numeric vector for candidate tuning parameters for a ridge regression. |
fold |
fold-cross validation used to choose the tuning parameter. |
A list containing
lambda |
The selected tuning parameter, which minimizes the prediction error. |
spe |
The prediction error for all of the candidate lambda values. |
Min Jin Ha
Ha, M. J. and Sun, W. (2014). Partial correlation matrix estimation using ridge penalty followed by thresholding and re-estimation. Biometrics, 70, 762–770.
p <- 100 # number of variables n <- 50 # sample size ############################### # Simulate data ############################### simulation <- simulateData(G = p, etaA = 0.02, n = n, r = 1) data <- simulation$data[[1L]] stddat <- scale(x = data, center = TRUE, scale = TRUE) X <- stddat[,-1L,drop = FALSE] y <- stddat[,1L] fit.lambda <- ne.lambda.cv(y = y, x = X, lambda = seq(from = 0.01, to = 1,by = 0.1), fold = 10L) lambda <- fit.lambda$lambda[which.min(fit.lambda$spe)]
p <- 100 # number of variables n <- 50 # sample size ############################### # Simulate data ############################### simulation <- simulateData(G = p, etaA = 0.02, n = n, r = 1) data <- simulation$data[[1L]] stddat <- scale(x = data, center = TRUE, scale = TRUE) X <- stddat[,-1L,drop = FALSE] y <- stddat[,1L] fit.lambda <- ne.lambda.cv(y = y, x = X, lambda = seq(from = 0.01, to = 1,by = 0.1), fold = 10L) lambda <- fit.lambda$lambda[which.min(fit.lambda$spe)]
The partial correlation matrix is estimated by p separate ridge regressions with the parameters selected by cross validation.
R.separate.ridge(x, fold, lambda, verbose = FALSE)
R.separate.ridge(x, fold, lambda, verbose = FALSE)
x |
n x p data matrix; n is the # of samples and p is the # of variables. |
fold |
Ridge parameters are selected by fold-cross validations separately for each regression. |
lambda |
The candidate ridge parameters for all p ridge regressions. |
verbose |
TRUE/FALSE; if TRUE, print the procedure. |
A list containing
R |
The partial correlation matrix. |
lambda.sel |
The selected tuning parameters for p ridge regressions. |
Min Jin Ha
Ha, M. J. and Sun, W. (2014). Partial correlation matrix estimation using ridge penalty followed by thresholding and re-estimation. Biometrics, 70, 762–770.
p <- 100 # number of variables n <- 50 # sample size ############################### # Simulate data ############################### simulation <- simulateData(G = p, etaA = 0.02, n = n, r = 1) data <- simulation$data[[1L]] stddata <- scale(x = data, center = TRUE, scale = FALSE) ############################### # estimate ridge parameter ############################### w.upper <- which(upper.tri(diag(p))) lambda.array <- seq(from = 0.1, to = 20, by=0.1) * (n-1.0) partial.sep <- R.separate.ridge(x = stddata, lambda = lambda.array, fold = 5L, verbose = TRUE)$R[w.upper]
p <- 100 # number of variables n <- 50 # sample size ############################### # Simulate data ############################### simulation <- simulateData(G = p, etaA = 0.02, n = n, r = 1) data <- simulation$data[[1L]] stddata <- scale(x = data, center = TRUE, scale = FALSE) ############################### # estimate ridge parameter ############################### w.upper <- which(upper.tri(diag(p))) lambda.array <- seq(from = 0.1, to = 20, by=0.1) * (n-1.0) partial.sep <- R.separate.ridge(x = stddata, lambda = lambda.array, fold = 5L, verbose = TRUE)$R[w.upper]
Scale a square matrix to have unit diagonal elements.
scaledMat(x)
scaledMat(x)
x |
A square matrix with positive diagonal elements |
Scaled matrix of x.
Min Jin Ha
############################### # Simulate data ############################### simulation <- simulateData(G = 100, etaA = 0.02, n = 50, r = 10) dat <- simulation$data[[1L]] correlation <- scaledMat(x = stats::cov(dat))
############################### # Simulate data ############################### simulation <- simulateData(G = 100, etaA = 0.02, n = 50, r = 10) dat <- simulation$data[[1L]] correlation <- scaledMat(x = stats::cov(dat))
Generate a random network where both the network structure and the partial correlation coefficients are random. The data matrices are generated from multivariate normal distribution with the covariance matrix corresponding to the network.
simulateData(G, etaA, n, r, dist = "mvnorm")
simulateData(G, etaA, n, r, dist = "mvnorm")
G |
The number of variables (vertices). |
etaA |
The proportion of non-null edges among all the G(G-1)/2 edges. |
n |
The sample size. |
r |
The number of replicated G by N data matrices. |
dist |
A function which indicates the distribution of sample. "mvnorm" is multivariate normal distribution and "mvt" is multivariate t distribution with df=2. The default is set by "mvnorm". |
A list containing
data |
a list, each element containing an n X G matrix of simulated data. |
true.partialcor |
The partial correlation matrix which the datasets are generated from. |
truecor.scaled |
The covariance matrix calculted from the partial correlation matrix. |
sig.node |
The indices of nonzero upper triangle elements of partial correlation matrix. |
Min Jin Ha
Schafer, J. and Strimmer, K. (2005). An empirical Bayes approach to inferring large-scale gene association networks. Bioinformatics, 21, 754–764.
simulation <- simulateData(G = 100, etaA = 0.02, n = 50, r = 10)
simulation <- simulateData(G = 100, etaA = 0.02, n = 50, r = 10)
Estimation of nonzero entries of the partial correlation matrix given zero structure.
structuredEstimate(x, E)
structuredEstimate(x, E)
x |
n by p data matrix with the number of variables p and sample size n. |
E |
The row and column indices of zero entries of the partial correlation matrix. |
A list containing
R |
The partial correlation matrix. |
K |
The inverse covariance matrix. |
RSS |
The residual sum of squares. |
Min Jin Ha
Ha, M. J. and Sun, W. (2014). Partial correlation matrix estimation using ridge penalty followed by thresholding and re-estimation. Biometrics, 70, 762–770.
p <- 100 # number of variables n <- 50 # sample size ############################### # Simulate data ############################### simulation <- simulateData(G = p, etaA = 0.02, n = n, r = 1) data <- simulation$data[[1L]] stddata <- scale(x = data, center = TRUE, scale = TRUE) ############################### # estimate ridge parameter ############################### lambda.array <- seq(from = 0.1, to = 20, by = 0.1) * (n-1.0) fit <- lambda.cv(x = stddata, lambda = lambda.array, fold = 10L) lambda <- fit$lambda[which.min(fit$spe)]/(n-1) ############################### # calculate partial correlation # using ridge inverse ############################### w.upper <- which(upper.tri(diag(p))) partial <- solve(lambda * diag(p) + cor(data)) partial <- (-scaledMat(x = partial))[w.upper] ############################### # get p-values from empirical # null distribution ############################### efron.fit <- getEfronp(z = transFisher(x = partial), bins = 50L, maxQ = 13) ############################### # estimate the edge set of # partial correlation graph with # FDR control at level 0.01 ############################### w.array <- which(upper.tri(diag(p)),arr.ind=TRUE) th <- 0.01 wsig <- which(p.adjust(efron.fit$correctp, method="BH") < th ) E <- w.array[wsig,] dim(E) ############################### # structured estimation ############################### fit <- structuredEstimate(x = stddata, E = E) th.partial <- fit$R
p <- 100 # number of variables n <- 50 # sample size ############################### # Simulate data ############################### simulation <- simulateData(G = p, etaA = 0.02, n = n, r = 1) data <- simulation$data[[1L]] stddata <- scale(x = data, center = TRUE, scale = TRUE) ############################### # estimate ridge parameter ############################### lambda.array <- seq(from = 0.1, to = 20, by = 0.1) * (n-1.0) fit <- lambda.cv(x = stddata, lambda = lambda.array, fold = 10L) lambda <- fit$lambda[which.min(fit$spe)]/(n-1) ############################### # calculate partial correlation # using ridge inverse ############################### w.upper <- which(upper.tri(diag(p))) partial <- solve(lambda * diag(p) + cor(data)) partial <- (-scaledMat(x = partial))[w.upper] ############################### # get p-values from empirical # null distribution ############################### efron.fit <- getEfronp(z = transFisher(x = partial), bins = 50L, maxQ = 13) ############################### # estimate the edge set of # partial correlation graph with # FDR control at level 0.01 ############################### w.array <- which(upper.tri(diag(p)),arr.ind=TRUE) th <- 0.01 wsig <- which(p.adjust(efron.fit$correctp, method="BH") < th ) E <- w.array[wsig,] dim(E) ############################### # structured estimation ############################### fit <- structuredEstimate(x = stddata, E = E) th.partial <- fit$R
Fisher's Z-transformation of (partial) correlation.
transFisher(x)
transFisher(x)
x |
A vector having entries between -1 and 1. |
Fisher's Z-transformed values.
Min Jin Ha
############################### # Simulate data ############################### simulation <- simulateData(G = 100, etaA = 0.02, n = 50, r = 1) dat <- simulation$data[[1L]] stddat <- scale(x = dat, center = TRUE, scale = TRUE) shrinkage.lambda <- lambda.TargetD(x = stddat) ############################### # the ridge parameter ############################### ridge.lambda <- shrinkage.lambda / (1.0 - shrinkage.lambda) ############################### # partial correlation matrix ############################### partial <- solve(cor(dat) + ridge.lambda * diag(ncol(dat))) partial <- -scaledMat(x = partial) ############################### # Fisher's Z transformation of # upper diagonal of the partial # correlation matrix ############################### w.upper <- which(upper.tri(diag(nrow(dat)))) psi <- transFisher(x = partial[w.upper])
############################### # Simulate data ############################### simulation <- simulateData(G = 100, etaA = 0.02, n = 50, r = 1) dat <- simulation$data[[1L]] stddat <- scale(x = dat, center = TRUE, scale = TRUE) shrinkage.lambda <- lambda.TargetD(x = stddat) ############################### # the ridge parameter ############################### ridge.lambda <- shrinkage.lambda / (1.0 - shrinkage.lambda) ############################### # partial correlation matrix ############################### partial <- solve(cor(dat) + ridge.lambda * diag(ncol(dat))) partial <- -scaledMat(x = partial) ############################### # Fisher's Z transformation of # upper diagonal of the partial # correlation matrix ############################### w.upper <- which(upper.tri(diag(nrow(dat)))) psi <- transFisher(x = partial[w.upper])