This page shows an example of how to generate conditional knockoffs for a multivariate gaussian graphical model as in Section 3.2 of the accompanying paper, which allows the dimension of covariates to be much larger than the sample size.
library(knockoff)
source('src/knockoff_measure.R')
source('src/util.R')
source('src/cknock_ldg.R')
source('src/cknock_ggm.R')
In this simulation we assume that the distribution of the covariates belongs to a Gaussian graphical model and a superset of its edges is known. Examples include autoregressive models and spatial-lattice models. Here we use AR(10) as an example.
First, we set up the distributional parameters. Specifically, the covariance matrix is obtained by renormalizing a banded precision matrix \(\mathbf{\Omega}_0\), whose entries satisfy \[-\mathbf{\Omega}_0 (i,j) / \sqrt{\mathbf{\Omega}_0 (i,i) \mathbf{\Omega}_0 (j,j) }= \left\{ \begin{array}{c l} 0.05 \quad &\text{, if } |i-j|\le 10, \\ 0 \quad &\text{, if } |i-j|>10. \end{array} \right.\]
set.seed(2019)
# Problem parameters
p = 2000 # number of covariates
n = 400 # number of observations
k = 60 # number of non-zero regression coefficients
A = 60 # signal amptitude
nonzero = sample(p, k)
beta = 1 * (1:p %in% nonzero) * sample(c(-1,1),p,rep = T)
# setup the banded precison matrix
nn=10 # bandwidth
rho0=-0.05 # the non-zero partial correlation
Omega0 = (1 - rho0) * diag(p) + rho0
Omega0[abs(outer(1:p, 1:p, '-')) > nn] = 0
S=solve(Omega0)
# renormalize so that the marginal variance = 1
ds=diag(S)
Omega=sqrt(ds)%diag*%Omega0%*diag%sqrt(ds)
Sigma=sqrt(1/ds)%diag*%S%*diag%sqrt(1/ds)
Sigma.chol = chol(Sigma)
Then we generate the covariates and response.
# Generate the covariate
X = matrix(rnorm(n * p), n) %*% Sigma.chol
# Known graphical information: adjacent matrix
Adj = (Omega!=0 )
# Generate the response from a linear model
Y = X%*%beta*A/sqrt(n) + rnorm(n)
The conditional knockoffs can be generated by blocking and two-fold data spliting, where the input threshold \(n'=50\), the number of folds \(K=2\) and the method for computing \(\mathbf{s}\) is mix
(as discussed in Appendix B.1.1 of the paper). This function runs Algorithm 12 in Appendix B.2.2 to search for blocking sets and then runs Algorithm 4 to generate knockoffs for each fold.
Xk.cond = cknockoff.ggm(X,Adj,threshold=50,K=2,method='mix')
Once we have generated conditional knockoffs, all the usual knockoffs machinery applies.
fdr.level = 0.2
knockoff.stat.fun = mod.stat.lasso_coefdiff
filter.cond = knockoff.filter(X,Y,
knockoffs = function(x) {
Xk.cond
},
statistic = knockoff.stat.fun,
fdr = fdr.level,offset = 1)
# false positive and false negative
c(fp(filter.cond$selected, beta),
fn(filter.cond$selected, beta))
## [1] 0.1800000 0.6833333