This page shows an example of how to generate conditional knockoffs for a multivariate gaussian graphical model as in Section 3.2 of the accompanying paper, which allows the dimension of covariates to be much larger than the sample size.


In this simulation we assume that the distribution of the covariates belongs to a Gaussian graphical model and a superset of its edges is known. Examples include autoregressive models and spatial-lattice models. Here we use AR(10) as an example.

First, we set up the distributional parameters. Specifically, the covariance matrix is obtained by renormalizing a banded precision matrix \(\mathbf{\Omega}_0\), whose entries satisfy \[-\mathbf{\Omega}_0 (i,j) / \sqrt{\mathbf{\Omega}_0 (i,i) \mathbf{\Omega}_0 (j,j) }= \left\{ \begin{array}{c l} 0.05 \quad &\text{, if } |i-j|\le 10, \\ 0 \quad &\text{, if } |i-j|>10. \end{array} \right.\]

# Problem parameters
p = 2000    # number of covariates
n = 400     # number of observations
k = 60      # number of non-zero regression coefficients
A = 60      # signal amptitude 
nonzero = sample(p, k) 
beta = 1 * (1:p %in% nonzero) * sample(c(-1,1),p,rep = T) 

# setup the banded precison matrix
nn=10            # bandwidth
rho0=-0.05       # the non-zero partial correlation
Omega0 = (1 - rho0) * diag(p) + rho0
Omega0[abs(outer(1:p, 1:p, '-')) > nn] = 0
# renormalize so that the marginal variance = 1
Sigma.chol = chol(Sigma)

Then we generate the covariates and response.

# Generate the covariate 
X = matrix(rnorm(n * p), n) %*% Sigma.chol
# Known graphical information: adjacent matrix
Adj = (Omega!=0 )

# Generate the response from a linear model
Y = X%*%beta*A/sqrt(n) + rnorm(n)

The conditional knockoffs can be generated by blocking and two-fold data spliting, where the input threshold \(n'=50\), the number of folds \(K=2\) and the method for computing \(\mathbf{s}\) is mix (as discussed in Appendix B.1.1 of the paper). This function runs Algorithm 12 in Appendix B.2.2 to search for blocking sets and then runs Algorithm 4 to generate knockoffs for each fold.

Xk.cond = cknockoff.ggm(X,Adj,threshold=50,K=2,method='mix')

Once we have generated conditional knockoffs, all the usual knockoffs machinery applies.

fdr.level = 0.2 =  mod.stat.lasso_coefdiff
filter.cond = knockoff.filter(X,Y,
    knockoffs = function(x) {
    statistic =,
    fdr = fdr.level,offset = 1)
# false positive and false negative
c(fp(filter.cond$selected, beta),
 fn(filter.cond$selected, beta))
## [1] 0.1800000 0.6833333