This page shows an example of how to generate conditional knockoffs for a multivariate gaussian graphical model as in Section 3.2 of the accompanying paper, which allows the dimension of covariates to be much larger than the sample size.

library(knockoff)
source('src/knockoff_measure.R')
source('src/util.R')
source('src/cknock_ldg.R')
source('src/cknock_ggm.R')

In this simulation we assume that the distribution of the covariates belongs to a Gaussian graphical model and a superset of its edges is known. Examples include autoregressive models and spatial-lattice models. Here we use AR(10) as an example.

First, we set up the distributional parameters. Specifically, the covariance matrix is obtained by renormalizing a banded precision matrix \(\mathbf{\Omega}_0\), whose entries satisfy \[-\mathbf{\Omega}_0 (i,j) / \sqrt{\mathbf{\Omega}_0 (i,i) \mathbf{\Omega}_0 (j,j) }= \left\{ \begin{array}{c l} 0.05 \quad &\text{, if } |i-j|\le 10, \\ 0 \quad &\text{, if } |i-j|>10. \end{array} \right.\]

set.seed(2019)
# Problem parameters
p = 2000    # number of covariates
n = 400     # number of observations
k = 60      # number of non-zero regression coefficients
A = 60      # signal amptitude 
nonzero = sample(p, k) 
beta = 1 * (1:p %in% nonzero) * sample(c(-1,1),p,rep = T) 


# setup the banded precison matrix
nn=10            # bandwidth
rho0=-0.05       # the non-zero partial correlation
Omega0 = (1 - rho0) * diag(p) + rho0
Omega0[abs(outer(1:p, 1:p, '-')) > nn] = 0
S=solve(Omega0)
# renormalize so that the marginal variance = 1
ds=diag(S)
Omega=sqrt(ds)%diag*%Omega0%*diag%sqrt(ds)
Sigma=sqrt(1/ds)%diag*%S%*diag%sqrt(1/ds)
Sigma.chol = chol(Sigma)

Then we generate the covariates and response.

# Generate the covariate 
X = matrix(rnorm(n * p), n) %*% Sigma.chol
# Known graphical information: adjacent matrix
Adj = (Omega!=0 )

# Generate the response from a linear model
Y = X%*%beta*A/sqrt(n) + rnorm(n)

The conditional knockoffs can be generated by blocking and two-fold data spliting, where the input threshold \(n'=50\), the number of folds \(K=2\) and the method for computing \(\mathbf{s}\) is mix (as discussed in Appendix B.1.1 of the paper). This function runs Algorithm 12 in Appendix B.2.2 to search for blocking sets and then runs Algorithm 4 to generate knockoffs for each fold.

Xk.cond = cknockoff.ggm(X,Adj,threshold=50,K=2,method='mix')

Once we have generated conditional knockoffs, all the usual knockoffs machinery applies.

fdr.level = 0.2
knockoff.stat.fun =  mod.stat.lasso_coefdiff
filter.cond = knockoff.filter(X,Y,
    knockoffs = function(x) {
     Xk.cond
    },
    statistic = knockoff.stat.fun,
    fdr = fdr.level,offset = 1)
# false positive and false negative
c(fp(filter.cond$selected, beta),
 fn(filter.cond$selected, beta))
## [1] 0.1800000 0.6833333