hidimstat.knockoff_aggregation

hidimstat.knockoff_aggregation(X, y, centered=True, shrink=False, construct_method='equi', fdr=0.1, fdr_control='bhq', reshaping_function=None, offset=1, method='quantile', statistic='lasso_cv', cov_estimator='ledoit_wolf', joblib_verbose=0, n_bootstraps=25, n_jobs=1, adaptive_aggregation=False, gamma=0.5, gamma_min=0.05, verbose=False, memory=None, random_state=None)

This function implements the aggregation of multiple knockoffs introduced by Nguyen et al.[1]

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The input samples.

yarray-like of shape (n_samples,),

The target values (class labels in classification, real numbers in regression).

centeredbool, default=True

Whether to standardize the data before doing the inference procedure.

shrinkbool, default=False

Whether to shrink the empirical covariance matrix.

construct_methodstr, default=”equi”

The knockoff construction methods. The options include: - “equi” for equi-correlated knockoff - “sdp” for optimization scheme

fdrfloat, default=0.1

The desired controlled FDR level

fdr_controlsrt, default=”bhq”

The control method for False Discovery Rate (FDR). The options include: - “bhq” for Standard Benjamini-Hochberg procedure - “bhy” for Benjamini-Hochberg-Yekutieli procedure - “ebh” for e-BH procedure

reshaping_function<class ‘function’>, default=None

The reshaping function defined in Benjamini and Yekutieli[2].

offsetint, 0 or 1, optional

The offset to calculate knockoff threshold, offset = 1 is equivalent to knockoff+.

methodsrt, default=”quantile”

The method to compute the statistical measures. The options include: - “quantile” for p-values - “e-values” for e-values

statisticsrt, default=”lasso_cv”

The method to calculate knockoff test score.

cov_estimatorsrt, default=”ledoitwolf”

The method of empirical covariance matrix estimation.

joblib_versobeint, default=0

The verbosity level of joblib: if non zero, progress messages are printed. Above 50, the output is sent to stdout. The frequency of the messages increases with the verbosity level. If it more than 10, all iterations are reported.

n_bootstrapsint, default=25

The number of bootstrapping iterations.

n_jobsint, default=1

The number of workers for parallel processing.

adaptive_aggregationbool, default=False

Whether to apply the adaptive version of the quantile aggregation method as in Nicolai Meinshausen and Bühlmann[3].

gamma: float, default=0.5

The percentile value used for aggregation.

gamma_minfloat, default=0.05

The minimum percentile value used for aggregation.

verbosebool, default=False

Whether to return the corresponding p-values of the variables along with the list of selected variables.

memorystr or joblib.Memory object, default=None

Used to cache the output of the computation of the clustering and the inference. By default, no caching is done. If a string is given, it is the path to the caching directory.

random_stateint, default=None

Fixing the seeds of the random generator.

Returns:
selected1D array, int

The vector of index of selected variables.

aggregated_pval: 1D array, float

The vector of aggregated p-values.

pvals: 1D array, float

The vector of the corresponding p-values.

aggregated_eval: 1D array, float

The vector of aggregated e-values.

evals: 1D array, float

The vector of the corresponding e-values.

References