hidimstat.knockoff_aggregation¶
- hidimstat.knockoff_aggregation(X, y, centered=True, shrink=False, construct_method='equi', fdr=0.1, fdr_control='bhq', reshaping_function=None, offset=1, method='quantile', statistic='lasso_cv', cov_estimator='ledoit_wolf', joblib_verbose=0, n_bootstraps=25, n_jobs=1, adaptive_aggregation=False, gamma=0.5, gamma_min=0.05, verbose=False, memory=None, random_state=None)¶
This function implements the aggregation of multiple knockoffs introduced by Nguyen et al.[1]
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The input samples.
- yarray-like of shape (n_samples,),
The target values (class labels in classification, real numbers in regression).
- centeredbool, default=True
Whether to standardize the data before doing the inference procedure.
- shrinkbool, default=False
Whether to shrink the empirical covariance matrix.
- construct_methodstr, default=”equi”
The knockoff construction methods. The options include: - “equi” for equi-correlated knockoff - “sdp” for optimization scheme
- fdrfloat, default=0.1
The desired controlled FDR level
- fdr_controlsrt, default=”bhq”
The control method for False Discovery Rate (FDR). The options include: - “bhq” for Standard Benjamini-Hochberg procedure - “bhy” for Benjamini-Hochberg-Yekutieli procedure - “ebh” for e-BH procedure
- reshaping_function<class ‘function’>, default=None
The reshaping function defined in Benjamini and Yekutieli[2].
- offsetint, 0 or 1, optional
The offset to calculate knockoff threshold, offset = 1 is equivalent to knockoff+.
- methodsrt, default=”quantile”
The method to compute the statistical measures. The options include: - “quantile” for p-values - “e-values” for e-values
- statisticsrt, default=”lasso_cv”
The method to calculate knockoff test score.
- cov_estimatorsrt, default=”ledoitwolf”
The method of empirical covariance matrix estimation.
- joblib_versobeint, default=0
The verbosity level of joblib: if non zero, progress messages are printed. Above 50, the output is sent to stdout. The frequency of the messages increases with the verbosity level. If it more than 10, all iterations are reported.
- n_bootstrapsint, default=25
The number of bootstrapping iterations.
- n_jobsint, default=1
The number of workers for parallel processing.
- adaptive_aggregationbool, default=False
Whether to apply the adaptive version of the quantile aggregation method as in Nicolai Meinshausen and Bühlmann[3].
- gamma: float, default=0.5
The percentile value used for aggregation.
- gamma_minfloat, default=0.05
The minimum percentile value used for aggregation.
- verbosebool, default=False
Whether to return the corresponding p-values of the variables along with the list of selected variables.
- memorystr or joblib.Memory object, default=None
Used to cache the output of the computation of the clustering and the inference. By default, no caching is done. If a string is given, it is the path to the caching directory.
- random_stateint, default=None
Fixing the seeds of the random generator.
- Returns:
- selected1D array, int
The vector of index of selected variables.
- aggregated_pval: 1D array, float
The vector of aggregated p-values.
- pvals: 1D array, float
The vector of the corresponding p-values.
- aggregated_eval: 1D array, float
The vector of aggregated e-values.
- evals: 1D array, float
The vector of the corresponding e-values.
References