hidimstat.dcrt_pvalue#

hidimstat.dcrt_pvalue(selection_features, X_res, sigma2, y_res, fdr=0.1, fdr_control='bhq', reshaping_function=None, scaled_statistics=False)[source]#

Calculate p-values and identify significant features using the dCRT test statistics.

This function processes the results from dCRT to identify statistically significant features while controlling for false discoveries. It assumes test statistics follow a Gaussian distribution.

Parameters:
selection_featuresndarray of shape (n_features,)

Boolean mask indicating which features were selected for testing

X_resndarray of shape (n_selected, n_samples)

Residuals from feature distillation

sigma2ndarray of shape (n_selected,)

Estimated residual variances for each tested feature

y_resndarray of shape (n_selected, n_samples)

Response residuals for each tested feature

fdrfloat, default=0.1

Target false discovery rate level (0 < fdr < 1)

fdr_control{‘bhq’, ‘bhy’, ‘ebh’}, default=’bhq’

Method for FDR control: - ‘bhq’: Benjamini-Hochberg procedure - ‘bhy’: Benjamini-Hochberg-Yekutieli procedure - ‘ebh’: e-BH procedure

reshaping_functioncallable, optional

Reshaping function for the ‘bhy’ method

scaled_statisticsbool, default=False

Whether to standardize test statistics before computing p-values

Returns:
selected_variablesndarray

Indices of features deemed significant

pvalsndarray of shape (n_features,)

P-values for all features (including unselected ones)

tsndarray of shape (n_features,)

test statistics following a standard normal distribution for all features

Notes

The function computes test statistics as correlations between residuals, optionally scales them, and converts to p-values using a Gaussian null. Multiple testing correction is applied to control FDR at the specified level.

Examples using hidimstat.dcrt_pvalue#

Distilled Conditional Randomization Test (dCRT) using Lasso vs Random Forest learners

Distilled Conditional Randomization Test (dCRT) using Lasso vs Random Forest learners