hidimstat.dcrt_zero#

hidimstat.dcrt_zero(X, y, estimated_coef=None, sigma_X=None, params_lasso_screening={'alpha': np.float64(0.029080693703091007), 'alpha_max_fraction': 0.5, 'alphas': None, 'cv': 5, 'fit_intercept': False, 'max_iter': 1000, 'n_alphas': 10, 'selection': 'cyclic', 'tol': 1e-06}, params_lasso_distillation_x=None, params_lasso_distillation_y=None, refit=False, screening=True, screening_threshold=0.1, statistic='residual', centered=True, n_jobs=1, joblib_verbose=0, fit_y=False, n_tree=100, problem_type='regression', random_state=2022)[source]#

Implements distilled conditional randomization test (dCRT) without interactions.

A faster version of the Conditional Randomization Test Candes et al.[1] using the distillation process from Liu et al.[2]. Based on original implementation at: moleibobliu/Distillation-CRT

Parameters:
Xarray-like of shape (n_samples, n_features)

Training data

yarray-like of shape (n_samples,)

Target values

estimated_coefarray-like of shape (n_features,), optional

Pre-computed feature coefficients

sigma_Xarray-like of shape (n_features, n_features), optional

Covariance matrix of X

params_lasso_screeningdict

Parameters for main Lasso estimation or crossvalidation Lasso, including: - alpha : float, optional - L1 regularization strength. If None, determined by CV. - n_alphas : int, default=0 - Number of alphas for cross-validation. - alphas : array-like, default=None - List of alpha values to try in CV. - alpha_max_fraction : float, default=0.5 - Scale factor for alpha_max. For other parameters see :py:func:LassoCV, here is some advise configuration - cv : int, default=5 - Number of cross-validation folds. - tol : float, default=1e-6 - Tolerance for optimization. - max_iter : int, default=1000 - Maximum iterations. - fit_intercept : bool, default=False - Whether to fit intercept. - selection : str, default=’cyclic’ - Feature selection method.

params_lasso_distillation_xdict, optional

Parameters for X distillation Lasso. Defaults to params_lasso_screening.

params_lasso_distillation_ydict, optional

Parameters for y distillation Lasso. Defaults to params_lasso_screening.

refitbool, default=False

Whether to refit on estimated support set

screeningbool, default=True

Whether to screen variables

screening_thresholdfloat, default=0.1

Threshold for variable screening (0-100)

statistic{‘residual’, ‘random_forest’}, default=’residual’

Learning method for outcome distillation

centeredbool, default=True

Whether to standardize features

n_jobsint, default=1

Number of parallel jobs

joblib_verboseint, default=0

Verbosity level

fit_ybool, default=False

Whether to fit y using selected features

n_treeint, default=100

Number of trees for random forest

problem_type{‘regression’, ‘classification’}, default=’regression’

Type of learning problem

random_stateint, default=2022

Random seed

Returns:
selection_featuresndarray of shape (n_features,)

Boolean mask of selected features

X_resndarray of shape (n_selected, n_samples)

Residuals after X distillation

sigma2ndarray of shape (n_selected,)

Estimated residual variances

y_resndarray of shape (n_selected, n_samples)

Response residuals

References

Examples using hidimstat.dcrt_zero#

Distilled Conditional Randomization Test (dCRT) using Lasso vs Random Forest learners

Distilled Conditional Randomization Test (dCRT) using Lasso vs Random Forest learners