hidimstat.dcrt_zero#
- hidimstat.dcrt_zero(X, y, estimated_coef=None, sigma_X=None, params_lasso_screening={'alpha': np.float64(0.029080693703091007), 'alpha_max_fraction': 0.5, 'alphas': None, 'cv': 5, 'fit_intercept': False, 'max_iter': 1000, 'n_alphas': 10, 'selection': 'cyclic', 'tol': 1e-06}, params_lasso_distillation_x=None, params_lasso_distillation_y=None, refit=False, screening=True, screening_threshold=0.1, statistic='residual', centered=True, n_jobs=1, joblib_verbose=0, fit_y=False, n_tree=100, problem_type='regression', random_state=2022)[source]#
Implements distilled conditional randomization test (dCRT) without interactions.
A faster version of the Conditional Randomization Test Candes et al.[1] using the distillation process from Liu et al.[2]. Based on original implementation at: moleibobliu/Distillation-CRT
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Training data
- yarray-like of shape (n_samples,)
Target values
- estimated_coefarray-like of shape (n_features,), optional
Pre-computed feature coefficients
- sigma_Xarray-like of shape (n_features, n_features), optional
Covariance matrix of X
- params_lasso_screeningdict
Parameters for main Lasso estimation or crossvalidation Lasso, including: - alpha : float, optional - L1 regularization strength. If None, determined by CV. - n_alphas : int, default=0 - Number of alphas for cross-validation. - alphas : array-like, default=None - List of alpha values to try in CV. - alpha_max_fraction : float, default=0.5 - Scale factor for alpha_max. For other parameters see :py:func:LassoCV, here is some advise configuration - cv : int, default=5 - Number of cross-validation folds. - tol : float, default=1e-6 - Tolerance for optimization. - max_iter : int, default=1000 - Maximum iterations. - fit_intercept : bool, default=False - Whether to fit intercept. - selection : str, default=’cyclic’ - Feature selection method.
- params_lasso_distillation_xdict, optional
Parameters for X distillation Lasso. Defaults to params_lasso_screening.
- params_lasso_distillation_ydict, optional
Parameters for y distillation Lasso. Defaults to params_lasso_screening.
- refitbool, default=False
Whether to refit on estimated support set
- screeningbool, default=True
Whether to screen variables
- screening_thresholdfloat, default=0.1
Threshold for variable screening (0-100)
- statistic{‘residual’, ‘random_forest’}, default=’residual’
Learning method for outcome distillation
- centeredbool, default=True
Whether to standardize features
- n_jobsint, default=1
Number of parallel jobs
- joblib_verboseint, default=0
Verbosity level
- fit_ybool, default=False
Whether to fit y using selected features
- n_treeint, default=100
Number of trees for random forest
- problem_type{‘regression’, ‘classification’}, default=’regression’
Type of learning problem
- random_stateint, default=2022
Random seed
- Returns:
- selection_featuresndarray of shape (n_features,)
Boolean mask of selected features
- X_resndarray of shape (n_selected, n_samples)
Residuals after X distillation
- sigma2ndarray of shape (n_selected,)
Estimated residual variances
- y_resndarray of shape (n_selected, n_samples)
Response residuals
References
Examples using hidimstat.dcrt_zero
#

Distilled Conditional Randomization Test (dCRT) using Lasso vs Random Forest learners