hidimstat.ensemble_clustered_inference¶
- hidimstat.ensemble_clustered_inference(X_init, y, ward, n_clusters, train_size=0.3, groups=None, inference_method='desparsified-lasso', seed=0, ensembling_method='quantiles', gamma_min=0.2, n_bootstraps=25, n_jobs=1, memory=None, verbose=1, **kwargs)¶
Ensemble clustered inference algorithm
- Parameters:
- X_initndarray, shape (n_samples, n_features)
Original data (uncompressed).
- yndarray, shape (n_samples,) or (n_samples, n_times)
Target.
- wardsklearn.cluster.FeatureAgglomeration
Scikit-learn object that computes Ward hierarchical clustering.
- n_clustersint
Number of clusters used for the compression.
- train_sizefloat, optional (default=0.3)
Fraction of samples used to compute the clustering. If train_size = 1, clustering is not random since all the samples are used to compute the clustering.
- groupsndarray, shape (n_samples,), optional (default=None)
Group labels for every sample. If not None, groups is used to build the subsamples that serve for computing the clustering.
- inference_methodstr, optional (default=’desparsified-lasso’)
Method used for making the inference. Currently, the two methods available are ‘desparsified-lasso’ and ‘group-desparsified-lasso’. Use ‘desparsified-lasso’ for non-temporal data and ‘group-desparsified-lasso’ for temporal data.
- seed: int, optional (default=0)
Seed used for generating a the first random subsample of the data. This seed controls the clustering randomness.
- ensembling_methodstr, optional (default=’quantiles’)
Method used for making the ensembling. Currently, the two methods available are ‘quantiles’ and ‘median’.
- gamma_minfloat, optional (default=0.2)
Lowest gamma-quantile being considered to compute the adaptive quantile aggregation formula. This parameter is considered only if ensembling_method is ‘quantiles’.
- n_bootstrapsint, optional (default=25)
Number of clustered inference algorithm solutions to compute before making the ensembling.
- n_jobsint or None, optional (default=1)
Number of CPUs used to compute several clustered inference algorithms at the same time.
- memorystr, optional (default=None)
Used to cache the output of the computation of the clustering and the inference. By default, no caching is done. If a string is given, it is the path to the caching directory.
- verbose: int, optional (default=1)
The verbosity level. If verbose > 0, we print a message before runing the clustered inference.
- **kwargs:
Arguments passed to the statistical inference function.
- Returns:
- beta_hatndarray, shape (n_features,) or (n_features, n_times)
Estimated parameter vector or matrix.
- pvalndarray, shape (n_features,)
p-value, with numerically accurate values for positive effects (ie., for p-value close to zero).
- pval_corrndarray, shape (n_features,)
p-value corrected for multiple testing.
- one_minus_pvalndarray, shape (n_features,)
One minus the p-value, with numerically accurate values for negative effects (ie., for p-value close to one).
- one_minus_pval_corrndarray, shape (n_features,)
One minus the p-value corrected for multiple testing.
References
[1]Chevalier, J. A., Nguyen, T. B., Thirion, B., & Salmon, J. (2021). Spatially relaxed inference on high-dimensional linear models. arXiv preprint arXiv:2106.02590.