hidimstat.empirical_thresholding

hidimstat.empirical_thresholding(X, y, linear_estimator=GridSearchCV(estimator=LinearSVR(), param_grid={'C': array([1.e-07, 1.e-06, 1.e-05, 1.e-04, 1.e-03, 1.e-02, 1.e-01, 1.e+00, 1.e+01])}))

Perform empirical thresholding on the input data and target using a linear estimator.

This function fits a linear estimator to the input data and target, and then uses the estimated coefficients to perform empirical thresholding. The threshold is calculated for keeping only extreme coefficients. For more details, see the section 6.3.2 of []

Parameters:
Xndarray, shape (n_samples, n_features)

The input data.

yndarray, shape (n_samples,)

The target values.

linear_estimatorestimator object, optional (default=GridSearchCV(

LinearSVR(),param_grid={“C”: np.logspace(-7, 1, 9)}, n_jobs=None))

The linear estimator to use for thresholding. It should be a scikit-learn estimator object that implements the fit method and has a coef_ attribute or a best_estimator_ attribute with a coef_ attribute (e.g., a GridSearchCV object).

Returns:
beta_hatndarray, shape (n_features,)

The estimated coefficients of the linear estimator.

scalendarray, shape (n_features,)

The threshold values for each feature.

Raises:
ValueError

If the linear_estimator does not have a coef_ attribute or a best_estimator_ attribute with a coef_ attribute.

Notes

The threshold is calculated as the standard deviation of the estimated coefficients multiplied by the square root of the number of features. This is based on the assumption that the coefficients follow a normal distribution with mean zero.