cinnamon.drift.adversarial_drift_explainer.AdversarialDriftExplainer¶
- class cinnamon.drift.adversarial_drift_explainer.AdversarialDriftExplainer(n_splits: int = 2, feature_subset: Optional[List[str]] = None, seed: Optional[int] = None, verbosity: bool = True, max_depth: int = 6, learning_rate: float = 0.1, tree_method: str = 'auto')¶
Study data drift using a adversarial learning approach (i.e. training a classifier to discriminate between dataset 1 and dataset2). XGBClassifier is used as adversarial classifier.
Parameters¶
- n_splitsint (must be >= 2), optional (default=2)
Number of folds in the cross validation.
- feature_subsetList[Union[int, str]], optional (default=None)
Subset of features to consider in the training of the adversarial classifier.
- seedint, optional (default=None)
Random seed to set in order to get reproducible results.
- verbositybool, optional (default=True)
Whether to print training logs of adversarial classifiers or not.
- max_depthint, optional (default=6)
“max_depth” parameter passed to XGBClassifier for each cross-validation model.
- learning_ratefloat, optional (default=0.1)
“learning_rate” parameter passed to XGBClassifier for each cross-validation model.
- tree_methodstr, optional (default=”auto”)
“tree_method” parameter passed to XGBClassifier for each cross-validation model.
Attributes¶
- cv_adversarial_modelsList[XGBClassifier]
List of cross-validated XGBClassifier models.
- kf_splitsList[Tuple[np.array, np.array]]
List of the training-validation indexes (train_idx, val_idx) used in the cross-validation.
Note: In order to learn the adversarial classifier, X1 and X2 are concatenated on axis=0, and indexes are reset after the concatenation.
- feature_driftslist of dict
Drift measures for each input feature in X.
- target_driftdict
Drift measures for the labels y.
- taskstring
Task corresponding to the (X, Y) data. Either “regression”, “classification”, or “ranking”.
- n_featuresint
Number of features in input X.
- feature_nameslist of string
Feature names for input X.
- class_nameslist of string
Class names of the target when task is “classification”. Otherwise equal to None.
- cat_feature_indiceslist of int
Indexes of categorical features in input X.
- X1, X2pandas dataframes
X1 and X2 inputs passed to the “fit” method.
- y1, y2numpy arrays
y1 and y2 targets passed to the “fit” method
- sample_weights1, sample_weights2numpy arrays
sample_weights1 and sample_weights2 arrays passed to the “fit” method
- __init__(n_splits: int = 2, feature_subset: Optional[List[str]] = None, seed: Optional[int] = None, verbosity: bool = True, max_depth: int = 6, learning_rate: float = 0.1, tree_method: str = 'auto')¶
- fit(X1: DataFrame, X2: DataFrame, y1: Optional[array] = None, y2: Optional[array] = None, sample_weights1: Optional[array] = None, sample_weights2: Optional[array] = None, cat_feature_indices: Optional[List[int]] = None)¶
Fit the adversarial drift explainer to dataset 1 and dataset 2. Only X1, X2, sample_weights1 and sample_weights2 are used to build the adversarial drift explainer. y1 and y2 are only only used if call to get_target_drift method is made.
Parameters¶
- X1pandas dataframe of shape (n_samples, n_features)
Dataset 1 inputs.
- X2pandas dataframe of shape (n_samples, n_features)
Dataset 2 inputs.
- y1numpy array of shape (n_samples,), optional (default=None)
Dataset 1 labels. If None, data drift is only analyzed based on inputs X1 and X2.
- y2numpy array of shape (n_samples,), optional (default=None)
Dataset 2 labels If None, data drift is only analyzed based on inputs X1 and X2.
- sample_weights1: numpy array of shape (n_samples,), optional (default=None)
Array of weights that are assigned to individual samples of dataset 1 If None, then each sample of dataset 1 is given unit weight.
- sample_weights2: numpy array of shape (n_samples,), optional (default=None)
Array of weights that are assigned to individual samples of dataset 2 If None, then each sample of dataset 2 is given unit weight.
- cat_feature_indiceslist of int, optional (default = None)
Indexes of categorical features in input X.
Returns¶
- AdversarialDriftExplainer
The fitted adversarial drift explainer.
- get_adversarial_correction_weights(max_ratio: int = 10) array¶
Compute weights for dataset 1 samples in order to correct data drift (more specifically in order to correct covariate shift).
Given an adversarial classifier c: X -> [0, 1], the formula used to compute weights for a sample X_i is c(X_i) / (1 - c(X_i)) (cross-validation is used in order to compute weights for all dataset 1 samples).
See the documentation in README for explanations about how it is computed, especially the slide presentation.
Parameters¶
- max_ratio: int, optional (default=10)
Maximum ratio between two weights returned in correction_weights (weights are thresholded so that the ratio between two weights do not exceed max_ratio).
Returns¶
- correction_weightsnp.array
Array of correction weights for the samples of dataset 1.
- get_adversarial_drift_importances()¶
Compute drift importances using the adversarial method. Here the drift importances correspond to the means of the feature importance taken over the cross-validated adversarial classifiers.
See the documentation in README for explanations about how it is computed, especially the slide presentation.
Returns¶
drift_importances : numpy array