cinnamon.drift.adversarial_drift_explainer.AdversarialDriftExplainer

class cinnamon.drift.adversarial_drift_explainer.AdversarialDriftExplainer(n_splits: int = 2, feature_subset: Optional[List[str]] = None, seed: Optional[int] = None, verbosity: bool = True, max_depth: int = 6, learning_rate: float = 0.1, tree_method: str = 'auto')

Study data drift using a adversarial learning approach (i.e. training a classifier to discriminate between dataset 1 and dataset2). XGBClassifier is used as adversarial classifier.

Parameters

n_splitsint (must be >= 2), optional (default=2)

Number of folds in the cross validation.

feature_subsetList[Union[int, str]], optional (default=None)

Subset of features to consider in the training of the adversarial classifier.

seedint, optional (default=None)

Random seed to set in order to get reproducible results.

verbositybool, optional (default=True)

Whether to print training logs of adversarial classifiers or not.

max_depthint, optional (default=6)

“max_depth” parameter passed to XGBClassifier for each cross-validation model.

learning_ratefloat, optional (default=0.1)

“learning_rate” parameter passed to XGBClassifier for each cross-validation model.

tree_methodstr, optional (default=”auto”)

“tree_method” parameter passed to XGBClassifier for each cross-validation model.

Attributes

cv_adversarial_modelsList[XGBClassifier]

List of cross-validated XGBClassifier models.

kf_splitsList[Tuple[np.array, np.array]]

List of the training-validation indexes (train_idx, val_idx) used in the cross-validation.

Note: In order to learn the adversarial classifier, X1 and X2 are concatenated on axis=0, and indexes are reset after the concatenation.

feature_driftslist of dict

Drift measures for each input feature in X.

target_driftdict

Drift measures for the labels y.

taskstring

Task corresponding to the (X, Y) data. Either “regression”, “classification”, or “ranking”.

n_featuresint

Number of features in input X.

feature_nameslist of string

Feature names for input X.

class_nameslist of string

Class names of the target when task is “classification”. Otherwise equal to None.

cat_feature_indiceslist of int

Indexes of categorical features in input X.

X1, X2pandas dataframes

X1 and X2 inputs passed to the “fit” method.

y1, y2numpy arrays

y1 and y2 targets passed to the “fit” method

sample_weights1, sample_weights2numpy arrays

sample_weights1 and sample_weights2 arrays passed to the “fit” method

__init__(n_splits: int = 2, feature_subset: Optional[List[str]] = None, seed: Optional[int] = None, verbosity: bool = True, max_depth: int = 6, learning_rate: float = 0.1, tree_method: str = 'auto')
fit(X1: DataFrame, X2: DataFrame, y1: Optional[array] = None, y2: Optional[array] = None, sample_weights1: Optional[array] = None, sample_weights2: Optional[array] = None, cat_feature_indices: Optional[List[int]] = None)

Fit the adversarial drift explainer to dataset 1 and dataset 2. Only X1, X2, sample_weights1 and sample_weights2 are used to build the adversarial drift explainer. y1 and y2 are only only used if call to get_target_drift method is made.

Parameters

X1pandas dataframe of shape (n_samples, n_features)

Dataset 1 inputs.

X2pandas dataframe of shape (n_samples, n_features)

Dataset 2 inputs.

y1numpy array of shape (n_samples,), optional (default=None)

Dataset 1 labels. If None, data drift is only analyzed based on inputs X1 and X2.

y2numpy array of shape (n_samples,), optional (default=None)

Dataset 2 labels If None, data drift is only analyzed based on inputs X1 and X2.

sample_weights1: numpy array of shape (n_samples,), optional (default=None)

Array of weights that are assigned to individual samples of dataset 1 If None, then each sample of dataset 1 is given unit weight.

sample_weights2: numpy array of shape (n_samples,), optional (default=None)

Array of weights that are assigned to individual samples of dataset 2 If None, then each sample of dataset 2 is given unit weight.

cat_feature_indiceslist of int, optional (default = None)

Indexes of categorical features in input X.

Returns

AdversarialDriftExplainer

The fitted adversarial drift explainer.

get_adversarial_correction_weights(max_ratio: int = 10) array

Compute weights for dataset 1 samples in order to correct data drift (more specifically in order to correct covariate shift).

Given an adversarial classifier c: X -> [0, 1], the formula used to compute weights for a sample X_i is c(X_i) / (1 - c(X_i)) (cross-validation is used in order to compute weights for all dataset 1 samples).

See the documentation in README for explanations about how it is computed, especially the slide presentation.

Parameters

max_ratio: int, optional (default=10)

Maximum ratio between two weights returned in correction_weights (weights are thresholded so that the ratio between two weights do not exceed max_ratio).

Returns

correction_weightsnp.array

Array of correction weights for the samples of dataset 1.

get_adversarial_drift_importances()

Compute drift importances using the adversarial method. Here the drift importances correspond to the means of the feature importance taken over the cross-validated adversarial classifiers.

See the documentation in README for explanations about how it is computed, especially the slide presentation.

Returns

drift_importances : numpy array