cinnamon.drift.model_drift_explainer.ModelDriftExplainer

class cinnamon.drift.model_drift_explainer.ModelDriftExplainer(model, iteration_range: Optional[Tuple[int, int]] = None, task: Optional[str] = None)

Study data drift through the lens of a ML model or ML pipeline.

Parameters

modela ML model or ML pipeline (see “Supported Model” section).

The model used to make predictions.

iteration_rangeTuple[int, int], optional (default=None)

Only for tree based models. Specifies which layer of trees are used. For example, if XGBoost is trained with 100 rounds, with iteration_range=(10, 20) then only the trees built during [10, 20) iterations are used. If None, all trees are used.

taskstring, optional (default=None)

Task corresponding to the (X, Y) data. Either “regression”, “classification”, or “ranking”. “task” is a mandatory parameter if the model is treated as a black box predictor.

Attributes

predictions1numpy array

Array of predictions of “model” on X1 dataset. For classification, corresponds to raw (logit of log-softmax) predictions.

predictions2numpy array

Array of predictions of “model” on X2 dataset. For classification, corresponds to raw (logit of log-softmax) predictions.

pred_proba1numpy array

Array of predicted probabilities of “model” on X1 (equal to None if task is regression or ranking).

pred_proba2numpy array

Array of predicted probabilities of “model” on X2 (equal to None if task is regression or ranking).

iteration_rangetuple of integers

Layer of trees used.

feature_driftslist of Union[DriftMetricsCat, DriftMetricsNum]

Drift measures for each input feature in X.

target_driftUnion[DriftMetricsCat, DriftMetricsNum]

Drift measures for the labels y.

n_featuresint

Number of features in input X.

feature_nameslist of string

Feature names for input X.

class_nameslist of string

Class names of the target when task is “classification”. Otherwise equal to None.

cat_feature_indiceslist of int

Indexes of categorical features in input X.

X1, X2pandas dataframes

X1 and X2 inputs passed to the “fit” method.

y1, y2numpy arrays

y1 and y2 targets passed to the “fit” method.

sample_weights1, sample_weights2numpy arrays

sample_weights1 and sample_weights2 arrays passed to the “fit” method.

__init__(model, iteration_range: Optional[Tuple[int, int]] = None, task: Optional[str] = None)
fit(X1: DataFrame, X2: DataFrame, y1: Optional[array] = None, y2: Optional[array] = None, sample_weights1: Optional[array] = None, sample_weights2: Optional[array] = None, cat_feature_indices: Optional[List[int]] = None)

Fit the model drift explainer to dataset 1 and dataset 2.

Parameters

X1pandas dataframe of shape (n_samples, n_features)

Dataset 1 inputs.

X2pandas dataframe of shape (n_samples, n_features)

Dataset 2 inputs.

y1numpy array of shape (n_samples,), optional (default=None)

Dataset 1 labels. If None, data drift is only analyzed based on inputs X1 and X2

y2numpy array of shape (n_samples,), optional (default=None)

Dataset 2 labels. If None, data drift is only analyzed based on inputs X1 and X2

sample_weights1: numpy array of shape (n_samples,), optional (default=None)

Array of weights that are assigned to individual samples of dataset 1 If None, then each sample of dataset 1 is given unit weight.

sample_weights2: numpy array of shape (n_samples,), optional (default=None)

Array of weights that are assigned to individual samples of dataset 2 If None, then each sample of dataset 2 is given unit weight.

cat_feature_indices: list of int Indexes of categorical features in input X.

Returns

ModelDriftExplainer

The fitted model drift explainer.

get_model_agnostic_drift_importances(type: str = 'mean', prediction_type: str = 'raw', max_ratio: float = 10, max_n_cat: int = 20) array

Compute drift importances using the model agnostic method.

See the documentation in README for explanations about how it is computed, especially the slide presentation.

Parameters

type: str, optional (default=”mean”)

Method used for drift importances computation. Choose among: - “mean” - “wasserstein”

See details in slide presentation.

prediction_type: str, optional (default=”raw”)

Choose among: - “raw” - “proba”: predicted probability if task == ‘classification’ - “class”: predicted class if task == ‘classification’

max_ratio: int, optional (default=10)

Only used for categorical features

max_n_cat: int, optional (default=20)

Only used for categorical features

Returns

drift_importances : numpy array

get_performance_metrics_drift() PerformanceMetricsDrift

Compute performance metrics on dataset 1 and dataset 2.

Returns

performance_metrics_drift: PerformanceMetricsDrift object

Comparison of either RegressionMetrics or ClassificaionMetrics objects.

get_prediction_drift(prediction_type: str = 'raw') List[DriftMetricsNum]

Compute drift measures based on model predictions.

Parameters

prediction_type: str, optional (default=”raw”)

Type of predictions to consider. Choose among: - “raw” : logit predictions (binary classification), log-softmax predictions (multiclass classification), regular predictions (regression) - “proba” : predicted probabilities (only for classification model) - “class”: predicted classes (only for classification model)

Returns

prediction_driftlist of DriftMetricsNum or DriftMetricsCat objects

Drift measures for each predicted dimension.

get_tree_based_correction_weights(max_depth: Optional[int] = None, max_ratio: int = 10) array

Not recommended way to compute correction weights for data drift (only for research purpose). AdversarialDriftExplainer should be preferred for this purpose. The approach is to use similar ideas as in get_tree_based_drift_importances in order to estimate correction weights (but first experiments show it has bad performance).

Parameters

max_depthint, optional (default=None)

Depth at which the ratio of node weights are computed If None, ratio are computed in terminal leaves

max_ratio: int, optional (default=10)

Maximum ratio between two weights returned in correction_weights (weights are thresholded so that the ratio between two weights do not exceed max_ratio)

Returns

correction_weightsnp.array

Array of correction weights for the samples of dataset 1

get_tree_based_drift_importances(type: str = 'mean') array

Compute drift importances using the tree structure of the model.

See the documentation in README for explanations about how it is computed, especially the slide presentation.

Parameters

type: str, optional (default=”mean”)

Method used for drift importances computation. Choose among: - “node_size” - “mean” - “mean_norm”

See details in slide presentation.

Returns

drift_importances : numpy array