`cinnamon.drift.model_drift_explainer`.ModelDriftExplainer¶

class cinnamon.drift.model_drift_explainer.ModelDriftExplainer(model, iteration_range: Optional[Tuple[int, int]] = None, task: Optional[str] = None)¶

Tool to study data drift between two datasets, in a context where “model” is used to make predictions.

Parameters¶

modela XGBoost model (either XGBClassifier, XGBRegressor, XGBRanker, Booster): The model used to make predictions.
iteration_rangeTuple[int, int], optional (default=None): Specifies which layer of trees are used. For example, if XGBoost is trained with 100 rounds, specifying iteration_range=(10, 20) then only the trees built during [10, 20) (half open set) iterations are used. If None, all trees are used.
taskstring: Task corresponding to the (X, Y) data. Either “regression”, “classification”, or “ranking”. “task” must be provided if the model is treated as a black box predictor (no specific parser for the model).

Attributes¶

predictions1numpy array: Array of predictions of “model” on X1 (for classification, corresponds to raw predictions).
predictions2numpy array: Array of predictions of “model” on X2 (for classification, corresponds to raw predictions).
pred_proba1numpy array: Array of predicted probabilities of “model” on X1 (equal to None if regression or ranking).
pred_proba2numpy array: Array of predicted probabilities of “model” on X2 (equal to None if regression or ranking).
iteration_rangetuple of integers: Layer of trees used.
feature_driftslist of dict: Drift measures for each input feature in X.
target_driftdict: Drift measures for the labels y.
n_featuresint: Number of features in input X.
feature_nameslist of string: Feature names for input X.
class_nameslist of string: Class names of the target when task is “classification”. Otherwise equal to None.
cat_feature_indiceslist of int: Indexes of categorical features in input X (not implemented yet: only numerical features are allowed currently).
X1, X2pandas dataframes: X1 and X2 inputs passed to the “fit” method.
y1, y2numpy arrays: y1 and y2 targets passed to the “fit” method.
sample_weights1, sample_weights2numpy arrays: sample_weights1 and sample_weights2 arrays passed to the “fit” method.

__init__(model, iteration_range: Optional[Tuple[int, int]] = None, task: Optional[str] = None)¶

fit(X1: DataFrame, X2: DataFrame, y1: Optional[array] = None, y2: Optional[array] = None, sample_weights1: Optional[array] = None, sample_weights2: Optional[array] = None, cat_feature_indices: Optional[List[int]] = None)¶

Fit the model drift explainer to dataset 1 and dataset 2.

Parameters¶

X1pandas dataframe of shape (n_samples, n_features): Dataset 1 inputs.
X2pandas dataframe of shape (n_samples, n_features): Dataset 2 inputs.
y1numpy array of shape (n_samples,), optional (default=None): Dataset 1 labels. If None, data drift is only analyzed based on inputs X1 and X2
y2numpy array of shape (n_samples,), optional (default=None): Dataset 2 labels. If None, data drift is only analyzed based on inputs X1 and X2
sample_weights1: numpy array of shape (n_samples,), optional (default=None): Array of weights that are assigned to individual samples of dataset 1 If None, then each sample of dataset 1 is given unit weight.
sample_weights2: numpy array of shape (n_samples,), optional (default=None): Array of weights that are assigned to individual samples of dataset 2 If None, then each sample of dataset 2 is given unit weight.

cat_feature_indices: TODO

Returns¶

ModelDriftExplainer: The fitted model drift explainer.

get_performance_metrics_drift() → PerformanceMetricsDrift¶: Compute performance metrics on dataset 1 and dataset 2.

Returns¶

Dictionary of performance metrics

get_prediction_drift(prediction_type: str = 'raw') → List[DriftMetricsNum]¶

Compute drift measures based on model predictions.

See the documentation in README for explanations about how it is computed, especially the slide presentation.

Parameters¶

prediction_type: str, optional (default=”raw”): Type of predictions to consider. Choose among: - “raw” : logit predictions (binary classification), log-softmax predictions (multiclass classification), regular predictions (regression) - “proba” : predicted probabilities (only for classification model) - “class”: predicted classes (only for classification model)

Returns¶

prediction_driftlist of DriftMetricsNum object: Drift measures for each predicted dimension.

get_tree_based_correction_weights(max_depth: Optional[int] = None, max_ratio: int = 10) → array¶

Not recommended way to compute correction weights for data drift (only for research purpose). AdversarialDriftExplainer should be preferred for this purpose. The approach is to use similar ideas as in get_tree_based_drift_importances in order to estimate correction weights (but first experiments show it has bad performance).

Parameters¶

max_depthint, optional (default=None): Depth at which the ratio of node weights are computed If None, ratio are computed in terminal leaves
max_ratio: int, optional (default=10): Maximum ratio between two weights returned in correction_weights (weights are thresholded so that the ratio between two weights do not exceed max_ratio)

Returns¶

correction_weightsnp.array: Array of correction weights for the samples of dataset 1

get_tree_based_drift_importances(type: str = 'mean') → array¶

Compute drift values using the tree structures present in the model.

See the documentation in README for explanations about how it is computed, especially the slide presentation.

Parameters¶

type: str, optional (default=”node_size”)

Method used for drift values computation. Choose among: - “node_size” (recommended) - “mean” - “mean_norm”

See details in slide presentation.

Returns¶

drift_importances : numpy array

cinnamon.drift.model_drift_explainer.ModelDriftExplainer¶

Parameters¶

Attributes¶

Parameters¶

Returns¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

`cinnamon.drift.model_drift_explainer`.ModelDriftExplainer¶