cinnamon.drift.model_drift_explainer.ModelDriftExplainer¶
- class cinnamon.drift.model_drift_explainer.ModelDriftExplainer(model, iteration_range: Optional[Tuple[int, int]] = None, task: Optional[str] = None)¶
Tool to study data drift between two datasets, in a context where “model” is used to make predictions.
Parameters¶
- modela XGBoost model (either XGBClassifier, XGBRegressor, XGBRanker, Booster)
The model used to make predictions.
- iteration_rangeTuple[int, int], optional (default=None)
Specifies which layer of trees are used. For example, if XGBoost is trained with 100 rounds, specifying iteration_range=(10, 20) then only the trees built during [10, 20) (half open set) iterations are used. If None, all trees are used.
- taskstring
Task corresponding to the (X, Y) data. Either “regression”, “classification”, or “ranking”. “task” must be provided if the model is treated as a black box predictor (no specific parser for the model).
Attributes¶
- predictions1numpy array
Array of predictions of “model” on X1 (for classification, corresponds to raw predictions).
- predictions2numpy array
Array of predictions of “model” on X2 (for classification, corresponds to raw predictions).
- pred_proba1numpy array
Array of predicted probabilities of “model” on X1 (equal to None if regression or ranking).
- pred_proba2numpy array
Array of predicted probabilities of “model” on X2 (equal to None if regression or ranking).
- iteration_rangetuple of integers
Layer of trees used.
- feature_driftslist of dict
Drift measures for each input feature in X.
- target_driftdict
Drift measures for the labels y.
- n_featuresint
Number of features in input X.
- feature_nameslist of string
Feature names for input X.
- class_nameslist of string
Class names of the target when task is “classification”. Otherwise equal to None.
- cat_feature_indiceslist of int
Indexes of categorical features in input X (not implemented yet: only numerical features are allowed currently).
- X1, X2pandas dataframes
X1 and X2 inputs passed to the “fit” method.
- y1, y2numpy arrays
y1 and y2 targets passed to the “fit” method.
- sample_weights1, sample_weights2numpy arrays
sample_weights1 and sample_weights2 arrays passed to the “fit” method.
- fit(X1: DataFrame, X2: DataFrame, y1: Optional[array] = None, y2: Optional[array] = None, sample_weights1: Optional[array] = None, sample_weights2: Optional[array] = None, cat_feature_indices: Optional[List[int]] = None)¶
Fit the model drift explainer to dataset 1 and dataset 2.
Parameters¶
- X1pandas dataframe of shape (n_samples, n_features)
Dataset 1 inputs.
- X2pandas dataframe of shape (n_samples, n_features)
Dataset 2 inputs.
- y1numpy array of shape (n_samples,), optional (default=None)
Dataset 1 labels. If None, data drift is only analyzed based on inputs X1 and X2
- y2numpy array of shape (n_samples,), optional (default=None)
Dataset 2 labels. If None, data drift is only analyzed based on inputs X1 and X2
- sample_weights1: numpy array of shape (n_samples,), optional (default=None)
Array of weights that are assigned to individual samples of dataset 1 If None, then each sample of dataset 1 is given unit weight.
- sample_weights2: numpy array of shape (n_samples,), optional (default=None)
Array of weights that are assigned to individual samples of dataset 2 If None, then each sample of dataset 2 is given unit weight.
cat_feature_indices: TODO
Returns¶
- ModelDriftExplainer
The fitted model drift explainer.
- get_performance_metrics_drift() PerformanceMetricsDrift¶
Compute performance metrics on dataset 1 and dataset 2.
Returns¶
Dictionary of performance metrics
- get_prediction_drift(prediction_type: str = 'raw') List[DriftMetricsNum]¶
Compute drift measures based on model predictions.
See the documentation in README for explanations about how it is computed, especially the slide presentation.
Parameters¶
- prediction_type: str, optional (default=”raw”)
Type of predictions to consider. Choose among: - “raw” : logit predictions (binary classification), log-softmax predictions (multiclass classification), regular predictions (regression) - “proba” : predicted probabilities (only for classification model) - “class”: predicted classes (only for classification model)
Returns¶
- prediction_driftlist of DriftMetricsNum object
Drift measures for each predicted dimension.
- get_tree_based_correction_weights(max_depth: Optional[int] = None, max_ratio: int = 10) array¶
Not recommended way to compute correction weights for data drift (only for research purpose). AdversarialDriftExplainer should be preferred for this purpose. The approach is to use similar ideas as in get_tree_based_drift_importances in order to estimate correction weights (but first experiments show it has bad performance).
Parameters¶
- max_depthint, optional (default=None)
Depth at which the ratio of node weights are computed If None, ratio are computed in terminal leaves
- max_ratio: int, optional (default=10)
Maximum ratio between two weights returned in correction_weights (weights are thresholded so that the ratio between two weights do not exceed max_ratio)
Returns¶
- correction_weightsnp.array
Array of correction weights for the samples of dataset 1
- get_tree_based_drift_importances(type: str = 'mean') array¶
Compute drift values using the tree structures present in the model.
See the documentation in README for explanations about how it is computed, especially the slide presentation.
Parameters¶
- type: str, optional (default=”node_size”)
Method used for drift values computation. Choose among: - “node_size” (recommended) - “mean” - “mean_norm”
See details in slide presentation.
Returns¶
drift_importances : numpy array