4.2. Permutation feature importance
- 4.2.1. Relation to impurity-based importance in trees
- 4.2.2. Strongly correlated features

4.2. Permutation feature importance

Permutation feature importance is a model inspection technique that can be usedfor any fitted estimator when the data is rectangular. This isespecially useful for non-linear or opaque estimators. The permutationfeature importance is defined to be the decrease in a model score when a singlefeature value is randomly shuffled 1. This procedure breaks the relationshipbetween the feature and the target, thus the drop in the model score isindicative of how much the model depends on the feature. This techniquebenefits from being model agnostic and can be calculated many times withdifferent permutations of the feature.

The permutation_importance function calculates the feature importanceof estimators for a given dataset. The n_repeats parameter sets thenumber of times a feature is randomly shuffled and returns a sample of featureimportances. Permutation importances can either be computed on the training setor an held-out testing or validation set. Using a held-out set makes itpossible to highlight which features contribute the most to the generalizationpower of the inspected model. Features that are important on the training setbut not on the held-out set might cause the model to overfit.

Note that features that are deemed non-important for some model with alow predictive performance could be highly predictive for a model thatgeneralizes better. The conclusions should always be drawn in the context ofthe specific model under inspection and cannot be automatically generalized tothe intrinsic predictive value of the features by them-selves. Therefore it isalways important to evaluate the predictive power of a model using a held-outset (or better with cross-validation) prior to computing importances.

4.2.1. Relation to impurity-based importance in trees

Tree based models provides a different measure of feature importances basedon the mean decrease in impurity (MDI, the splitting criterion). This givesimportance to features that may not be predictive on unseen data. Thepermutation feature importance avoids this issue, since it can be applied tounseen data. Furthermore, impurity-based feature importance for treesare strongly biased and favor high cardinality features(typically numerical features). Permutation-based feature importances do notexhibit such a bias. Additionally, the permutation feature importance may usean arbitrary metric on the tree’s predictions. These two methods of obtainingfeature importance are explored in:Permutation Importance vs Random Forest Feature Importance (MDI).

4.2.2. Strongly correlated features

When two features are correlated and one of the features is permuted, the modelwill still have access to the feature through its correlated feature. This willresult in a lower importance for both features, where they might actually beimportant. One way to handle this is to cluster features that are correlatedand only keep one feature from each cluster. This use case is explored in:Permutation Importance with Multicollinear or Correlated Features.

Examples:

References:

1
L. Breiman, “Random Forests”, Machine Learning, 45(1), 5-32,2001. https://doi.org/10.1023/A:1010933404324