In this article, I compare model explainability techniques for feature interactions. In a surprising twist, two commonly used tools, SHAP and ALE, produce opposing results.
Probably, I should not have been surprised. After all, explainability tools measure specific responses in distinct ways. Interpretation requires understanding test methodologies, data characteristics, and problem context. Just because something is called an explainer doesn’t mean it generates an explanation, if you define an explanation as a human understanding how a model works.
This post focuses on explainability techniques for feature interactions. I use a common project dataset derived from real loans [1], and a typical mode type (a boosted tree model). Even in this everyday situation, explanations require thoughtful interpretation.
If methodology details are overlooked, explainability tools can impede understanding or even undermine efforts to ensure model fairness.
Below, I show disparate SHAP and ALE curves and demonstrate that the disagreement between the techniques arise from differences in the measured responses and feature perturbations performed by the tests. But first, I’ll introduce some concepts.
Feature interactions occur when two variables act in concert, resulting in an effect that is different from the sum of their individual contributions. For example, the impact of a poor night’s sleep on a test score would be greater the next day than a week later. In this case, a feature representing time would interact with, or modify, a sleep quality feature.
In a linear model, an interaction is expressed as the product of two features. Nonlinear machine learning models typically contain numerous interactions. In fact, interactions are fundamental to the logic of advanced machine learning models, yet many common explainability techniques focus on contributions of isolated features. Methods for examining interactions include 2-way ALE plots, Friedman’s H, partial dependence plots, and SHAP interaction values [2]. This blog explores…