SHAP & LIME
Most production AI models are black boxes: a set of inputs goes in, a prediction comes out, and the intermediate reasoning is not directly observable. Post-hoc explainability methods — applied after the model is trained — attempt to explain individual predictions without requiring access to or modification of the model architecture. SHAP and LIME are the two most widely used post-hoc methods and represent different philosophical approaches to the problem.
LIME — Local Interpretable Model-Agnostic Explanations
LIME (Ribeiro et al., 2016) generates an explanation for a single prediction by fitting a simple, interpretable model (typically linear regression) to locally sampled perturbations of the input. The intuition: even if the model is globally non-linear and complex, it can often be approximated by a linear function in the small neighbourhood around a specific input.
LIME algorithm
- Take the input instance x to be explained
- Generate a set of perturbed samples z₁, z₂, ... zₙ in the neighbourhood of x (e.g., randomly mask features or tokens)
- Query the black-box model f for predictions on each perturbed sample: f(z₁), f(z₂), ..., f(zₙ)
- Weight each sample by its proximity to x (closer samples get higher weight)
- Fit a sparse linear model g to the weighted samples: g is the explanation
- The coefficients of g represent feature importances for this specific prediction
LIME strengths
- Model-agnostic — works on any classifier or regressor
- Produces human-readable feature attribution (e.g., highlighted text spans, image superpixels)
- Fast for a single instance explanation
- Easy to implement and widely supported in Python (lime library)
LIME limitations
- Instability: the random perturbation sampling can produce different explanations on repeated runs
- Local fidelity does not guarantee global consistency
- Neighbourhood definition (what counts as "close"?) is arbitrary and affects results
- Does not satisfy mathematical consistency guarantees
SHAP — SHapley Additive exPlanations
SHAP (Lundberg & Lee, 2017) is grounded in cooperative game theory. It assigns each feature a contribution value (Shapley value) that represents the average marginal contribution of that feature across all possible subsets of features. SHAP values satisfy a set of desirable mathematical properties that LIME does not guarantee.
SHAP axioms
- Efficiency: Feature contributions sum to the difference between the model output and the expected model output (f(x) − E[f(X)]). The full prediction is accounted for.
- Symmetry: Two features that contribute equally to the model receive equal SHAP values, regardless of order
- Dummy: A feature that has no influence on any prediction receives a SHAP value of 0
- Linearity / Additivity: For a model that is a sum of two sub-models, the SHAP values are the sum of the SHAP values for each sub-model
Because computing exact Shapley values requires summing over all 2ⁿ feature subsets (exponential in the number of features), SHAP provides efficient approximations for specific model types:
| SHAP variant | Model type | Speed |
|---|---|---|
| TreeSHAP | Tree ensembles (XGBoost, LightGBM, Random Forest) | O(TLD²) — very fast |
| DeepSHAP | Deep neural networks | Fast — uses backpropagation |
| LinearSHAP | Linear models (logistic regression) | Exact, very fast |
| KernelSHAP | Any model (model-agnostic) | Slow — uses LIME-style sampling |
SHAP vs LIME: Practical Comparison
| Dimension | SHAP | LIME |
|---|---|---|
| Mathematical guarantees | Yes — efficiency, symmetry, dummy, additivity | No formal guarantees |
| Consistency | Deterministic for tree/linear models | Stochastic — can vary between runs |
| Model support | Fast for trees/linear; slow (KernelSHAP) for arbitrary models | Fully model-agnostic; similar speed everywhere |
| Global summaries | Yes — beeswarm, bar, and waterfall plots across full dataset | Primarily local; no native global view |
| Interaction effects | SHAP interaction values available for trees | Not supported |
| Recommended for | Production explainability; regulatory audits; model debugging | Quick local explanations; NLP/text classification; prototyping |
Explainability for Large Language Models
Applying SHAP and LIME to LLMs is significantly more complex than for tabular or image models. The "features" in text are tokens, but token-level attribution is often not meaningful to non-technical audiences. Key approaches:
- Attention visualisation: Plots which input tokens the model attended to. Easy to implement but critiqued as not reliably representing causal importance — high attention does not guarantee causal contribution.
- SHAP for text classifiers: Works well for classification heads on top of LLMs (e.g., sentiment, intent). Token masking + prediction comparison produces token attributions.
- Gradient-based attribution: Integrated Gradients (Sundararajan et al.) computes attribution via the gradient of the output with respect to the input embeddings — theoretically grounded, faster than sampling-based methods.
- Logit lens / probing: Analyses intermediate layer representations to understand what information is encoded at each layer of the transformer — more of a model debugging tool than a user-facing explanation.
Visualising Explanations for Non-Technical Audiences
SHAP visualisations (from shap library)
- Waterfall plot: Shows how each feature pushes the prediction up or down from the base value — best for a single instance explanation
- Beeswarm plot: Shows distribution of SHAP values for all features across the full dataset — best for global feature importance
- Dependence plot: Shows how SHAP value for one feature varies with feature value — surfaces non-linear relationships
Communication to stakeholders
- For regulators: "The three most influential factors in this credit decision were..."
- For affected individuals: highlight which inputs led to a negative outcome and whether any were within their control
- For developers: waterfall + dependence plots to debug unexpected feature usage
- For executives: bar chart of global feature importances, not per-instance detail
Checklist: Do You Understand This?
- What is the core idea behind LIME — what kind of model does it fit and to what data?
- Why does SHAP use game-theoretic Shapley values rather than simple gradient-based attribution?
- Name the four SHAP axioms and explain what the efficiency axiom means in practical terms.
- When would you choose LIME over SHAP, and when would you choose SHAP over LIME?
- What is the main criticism of attention visualisation as an explanation method for LLMs?
- Which SHAP visualisation would you use to explain a single prediction to a regulator?