🧠 All Things AI
Advanced

SHAP & LIME

Most production AI models are black boxes: a set of inputs goes in, a prediction comes out, and the intermediate reasoning is not directly observable. Post-hoc explainability methods — applied after the model is trained — attempt to explain individual predictions without requiring access to or modification of the model architecture. SHAP and LIME are the two most widely used post-hoc methods and represent different philosophical approaches to the problem.

LIME — Local Interpretable Model-Agnostic Explanations

LIME (Ribeiro et al., 2016) generates an explanation for a single prediction by fitting a simple, interpretable model (typically linear regression) to locally sampled perturbations of the input. The intuition: even if the model is globally non-linear and complex, it can often be approximated by a linear function in the small neighbourhood around a specific input.

LIME algorithm

  1. Take the input instance x to be explained
  2. Generate a set of perturbed samples z₁, z₂, ... zₙ in the neighbourhood of x (e.g., randomly mask features or tokens)
  3. Query the black-box model f for predictions on each perturbed sample: f(z₁), f(z₂), ..., f(zₙ)
  4. Weight each sample by its proximity to x (closer samples get higher weight)
  5. Fit a sparse linear model g to the weighted samples: g is the explanation
  6. The coefficients of g represent feature importances for this specific prediction

LIME strengths

  • Model-agnostic — works on any classifier or regressor
  • Produces human-readable feature attribution (e.g., highlighted text spans, image superpixels)
  • Fast for a single instance explanation
  • Easy to implement and widely supported in Python (lime library)

LIME limitations

  • Instability: the random perturbation sampling can produce different explanations on repeated runs
  • Local fidelity does not guarantee global consistency
  • Neighbourhood definition (what counts as "close"?) is arbitrary and affects results
  • Does not satisfy mathematical consistency guarantees

SHAP — SHapley Additive exPlanations

SHAP (Lundberg & Lee, 2017) is grounded in cooperative game theory. It assigns each feature a contribution value (Shapley value) that represents the average marginal contribution of that feature across all possible subsets of features. SHAP values satisfy a set of desirable mathematical properties that LIME does not guarantee.

SHAP axioms

  • Efficiency: Feature contributions sum to the difference between the model output and the expected model output (f(x) − E[f(X)]). The full prediction is accounted for.
  • Symmetry: Two features that contribute equally to the model receive equal SHAP values, regardless of order
  • Dummy: A feature that has no influence on any prediction receives a SHAP value of 0
  • Linearity / Additivity: For a model that is a sum of two sub-models, the SHAP values are the sum of the SHAP values for each sub-model

Because computing exact Shapley values requires summing over all 2ⁿ feature subsets (exponential in the number of features), SHAP provides efficient approximations for specific model types:

SHAP variantModel typeSpeed
TreeSHAPTree ensembles (XGBoost, LightGBM, Random Forest)O(TLD²) — very fast
DeepSHAPDeep neural networksFast — uses backpropagation
LinearSHAPLinear models (logistic regression)Exact, very fast
KernelSHAPAny model (model-agnostic)Slow — uses LIME-style sampling

SHAP vs LIME: Practical Comparison

DimensionSHAPLIME
Mathematical guaranteesYes — efficiency, symmetry, dummy, additivityNo formal guarantees
ConsistencyDeterministic for tree/linear modelsStochastic — can vary between runs
Model supportFast for trees/linear; slow (KernelSHAP) for arbitrary modelsFully model-agnostic; similar speed everywhere
Global summariesYes — beeswarm, bar, and waterfall plots across full datasetPrimarily local; no native global view
Interaction effectsSHAP interaction values available for treesNot supported
Recommended forProduction explainability; regulatory audits; model debuggingQuick local explanations; NLP/text classification; prototyping

Explainability for Large Language Models

Applying SHAP and LIME to LLMs is significantly more complex than for tabular or image models. The "features" in text are tokens, but token-level attribution is often not meaningful to non-technical audiences. Key approaches:

  • Attention visualisation: Plots which input tokens the model attended to. Easy to implement but critiqued as not reliably representing causal importance — high attention does not guarantee causal contribution.
  • SHAP for text classifiers: Works well for classification heads on top of LLMs (e.g., sentiment, intent). Token masking + prediction comparison produces token attributions.
  • Gradient-based attribution: Integrated Gradients (Sundararajan et al.) computes attribution via the gradient of the output with respect to the input embeddings — theoretically grounded, faster than sampling-based methods.
  • Logit lens / probing: Analyses intermediate layer representations to understand what information is encoded at each layer of the transformer — more of a model debugging tool than a user-facing explanation.

Visualising Explanations for Non-Technical Audiences

SHAP visualisations (from shap library)

  • Waterfall plot: Shows how each feature pushes the prediction up or down from the base value — best for a single instance explanation
  • Beeswarm plot: Shows distribution of SHAP values for all features across the full dataset — best for global feature importance
  • Dependence plot: Shows how SHAP value for one feature varies with feature value — surfaces non-linear relationships

Communication to stakeholders

  • For regulators: "The three most influential factors in this credit decision were..."
  • For affected individuals: highlight which inputs led to a negative outcome and whether any were within their control
  • For developers: waterfall + dependence plots to debug unexpected feature usage
  • For executives: bar chart of global feature importances, not per-instance detail

Checklist: Do You Understand This?

  • What is the core idea behind LIME — what kind of model does it fit and to what data?
  • Why does SHAP use game-theoretic Shapley values rather than simple gradient-based attribution?
  • Name the four SHAP axioms and explain what the efficiency axiom means in practical terms.
  • When would you choose LIME over SHAP, and when would you choose SHAP over LIME?
  • What is the main criticism of attention visualisation as an explanation method for LLMs?
  • Which SHAP visualisation would you use to explain a single prediction to a regulator?