09. June 2024

Integrated gradients

Interpreting black box of deep neural networks

Introduction

In the realm of artificial intelligence and machine learning, interpretability has become an essential aspect, especially for deep learning models, which often operate as black boxes. Integrated Gradients is one of the prominent methods developed to address the need for interpretability.

What are Integrated Gradients?

Integrated gradients is a technique for attributing the prediction of a neural network to its input features. It is designed to be used with differentiable models, particularly neural networks, to understand how each input feature contributes to the output prediction. The method was proposed by Mukund Sundararajan, Ankur Taly and Qiqi Yan in their 2017 paper, "Axiomatic Attribution for Deep Networks".

The Concept Behind Integrated Gradients

Integrated gradients aims to attribute a model's prediction to its input features in a way that satisfies certain axioms. These axioms include:

✔ Sensitivity

If an input feature changes and this change alters the prediction, the attribution should reflect this change.

✔ Implementation Invariance

Two models that are functionally equivalent should produce the same attributions for the same input.

To achieve these goals, integrated gradients computes the integral of gradients of the model’s output with respect to the input along a straight path from a baseline input to the actual input.

Methodology

The methodology of integrated gradients can be broken down into the following steps:

  1. Baseline Selection

    The choice of baseline input is crucial for the effectiveness of integrated gradients. The baseline is typically a neutral input such as a zero vector or an input where all features are set to their mean or median values. Selecting an appropriate baseline can significantly influence the resulting attributions, making it an important consideration in the application of this method.

  2. Path Integration

    Consider a straight-line path from the baseline to the input. Mathematically, this can be represented as:
    \[ x' = \alpha x + (1 - \alpha) x_{\text{baseline}}, \alpha \in [0, 1] \] where \(x\) is the actual input and \(x_{\text{baseline}}\) is the baseline input.

  3. Gradient Computation

    For each point along the path defined by the above equation, the gradient of the model's output with respect to the input is computed. These gradients provide insights into how sensitive the model's output is to changes in each input feature at different points along the path.

  4. Integral Calculation

    Integrate these gradients over the interval from 0 to 1. The integrated gradient for each input feature \(i\) is given by:
    \[ IG_i(x) = (x_i - x_{\text{baseline},i}) \int_{0}^{1} \frac{\partial F(x'(\alpha))}{\partial x_i} d\alpha \] where \(F\) is the model's output function. This integral essentially accumulates the contributions of each input feature to the model's output along the specified path.

Computational Considerations

Practical Implementation

✔ Libraries and Tools

Several machine learning libraries provide implementations of integrated gradients, making it easier to apply this technique in practice. For example:

✔ Use Cases

Integrated gradients can be used in various scenarios, such as:

By integrating these specific technique elements, you can leverage integrated gradients to enhance the interpretability of deep learning models effectively.

Advantages and Limitations

✔ Advantages

✔ Limitations:

References

  1. (Article) Axiomatic Attribution for Deep Networks, Mukund Sundararajan, Ankur Taly, Qiqi Yan | Website
  2. (Article) Enhanced Integrated Gradients: improving interpretability of deep learning models using splicing codes as a case study, Jha, A., K. Aicher, J., R. Gazzara | Website
  3. (Article) Guided Integrated Gradients: an Adaptive Path Method for Removing Noise, A. Kapishnikov, S. Venugopalan, B. Avci, B. Wedin, M. Terry and T. Bolukbasi | Website
  4. (Article) A Rigorous Study of Integrated Gradients Method and Extensions to Internal Neuron Attributions, Daniel Lundstrom, Tianjian Huang, Meisam Razaviyayn | Website