Skip to content

Sobol sensitivity analysis

What it is

A global sensitivity analysis run quarterly on each model × region combination, using SALib (the Saltelli sampler) to identify which input parameters dominate the variance of the per-query estimate. Results are published in the methodology repo and inform where to focus future calibration effort.

Method

We run two flavours, in order of computational cost:

  1. Morris elementary effects (cheap; weekly): coarse ranking of which inputs matter
  2. Sobol indices (expensive; quarterly): variance decomposition with first-order and total-order indices

Sobol's first-order index S_i for input i is the share of output variance attributable to varying input i alone, holding other inputs at their distribution. The total-order index ST_i additionally captures interaction effects involving input i.

Inputs we vary

For tier-2 estimates, the input parameters in the sensitivity sweep are:

Input Distribution Notes
GPU energy per token Log-normal, GSD from pedigree Dominant for most workloads
Host CPU/DRAM share Log-normal Material at low batch sizes
PUE Log-normal, GSD 1.05 Modest contribution
Embodied amortisation Log-normal, GSD 1.50 (high uncertainty) Material at long context
Grid intensity Log-normal, GSD 1.10 (live data) Material in high-carbon regions
Batch size estimate Log-normal Less material; usually well-estimated

Typical results

For Mistral Medium 3 on Scaleway PAR-1, Q1 2026:

Input First-order S_i Total-order ST_i
GPU energy per token 0.52 0.61
Grid intensity 0.18 0.21
Embodied amortisation 0.12 0.18
Host CPU/DRAM share 0.06 0.09
PUE 0.05 0.07
Batch size 0.04 0.05
(interactions) (residual)

GPU energy per token dominates, as expected. Grid intensity is the second-largest contributor; for atNorth regions (lower variance grid), it drops below embodied amortisation. The interaction term is small (≈ 5%); inputs are largely independent in their effects.

How we use the results

  • Calibration prioritisation. The dominant inputs become the targets for the next round of calibration data collection. If GPU energy per token dominates, we run more tier-3 measurements at varied batch sizes to reduce its uncertainty.
  • Auditor communication. The Sobol decomposition is part of the annual evidence pack; it lets the auditor see which inputs the methodology is most sensitive to and where the methodology team is investing.
  • Customer guidance. For customers under ISAE 3000 review, knowing which inputs dominate lets them anticipate which methodology changes would affect their reported aggregates.

Publication

Sobol results for the prior quarter are published as part of the quarterly methodology changelog at docs.vettedinference.com/changelog/. The full notebooks are in methodology/sensitivity/.

Where this is implemented

methodology/sensitivity/sobol.py

Citations

  • Saltelli, A., et al. (2010). Variance based sensitivity analysis of model output. Design and estimator for the total sensitivity index. Computer Physics Communications 181(2).
  • Herman, J., & Usher, W. (2017). SALib: An open-source Python library for Sensitivity Analysis. Journal of Open Source Software 2(9).