Skip to content

Methodology

The Vetted Inference per-query environmental accounting methodology, version-pinned to the receipt that cites it.

Source repo (AGPL-3.0) Methodology paper PDF

The four-tier ladder

We compute four estimates per query at four levels of evidential strength. The receipt declares which tier was returned and the 90% confidence interval at that tier.

Tier Method Inputs Typical 90% CI Best for
1 Proxy Public benchmarks ±60% Unsupported models, sketch estimates
2 Parametric BoaviztAPI + ecoinvent + ADEME ±25–35% Default; CSRD reporting
3 Telemetry NVML + vLLM + live grid ±10–18% Enterprise, calibration
4 Audited ISAE 3000 limited assurance tier-3 + opinion Regulatory disclosure

Uncertainty propagation

Uncertainty propagates through the calculation graph using:

Three indicators, two boundaries

We report three indicators (climate, water, resource depletion) following the Mistral × Carbone 4 × ADEME LCA convention and AFNOR Frugal AI methodology. We report two boundaries (narrow accelerator-only, comprehensive accelerator+host+idle+PUE) following Google Gemini 2025 disclosure conventions.

The default indicator is climate (gCO₂e); the default boundary is comprehensive. Both can be requested via extra_body.vetted.boundary.

Fresh versus live

The methodology makes a strict distinction between fresh and live regional grid evidence.

  • Live means the source observation falls within the strict +/-15 minute receipt-matching window.
  • Fresh means the signal is recent enough to remain operationally useful, but it does not meet that strict live rule.
  • Fallback means the receipt had to use a lower-fidelity source class such as prior-week or annual factors.

That distinction matters more than marketing tone. A green operational status page does not mean every region is live, and a useful near-time region is not automatically strict live. Current region-bucket posture is published in the carbon intensity sources page and should be read alongside the receipt provenance fields:

  • grid_intensity_observed_at
  • grid_intensity_requested_at
  • grid_intensity_age_minutes
  • grid_temporal_match

Current calibration status

Methodology v0.2.0 work is now grounded in real telemetry and legitimate truth-bearing observations, but the evidence is not all of one kind.

What is calibrated now:

  • a real calibration dataset assembled from production audit-ledger snapshots plus legitimate truth overlays
  • a held-out conformal artifact generated in stratified cell mode
  • a real hierarchical fit on the current dataset
  • a real Sobol sensitivity report on the current calibrated evidence base

What is still shadow-only:

  • the strongest truth-bearing calibration cells today come from self-hosted or production-like shadow domains
  • those cells are valid for methodology calibration and explanatory power
  • they are not equivalent to exact hosted provider truth for a specific hosted production cell
  • the current reference-cell transferability report does not show strong or moderate transfer support from the calibrated shadow cells into the hosted FR H100 cells
  • that means the reference-cell lane is already scientifically useful, but it is not yet a substitute for hosted exact-cell closure

What exact hosted cells remain blocked:

  • the current hosted FR H100 cells are still truth-empty at exact-cell level
  • the lead blocked cell is mistral-medium-3|nvidia_h100_sxm|scaleway_par2_fr
  • that cell now has enough hosted receipt corpus to be useful immediately once a qualifying provider artifact appears, but it does not yet have legitimate provider_published_kwh
  • that same lead cell is still not fully join-ready today because provider-request-id coverage is incomplete in the current hosted validation report

This is deliberate. The methodology pages should distinguish:

  • calibrated now
  • shadow-only today
  • exact hosted-cell blocked

instead of collapsing them into one maturity bucket.

Worked example: 400 tokens, four ways

We publish a complete worked example reproducing the same prompt through all four tiers, with intermediate calculations, BoaviztAPI request payloads, NVML samples, and conformal-calibration fits, at github.com/vetted-inference/methodology/examples/2026-03-17-400-tokens. The full write-up is published as a Journal entry on the marketing site.

Headline result for that example:

Tier gCO₂e median 90% CI Boundary
1 (proxy) 1.62 0.85–3.10 comprehensive
2 (parametric) 1.07 0.71–1.58 comprehensive
3 (telemetry, region-adjusted) 0.97 0.83–1.11 comprehensive

The agreement is by construction (the calibration that produces the conformal interval forces it). What matters is that the methodology surfaces the conditions under which it would not agree.

License

The methodology code is AGPL-3.0. The methodology document is CC-BY-4.0. Closed-source forks of the methodology code are grounds for Foundation veto-share intervention per Articles § 11(d).

References

These pages reference, throughout:

  • ISO 14040:2006 / 14044:2006 — LCA principles and framework
  • ISO/IEC 21031:2024 — Software Carbon Intensity (SCI)
  • ISO/IEC 42001:2023 — AI Management Systems
  • ESRS E1 / E3 / E5 (post-Omnibus simplified ESRS, mid-2026)
  • ecoinvent v3.10 (Zürich, 2024)
  • ADEME Base Empreinte 2024 (Paris)
  • JRC NEEFE 2024 (Joint Research Centre)
  • IPCC AR6 GWP100 (2021)
  • Weidema & Wesnæs (1996); Ciroth et al. (2016) — pedigree matrix
  • Lloyd & Ries (2007) — log-normal Monte Carlo for LCA
  • Angelopoulos & Bates (2021) — conformal prediction
  • Han et al. (ISCA 2025) — Shapley attribution for multi-tenant inference
  • Mistral × Carbone 4 × ADEME LCA (2025) — Mistral Large 2 LCA
  • Google Gemini Production Disclosure (2025) — comprehensive boundary methodology

Specific page citations are listed at the bottom of each tier page.