Pedigree (Weidema/Ciroth)¶
What it is¶
A five-axis quality score attached to every emission factor used in the calculation. Originally proposed by Weidema and Wesnæs (1996) and elaborated by Ciroth et al. (2016), the pedigree matrix scores data quality on a 1-to-5 scale across five axes:
| Axis | 1 (best) | 2 | 3 | 4 | 5 (worst) |
|---|---|---|---|---|---|
| Reliability | Verified data, measurement | Verified data, partly assumption | Non-verified data, partly assumption | Qualified estimate | Non-qualified estimate |
| Completeness | Representative, sufficient sample | Representative, smaller set | Representative, > 50% sites | Representative, < 50% sites | Unknown |
| Temporal correlation | < 3 years | < 6 years | < 10 years | < 15 years | Unknown / older |
| Geographic correlation | Area under study | Similar area | Different area | Unknown | Unrelated |
| Technological correlation | Same technology | Related technology | Different technology, same materials | Different processes, same technology | Unrelated |
Why we use it¶
The pedigree score is a structured, auditable representation of "how confident are we in this emission factor for this query?" It feeds two downstream operations:
- Monte Carlo prior dispersion — pedigree scores are mapped to log-normal standard deviations using Ciroth's lookup table, which become the priors for the Monte Carlo variance propagation.
- Conformal interval width — wider pedigree priors produce wider conformal intervals at calibration time.
Mapping example¶
For a Mistral Medium 3 query running on Scaleway PAR-1 with live ENTSO-E grid data:
| Emission factor | Reliability | Completeness | Temporal | Geographic | Technological |
|---|---|---|---|---|---|
| GPU energy (BoaviztAPI) | 2 | 2 | 1 | 1 | 2 |
| Host CPU+DRAM share | 2 | 3 | 1 | 1 | 2 |
| Datacentre PUE | 2 | 2 | 2 | 1 | 1 |
| Grid intensity (ENTSO-E live) | 1 | 1 | 1 | 1 | 1 |
| Embodied amortisation | 3 | 3 | 2 | 2 | 2 |
The composite pedigree on the receipt is the median across factors, weighted by their share of the total impact. For tier 2, the typical composite is [2, 2, 1, 1, 2] — verified data, partly based on assumptions, recent, geographically and technologically aligned.
Pedigree score → log-normal SD lookup¶
We use Ciroth et al.'s lookup, simplified:
| Score | Reliability SD | Completeness SD | Temporal SD | Geographic SD | Technological SD |
|---|---|---|---|---|---|
| 1 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 2 | 1.05 | 1.02 | 1.03 | 1.01 | 1.18 |
| 3 | 1.10 | 1.05 | 1.10 | 1.02 | 1.50 |
| 4 | 1.20 | 1.10 | 1.20 | 1.10 | 2.00 |
| 5 | 1.50 | 1.20 | 1.50 | 1.50 | 3.00 |
The composite log-normal SD is the geometric combination of the per-axis SDs; this becomes the prior for that emission factor in the Monte Carlo simulation.
Auditor expectations¶
Assurance partners under ISAE 3000 review the pedigree-score worksheet as evidence of methodological rigour. We provide:
- The pedigree score for every emission factor used in the period
- The justification for each score
- The lookup table version used to convert pedigree to log-normal SD
- The composite score as it appears on each receipt
Where this is implemented¶
methodology/uncertainty/pedigree.py
Citations¶
- Weidema, B. P., & Wesnæs, M. S. (1996). Data quality management for life cycle inventories — an example of using data quality indicators. J. Cleaner Production 4(3-4).
- Ciroth, A., Muller, S., Weidema, B. P., & Lesage, P. (2016). Empirically based uncertainty factors for the pedigree matrix in ecoinvent. Int. J. Life Cycle Assessment 21(9).