SIGMA Data Schema

This document describes the data schema used in the SIGMA Unified Global Standards Index.

Master Schema — 22 Fields per Entry

Every entry in the SIGMA index carries the following fields:

#Field NameTypeDescriptionExample
1sigma_idStringUnique SIGMA identifierHL-ISO-15189-2022
2entry_typeEnumStandards Body / Standard / Framework / Treaty / Guideline / Regulation / Classification / Code of Practice / RecommendationStandard
3meta_layerEnumL1 Life Sciences / L2 Physical Sciences / L3 Society & Governance / L4 Economy & Trade / L5 Technology & Infrastructure / L6 EnvironmentL1 Life Sciences
4domainString (from 40-domain taxonomy)Primary domainHealth & Medical
5sub_domainStringSub-category within domainClinical Laboratories
6name_fullStringComplete official nameMedical laboratories — Requirements for quality and competence
7name_shortStringCommon name / acronymISO 15189
8standard_idStringOfficial identifier from issuing bodyISO 15189:2022
9issuerStringName of issuing bodyISO (TC 212)
10issuer_typeEnumUN Agency / Treaty Body / ISO / IEC / ITU / Industry SDO / Professional Body / NGO / Intergovernmental / National GovernmentISO
11governance_layerEnumInternational / Regional / NationalInternational
12geographic_scopeStringCountries / regions where formally applicableGlobal — 175 ISO member countries
13year_publishedIntegerYear of current edition2022
14year_firstIntegerYear first published2003
15statusEnumActive / Withdrawn / Superseded / Under Development / Under ReviewActive
16mandateEnumMandatory / Voluntary / Voluntary-with-regulatory-adoption / Treaty-bindingVoluntary
17sector_applicabilityStringWho must/should use this standardHealthcare laboratories / accreditation bodies / regulators
18why_it_mattersStringPlain-language explanation of significanceDefines quality requirements for medical labs; basis for lab accreditation in 100+ countries
19key_outputsStringMain standards/versions/elementsISO 15189:2022 (third edition); covers pre-examination, examination, post-examination processes
20official_urlURLPrimary source URL (authoritative, stable)https://www.iso.org/standard/76677.html
21data_sourceStringWhere this entry's data was obtainedISO Open Data CSV + manual verification
22notesStringAny additional contextual informationReplaced 2012 edition; significant restructuring of management requirements

Google Sheet Curation Sync

The shared Google Sheet may include curation-only metadata columns after the 22 master fields, including related_sigma_ids, llm_enriched, and last_updated. These fields are useful for editorial workflow and future enrichment, but they are not part of the published master release schema yet.

Run make sync-google-sheet to export the Sheet into data/processed/google_sheet_master.csv. The sync script keeps only the 22 master fields so existing validation, release builds, and GitHub Pages downloads remain compatible.

Supplementary Entity Tables

Standards Bodies Register

One record per issuing organisation.

Fields: org_id, org_name, org_acronym, org_type, founding_year, hq_country, hq_city, geographic_scope, governance_structure, iso_member (Y/N), wikidata_qid, official_url, linkedin_url, twitter_handle, standard_count, ics_scope, parent_org_id.

Relationships Map

Captures inter-standard and inter-organisation relationships.

Fields: from_id, to_id, relationship_type, confidence, source_url, notes.

Allowed relationship_type values: references, supersedes, adopted_by, implements, aligned_with, referenced_by, harmonised_with, national_adoption_of, inspires.

Allowed confidence values: source-confirmed, curator-reviewed, llm-suggested. LLM-suggested relationships must not be published as final graph edges until a human reviewer confirms the source.

Relationship CSVs are validated by scripts/validate_relationships.py. Empty header-only templates are allowed, but published rows require from_id, to_id, relationship_type, confidence, and source_url.

Ratification & Adoption Tracker

For treaty and convention entries only. One row per country-per-treaty.

Fields: sigma_id, country_iso3, country_name, status (signatory / ratified / acceded / not party), date, reservations, source_url.

ID Convention

SIGMA IDs follow a deterministic, human-readable pattern:

[DOMAIN_CODE]-[ISSUER_CODE]-[STD_NUMBER]-[YEAR]

Examples: