SIGMA Project Status Report
As of: 2026-05-02 19:26:18 +06 (+0600) Repository: sigma-standards/sigma-index Primary source of truth: RESEARCH_PROJECT_PLAN_Global_Standards_Index.md Execution roadmap: docs/superpowers/plans/2026-05-02-roadmap-to-100-percent-global-standards-index.md Project owner and lead curator: Mohammad Ariful Islam Public contact path: https://github.com/sigma-standards/sigma-index/issues
---
1. Executive Summary
SIGMA is now a working public MVP for a unified global standards index. The project has moved from a small landing-page-and-seed-data repository into a reproducible data engineering pipeline with release artifacts, rendered GitHub Pages documentation, public search, quality gates, and a growing set of source-backed priority ingestors.
Current release baseline:
| Metric | Current value |
|---|---|
| Release entries | 88,204 |
| Relationship edges | 20,140 |
| Canonical domains represented | 40 |
| Research tasks tracked | 35 |
| Done tasks | 5 |
| Active tasks | 25 |
| Planned tasks | 5 |
The project should be considered about 22 percent complete against the full 100 percent global vision and about 70 percent complete against the public MVP layer. That distinction matters: SIGMA already has visible public value, but the complete project still requires deeper source expansion, richer metadata, broader relationship modeling, and formal publication packages.
---
2. Journey Accomplishments So Far
2.1 Infrastructure and Publication
Completed accomplishments:
- Established a reproducible repository structure for raw, reference, processed, staging, relationship, report, and publication layers.
- Added deterministic validation through
make validate. - Added release artifact generation through
make release. - Added GitHub Pages static-site generation through
scripts/build_static_site.py. - Added rendered project documentation so public links no longer open as raw Markdown.
- Added public search through Pagefind plus a JSON fallback search index.
- Added progress and owner/contact signals to the public site.
- Added a quality gate report in
docs/QUALITY_GATE.md. - Added a project knowledge graph in
docs/PROJECT_KNOWLEDGE_GRAPH.md. - Incorporated external feedback into
docs/SIGMA_GAP_ANALYSIS_AND_ENHANCEMENT_PLAN.md.
2.2 Data and Source Coverage
Completed or active data families:
| Phase | Source family | Current state |
|---|---|---|
| Phase 1 | ISO metadata, ICS, and technical committees | Bulk seed integrated |
| Phase 1 | IETF RFC metadata | Bulk source integrated |
| Phase 1 | ILO standards metadata | Bulk source integrated |
| Phase 1 | Wikidata standards-body metadata | Reference source integrated |
| Phase 1 | Google Sheet curation sync | Active |
| Phase 2A | WHO/Sphere/WASH health priority records | Active processed ingestor |
| Phase 2B | Codex Alimentarius | Active processed ingestor |
| Phase 2C | CHS, INEE, IASC, UNHCR, WHO EMT | Active humanitarian ingestor |
| Phase 2D | WHO IRIS/OAI metadata | Staging harvester |
| Phase 2E | UN Treaty Collection and OHCHR core treaties | Staging pipeline |
| Phase 3A | IAEA Safety Standards | Active processed ingestor |
| Phase 4A | GRI/SASB sustainability reporting | Active processed ingestor |
| Phase 5A | NIST cybersecurity and AI | Active processed ingestor |
| Phase 5B | W3C web standards | Active processed ingestor |
| Phase 5C | ITU telecommunications | Active processed ingestor |
| Phase 5D | ETSI ICT standards | Active processed ingestor |
| Phase 5E | OASIS, Ecma, and GS1 | Active processed ingestor |
| Phase 6A | IEC electrotechnical standards | Active processed ingestor |
| Phase 6B | CCSDS and ECSS space standards | Active processed ingestor |
| Phase 7A | UNESCO, ICOMOS, ICOM, and ICCROM culture and heritage records | Active processed ingestor |
| Phase 7B | WADA, IOC, IFAB, World Athletics, CAS, FIBA, and ITF sports records | Active processed ingestor |
| Phase 8A | National standards body registry | Active processed ingestor |
2.3 Documentation and Governance
The repository now has:
- A research plan for the full project vision.
- A roadmap to 100 percent completion.
- A machine-readable research task matrix.
- A schema document for the 22-field master entry contract.
- Contributing and code-of-conduct documents.
- A project knowledge graph.
- A friend-reviewed gap analysis incorporated into the working roadmap.
- A public GitHub Issues contact path.
---
3. Previous Experience and Lessons Learned
3.1 What Worked Well
The strongest pattern has been small, source-backed ingestion slices. Each successful slice followed the same shape:
- Confirm the source family belongs in the roadmap.
- Store official source evidence in
data/reference/. - Write a deterministic processor or stager.
- Write tests around the expected rows and schema behavior.
- Generate either
data/processed/rows ordata/staging/candidates. - Update
data/reference/research_tasks.csv. - Update
data/reference/source_registry.csv. - Update public documentation.
- Run validation and publish locally and remotely.
This pattern avoided overreaching while still steadily expanding the project.
3.2 Main Challenges
| Challenge | What happened | Strategy used |
|---|---|---|
| Local dependency mismatch | Some earlier scripts assumed dependencies that were not always available in the shell. | Prefer standard-library CSV validation where practical and keep .venv/bin/python as the stable test runner. |
| CSV fragility | Domain names and meta-layer names sometimes contain commas. | Use CSV-aware writers and validators instead of manual string assembly. |
| Source flood risk | Broad harvesters like WHO IRIS can include many general publications that are not standards. | Keep broad harvests in data/staging/ until filters and curator promotion are proven. |
| Public Markdown rendering | GitHub Pages links initially opened raw Markdown. | Convert key documents to rendered HTML through the static-site builder. |
| Search scale | Tens of thousands of records need a static, low-maintenance search surface. | Use Pagefind-readable generated record pages plus a compact JSON fallback. |
| Remote synchronization | Earlier pushes could fail when GitHub CLI/auth state was not ready. | Check branch, remote, and authentication before push and verify Actions after push. |
3.3 Best Practices Established
- Staging before promotion: broad harvesters write to
data/staging/until a curator approves promotion. - Reference before processed: every promoted record should trace back to a source row or source family.
- Validation before publication:
make validateand tests run before release or push. - Generated artifacts stay generated:
dist/andpublic/are rebuilt, not treated as hand-edited source. - Small ingestors beat giant scrapers: one clear source family at a time keeps quality high.
- Official URLs are mandatory: every new row must include a durable source link where possible.
- Task matrix stays current: every phase change updates
research_tasks.csv. - Source registry stays current: every new source family updates
source_registry.csv. - Docs move with data: README, index page, roadmap, and knowledge graph are updated alongside pipelines.
- Local and remote remain aligned: each accepted change is committed and pushed.
---
4. Current State by Roadmap Phase
Phase 0 - Infrastructure Hardening
Status: active and largely operational.
Completed:
- Reproducible validation.
- Release artifact build.
- Static public site.
- Rendered project references.
- Public search.
- Quality gate.
- Owner/contact placement.
Remaining:
- Add live URL health summaries to the public site.
- Add duplicate/adoption reports by source family.
- Add release notes automation.
- Add versioned release tags and archival package workflow.
Phase 1 - Free Bulk Source Completion
Status: strong foundation, still needs refresh and enrichment.
Completed:
- ISO seed metadata.
- IETF RFC metadata.
- ILO metadata.
- Wikidata standards-body reference data.
- Google Sheet sync path.
Remaining:
- Refresh ISO, IETF, and ILO from official current exports.
- Add richer relationship edges: replacements, updates, committees, classifications, and national adoptions.
- Improve
why_it_matters, mandate notes, and sub-domain normalization for bulk rows.
Phase 2 - Human Rights, Humanitarian, Health, Labour, Education, and Development
Status: high-impact priority layer started.
Completed or active:
- WHO/Sphere/WASH priority records.
- Codex priority records.
- Humanitarian standards expansion.
- WHO IRIS staging.
- UN/OHCHR treaty staging.
Remaining:
- Promote reviewed UN treaty candidates into processed records.
- Harden WHO IRIS filters and promote only normative standards/guidelines.
- Expand Codex into a fuller catalogue.
- Add IPPC ISPMs, WOAH codes/manuals, FAOLEX staging, and education/development frameworks.
Important official resources:
- WHO IRIS: https://iris.who.int/
- WHO publications: https://www.who.int/publications
- Codex standards list: https://www.fao.org/fao-who-codexalimentarius/codex-texts/list-standards/
- UN Treaty Collection: https://treaties.un.org/
- OHCHR core instruments: https://www.ohchr.org/en/core-international-human-rights-instruments-and-their-monitoring-bodies
Phase 3 - Environment, Climate, and Natural Systems
Status: early but important source-backed work has begun through IAEA.
Completed or active:
- IAEA Safety Standards priority records.
Remaining:
- Expand IAEA beyond the priority slice.
- Add climate and environment frameworks.
- Add biodiversity, disaster risk, geospatial, water, and environmental management standards.
Important official resources:
- IAEA Safety Standards: https://www.iaea.org/resources/safety-standards
- UNFCCC: https://unfccc.int/
- IPCC: https://www.ipcc.ch/
- Convention on Biological Diversity: https://www.cbd.int/
Phase 4 - Finance, Trade, and Economic Governance
Status: first sustainability reporting slice is active.
Completed or active:
- GRI/SASB sustainability reporting records.
Remaining:
- Add ISSB.
- Add ESRS.
- Add GHG Protocol.
- Add WTO agreements and trade-related standards.
- Add financial market, accounting, audit, and anti-corruption frameworks.
Important official resources:
- GRI Standards: https://www.globalreporting.org/standards/
- SASB Standards: https://sasb.ifrs.org/standards/
- ISSB/IFRS sustainability standards: https://www.ifrs.org/issued-standards/ifrs-sustainability-standards-navigator/
- WTO legal texts: https://www.wto.org/english/docs_e/legal_e/legal_e.htm
Phase 5 - ICT, Digital, AI, and Cybersecurity
Status: strongest expansion area after the initial seed layer.
Completed or active:
- NIST cybersecurity and AI priority records.
- W3C web standards.
- ITU telecommunications.
- ETSI ICT standards.
- OASIS, Ecma, and GS1 priority records.
Remaining:
- Expand NIST beyond the priority slice.
- Add IANA registry relationships.
- Add IEEE priority metadata where lawful open metadata is available.
- Add 3GPP and additional ETSI release metadata.
- Add AI governance and data protection frameworks.
Important official resources:
- NIST CSRC: https://csrc.nist.gov/publications
- W3C standards: https://www.w3.org/TR/
- ITU Recommendations: https://www.itu.int/rec/
- ETSI standards search: https://www.etsi.org/standards
- OASIS standards: https://www.oasis-open.org/standards/
- Ecma standards: https://ecma-international.org/publications-and-standards/standards/
- GS1 standards: https://www.gs1.org/standards
Phase 6 - Transport, Energy, Manufacturing, and Built Environment
Status: priority slices started for electrotechnical and space standards.
Completed or active:
- IEC priority metadata.
- CCSDS and ECSS space standards.
Remaining:
- Expand IEC coverage from priority rows toward a larger catalogue.
- Add ICAO aviation standards.
- Add IMO maritime instruments.
- Add ASTM, ASME, CEN, CENELEC, building, fire, transport, and manufacturing standards metadata.
Important official resources:
- IEC standards: https://webstore.iec.ch/
- CCSDS publications: https://public.ccsds.org/Publications/
- ECSS standards: https://ecss.nl/standards/
- ICAO: https://www.icao.int/
- IMO: https://www.imo.org/
Phase 7 - Society, Culture, Sports, Legal, and Specialised Domains
Status: culture and heritage plus sports priority work are active.
Current and planned high-impact slices:
- Culture and heritage standards.
- Sports and recreation standards.
Recommended execution order:
- Expand culture and heritage beyond the first UNESCO, ICOMOS, ICOM, and ICCROM priority records.
- Expand sports and recreation beyond the first WADA, IOC, IFAB, World Athletics, CAS, FIBA, and ITF priority records.
Important official resources:
- UNESCO conventions: https://www.unesco.org/en/legal-affairs/conventions
- ICOMOS doctrinal texts: https://www.icomos.org/en/resources/charters-and-texts
- ICOM Code of Ethics: https://icom.museum/en/resources/standards-guidelines/code-of-ethics/
- WADA Code: https://www.wada-ama.org/en/what-we-do/world-anti-doping-code
- Olympic Charter: https://olympics.com/ioc/olympic-charter
- IFAB Laws of the Game: https://www.theifab.com/laws/latest/
Phase 8 - National Standards Bodies and Regional Networks
Status: initial registry slice active.
Completed or active:
- First national standards body registry slice.
Remaining:
- Expand toward all ISO national member bodies.
- Add regional networks such as ARSO, ASEAN, COPANT, CEN, CENELEC, and Gulf/MENA/Pacific networks.
- Add relationships among national bodies, ISO membership, and regional bodies.
Important official resources:
- ISO members: https://www.iso.org/members.html
- CEN: https://www.cencenelec.eu/
- ARSO: https://www.arso-oran.org/
- COPANT: https://copant.org/
Phase 9 - Verification, Publication, and Community Launch
Status: quality gate active; formal launch packaging remains.
Completed or active:
- Static Pages publication.
- Search layer.
- Release artifacts.
- Deterministic quality gate.
Remaining:
- Add HDX publication package.
- Add Zenodo archival DOI workflow.
- Add release notes and versioning.
- Add contributor issue templates for new source families and record corrections.
- Add public quality dashboard and URL-health reporting.
Important resources:
- HDX: https://data.humdata.org/
- Zenodo GitHub integration: https://help.zenodo.org/docs/github/
Phase 19 - Enhanced Integration Roadmap
Status: planned.
Remaining:
- Hybrid tabular plus graph architecture.
- Knowledge graph exports.
- GraphRAG-ready query layer.
- Scheduled source refresh automation.
- Multilingual labels and summaries.
- API and faceted browsing.
- Sustainability, governance, and partnership model.
---
5. Recommended Next Execution Steps
Step 1 - Phase 7A Culture and Heritage Expansion
Deliverables:
- Expand
data/reference/culture_priority_sources.csv. - Expand
data/processed/culture_heritage_standards.csv. - Add more ICOMOS doctrinal texts and ICCROM publication families.
- Add relationship edges to humanitarian response, education, indigenous rights, environment, and disaster risk.
- Keep README, index, roadmap, knowledge graph, and gap analysis synchronized.
Candidate records:
- Burra Charter.
- Valletta Principles.
- Historic Urban Landscape Recommendation.
- Additional ICCROM conservation and collections-care resources.
- Additional UNESCO cultural property and restitution instruments.
Step 2 - Phase 7B Sports and Recreation Expansion
Deliverables:
- Expand
data/reference/sports_priority_sources.csv. - Expand
data/processed/sports_recreation_standards.csv. - Add UNESCO anti-doping treaty and FIFA legal instruments.
- Add sports safeguarding and athlete-rights frameworks.
- Add relationship edges to health, law, human rights, safety, and event management.
Candidate records:
- UNESCO International Convention Against Doping in Sport.
- FIFA Statutes and legal instruments.
- ISO TC 83 sports equipment domain tagging.
- ICC cricket playing conditions.
- International safeguarding frameworks.
Step 3 - Promote UN Treaty Staging
Tasks:
- Review
data/staging/un_treaty_candidates.csv. - Confirm official treaty status URLs and entry-into-force metadata.
- Promote high-confidence records into
data/processed/. - Add treaty-protocol and treaty-body relationship edges.
Step 4 - Harden WHO IRIS Staging
Tasks:
- Add stricter normative metadata filters.
- Add tests preventing general reports from entering release data.
- Promote only curator-approved WHO guidelines, classifications, and technical standards.
Step 5 - Expand Phase 5 and Phase 6 Catalogues
Tasks:
- Expand IEC beyond priority metadata.
- Expand ETSI/3GPP-related references.
- Expand ITU beyond the initial recommendations.
- Add IEEE metadata where public metadata is lawful and stable.
- Add ICAO and IMO priority records.
Step 6 - Build Publication Packages
Tasks:
- Add HDX dataset metadata and upload checklist.
- Add Zenodo citation metadata.
- Add release notes template.
- Add version tag checklist.
- Add public quality dashboard links.
Step 7 - Enrichment and Quality Upgrade
Tasks:
- Prioritize top standards for
why_it_mattersenrichment. - Add mandate notes and regulatory references.
- Normalize sub-domain labels.
- Add URL health reports.
- Add duplicate/adoption detection.
- Add relationship edge expansion.
---
6. Operational Checklist for Every Future Slice
For each source family:
- Confirm it is in the research plan or add it to the roadmap.
- Use official source URLs only.
- Add a reference CSV first.
- Write or update tests before implementation where behavior changes.
- Generate processed rows only for curated, standards-relevant records.
- Put broad or noisy metadata in
data/staging/. - Run the source-specific make target.
- Run
.venv/bin/python -m pytest -q. - Run
make validate. - Run
make pagefind-searchwhen public search or site surfaces change. - Run
git diff --check. - Commit with a clear message.
- Push to
origin/main. - Verify GitHub Actions and Pages deployment.
---
7. Definition of Success for the Next Milestone
The next milestone should be considered complete when:
- Phase 7A culture and heritage records are expanded beyond the first priority slice.
- Phase 7B sports and recreation records are expanded beyond the first priority slice.
- UN treaty candidates have a promotion pathway.
- WHO IRIS staging has stricter filters.
- The public site shows the current report, roadmap, task matrix, knowledge graph, gap analysis, and quality gate as rendered HTML.
- Local
mainandorigin/mainremain synchronized.
---
8. Closing Note
The project has already crossed the most difficult early threshold: it is no longer just an idea or a static list. It is a living, validated, searchable standards-index pipeline. The next stage is about depth, trust, and polish: more authoritative sources, richer explanations, cleaner relationships, visible quality evidence, and formal publication channels.
The safest path to 100 percent is not one giant scrape. It is steady, source-backed, tested expansion by domain and phase.