Client Multiple (EBI, LBNL, Research Data Alliance)
Duration 2020 - Present
Focus Areas standards, mapping, FAIR, community

The Challenge

Mappings between ontologies, databases, and vocabularies are essential for data integration—but there was no standard way to share them. Different projects used different formats, making it hard to:

  • Share mappings between collaborators
  • Validate mapping quality
  • Track provenance and justifications
  • Aggregate mappings from multiple sources

My Role

I led the development of the Simple Standard for Sharing Ontological Mappings (SSSOM), a community-driven specification that defines:

A Standard Format

  • TSV-based format readable by humans and machines
  • Clear column specifications for subjects, objects, predicates
  • Rich metadata for provenance, confidence, and justification
  • Support for both exact and fuzzy mappings

A Semantic Model

  • LinkML-based schema with formal semantics
  • Integration with OWL and RDF ecosystems
  • Extensible for domain-specific needs

Tooling Ecosystem

I developed supporting tools including:

  • sssom-py: Python library for reading, writing, validating, and transforming SSSOM
  • Validation rules: Automated quality checks for mapping files
  • Converters: Transform from legacy formats to SSSOM
  • Registry: Central discovery point for published mapping sets

Outcomes

  • FAIR-compliant mapping standard recognized by funding agencies
  • Hundreds of mapping sets now published in SSSOM format
  • Adopted by major projects: OBO Foundry, Monarch Initiative, EMBL-EBI, Research Data Alliance
  • Active community maintaining and extending the standard
  • Interoperability across previously siloed mapping efforts

Why This Matters

Before SSSOM, sharing mappings between projects meant ad-hoc spreadsheets with no provenance and no way to validate quality. Now, a mapping set published by one group can be directly consumed by another—with full confidence in where each mapping came from and why it was made. This enables data integration at a scale that was simply not possible before.

Technologies Used

  • LinkML (semantic schema)
  • Python
  • GitHub for collaborative development
  • TSV/YAML formats