Toward a Global Standard
Non-normative. This page is a design discussion, not part of the R1 conformance surface. Nothing here changes what makes a Bundle valid (Conformance). Every change proposed below would enter through the RFC + ballot process and start low on the OMIR Maturity Model — at OMM-0/1 — and earn its level through independent implementation.
OMIR R1 is honest about its origin: its schemas were derived from one production memory engine, Veld. That is a strength — the core is grounded in a system that actually ships calibrated confidence, decay, Hebbian edges, and tiering — and a risk. A standard authored by a single implementation is, until proven otherwise, that implementation’s export format with a logo. The work of becoming a global standard is the work of shedding the assumptions that are true only for Veld while keeping the cognitive substance that makes OMIR worth adopting.
This page names those assumptions and proposes the generalizations that would let a
robotics stack, a healthcare agent, a multi-tenant assistant platform, and a coding agent
all read and write the same .omir files without loss. It is the OMIR equivalent of
FHIR’s long migration from “HL7’s resources” to “everyone’s resources.”
What already generalizes (keep it)
Before the critique, the parts of R1 that are not Veld-specific and should be preserved:
- Resource + typed-reference model. Everything is a Resource; links are
ResourceType/id. This is FHIR-proven and domain-neutral. - 80/20 core + typed
extension[]. The escape hatch is exactly what lets the long tail of proprietary data travel without bloating the core. See Extensions. - Honest maturity (OMM). A promise-about-change per resource type. Domain-neutral.
- Calibrated confidence as a distribution, not a bare float (Memory Semantics).
- Portable forgetting state — decay, anchoring, tiers — recorded as state, not as a mandated algorithm.
- Bitemporal-ish timestamps (
eventTimevscreatedAt) and temporal invalidation. - Two lossless encodings and the Profiles mechanism for domain tightening.
The generalizations below are mostly additive — new optional structure and extension points — precisely so they do not break the parts that already work.
Generalization themes
Each theme states what is Veld-specific today, why it limits interchange, and a concrete proposal. The FHIR precedent is cited where one exists, because OMIR is FHIR-modeled and should borrow FHIR’s solved problems rather than reinvent them.
A. Open vocabularies, not closed enums (highest interchange leverage)
Today. MemoryRecord.kind, experienceType, Entity.labels, and Episode.source
are closed enums, and they lean toward coding/dev-agent life: code_edit,
file_access, command, prompt. A robot cannot say “object grasped”; a healthcare
agent cannot say “symptom reported”; a research agent cannot say “hypothesis formed.”
relationType is already an open string — good — but it has no way to say which
vocabulary a term comes from.
Risk. Closed enums hard-code one domain’s worldview into the core. Every other domain
is forced into other + an extension, which means their primary semantics are invisible
to a generic consumer — defeating interchange for everyone but Veld-likes.
Proposal. Adopt a FHIR-style CodeableConcept for these fields: a structure
carrying an optional system (vocabulary URI), a code, and human text, e.g.
"experienceType": { "system": "https://omir.io/vocab/robotics", "code": "object_grasped", "text": "Picked up the red block" }
R1’s bare-string/enum forms remain valid as the degenerate case (text only). OMIR ships
core vocabularies (the current enums, promoted to published code systems) and lets
domains register their own. This single change is the difference between “a memory format
for coding agents” and “a memory format.”
B. Identity, ownership, and multi-agent memory
Today. Memory is implicitly single-agent. There is no first-class notion of whose
memory a record is, who authored it, or how two agents share a memory store. Veld carries
optional agent_id/actor_id tags, but they are tags, not a model.
Risk. The 2026 reality is fleets of agents and multi-tenant platforms. Without an ownership/authorship model, a Bundle exchanged between agents cannot answer “can I trust this? who wrote it? am I allowed to read it?” — the questions that matter most when memory crosses a trust boundary.
Proposal. Introduce an Agent (or Principal) resource and optional
authoredBy / owner references on records and episodes; let Provenance carry the
chain of agents a memory passed through (see Theme E). Access control itself stays a
profile/extension concern (it is policy, not data shape), but the identity hooks the
policy needs belong in the core. Grade Agent at OMM-0 and let multi-agent stacks prove
it.
C. Entity resolution across systems
Today. Entity.id is bundle-local; there is no canonical external identifier, no
alias set, and no “this entity is the same as that one.” Within Veld a UUID suffices
because there is one graph.
Risk. Interchange is entity resolution. If Acme’s “Alice Smith (employee 123)” and Globex’s “@asmith” cannot be declared the same person, two memory stores can never merge — the whole point of a portable format collapses to per-pair adapters.
Proposal. Add FHIR-style Entity.identifier[] ({ system, value } pairs, e.g.
{ "system": "mailto", "value": "alice@acme.com" }), an aliases[] list of surface
forms, and a sameAs link type so a Bundle can assert cross-system identity. This is the
backbone of federation (Theme I).
D. Beyond text: modality and embeddings
Today. content is a single string. Multimodal data and embeddings are
extension-only. Veld carries image/audio/video vectors internally, but the at-rest core
sees text.
Risk. Agent memory in 2026 is increasingly multimodal (vision, audio, sensor). A text-only core relegates non-text memory to opaque extensions that no generic consumer can interpret.
Proposal. Generalize content to an optional multi-part content model —
contentType (a media type) plus either inline content or a MediaReference
({ uri, contentType, hash }) for out-of-line bytes — and define an algorithm-neutral,
efficiency-bearing Embedding that any vendor can emit without mandating a model. Embeddings
stay optional (derived, not source) but become interpretable rather than vendor-opaque. The
block below is the single canonical definition of the embedding object: it folds in the
efficiency-bearing fields (Matryoshka prefixes, sparse codes, a dual-trace role, and reserved
drift / VSA spaces). The Efficiency & Information-Bearing Codes page (Theme D
reframed from “neutral embeddings” to efficiency-bearing codes) supplies the rationale —
watts / inference / canon — for each efficiency field but does not redefine it.
"Embedding": {
"type": "object",
"description": "Algorithm-neutral, optional embedding code (derived, not source). Carries a dense, sparse, Matryoshka-nested, or out-of-line code; comparable only within one 'space'. Efficiency rationale: efficiency.md.",
"properties": {
"space": { "type": "string", "description": "Opaque equality key — codes are comparable IFF 'space' strings are byte-equal. Reserved WG spaces: 'omir:temporal-context' (a slow-drift contiguity vector, efficiency.md EP-5a), 'omir:vsa' (a bind/bundle structural hypervector, efficiency.md EP-F)." },
"model": { "type": "string", "description": "Producer model id that generated the code." },
"dims": { "type": "integer", "minimum": 1, "description": "Full dimensionality; MUST equal a dense 'vector' length and the sparse index space width." },
"dtype": { "type": "string", "description": "Element byte form, e.g. 'f32' | 'int8' | 'binary'. Pins the .omirb encoding; excluded from signed images by default (Phase 3)." },
"vector": { "type": "array", "items": { "type": "number" }, "description": "Dense code. Mutually exclusive with 'sparse'." },
"sparse": {
"type": "object",
"description": "Sparse code for CPU inverted-index search + one-step associative completion (efficiency.md EP-3). Mutually exclusive with 'vector'.",
"properties": {
"indices": { "type": "array", "items": { "type": "integer", "minimum": 0 }, "description": "Active dimension indices, strictly ascending." },
"values": { "type": "array", "items": { "type": "number" }, "description": "Weights parallel to 'indices' (equal length)." }
},
"required": ["indices", "values"],
"additionalProperties": false
},
"ref": { "$ref": "#/$defs/MediaReference", "description": "Out-of-line code — an offloaded / verbatim payload (efficiency.md EP-2)." },
"matryoshka": { "type": "boolean", "description": "True if 'vector' is a nested (Matryoshka) code: any prefix whose length is in 'nestedDims' is itself a valid, rankable embedding — coarse-to-fine shortlisting without re-embedding (efficiency.md EP-2)." },
"nestedDims": { "type": "array", "items": { "type": "integer", "minimum": 1 }, "description": "Ascending valid prefix lengths, e.g. [64,128,256,512,768]. Rank on any listed prefix, re-rank on a longer one; prefixes compare only within one 'space' (efficiency.md EP-2)." },
"role": { "enum": ["gist", "verbatim"], "description": "Dual-trace role (efficiency.md EP-2): 'gist' = durable, compact, anchorable, ranked cheaply; 'verbatim' = fast-decaying surface trace, usually offloaded via 'ref' and fetched only for top-k." }
},
"additionalProperties": false
}
E. Provenance and trust as a chain
Today. Provenance is a flat { source, sourceType, credibility, externalId }. Veld’s
internal model is richer (a relay chain with per-hop credibility and a verified flag);
the core flattens it.
Risk. When memory crosses vendors, “where did this come from, through whom, and is it signed?” is a trust-critical question a flat source string cannot answer.
Proposal. Align with W3C PROV: let Provenance carry a chain[] of derivation
steps (agent, activity, time, per-hop credibility) and an optional attestation/signature
so a consumer can verify a record was not tampered with in transit. Keep the flat form as
the common case.
F. Privacy, sensitivity, and retention
Today. Absent. R1 has no notion of data sensitivity, consent, legal basis,
retention/expiry policy, or redaction. validUntil is for contradiction, not governance.
The omir-personal-assistant profile gestures at a PII-class extension, but it is opt-in
and shallow.
Risk. A global memory format will carry personal and regulated data. Without a governance vocabulary, OMIR cannot be adopted where GDPR/CCPA/HIPAA-style obligations apply — which is most of the interesting market.
Proposal. Define an optional governance block (sensitivity classification, legal
basis, retention policy / deleteAfter, redaction markers, consent reference). Likely a
core extension family with published URLs first, promoted to core once proven. This is
the theme most likely to decide whether OMIR is adoptable by enterprises at all.
G. Uncertainty beyond a point estimate
Today. Confidence is Beta(α, β) + a calibrated point — already better than most. But
it assumes one uncertainty model and conflates evidence count with belief.
Proposal. Keep Beta as the recommended default; allow a more general uncertainty representation (credible interval, or a named distribution + parameters) and optionally distinguish epistemic (lack of evidence) from aleatoric (inherent variability) uncertainty for agents that reason about it.
H. A complete temporal model
Today. eventTime vs createdAt plus validUntil / validAt / invalidatedAt is a
partial bitemporal model, and time is always a precise RFC 3339 instant.
Proposal. Make bitemporality explicit and uniform (a valid-time interval and a transaction-time interval), and support imprecise time (“sometime in 2024”, “before the incident”) via interval/precision-qualified instants. Memory is frequently vague about when; the format should be able to say so.
I. Federation and cross-Bundle references
Today. R1 is deliberately closed-world: every reference must resolve inside the Bundle. That is the right call for R1 (it makes conformance decidable), but it blocks linking a small export to a large shared graph.
Proposal. Define an optional, higher conformance level for resolvable external references (a reference may target a resource in another, addressable Bundle/store), with clear rules so the closed-world guarantee remains the default and the validator can still decide conformance. Federation is how a memory standard scales past one file.
J. Structured knowledge and typed attributes
Today. Entity.attributes is string → string; Episode.metadata likewise. All
structure degrades to stringly-typed key/values.
Proposal. Allow typed attribute values (number, boolean, dateTime, quantity-with-unit, CodeableConcept, reference) so a global graph can carry real structured knowledge — a measurement with units, a date, a link — without serializing everything to strings.
K. Language and internationalization
Today. Text is implicitly English; there are no language tags, and the reference pipeline’s tokenization/NER assume English.
Proposal. Add optional language tags (BCP 47) on textual content and entity names,
and allow multiple language variants of a name/summary. A global standard cannot assume
one language.
A prioritized path
Not all of these are equal. For interchange specifically — the existential goal — the order is clear:
| Priority | Theme | Why first | Likely entry |
|---|---|---|---|
| 1 | A. Open vocabularies | Unblocks every non-coding domain at once; small, additive change. | R1.x · OMM-1 |
| 2 | C. Entity resolution | Interchange is identity; without it, stores can’t merge. | R2 · OMM-1 |
| 3 | F. Privacy / retention | Gate to enterprise & regulated adoption. | R2 · OMM-0 |
| 4 | B. Identity / multi-agent | Matches the 2026 fleet reality; trust across boundaries. | R2 · OMM-0 |
| 5 | E. Provenance chain + attestation | Trust when memory crosses vendors. | R2 · OMM-1 |
| 6 | D. Modality + neutral embeddings | Multimodal memory becomes interpretable, not opaque. | R2 · OMM-0 |
| — | H, G, J, K, I | Valuable, lower interchange-urgency; fold in as domains demand. | R2+ |
The first item is the highest-leverage and lowest-cost: turning four closed enums into CodeableConcepts, with the current values published as the seed vocabularies, would by itself move OMIR from “a coding-agent memory format” to “a memory format that coding agents use,” at OMM-1, without breaking a single R1 document.
The adoption plan
The prioritization above is a ranking; this is the plan. The Working Group’s stated
intent is to adopt all of the proposals, in a deliberate sequence set by dependency and
leverage — vocabularies (A) first, identity & resolution (C) next — with W3C PROV (E)
as the trust backbone the rest hangs from, and the refinements that fix G, J, and K folded
in once the load-bearing pieces land. Every phase enters at low OMM and is gated by independent
implementation. “Additive and non-breaking” is a claim with teeth here, not a slogan: every
R1 resource schema sets additionalProperties: false (Conformance §Document rules, rule CR-6),
so a new core field is rejected by the very R1 schema we promise to keep honoring. We
therefore label each change honestly against GOVERNANCE §5.1/§5.2 and route it through the
correct release vehicle below.
A note on the §5.2 boundary, stated once and reused. Adding a new, unreferenced $defs
member to common.schema.json is purely additive under §5.2 (no existing bundle becomes
invalid). It is the widening of an existing field or def to reference it, or the widening of
the shared Reference pattern, that is non-additive. This is the principle that lets
CodeableConcept (Phase 1) and TypedValue (Phase 5) be added cleanly as defs while the union
widening (Phase 1) and the Agent/Reference widening (Phase 2) are correctly breaking. Wherever
the plan calls a shared-schema change “breaking” or “additive,” it means this. A corollary the
plan applies throughout: any field whose value can only be expressed by widening the shared
Reference pattern (authoredBy, owner, an Agent-targeting sameAs) is NOT §5.1-additive and
is R2/RFC-gated, full stop — it cannot ride an R1.x increment.
Four preconditions before any phase ships
Each precondition is itself a §3.1 normative RFC (it amends GOVERNANCE/principles/CONTRIBUTING,
the common.schema.json shared defs, or process), so each needs a 14-day-minimum comment window
and a two-thirds TSC ballot (§3.3) before Phase 1 opens. Bundle P0+P0a into one
governance-reconciliation RFC (they are coupled) and ship P1 as a second RFC; P-1 (the
founding-TSC bootstrap) is procedurally first because no ballot is callable without a seated TSC.
P-1 — Seat a founding TSC (the true first domino). GOVERNANCE §2.3 targets a 3–7 seat TSC filled “by community nomination and confirmation by the sitting TSC” — but no TSC is seated, and §2.3 has no bootstrap clause for the first one, so the balloting body is self-perpetuating by design and cannot start. “Confirm it before P0” is not a mechanism. P-1 (a GOVERNANCE amendment, in scope) adds a one-time founding-TSC procedure that does not require a sitting TSC: a public call-for-nominations seating an initial 3-seat TSC with at least two non-Veld seats (so Veld holds ≤ one-third from day one and the §2.4 cap holds — at 3 seats, one-third rounded down is zero, so a Veld-majority bootstrap would be unlawful), confirmed by lazy consensus over a fixed comment window rather than by a nonexistent sitting TSC. P-1 is therefore where external participation is first recruited, not an administrative formality — and it couples directly to P2: you cannot seat a neutral TSC without at least one non-Veld party, the same party the second-implementer gate needs.
P0 — Reconcile the OMM ladder and re-grade the published R1 table. The repo carries three inconsistent OMM-2 definitions: GOVERNANCE §4.1 OMM-2 = “Exercised in real bundles by at least one implementation” (GOVERNANCE.md:233); principles.md §4 OMM-2 = “implemented in more than one system” (principles.md:55); and CONTRIBUTING.md §8 collapses OMM-1 and OMM-2 into one band — “Trial use — implemented somewhere, but limited field experience” (CONTRIBUTING.md:372-378) — putting “multiple implementations” only at OMM-3. The reconciliation RFC MUST rewrite the ladder identically across all THREE documents (GOVERNANCE §4.1+§4.3, principles.md §4, CONTRIBUTING.md §8, splitting its merged “1–2” row), and update the §3.4 Draft/Trial-Use/Normative band table in the same RFC:
- OMM-1 = implemented in ≥1 system; field set volatile.
- OMM-2 = ≥2 independent implementations; shape settling, not yet broad field use.
- OMM-3 = ≥2 independent implementations plus real-world cross-validated usage/conformance evidence (the discriminator from OMM-2 is settling vs cross-validated field experience).
The reconciliation is monotonic in BOTH directions — it lowers any existing grade the new independence gate no longer supports. Today GOVERNANCE §4.3 grades MemoryRecord at OMM-4 and Entity/Relationship/Episode at OMM-3 with Veld as sole implementer (GOVERNANCE.md:255-258), repeated in principles.md §4 (62-64), CONTRIBUTING.md §8 (382-384), and the README grade table (README.md:48-51). Under the reconciled gate those grades are unsupportable: there is no second independent implementation. The same RFC MUST re-grade the R1 table — demoting MemoryRecord and Entity/Relationship/Episode to OMM-1 (“settled shape in the reference implementation; awaiting a second independent implementation”) across all four artifacts in one ballot. Leaving four overclaimed grades standing while reforming the OMM in the name of honesty is exactly the dishonesty §4.2 forbids and the loudest possible “the OMM is marketing” signal to a prospective second implementer. Until P0 lands, no phase may claim OMM-2, and every OMM target below is read against the reconciled, re-graded ladder. P0 also assigns stable rule anchors CR-1..CR-8 to Conformance §“Document rules (MUST)” (today a positional 1–8 list with no stable IDs), so every cross-reference in this plan and its RFCs survives renumbering.
P0a — Define “independent” so it is recorded, neutral, and not gameable. An implementation counts toward the gate iff (a) it has a separate codebase/maintainer, developed without copying reference-implementation code under any license other than the Apache-2.0 grant — re-implementing from the CC-BY spec and using the validator as a conformance oracle explicitly COUNTS as independent (otherwise the licensing invariant that anyone may re-implement from the spec contradicts the gate); (b) it ships under a separate product/funding line with its own users; and (c) it passes the reference validator while PRODUCING the feature (not merely ignoring it). The gate is decoupled from TSC seats and employer: a committed second adopter who joins the TSC is exactly what we want and MUST NOT be disqualified for it. An anti-sock-puppet clause (common control, shared funding, or shared engineering team → counts as one) replaces any unenforceable string-match on “shared employer.” To keep the gate from being decided by the party that benefits from declaring it met: the counts-as-one determination uses the SAME “organization/common control” definition as the §2.4 seat cap (one bright line governs both seat math and implementation counting), the TSC MUST publish written findings against the three criteria on the ballot record, and the determination is taken by a supermajority EXCLUDING any TSC member employed by or contracted to the reference implementer (extending the §2.3 recusal rule — the convening vendor has a material non-interoperability interest in the gate being declared met). The WG validator and any Veld-authored tool never count.
P1 — A versioning contract for the R1→R2 line, with a named validator-rewrite scope. The R1
version pin is enforced at four points today; the R2 schema set and the validator MUST handle
each: (i) Bundle.omirVersion const:"R1" is neutralized to a plain string in the validator
(schemas.rs:103) — the literal-R1 envelope gate actually lives in Rust at version_presence
(lib.rs:390-402 → E300), so making it version-relative is a code change, not a schema const
edit; (ii) Meta.omirVersion const:"R1" (common.schema.json:39 → E301); (iii) Meta is
additionalProperties:false (common.schema.json:58); (iv) every resource schema AND the
Bundle envelope set additionalProperties:false (CR-6; Bundle.schema.json:30). Specify:
- R2 Bundles and resources carry
omirVersion:"R2"; the R2common.schema.json(under the R2 namespace) sets theMeta.omirVersionsite toconst:"R2"(or anenumof accepted majors), leaving R1 schemas frozen. E300/E301 become version-relative, not literal-R1. - A consumer reads the major version and applies that version’s schema set, and MAY
reject a newer major it does not implement. Forward-read of unknown core fields is
constrained by closed schemas and is stated honestly: because R2.0 resource schemas are
additionalProperties:false, an already-shipped R2.0 validator does NOT silently forward-read a new core field introduced by a later R2.x minor — it would hard-reject it (E120), exactly as “R1 readers do not tolerate R2.” Intra-major additivity therefore rides theextension[]lane (always tolerated byignore-unknown-extensions), not undeclared new top-level properties. Any genuinely new optional core field that we want same-major consumers to skip is shipped only when a minor schema bump relaxes the relevant object to a controlled, pattern- gated extension lane; absent that, it is a new-release item. The earlier blanket “SHOULD ignore unknown newer-minor fields” claim is dropped as unimplementable against closed schemas — this corrects the Overlook table’s “R1.x-additive ⇒ forward-readable” assumption. - Forward-read across majors is a property of R2+ tools about older majors, never of already-shipped R1 tools about newer ones. A published R1 validator WILL reject any R2 bundle at the envelope (E300) by design. The honest compatibility guarantee is exactly “R1 bundles stay valid forever” + “R2 readers accept R1,” not “R1 readers tolerate R2.”
- The reference validator becomes version-aware as a named, scoped rewrite, not a config
switch: (a) parameterize
SchemaFiles,COMMON_ID, andRESOURCE_TYPES(schemas.rs) by declared major version (an R1 set and an R2 set, each with its owncommon.schema.json$id); (b) build a version-keyedRegistryand dispatch inSchemas::build; (c) the validator reads the declared Bundle-levelomirVersionat PARSE time, before any structural / reference / version check, and selects the version-keyed registry; E110/E120/E200/E300/E301 then all run against the selected set. (Selecting “before E300” is insufficient: today structural E110/E120 run first (lib.rs:31-132) and E300 last (lib.rs:142), so an R2 bundle would be double-reported as wrong-version AND malformed-against-R1.) An unknown newer major short-circuits to a single E3xx at the envelope and runs NO resource-schema checks; (d) the embedded-schemainclude_str!mechanism carries both releases’ schemas. DoD: P1 is done only when the same binary validates a published R1 example AND an R2 example, and reports an R2-declared bundle as a single envelope-level E300 under the R1 set.
The phases
Phase 1 — Vocabulary (A). Split into 1a (the cheap adoption surface) and 1b (closed-vocabulary enforcement); 1b does NOT gate 1a and is not on the second-implementer critical path.
Phase 1a, in two decoupled moves so namespace-opening costs nothing on day one. (1) Open the
namespace first, additively. Publish the seed code systems and the vocabulary registry under
https://omir.io/spec/R2/vocab/<field> as an R1.x-additive documentation + JSON-LD artifact
(no schema type change), so other domains can NAME their own concepts immediately with zero
bundle becoming R2. This is the real payoff of A-first under the directive — opening the
namespace, not paying a major-version envelope break before any consumer benefits. (2) Widen the
union when a coded value is demanded. Add a CodeableConcept def ({ system, code, text })
to common.schema.json (purely-additive per the §5.2 note) and widen kind, experienceType,
Entity.labels, Episode.source, and Relationship.relationType (the spec’s one already-open
vocabulary — CONTRIBUTING §3.10:137-140, GOVERNANCE §5.1:282 — and the single largest real-world
fragmentation vector; widening it is the least breaking because its bare branch is already
type:string). Widen each closed enum’s string branch to the EXISTING enum, not to type:string:
anyOf: [ <the R1 enum>, CodeableConcept ] (for relationType, anyOf: [ {type:string}, CodeableConcept ]). This keeps the legacy closed set hard-validated at Core (a bad kind
like "banana" still fails E120). Make the seed/object mapping STRUCTURAL, not prose: in the
R2 schema, constrain the CodeableConcept branch of each widened field so its system MUST NOT
be that field’s own seed-system URL (not:{properties:{system:{const:<seedURL>}}}), so a seed
value is expressible only as the bare string and the union is genuinely bijective at the Core
schema layer — no Phase-1b dependency, no honor-system. Move (2) ships as the first R2 change;
move (1) does not. Adding or retiring a seed value is governed per the operation, below.
Seed-registry governance, split by direction (so additive growth stays cheap):
- ADDING a seed value to an existing seed system is §5.1-additive open-vocabulary growth (it matches GOVERNANCE §5.1’s existing “new open-vocabulary values” carve-out): an ordinary PR under Maintainer review, NOT a full RFC. Routing every new concept through a 14-day ballot would make A heavier than today’s open-vocab path — the opposite of cheap adoption.
- RETIRING, narrowing, or re-meaning a seed value is enum-narrowing: §3.1 RFC + ballot, and (see Phase 3) a canonicalization-version bump under §6 so historical signatures are verified under the frozen seed table of their own version, never retroactively flipped.
Normalization, stated precisely and per-field cardinality:
- A bare seed string
sand the seeded coding{system:<seed URL>, code:s}are semantically equivalent for matching/comparison; the bare-string seed form is the canonical at-rest form. R2 introduces a NEW consumer rule (not a restatement of the existing extensions-scoped SHOULD): an R2 consumer MUST NOT canonicalize between bare-string and object form for seed-coded fields on re-export — lossless pass-through. This is a consumer-side tightening listed in the R2 migration note; equality is a consumer obligation, not a structural check (the core validates union shape only, with the seed-system exclusion above making the shape itself bijective). - For fields carrying a schema
default(kind,tier): do NOT mutate the R1defaultin place (MemoryRecord.kind:18default:"memory", tier:31default:"working"live on an OMM-1 type and are read by default-applying codegen/normalizers). The R2 schemas keepdefault; the canonical at-rest form is omission when a field equals its default (not the explicit value). Phase 3’s canonicalizer normalizes a defaulted field to its omitted form before building the signed map M, so absent and explicit-default sign identically. Removing/relocating adefaultwould change the value seen by default-applying consumers — so this is a documented R2 migration item under “behavior of absent fields” for bothkindandtier(“an absentkindMUST be interpreted asmemory; an absenttierasworking”), not a silent edit. - For scalar optional fields with no default (
experienceType,Episode.source): absent stays absent and is distinct from{text:s}. - For the array field
Entity.labels: the legacy"other"+ label-extension idiom is DEPRECATED under §6 in favor of a CodeableConcept item. Updatecontext.jsonldlabels/term mappings accordingly. - A non-seed coded value MUST use the object form; a seed value MUST use the bare string — now enforced structurally (above), so Phase-3 attestation canonicalization collapses the union to one byte sequence.
Companion item (atomic with the widening, NOT deferred to 1b): re-key every behavioral MUST that
references a widened enum value. profiles.md:88-90 makes prospective-memory filtering a
behavioral MUST keyed on the literal experienceType:"intention". The moment 1a admits the object
form, an unmodified profile MUST is silent on experienceType:{system:<seed>, code:"intention"} —
a safety-relevant recall gap. The normative re-key (“any experienceType whose canonical form
is the seeded intention coding MUST be filtered from ordinary recall”) lands in the SAME RFC as
the experienceType widening, with a 1a-DoD fixture proving an object-form intention is still
filtered. Only the machine-readable binding engine stays in 1b.
JSON-LD @context audit (a DoD blocker, not an afterthought): widening a field that shares an
overloaded @context term silently corrupts the RDF lift of existing bundles. In R1, one term
source is bound to a single bare omir:source with no per-field @type (context.jsonld:33), yet
four fields use it: Episode.source (being widened), Bundle.source, Meta.source,
Provenance.source (all plain strings). A coded Episode.source would lift a {system,code,text}
node onto a predicate that elsewhere carries a literal, and system/code/text (unmapped) would
fall to accidental @vocab predicates. Before widening any field, split every overloaded term in
the R2 context.jsonld: give the widened Episode.source its own term (episodeSource → omir:episodeSource) typed for an embedded coding, add CodeableConcept’s system/code/text
with explicit typing, and keep the three string source fields on a separate literal predicate.
DoD: a Turtle/N-Quads golden-file test that the R1 minimal bundle and an R2 coded bundle both
lift to the intended triples.
Honest status: this is a type change to existing fields, two of which live on MemoryRecord (now OMM-1 under P0). A bare-string producer stays valid; an object value is rejected by an R1 consumer; an R2 bundle is opaque to an unmodified R1 consumer at the envelope (per P1). Domain vocabularies (robotics, healthcare) live under their own non-omir.io URLs; the omir.io namespace is reserved for WG-published seed systems only. Target: R2 · OMM-1. A-first is about opening the namespace (move 1, day-one, additive) so other domains can name their own concepts; the union widening (move 2) is held to R2 and is the cheapest additive spec change — but milestone zero (below) is the literally cheapest adoption.
Phase 1b — closed-vocabulary enforcement (separate work item, off the 1a critical path).
Extend profiles.md with a machine-readable terminology-binding artifact (a profile schema,
not prose) binding a CodeableConcept field to a code system/value set with a binding strength
(required/extensible/preferred), mirroring FHIR. Implement profile-constraint checking in
the reference validator — today there is no code path (lib.rs:144 is a bare comment): this is a
net-new subsystem (profile loader, meta.profile dispatch, value-set membership check, new
finding codes: required → error, extensible/preferred → warning), with passing and failing
examples. Re-issue the three R1 reference profiles, splitting by kind:
- Genuinely REQUIRED enums (
omir-coding-agent.kind) →required-strength bindings. The required-strength binding requires the field to be PRESENT — an absentkindrelying on the Phase-1a normative default does not satisfy the profile (a profile narrows the base; presence is the thing it guarantees). Cross-referenced from the 1a “default” bullet. - SHOULD value fields (
omir-coding-agent provenance.sourceType(profiles.md:41),Entity.labels) →preferred/extensiblebindings reported as warnings, never errors. Binding strength MUST match the original RFC-2119 keyword; promoting a SHOULD to an enforcedrequiredbinding is a new-release narrowing, not a re-expression. - Behavioral MUSTs are NOT terminology bindings and stay as prose RFC-2119 requirements. Re-issuing a profile MUST preserve every behavioral MUST verbatim (the intention-filtering re-key already landed in 1a); terminology bindings constrain values, not consumer behavior.
Phase 2 — Identity (B: Agent) and Resolution (C). B (Agent) is a prerequisite ONLY for E2
(agent-attribution hops); the headline trust deliverable E1 depends only on A and the existing
Provenance and ships without Agent — so Agent’s adoption gate never holds the trust pillar
hostage. B: introduce an Agent (Principal) resource and optional authoredBy / owner
references. C: add optional Entity.identifier[] ({ system, value }), aliases[], and a
sameAs link so two stores can declare cross-system identity and merge.
Honest status, partitioned by whether a field touches the Reference pattern:
- §5.1-additive (R1.x-eligible):
Entity.identifier[]andaliases[](a new non-refdef{system,value}and string array — noReferencewidening). - §5.2/§3.1-breaking (R2/RFC-gated): any field that can reference
Agent—authoredBy,owner, andsameAsinsofar as it permits Agent targets — because each forces thecommon.schema.json#/$defs/Reference.patternwidening. These are NOT R1.x-additive. AddingAgentdoes not invalidate any existing R1 bundle, but the conformance/bundle prose enumerating “the four core resources” must change in the same PR. Required, atomic work items: - (a1) author
schemas/Agent.schema.json; (a2) add its$reftoBundle.entry.items.oneOf(aoneOfof resource-schema$refs, not a regex); (a3) widen thecommon.schema.json#/$defs/Reference.patternregex (today^(MemoryRecord|Entity| Relationship|Episode)/..., common.schema.json:18) to includeAgent(the single load-bearing pattern change). - (a4) extend the validator’s
SchemaFilesstruct +from_dir/embeddedloaders +by_typemap (schemas.rs is a hard-coded 6-file / 4-type shape that will not pick up a new type automatically); updateRESOURCE_TYPES(item b) and both “four core resources” message sites: the validator E101 string (lib.rs:124) and validator/README.md:93. - (b) add
AgenttoRESOURCE_TYPES. - (c) replace the hand-coded per-field reference walk with a registry-driven walker. The
mandate is not “resolve every
{ref:"Type/id"}anywhere.” It resolves (i) every Reference-object{ref:...}at a schema-declared Reference-typed field — NEVER insideextension[].valueJsonor any free-form/additionalPropertiesmap (a vendor may legitimately put a{"ref":...}-shaped object invalueJson); and (ii) every closed-world bare-id field enumerated in CR-5 (currentlyMemoryRecord.parentId, special-cased at lib.rs:219-230). Drive the walk from an explicit registry of(field-path → target-type-SET, ref-shape)entries (a set, because some R2 slots are poly-typed — see (c′)). Seed the registry with the FULL existing closed-world set as ground truth from lib.rs:217-239 and CR-5:entityRefs → {Entity}(on MemoryRecord and Episode),Relationship.from → {Entity},Relationship.to → {Entity},Relationship.sourceEpisode → {Episode}(silently dropped in earlier drafts — its E201 expected-type=Episode check must survive the refactor),MemoryRecord.parentId → {MemoryRecord}(bare-id). AddingauthoredBy/owner/sameAs/provenance.chain[]then extends the registry rather than relying on shape-sniffing, and the per-field target type is preserved (E201 must still reject, e.g.,Agent/xinRelationship.from). - (c′) Change E201’s check from
ref_type == expectedtoexpected_set.contains(ref_type)(lib.rs:337), since Phase 3’sprovenance.chain[].fromis poly-typed (any in-Bundle resource) whilechain[].agentis mono-typed to{Agent}. The current single-type tuple cannot model this; the set form can. - (d) amend Conformance CR-5 by appending the new reference-bearing fields to the existing closed-world list (never re-deriving it from the Reference type); state that the closed-world set is exactly the schema’s Reference-typed fields plus the enumerated bare-id fields, and record the per-field target-type table in the migration note.
- (e) register
Agentin the JSON-LD@context; (f) ship a migration note.
For Phase 5’s federation form, do NOT relax the shared Reference pattern: define the
external-reference form as a distinct $def (ExternalReference, its own property, not ref)
so the Core closed-world pattern is untouched. Target: R2 · B (Agent): OMM-0 / C: OMM-0 —
C is graded OMM-0 “mechanism present, cross-system merge unverifiable single-party”, not OMM-1:
identifier[]/aliases/sameAs are inert until either a second store exists or federation (I)
lands. C’s identifier fields and I’s federation mechanism are a matched pair whose combined
value requires the second implementer; neither is claimed as delivered interop value at single-party.
Phase 3 — Provenance & trust: W3C PROV (E), the pillar. Re-shape Provenance into a
PROV-aligned derivation chain with optional, redaction-aware attestation. Detailed below. E splits:
E1 (chain over prior resources + per-hop credibility) depends only on A and the existing
Provenance; E2 (agent-attribution hops referencing Agent) depends on B. The critical
path is A → E1-chain — pure additive optional structure, immediately reviewable by a second
party. The heavyweight attestation subsystem (det-CBOR / M / inclusion allow-list / redaction
commitments / key resolution) is a PARALLEL track explicitly OFF the critical path, because
cross-verification is its only meaningful test and is worthless with one implementer: it cannot
reach OMM-1 or interop until the P2 second-verifier milestone is met. This honors “E emphasized” as
the trust design pillar without front-loading a multi-quarter cryptographic interop project
ahead of any second implementer. The E1/E2 discriminator is normative: a record is E1 only
if its chain contains no agent operand and no agent-naming attestation; the moment any hop
carries agent or the attestation’s signed subset includes an Agent reference, it is E2
and capped at the Agent OMM floor. E2 MUST NOT advance past OMM-0 until Agent is at least
OMM-1 with Phase-2 Reference support. Target: R2 · E1-chain: OMM-1 / attestation & E2: OMM-0
(capped at the Agent floor; non-interoperable until P2).
Phase 4 — Sensitivity & modality (F, D). F: an optional governance block (classification,
legal basis, retention / deleteAfter, redaction markers, consent reference) — plus a typed
redactionCommitments[] array ({ fieldPath, commitment, salt }) so per-field signed
commitments have a declared schema home under additionalProperties:false (without it there is no
legal place to store a commitment for an arbitrary redactable field). The redaction mechanism is
co-designed with E’s attestation envelope (Phase 3). On introducing the core governance block,
deprecate the WG-published, omir.io-namespaced, profile-REQUIRED pii-class extension
(https://omir.io/spec/R1/extension/pii-class, profiles.md:91-93) under §6 (deprecated R2, earliest
removal R3): because it is required by omir-personal-assistant, during the R2 window that
profile MUST accept EITHER the deprecated extension OR the core classification field (so no
existing producer is instantly non-conformant), and the R2 migration note MUST carry an explicit
field-by-field value-shape mapping (pii-class → core classification). D: a multi-part
content model (contentType + inline-or-MediaReference) and an algorithm-neutral Embedding
representation { space, model, dims, dtype, vector?|ref? } where space is an opaque equality
key (vectors comparable iff space strings are byte-equal; no cross-space comparability
claim), dims MUST equal the vector length, and dtype pins the byte form. On-the-wire: a
JSON decimal array in .omir, a typed CBOR array in .omirb. Because .omir decimal and
.omirb dtype bytes are not byte-reconstructable from each other, embedding vectors are EXCLUDED
from signed images by default; if ever committed, the commitment is over a single dtype-independent
canonical-decimal (shortest-round-trip) form, never the dtype byte form (see Phase 3). Honest
status: F and D are new optional fields; absent the Agent/ref dependency they qualify for
R1.x-additive (footnote ¹). Target: R2 · OMM-0 (D grades OMM-0 until two vendors demonstrate a
cross-store similarity round-trip).
Phase 5 — Refinements: fix G, J, and K (and fold in H, I).
- G — uncertainty: keep Beta as default; add a general uncertainty value (credible interval, or named distribution + params) and optionally separate epistemic from aleatoric. New optional fields — R1.x-additive-eligible.
- J — typed attributes: do not replace the string maps. Add an additive optional
sibling channel —
Entity.typedAttributes[]/Episode.typedMetadata[], each{ key, value }wherevalueis a sharedTypedValuedef incommon.schema.json(number, boolean, dateTime, quantity-with-unit, CodeableConcept, reference; the def-add is purely additive) — and markattributes/metadataDEPRECATED under §6 (deprecated R2, earliest removal R3), updatingcontext.jsonld. Adding the typed channel AND marking the maps deprecated are both §5.1-additive and R1.x-eligible (per §6.4 validators only warn on deprecated-but-present items); only the eventual R3 REMOVAL of the maps is the breaking boundary. CodeableConcept (from A) is one member of the union; keep the dependencyA → J, and J’s typed channel caps at A’s grade.TypedValueis NOT reused in theExtensionvalue slots — constraining them to a closed union would remove arbitraryvalueJson(a §5.2 break violating the typed-extension escape-hatch invariant).Extension.valueJsonstays maximally permissive; if typed extension values are later wanted, add an optionalvalueTyped($ref TypedValue) alongside the existingvalue*fields (purely additive) — never retro-typevalueJson. - K — i18n: BCP-47 language tags on textual content and entity names, with optional multilingual variants. New optional fields — R1.x-additive-eligible.
- Fold in H (full bitemporality + imprecise/interval time — new optional fields) and I
(federation). I requires a mechanism before a target: introduce the syntactically distinct
ExternalReference$def(its own property, an absolute/identifier-based locator built on Theme C’s identifiers, never the bareResourceType/idand never a relaxed Core pattern); define a third conformance level “Federated” in Conformance.md (via RFC) that relaxes CR-5 only for explicitly-external references; closed-world remains the Core default and the R1/R2 invariant is preserved for anything not claiming Federated. I is blocked by (1) the widened Reference (Phase 2 B-work), (2) the newExternalReference$def, and (3) the CR-5 carve-out RFC — C’s identifiers are an input to how external references are expressed, not the gate. Target: R2+ · OMM-0.
Phase 3 in depth — aligning Provenance with W3C PROV
W3C PROV models the world with Entity (a thing), Activity (something that acts over
time), and Agent (who/what is responsible), related by a type-constrained verb set whose
relations take incompatible operands: wasAttributedTo is Entity→Agent; wasDerivedFrom is
Entity→Entity; wasGeneratedBy is Entity→Activity (generated entity is the subject); used is
Activity→Entity and wasInformedBy is Activity→Activity — both Activity-subject. OMIR’s flat
{ source, sourceType, credibility, externalId } is the common case; the proposal lets it
expand into a chain when trust matters. The chain’s implicit subject is the resource carrying
the provenance block, lifting to prov:Entity. Each step is a discriminated union keyed on
role that pins its legal operands. To avoid a fatal @context term collision, the chain does
NOT reuse the globally-bound terms from (Relationship operand, context.jsonld:158) or
credibility (context.jsonld:91): it uses distinct field names derivedFrom and hopCredibility
(JSON-LD term definitions are document-global, not subtree-scoped — the same term cannot mean two
things under one merged context).
"provenance": {
"source": "design-review",
"credibility": 0.92,
"aggregateCredibility": { "value": 0.75, "model": "product" },
"chain": [
{ "role": "wasAttributedTo", "agent": { "ref": "Agent/agent-a" }, "at": "2026-05-30T11:42:00Z", "hopCredibility": 0.95 },
{ "role": "wasDerivedFrom", "derivedFrom": { "ref": "Episode/ep-launch-chat" }, "at": "2026-05-30T11:42:03Z", "hopCredibility": 0.90 },
{ "role": "wasGeneratedBy", "activity": { "id": "act-consolidation", "label": "consolidation", "at": "2026-05-31T03:00:00Z" }, "hopCredibility": 0.88 }
],
"attestation": { "alg": "ed25519", "keyId": "did:web:example.org#k1", "key": "<inline JWK>", "agent": "Agent/agent-a", "at": "2026-05-31T03:00:01Z", "signature": "base64url…" }
}
-
Role/operand validation is normative and enforced. With the bearing resource as subject, the Entity-subject verbs are well-formed and enumerated:
wasAttributedTo→ MUST carryagent, MUST NOT carryderivedFrom;wasDerivedFrom→ MUST carryderivedFrom(a prior resource), MUST NOT carryagent;wasGeneratedBy→ MUST carry anactivityoperand, MUST NOT carryagent. A validator rule rejects role/operand mismatches. The Activity-subject verbsused,wasAssociatedWith, andwasInformedByare ALL reserved (none can take the bearingprov:Entityas subject) until a first-class Activity referent exists. (wasInformedByis Activity→Activity, the same subject-type problem as the others — it was previously and contradictorily permitted with an inline activity; that permission is dropped, leavingwasGeneratedByas the only Activity-touching verb in R2.) -
derivedFromis poly-typed;agentis mono-typed.wasDerivedFrom.derivedFrommay resolve to any in-Bundle resource type (Episode, MemoryRecord, Entity, Relationship, Agent), so its registry entry carries the full resource-type set and role/operand validation is what narrows it per hop;wasAttributedTo.agentis mono-typed to{Agent}. The widened E201 set-membership check (Phase 2 c′) covers both. -
Activity is an inline shape, not a literal and not a top-level resource. R2 carries Activity within the hop as
activity: { id, label?, at? }(no newBundle.entry/oneOf type, so the “five core resources” prose is untouched). The PROV-O lift mints a blank-nodeprov:Activityfrom it. Activity operands carry norefand are EXEMPT from closed-world resolution; the walker keys strictly onref/derivedFrom/agentand skips them. Promoting Activity to a first-class resource is deferred. -
Closed-world applies to the resource-typed chain refs.
chain[].agentandchain[].derivedFromare closed-world references in R2 and MUST resolve within the Bundle; the registry-driven walker covers them. Chain-ref integrity is a DOCUMENT property: the validator walks it unconditionally. A producer that cannot include an upstream Agent/Episode MUST inline a minimal resource, satisfy the hop with a placeholder tombstone of the right type, or omit the hop — never emit a dangling ref. A placeholder tombstone Agent MUST carry an explicitplaceholder:truemarker. Cross-Bundle provenance is deferred to Federated, never permitted at Core; until I lands, attested records carry their provenance closure. -
Per-hop credibility is the only normative trust number; any roll-up is optional and non-normative.
chain[].hopCredibilityis a UnitInterval, added to the CR-7 [0,1] enumeration and bound tocommon.schema.json#/$defs/UnitInterval. The legacyprovenance.credibilitykeeps its R1 meaning — it is not silently redefined as a derived product. Any roll-up lives in a new optionalprovenance.aggregateCredibility { value: UnitInterval, model: "product"|"min"|… }.productis a series-reliability / weakest-chain heuristic, NOT an independent-evidence probability — the “independent-evidence” label is dropped as false. A SHOULD-level check warns whenaggregateCredibility.valueis inconsistent withmodelover the present hops (rounding-tolerant). -
Attestation — explicitly an OMM-0, off-critical-path, parallel track until two independent verifiers demonstrate cross-encoding verification AND the A-normalization and J-migration canonicalization versions are pinned. The validator cannot enforce any of this today (Outcome is a closed
{Pass, Fail}enum, report.rs:64-69;Checkis{Structural, ReferenceIntegrity, VersionPresence, Profile}, report.rs:45-52; lib.rs has no crypto, no CBOR, and reads only parsed.omirJSON). The attestation track therefore includes an honest validator work item paralleling Phase 1b’s “net-new subsystem”: (a) add an.omirb/CBOR reader with a declared sub-profile tag byte; (b) add aCheck::Attestationvariant and a per-attestation finding tripletverified/tampered/unverifiablethat is SEPARATE from the document-level Pass/Fail — anunverifiableattestation MUST NOT flipcore_conformantto false (it is orthogonal to schema conformance); (c) scope the ed25519/key code as OMM-0 and OUT of the Core conformance path, so a Core-R2 consumer that does not verify is still conformant. Until (a)-(c) land, the bincode-under-attestation rule is documentation only and attestation is OMM-0/non-interoperable — no validator-enforced MUST is claimed. The mechanism, when built:- Signing input is a canonical typed map M defined by a closed, version-tagged INCLUSION
allow-list, enumerated per signable resource type. For MemoryRecord, M includes
id,resourceType, acontent-commitment, the ordered chain (roles, operandref/inline-activity-id, per-hophopCredibility), the assertingAgentref +keyId+ the inline signing key + signing-timeat, andclassificationonce Phase 4 lands; M excludes mutable/operational fields (decay.*,meta.lastUpdated,version). Entity/ Relationship subsets, if signable, are enumerated separately (Entity excludesmentionCount,salience,lastSeenAt; Relationship excludesstrength,validAt,invalidatedAt). - Scope. An attestation signs the bearing resource’s enumerated subset plus a
hash-commitment of each referenced resource’s
id+resourceType(not their mutable bodies). - One canonical byte form. The signature is over
SHA-256(det-CBOR(M))(RFC 8949 §4.2). A JSON (.omir) signer/verifier MUST construct the identical typed map M and the identical det-CBOR(M) bytes; JCS is at most an aid to building M, never an alternate signed form. No field whose.omirand.omirbrepresentations are not provably byte-reconstructable from each other may enter a signed commitment (dtype-pinned binary byte forms MUST NOT) — this clause is also written into encodings.md so encoding-neutrality (Principle 5) and one-canonical-form hold simultaneously. - Number normalization is a single pre-CBOR step applying to ALL signed numeric fields
(not only UnitInterval/score): every JSON number in M maps to its shortest round-tripping
decimal, so
9,9.0,9.00collapse (the shipped corpus already drifts —{alpha:9,beta:1}in encodings.md:36 vs{alpha:9.0,beta:1.0}in minimal-bundle.omir:82) and CBOR floats are forbidden for signed numeric fields. A KAT fixture proves{alpha:9}and{alpha:9.0}sign identically. Defaulted fields are normalized to their omitted form before M is built, so absent and explicit-default sign identically (Phase 1a). A KAT fixture proves two MemoryRecords differing only in presence/absence of a default-valuedkindproduce the identical digest. - CodeableConcept-union fields are pre-canonicalized into M (seed values → bare string —
enforced bijective by the Phase-1a seed-system exclusion — non-seed →
{system,code}withtextdropped) before CBOR. The canonicalization-profile version is carried INSIDE M (not merely on the envelope), and the verifier selects canonicalization rules by the version recorded in M, never by its current profile. A seed retirement (Phase 1a) is a canonicalization-version bump; attestations are verified under their own version’s frozen, per-release-published seed table (§5.3) — so seed evolution is non-retroactive to historical signatures. - Ordered chain, not a set. M commits to the ordered chain (a det-CBOR array of per-hop commitments, equivalently a Merkle root), so reorder/drop/duplicate is detected.
- Key authority is self-contained — verification is decidable from the Bundle alone. Because
OMIR is an at-rest format with a closed-world invariant, a
verifiedresult MUST be a pure function of the bundle bytes. M therefore inlines the full public key (JWK) used and MAY inline a short Agent-signed key-authorization assertion bindingkeyIdatat; verification = signature valid over M + key-binding valid +atwithin validity. Out-of-band DID/HTTPS key resolution is demoted to an OPTIONAL Federated-level enhancement (it belongs with Theme I, which already gates open-world). AsameAsmerge MUST NOT rewrite a signed Agent id. Lawful key rotation/revocation does not retroactively flip historical attestations. - Redaction is cryptographically honest — and the security claim is stated honestly. For each
redactable field M contains a
commitment = H(field-bytes ∥ per-field-salt)(stored in the Phase-4redactionCommitments[]) and MUST NOT contain the plaintext. Redaction deletes only the plaintext and leaves thecommitment, so the signature still verifies; absent plaintext + intact signature-covered commitment = “lawfully redacted, chain intact,” NOT tampered. Because the salt is retained in the at-rest document, it provides ZERO brute-force resistance for low-entropy fields — the honest property is unlinkability (commitments are non-correlatable across resources/documents), not brute-force resistance. For genuine brute-force resistance a producer MUST use a high-entropy per-field nonce that is itself deleted at redaction time (accepting that such fields can never be re-verified against plaintext) or explicit out-of-band salt escrow; the chosen model is recorded. The impossible “tombstone bearing the same hash” phrasing is dropped: the tombstone bears aredacted:truemarker; the siblingcommitmentcarries the hash. - Erasure dominates the trust pillar — the conflict is acknowledged, not legislated away. When
a data subject’s erasure is legally compelled and the subject is the signing Agent, retaining
id + key-commitment + an attributing signature can itself be unlawful personal data. So: Agent
identity enters M via the erasable salted commitment (above), not a raw resolvable ref;
erasing the Agent’s PII deletes the plaintext while the commitment + signature survive, and
closed-world is preserved by a placeholder tombstone carrying that commitment. The earlier
“MUST NOT sign over a placeholder Agent reference” is relaxed to “MUST NOT sign over a
placeholder that carries no key-commitment” — i.e. sign over the commitment, not the
resolvable id. When erasure nonetheless forces signature invalidation, the verifier reports the
record as
unverifiable-by-erasure(a defined sub-outcome), NOTtampered. The hard privacy invariant is not subordinated to the trust pillar. - Encoding cannot silently break a signature — and the rule lives in the SPEC, not only the
validator. A producer emitting
.omirbfor a bundle containing anyprovenance.attestationMUST use the CBOR sub-profile, never bincode (bincode is not self-describing and cannot reconstruct M). This narrows an existing release-published allowance, so it is an R2 normative edit to encodings.md §“.omirb binary profile” (encodings.md:84-101) under §3.1/§5.2: amend the “bincode permitted as an internal sub-profile” sentence (encodings.md:90) to “…permitted EXCEPT when the bundle carries anyprovenance.attestation, in which case the CBOR sub-profile is REQUIRED.” The validatorCheck::Attestationthen enforces a rule the spec actually states. An attestation whose canonical form cannot be reconstructed is reportedunverifiable— a third outcome distinct fromverified/tampered. A cross-verifier known-answer test vector (canonical det-CBOR bytes + digest over the worked example) ships as a DoD artifact, gated on the P2 second verifier.
- Signing input is a canonical typed map M defined by a closed, version-tagged INCLUSION
allow-list, enumerated per signable resource type. For MemoryRecord, M includes
-
The PROV-O lift is a SEPARATE opt-in context that does NOT silently compose under
@vocab. The basecontext.jsonldsets@vocab: https://omir.io/ns#(line 4) andresourceType → @type, so unmapped chain terms would otherwise mint accidentalomir:role/omir:atpredicates and the chain hops (which carry noresourceType) have nothing for@typeto attach to. Thehttps://omir.io/spec/R2/prov-context.jsonldtherefore: (a) re-declares theprovenanceterm, dropping the inherited@type:@id(context.jsonld:144) and declaring it an embedded node with typed interiorchain/agent/derivedFrom/activityterms; (b) sets@vocab: nullwithin the chain/attestation sub-context (or gives every chain/attestation property an explicit@id) so unmapped terms are DROPPED, not coerced toomir:; (c) attaches PROV to the bearing resource via explicit PROV relations whose subject is theomir:MemoryRecordnode (the resource keeps its singleomir:@typeand gains PROV edges) rather than giving it a conflicting second@typeofprov:Entity. The disjointness assertion is narrowed: “OMIR Entity-as-subject-matter is NOTprov:Entity; any OMIR resource appearing as a provenance derivation operand lifts toprov:Entityfor the derivation graph” (publish that, not a blanketowl:disjointWiththat would make awasDerivedFrom Episodetriple ill-formed or the graph inconsistent). A normative omir-role → prov: predicate table accompanies the chain. DoD: a worked RDF-output test vector proving two implementers produce identical triples under base-only and base+prov, with no spurious coercion artifacts. -
Redaction mechanics (Phase 4) are concrete. A redacted resource retains
resourceType/idand all reference targets (graph stays valid per CR-5), setscontentto a defined sentinel ("[redacted]"+redacted:truein the governance block) while its signedcommitmentsibling is preserved, and MUST NOT be removed while anything references it. Whole-resource deletion uses a distinct tombstone (a present, conformant resource of the right type). A redaction round-trip example ships with the phase. -
Vendor-neutral by construction. The worked example uses placeholder agents (
Agent/agent-a) and a placeholder issuer. Any Veld-specific signing convention lives in a vendor extension underveld.dev, never in the normative core example.
Overlook — the sequence (lens 1)
| Phase | Themes | Release · OMM | Breaking? | Unlocks |
|---|---|---|---|---|
| 1a / 1b | A vocabularies | R2 · OMM-1 | Union = type-change; R2, RFC-gated. Vocab registry (move 1) ships R1.x-additive; 1b binding-engine off the critical path | Every non-coding domain can name concepts |
| 2 | B Agent + C identity | R2 · OMM-0/0 | New resource + Reference widening; R2, RFC-gated, non-invalidating to R1 bundles. C inert single-party → OMM-0 | Agents to attribute to; cross-store merge (needs 2nd store) |
| 3 | E PROV (E1-chain on path; attestation parallel) | R2 · E1-chain OMM-1 / attestation OMM-0 | New optional fields (R2 line) | Verifiable trust across vendors (at P2) |
| 4 | F + D governance, modality | R1.x where additive¹ | New optional fields | Regulated & multimodal adoption |
| 5 | G J K + H, I | R1.x additive / R2+ structural¹ | I’s CR-5 carve-out → R2+; G/K/H/F/D and J’s typed-channel + deprecation markers additive → R1.x¹ | Structured, multilingual, federated memory |
¹ Genuinely §5.1-additive themes ship as R1.x increments as they land (K, G, H, F, D, the
non-Reference C fields only — identifier[]/aliases[], never authoredBy/owner/
Agent-sameAs — and J’s typed channel + deprecation markers, since §6.4 makes a deprecation
marker a warn-only additive change). The R2 boundary is reserved strictly for the actually
non-additive items: the CodeableConcept union (type change), the Agent resource + Reference
widening (§5.2/§3.1), the eventual R3 REMOVAL of J’s deprecated string maps, and I’s CR-5
carve-out. Holding a purely-additive optional field for a major release works against the
cheap-adoption invariant. Caveat from P1: “R1.x-additive” means “additive to the data model and
deprecation-policy-clean,” NOT “forward-readable by an already-shipped same-major closed-schema
validator” — closed schemas reject undeclared new core fields, so intra-major novelty rides
extension[] until a minor schema bump admits it.
Overlook — dependencies (lens 2)
A (vocabularies) ──► everything (domains can finally name their own concepts)
A ──► J (typed values reuse CodeableConcept via the shared TypedValue def; NOT via Extension)
A ──► E1-chain (chain over prior resources; needs no Agent)
B (Agent) ──► E2 (agent-attribution hops attribute memory to Agents) ──► (only) the agent case
B (Reference widening) ──► I (the ExternalReference $def extends, never relaxes, the widened Reference)
C (identifier/sameAs) ──► I (input to how external refs are expressed; not the gate)
I (Federated CR-5 carve-out) ──► attested cross-store E2 (a real Agent or an external ref)
D, F ──► E (close the attestation inclusion allow-list; F's classification enters M)
E (attestation, redaction-aware) ──► F (signed retention/consent; redaction preserves the commitment)
P2 (a second implementer / verifier) ──► OMM-2 anywhere; ──► attestation interop & E maturity at all
The critical path is A → E1-chain, a pure additive structure reviewable by a second party; the
attestation subsystem and E2 are a parallel track gated on P2 (cross-verification is their only
test). C runs in parallel after A per the directive, but its value (cross-system merge) and I’s
mechanism are a matched pair that also need P2. Maturity floor rule: a dependent feature’s OMM
is min(grade(its dependencies), grade(its own mechanism), gate(P2 where interop is the test)) —
while Agent is OMM-0, E2 and any agent-signing attestation are at most OMM-0; J caps at A’s grade;
I and C at single-party OMM-0 until P2. I’s true gate is the CR-5 carve-out RFC + the
ExternalReference $def, not C.
Breakers — adversarial stress-test (3 passes)
Pass 1 — semantic collisions & compatibility.
- PROV
Entityvs OMIREntity, the base context, and term collisions. Mitigation: distinct chain field names (derivedFrom/hopCredibility, never the globally-boundfrom/credibility); a separate opt-in PROV context that re-declaresprovenance, suppresses@vocabinside the subtree, attaches PROV via explicit relations on theomir:node (not a second@type), and publishes the narrowed disjointness (“subject-matter Entity ≠ prov:Entity; derivation operands DO lift to prov:Entity”). The published context + RDF golden-file vector is what makes the lift deterministic. - CodeableConcept softens validation. Mitigation: the widened branch is the existing enum
(not free
type:string), and the seed/object mapping is made structurally bijective by excluding the field’s own seed-system URL from the object branch — so a seed value is expressible only as the bare string at the Core schema layer, with no Phase-1b/honor-system dependency. Phase 1b adds the machine-readable binding construct + net-new validator subsystem for closed-vocabulary enforcement. - Vocabulary fragmentation — including the field that actually fragments. Mitigation:
WG-published seed code systems (release-governed) + a public registry (add = additive PR; retire =
RFC) + a mandatory
textfallback.relationTypeis included in A — the one already-open, highest-fragmentation field — so the namespace discipline lands where collisions actually happen. Domain vocabularies live under non-omir.io URLs.
Pass 2 — trust & governance hard parts.
- Attestation brittle under re-serialization / across encodings. Mitigation: one canonical
form
SHA-256(det-CBOR(M))over a versioned inclusion allow-list, JSON signers build identical M, numbers normalized to shortest-round-trip decimal pre-CBOR (floats forbidden), CodeableConcept folded bijectively, dtype-pinned binary forms barred from signed commitments (and embeddings excluded by default), forbid bincode under attestation (encodings.md edit + validator enforcement), ship a KAT vector — gated on P2. - Retention vs immutable provenance; redaction vs signatures; erasure of a signing Agent.
Mitigation: signed subset holds salted per-field commitments, not plaintext (with an
honest unlinkability-not-brute-force claim and an erasable-nonce option for real resistance);
redaction deletes only plaintext; erasure dominates — Agent identity enters via an erasable
commitment so a compelled deletion yields
unverifiable-by-erasure, nevertampered, and never a dangling ref. - Key authority across rotation/merge/redaction — and the closed-world invariant. Mitigation:
inline the key + key-binding into M so
verifiedis decidable from the bundle bytes alone; demote out-of-band DID/HTTPS resolution to Federated; bind to signing-timeat; never rewrite a signed Agent id on merge. - Classification across a trust boundary. Mitigation: classification is declarative; enforcement is consumer policy; a profile may require the core governance block (and the legacy pii-class extension is deprecated with a dual-accept window + value-shape mapping).
Pass 3 — adoption & process risks.
- The whole plan is gated on a second implementer — and nobody is tasked with producing one. Mitigation: P2 makes the gate a named, dated deliverable, not a deferred “later.” Milestone zero is reframed: a NON-Veld party consumes a published minimal Bundle unchanged and publishes a conformance statement (Conformance §“Declaring conformance”). A is the cheapest additive spec change, sequenced first to open the namespace (move 1, additive, day-one). The independence test (P0a) — technical/economic, recusal-bounded — prevents “Veld twice.” Co-design C and E with the first external adopter, named on the RFC record.
- Backwards compatibility is not a slogan. Adding any new core field is a new-release change
because R1/R2 schemas are closed (
additionalProperties:false). Honest guarantee: “R<n> bundles stay valid forever” + “R<n+1> readers accept R<n>,” not “older readers tolerate newer.” A published R1 reader rejects any R2 bundle at the envelope (E300) by design. Until a field is promoted, data ridesextension[]under a non-omir.io URL. - Scope creep / 80-20 violation — measured by the SPEC SURFACE a second implementer must read, not
the required-field count. Mitigation: the Adopter floor (below) defines a normatively-labeled
Core-R2 mandatory-to-understand tier (the four+Agent core resources + the CodeableConcept
text-fallback + ignore-unknown rules) and an Optional-capability tier (B-semantics, D, E, F, G,
H, I, J, K) a Core-R2 consumer MAY ignore wholesale without reading their specs. WG effort is
tied to the gate: no phase past Phase 2 begins schema work until a candidate second implementer
has cleared the Phase-1a floor (P2). “Adopt all” stays the destination without spending the
cheap-adoption budget before the gate is met.
Extension.valueJsonstays arbitrary JSON. OMM is graded per resource TYPE; new volatile fields on a type do not inherit or drag the type’s grade — but the field-level signal must be machine-readable, not buried in prose: a new field on a mature type either ridesextension[]until it earns promotion or carries anx-omir-maturityannotation the validator surfaces as an INFO finding when a bundle uses a below-type-grade field (prosedescriptionis governed as editorial §5.1 and would let stability claims escape the OMM ballot gate — so it is not the vehicle). - Velocity vs honesty. Mitigation: the OMM rule — no OMM-2 without ≥2 independent implementations — is the brake, real only once P0 reconciles the three conflicting definitions and re-grades the four overclaimed R1 types. A falling trigger keeps the gate visible: any feature holding OMM-1 across a full release cycle (or fixed calendar window) with no recorded independent implementation MUST be flagged “OMM-1 (single-party; no independent implementation as of <date>)” and becomes a candidate for §4.2 TSC review — so a standard stuck single-party forever is not indistinguishable in the grade table from a healthy one.
Adopter conformance floor
So a second implementer can scope the work, the Core-R2 floor is explicit, split into a mandatory-to-understand tier and an optional-capability tier.
Mandatory to understand (the whole of Core-R2 reading): the five core resources (MemoryRecord,
Entity, Relationship, Episode, Agent), the CodeableConcept union + text fallback, and the
ignore-unknown rules. A Core-R2 consumer:
- MUST parse the CodeableConcept union and read
text; MAY ignoresystem/codeit does not know. - MUST ignore the semantics of unknown Agents, chain hops, governance blocks, embeddings, typed attributes, and i18n variants without rejecting (extending “ignore unknown extensions” to new optional core blocks).
- MUST satisfy reference integrity as a document property for Bundles it AUTHORS: a
Bundle a Core-R2 implementation authors MUST have a closed chain (every
chain[].agent/chain[].derivedFromresolves in-Bundle), exactly as CR-5 is unconditional. When RELAYING a Bundle it did not author, it MUST pass the chain through unmodified (lossless pass-through) and MUST NOT introduce a new dangling ref — a trust-agnostic relay is never forced to mint placeholder Agents into a chain it cannot interpret, and the Phase-3 “producer MUST emit a closed chain” wording means author, not relay.
Optional capability (may be ignored wholesale, specs unread): attestation verification, federation references, Agent/B semantics beyond ignore-and-preserve, D modality, F governance enforcement, G/H/J/K. A Core-R2 implementation is NOT required to verify attestations, resolve federation references, or implement any profile it does not claim.
A Core-R2 producer MUST emit the backward-compatible (bare-string) form where it has no coded
value, and MUST place implementation-specific data in extension[] under a non-omir.io URL.
Definition of done (per phase)
A phase reaches its stated target maturity when, for each theme it lands: the schema and the
version-aware reference validator support it — including that every new ResourceType/id
reference field (and every closed-world bare-id field, new and pre-existing) is in the
registry-driven reference walk and exercised by a dangling-reference invalid example in
examples/invalid/. The non-regression clause is split by the phase that introduces the field,
because a field cannot have a fixture before it exists:
- Phase 2 DoD (the walker-refactor guard): a dangling-
parentIdfixture AND a danglingRelationship.sourceEpisodefixture MUST be added and MUST still be rejected, asserted invalidator/tests/conformance.rsbefore the registry-walker refactor merges (parentId is the lib.rs:219-230 special case the refactor risks dropping; sourceEpisode was the silently-omitted pre-existing field). - Phase 3 DoD: dangling
chain[].derivedFromandchain[].agentfixtures, plus a wrong-shapechain[].agent(e.g. resolving toEpisode/x) fixture proving the broadened set-membership E201 still rejects a non-Agent in the Agent slot.
For A: a forward-compat fixture PAIR, each with a single unambiguous expected code —
(i) an R2-declared bundle with object-form CodeableConcept that the R1 validator rejects
with E300 at the envelope (proves “R1 readers reject R2 by design”); and (ii) an
R1-declared bundle that nonetheless contains an object-form CodeableConcept, which the R1
schema set rejects with E120 at per-entry dispatch (proves the legacy enum stays hard-validated).
Plus the @context Turtle golden-file (the source-split audit) and the intention-filtering
object-form fixture. For 1b: profile-constraint checking enforces code-system membership with
passing/failing examples. For E: the cross-verifier known-answer attestation vector, the
bincode-under-attestation failing example, the default-vs-omitted and 9-vs-9.0 digest-equality
KATs, and the RDF golden-file — the interop DoD items gated on P2. At least one profile
exercises each theme; docs + examples exist; an OMM grade is assigned honestly under the
reconciled, re-graded P0 ladder.
The independent-implementation requirement applies to claiming OMM-2+, not to completing a phase — but P2 binds at least one phase to a real second party. OMM-0/1 = the reference implementation alone; OMM-2 = ≥2 independent (per P0a). Phases 1–5 may ship at OMM-0/1 on the reference implementation alone, so the standard ships before a second implementer materializes — directly serving “cheap early adoption matters more than feature count.” But Phase 1a MUST NOT be declared done until a candidate independent implementer (named on the RFC/ballot record) has either (a) consumed an R2 CodeableConcept-union bundle and round-tripped it, or (b) recorded a public statement of intent — so the one existential invariant is a gate, not an owner-less “later.” Process preconditions (P-1, P0, P0a, P1) must land before the phases they gate — no phase may claim OMM-2 before the OMM reconciliation, no R2 traffic is testable before the validator is version-aware, and no ballot is callable before the founding TSC is seated.
Process
None of this is unilateral. Every change here is a candidate, to be proposed as an RFC,
debated by the Working Group, and balloted (see CONTRIBUTING.md
and GOVERNANCE.md).
New resources and fields enter at OMM-0/1 and earn maturity through independent
implementation — the same honesty rule the rest of the spec lives by. The fastest way to
move any of these forward is the project’s stated existential need: a second implementer
whose requirements turn one of these themes from a hypothesis into a ballot.