Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Overview

OMIROpen Memory Interoperability Resources (pronounced “OH-meer”) — is an open, vendor-neutral data format for portable AI agent and cognitive memory. This document describes OMIR R1, the first release.

OMIR is modeled, deliberately and openly, on HL7 FHIR (Fast Healthcare Interoperability Resources). Where FHIR made clinical data portable across hospital systems, OMIR makes memory portable across agent systems. The mapping is direct: OpenFast, MemoryHealthcare, and “Interoperability Resources” is kept verbatim.

What OMIR is

OMIR is an at-rest data format — a document/file standard. It defines how a unit of agent memory looks when it is written to disk, exported, backed up, or handed from one system to another.

OMIR is:

  • A file format. The canonical serialization is JSON / JSON-LD with the .omir extension; a compact binary profile uses .omirb.
  • Vendor-neutral. No single implementation owns the format. Veld convenes the working group as the reference implementation, but the spec and schemas are free (see Conformance and the governance notes below).
  • Resource-oriented. Borrowing FHIR’s model, everything in OMIR is a Resource.

OMIR is not:

  • Not a wire protocol. OMIR does not define how memory is transported, queried live, or synchronized between running agents. It describes the bytes at rest.
  • Not a product. OMIR is a standard. Implementations are products; the format is not.
  • Not a database. OMIR says nothing about indexing, vector search, or storage engines. Those are implementation concerns.

Where OMIR sits: MCP, A2A, and OMIR

Two transport standards already exist for agent systems, both now stewarded by the Linux Foundation:

  • MCP (Model Context Protocol, originally Anthropic) — connects agents to tools and context sources.
  • A2A (Agent-to-Agent, originally Google) — lets agents talk to each other.

Both are transport concerns: they move data between live endpoints. Neither defines what memory is once it has been written down.

MCP and A2A transport memory. OMIR is the memory at rest.

OMIR is complementary, never competing. An agent can serve memory over MCP, hand it to a peer over A2A, and persist or export it as an .omir Bundle. The transport moves the bytes; OMIR defines the bytes. A clean separation of “in motion” from “at rest” is exactly the separation FHIR drew between its RESTful API (motion) and its Resources (rest).

The resource model

Every piece of data in OMIR is a Resource. R1 defines five resource types:

ResourcePurposeOMM
MemoryRecordThe atomic unit of memory: a remembered experience, plan, prompt, or learning.4
EntityA named thing extracted from memory — person, place, concept, technology, skill.3
RelationshipA directed, weighted (Hebbian) edge between two entities.3
EpisodeA bounded experience — the raw event the other resources are derived from.3
BundleThe container. A Bundle is the .omir document.

The maturity column refers to the OMIR Maturity Model (OMM) — see Design Principles. It is surfaced per resource in meta.maturity.

References

Resources do not nest each other arbitrarily; they link by typed reference. A reference is an object carrying a single string of the form ResourceType/id:

{ "ref": "Entity/john" }

The reference target’s ResourceType MUST be one of MemoryRecord, Entity, Relationship, or Episode, and the id MUST match an id of that type. References resolve within the Bundle: a MemoryRecord carrying entityRefs: [{ "ref": "Entity/john" }] is satisfied by an Entity with id: "john" in the same Bundle’s entry array. Reference integrity is a conformance rule (Conformance).

This is how the graph is expressed at rest: a Relationship points from one Entity to another; an Episode lists the entityRefs it produced; a MemoryRecord links the entities it mentions. No pointers, no foreign keys — just typed string references.

The 80/20 rule

OMIR does not try to model every field every memory system has ever invented. It follows FHIR’s 80/20 rule: the core of each resource captures the ~80% of fields that ~80% of implementations actually need — content, timestamps, importance, confidence, decay, provenance, references. The remaining long tail of implementation-specific data does not bloat the core.

Instead, every resource carries a typed extension[] array: a structured, namespaced escape hatch for proprietary or experimental data. Veld’s 20-signal retrieval scores, external isotropy/closure dimensions, and name embeddings all ride in extensions — never in the core. A consumer that does not understand an extension MUST ignore it and still process the resource. See Extensions.

Where an implementation needs to constrain the core (require certain fields, forbid others, mandate specific extensions), it publishes a Profile. See Profiles.

Heritage

OMIR’s core faithfully carries the cognitive-memory features proven in its reference implementation, Veld — Agentic Memory:

  • Calibrated Bayesian confidence — a Beta(α, β) posterior plus a point estimate, not a bare float.
  • Multi-time-scale decay and anchoring — “better forgetting”: records decay on a half-life unless anchored.
  • Tiered memoryworking → session → longterm → archive.
  • Hebbian relationship strength — edges strengthen with co-activation, decay without use.
  • Temporal invalidationvalidUntil on records, invalidatedAt on relationships.
  • Provenance and source credibility — where a memory came from, and how much to trust the origin.
  • Prospective memoryexperienceType: "intention" for future-directed records.
  • Entity salience — how strongly an entity pulls on retrieval.

These are not optional add-ons bolted onto a flat record; they are the core OMIR resource fields. The format encodes how memory behaves over time, not just what was said.

Governance and licensing

OMIR is stewarded by a vendor-neutral OMIR Working Group in a neutral GitHub organization. Veld convenes the group but does not own the standard. The group operates an open RFC + ballot process, a Technical Steering Committee (TSC), semantic release versioning (R1, R2, …), and a published deprecation policy.

Licensing is deliberately decoupled from any implementation:

  • The specification and schemas are licensed CC-BY-4.0.
  • The reference code (validator, generators) is licensed Apache-2.0.

A standard under a restrictive license is dead on arrival. The OMIR spec is, and will remain, free.

Design Principles

The OMIR design is governed by six principles. The key words MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY in this specification are to be interpreted as described in RFC 2119.

1. Open and neutral

OMIR is an open standard owned by no vendor.

  • The specification and JSON Schemas MUST remain freely licensed (CC-BY-4.0 for the spec, Apache-2.0 for the reference code). A standard behind a restrictive license is dead on arrival.
  • No single implementation’s behavior is normative. Where this spec describes cognitive behavior (decay, Hebbian strengthening, tiering), it describes the data shape that records that behavior, not a required algorithm.
  • The format MUST NOT privilege one vendor’s identifiers, URLs, or schemas in its core. Vendor-specific data lives in extension[] under vendor-controlled URLs.

2. At-rest, not a protocol

OMIR describes memory at rest — bytes on disk, in a backup, in an export, in a hand-off file.

  • OMIR MUST NOT be read as a transport or query protocol. It defines no endpoints, no request/response shapes, no synchronization semantics.
  • Live transport is the job of MCP and A2A. OMIR is complementary: those protocols move memory; OMIR is the memory once it lands. An implementation MAY serve an OMIR Bundle over any transport, but the Bundle’s validity does not depend on how it travelled.

3. 80/20 core plus extensions

Each resource defines a small, stable core and an open extension mechanism.

  • The core of a resource captures the ~80% of fields that ~80% of memory systems need. New core fields are added conservatively and only through the ballot process.
  • Implementation-specific, experimental, or proprietary data MUST be carried in the typed extension[] array under a canonical URL, not by adding properties to the core. Every resource schema sets additionalProperties: false, so undeclared top-level fields are non-conformant by construction.
  • A consumer MUST ignore extensions it does not recognize and MUST still process the resource. Unknown extensions are never an error. See Extensions.

4. Honest maturity via OMM

OMIR grades the stability of each resource type with the OMIR Maturity Model (OMM), an integer 0–5 surfaced in meta.maturity.

These are the canonical OMM levels (identical to GOVERNANCE §4.1), assigned per resource type:

OMMNameCriterion
0DraftNewly proposed; shape may change freely; not safe to depend on.
1Trial Use (early)Implemented in at least one system; minimal field experience; breaking changes expected.
2Trial Use (proven)Exercised in real bundles by at least one implementation; shape settling but not yet cross-validated.
3EstablishedMultiple independent implementations and real-world usage; stabilizing, with conservative change.
4MatureBroad field experience across implementations; changes rare and strictly backward-compatible within a release.
5NormativeStable and authoritative; breaking changes only across releases (deprecation policy).

R1 grades honestly and MUST NOT overclaim:

  • MemoryRecordOMM-4.
  • Entity, Relationship, EpisodeOMM-3.
  • New resources introduced in later releases start lower and earn their level through real, independent implementation.

Maturity is a promise about change, not a quality score. A consumer SHOULD treat lower-OMM resources as more likely to change between releases.

5. Encoding-neutral

OMIR defines a single logical data model with more than one byte-level encoding.

  • The canonical encoding is JSON / JSON-LD (.omir), chosen for ubiquity and human-readability.
  • A compact binary profile (.omirb, CBOR/bincode) exists for edge and robotics deployments where size and parse cost matter.
  • The two encodings MUST be lossless round-trips of the same logical model. A resource’s meaning MUST NOT depend on its encoding. See Encodings.

The file extensions .omir and .omirb are the canonical, collision-free identifiers for the format. Implementations MUST NOT use .mf or .mif for OMIR documents; those extensions are heavily collided and are not part of this standard.

6. Forgetting is first-class

Most data formats assume data is permanent. Memory is not. OMIR treats forgetting as core, not as an afterthought.

  • Every MemoryRecord MAY carry a decay block: a half-life, last-access time, access count, and an anchored flag. Anchored records resist decay.
  • Relationship strength is Hebbian: it rises with co-activation and falls without use. The strength field records the current synaptic weight.
  • Temporal invalidation is explicit: validUntil supersedes a record after an instant; invalidatedAt retires a relationship.
  • tier (working → session → longterm → archive) records where a memory sits in the consolidation hierarchy.

These fields let a faithful implementation reconstruct better forgetting from an at-rest document — decay curves, anchors, and tier transitions survive export. OMIR does not mandate a forgetting algorithm; it standardizes the state that algorithm reads and writes.

Memory Semantics

The resource pages define what fields a record carries. This page defines what those fields mean over time — the cognitive model OMIR encodes. It is the answer to a fair question about the field tables: “why is confidence two numbers and a third? what is a half-life doing in a data format? what makes an edge’s strength go up?”

OMIR’s guiding rule, restated from Design Principles §6:

OMIR standardizes the state a memory algorithm reads and writes — not the algorithm.

Two faithful implementations may decay, consolidate, and re-rank memory with completely different code. What they must agree on is the meaning of the state in the file. The mechanisms below are drawn from the reference implementation (Veld); the spec records the state, and an implementation is free to reproduce the behavior however it likes.

1. Confidence as calibrated belief

MemoryRecord.confidence is a Confidence object — { alpha, beta, calibrated } — not a bare float, because an agent needs to distinguish “90% sure, having checked many times” from “90% sure, having seen this once.” A single number cannot.

OMIR models belief as a Beta(α, β) posterior:

  • alpha accumulates confirming evidence (the memory proved helpful/correct), plus a prior.
  • beta accumulates disconfirming evidence (the memory proved misleading/wrong), plus a prior.
  • calibrated is the point estimate a consumer should act on. The natural estimate is the posterior mean, α / (α + β), but a faithful producer damps it toward 0.5 when evidence is thin — at one or two observations the prior should dominate, so an unproven memory does not masquerade as a confident one. As α + β grows, calibrated approaches the raw mean. R1 does not mandate the exact shrinkage, so calibrated is a producer-asserted point estimate: two faithful producers MAY emit a different calibrated for the same (α, β). The α/β pair is the portable, reproducible quantity; a consumer that needs a reproducible or cross-producer-comparable estimate SHOULD recompute it from α/β under its own rule rather than rely on calibrated being identical across producers (see Theory & Scope).

A consumer that only reads calibrated behaves correctly within a single producer’s records; the α/β pair is there for consumers that want to keep updating belief as new feedback arrives. This is why confidence is a distribution rather than a bare float — it carries its own evidence count, so belief can be revised rather than merely averaged. One caveat on merging: adding two stores’ α/β yields a coherent posterior only when the two evidence streams are independent. When both stores derived their belief from the same upstream source — common when memories are exported and re-imported — summing the counts double-counts the shared evidence and overstates confidence. A consumer that merges confidence across stores SHOULD account for shared provenance (e.g. via provenance.externalId) rather than blindly add counts.

2. Forgetting as a curve

MemoryRecord.decay ({ halfLifeHours, lastAccess, accessCount, anchored }) records a memory’s forgetting state so that “better forgetting” survives export. The intent:

  • A record’s retrievability falls over time on a half-life — recent, frequently accessed memories stay sharp; stale ones fade. lastAccess and accessCount are the inputs a decay function reads; each access pushes retrievability back up.
  • Decay is naturally multi-time-scale: fast in the first hours/days (filtering noise), then much slower for what survives — empirically closer to a power law than a single exponential. OMIR does not mandate the curve; it records the parameters (halfLifeHours) and the access signal a curve consumes.
  • anchored: true marks a memory that resists decay — a pinned fact, a user-stated preference, a safety constraint. Anchored memories have a floor below which retrievability does not fall.

The spec stores forgetting state, not a forgetting algorithm, precisely so a robot on a power-cycle budget and a cloud assistant can both reconstruct sensible decay from the same fields.

3. Tiering and consolidation

MemoryRecord.tier (working → session → longterm → archive) records where a memory sits in a consolidation hierarchy — the at-rest analogue of working vs. long-term memory:

  • working — active, in-focus, short-lived; the smallest, hottest set.
  • session — bounded to a task/conversation; indexed for fast recall.
  • longterm — consolidated knowledge, retrieved by semantic cue.
  • archive — cold, near-permanent, batch-retrieved.

Promotion is driven by age × importance × access: a memory accessed repeatedly, or marked important, or linked to long-term knowledge, migrates inward; an untouched low-importance memory drifts outward and eventually compresses. OMIR records the tier a record currently occupies; the promotion policy is the implementation’s.

4. Hebbian relationship strength

Relationship.strength is a synaptic weight, not a static label. The cognitive model is Hebbian — cells that fire together wire together:

  • When two entities are co-activated (retrieved or mentioned together), the edge between them strengthens.
  • An edge that is not used decays toward zero, the same “better forgetting” applied to structure rather than content.
  • validAt records when the relationship was last observed to hold; invalidatedAt retires an edge that has been contradicted without deleting it, so the graph keeps its history rather than silently rewriting it.

Heavier machinery a faithful engine may run on top of this — long-term potentiation, asymmetric forward/backward strengths, consolidation tiers on the edge itself — is implementation state and rides in extension[], not the core. The core carries the one number every graph engine agrees on: current strength.

5. Salience

Entity.salience is how strongly an entity pulls on retrieval — its gravitational mass in the memory space. A high-salience entity is more likely to be surfaced and to drag related memories up with it via spreading activation. Salience rises with frequency (mentionCount), recency (lastSeenAt), whether the entity is a proper noun (properNoun), and explicit user importance. As with decay, OMIR stores the salience value and its inputs; how an engine computes and uses it is its own business.

6. The temporal model

OMIR is bi-temporal in spirit: it separates when something happened from when it was recorded.

  • eventTime — when the described event actually occurred. It may precede createdAt (you can record a memory of last week today).
  • createdAt — encoding time: when the record was written.
  • validUntil (on MemoryRecord) and invalidatedAt (on Relationship) express temporal invalidation — a fact superseded after an instant, or an edge that stopped holding — so consumers can filter stale knowledge rather than trust it forever.

Entry order is not significant; everything resolves by ResourceType/id. A consumer must never infer meaning from position (Conformance). A richer, fully interval-based bitemporal model is a candidate generalization — see Toward a Global Standard §H.

7. Provenance and source credibility

MemoryRecord.provenance ({ source, sourceType, credibility, externalId }) answers where a memory came from and how much to trust the origin. credibility (a [0,1] UnitInterval) lets a consumer weight a memory by the trustworthiness of its source — a user statement, a verified document, and an inferred guess are not equally reliable, and a retrieval ranker should be able to say so. externalId (system:id, e.g. github:pr-123, linear:SHO-39) keeps a memory traceable back to the artifact that produced it. A multi-hop, signable provenance chain is a candidate generalization — see Toward a Global Standard §E.

8. Prospective memory

Most memory is about the past. Prospective memory is about the future: a remembered intention to do something later. OMIR models it with experienceType: "intention" on an otherwise ordinary MemoryRecord — future-directed records live in the same store as retrospective ones, so the same decay, confidence, and provenance machinery applies. A conforming consumer SHOULD keep intentions out of ordinary retrospective recall and surface them when their trigger condition is met (a time, a context); the omir-personal-assistant profile makes this explicit (Profiles).

What OMIR deliberately does not specify

The following are implementation concerns. They are real and important, but they are not part of the at-rest contract, and a conformant document says nothing about them:

  • Retrieval scoring. How memories are ranked at query time (similarity, BM25, graph spreading, cross-encoders, RRF fusion, multi-signal blends) is engine-specific. Score vectors ride in extension[]; they are never core. See the worked example in Extensions.
  • Embeddings. Which model, which dimensionality, which vector — all implementation. R1 carries embeddings only as extensions; a neutral embedding representation is a candidate generalization (Toward a Global Standard §D).
  • Consolidation schedules, replay, “sleep” phases. When and how memory is reorganized is an algorithm, not a state.
  • Storage and indexing. Vector indexes, key-value engines, graph databases — none of it is OMIR’s concern. OMIR is the bytes at rest, not the engine.

This boundary is the whole design: standardize the state so any engine can read another engine’s memory, and leave the behavior free so engines still have something to compete on.

Theory & Scope

Non-normative. This page explains the theory of memory OMIR R1 encodes and states the boundaries of that theory. It adds no conformance requirements beyond those in Conformance; the RFC-2119 keywords below restate consumer guidance already implied by the resource semantics. It is the conceptual companion to Memory Semantics, which defines how individual fields behave over time.

The theory OMIR encodes

OMIR R1 standardizes the state a memory algorithm reads and writes, not the algorithm (Design Principles §6). The state it makes first-class — calibrated belief, half-life decay with anchoring, consolidation tiers, Hebbian edge strength, entity salience, bi-temporal timestamps, prospective intentions, and source credibility — places OMIR in a specific, well-established tradition: the rational analysis of memory, in which a memory’s strength tracks the statistics of the environment (recency and frequency predict future need), realized in cognitive architectures such as ACT-R and complemented by the hippocampal/neocortical picture of consolidation.

Naming the lineage matters because “state, not algorithm” does not make the choice of state neutral. The fields OMIR blesses are an opinionated, defensible theory of what memory is. This page states the boundaries of that theory so they read as deliberate design positions, not as omissions.

Scope: declarative memory

OMIR R1 models declarative memory — the episodic and semantic memory of facts, experiences, and their relationships. In the common taxonomy of agent memory (a working store plus long-term episodic, semantic, and procedural memory), R1 covers the episodic and semantic long-term store, with tier standing in for a working/activation notion.

The following are out of scope in R1 and have no first-class representation:

  • Procedural / skill memory — learned tool-use policies, workflows, and macros.
  • Parametric memory — knowledge held in model weights, adapters (e.g. LoRA), or fine-tunes.
  • Activation / KV-cache memory — transient in-context state below the record level.
  • Priming and other implicit (non-declarative) memory.

Such state MAY be carried opaquely (inside content, or under extension[]), but it is not interpretable by a generic consumer and is not what R1 standardizes. A later release MAY introduce first-class resources for these systems; per Design Principles §4 they would start low on the OMM and earn their level through independent implementation. Until then, “OMIR memory” means declarative memory — stated here so adopters self-select rather than discover the boundary after building an adapter.

State, not dynamics — and the snapshot it implies

Because OMIR records state and not the algorithm that evolves it, every time-dependent value is a snapshot as of its timestamp. A decay block, an Entity.salience, a Relationship.strength, and the retrievability they feed are all functions of time, frozen at the moment of export. R1 defines no procedure for aging them forward.

A consumer SHOULD treat these values as valid as of their associated timestamp (decay.lastAccess, Entity.lastSeenAt, Relationship.validAt, meta.lastUpdated) and MAY recompute them under its own decay / salience / strength model on import. A consumer MUST NOT assume an exported retrievability, salience, or strength is current at read time. This is the price of encoding-neutral, algorithm-free portability: the record survives the hand-off; the dynamics are reconstructed by the importer.

Stored scalars are producer-relative

The normalized scores OMIR carries — importance, Entity.salience, Relationship.strength, provenance.credibility, and confidence.calibrated — are producer-asserted and producer-normalized. They are comparable within one producer’s output, where they share a normalization regime, and are not guaranteed comparable across producers.

R1 defines no cross-producer scale for these fields. Consequently:

  • A consumer MUST NOT assume two producers’ scalars share a scale: one store’s importance: 0.8 need not mean what another store’s 0.8 means.
  • A consumer that ranks or merges records from more than one producer SHOULD renormalize per producer (keyed by meta.source) rather than compare raw values.
  • For belief specifically, prefer the evidence-bearing confidence.alpha / beta over the derived calibrated when comparing or merging across producers (Memory Semantics §1).

This reflects a fact about memory the literature is explicit on: salience and relevance are cue-dependent and emergent at retrieval, not context-free stored properties. OMIR must freeze them into stored scalars to serialize them at all; this section bounds what that freezing does and does not guarantee. A future release MAY add an optional normalizationRef so cross-producer comparability becomes detectable rather than assumed (see Toward a Global Standard).

Records, not reconstructions

Human recall is reconstructive: a trace is rebuilt, and often altered, each time it is retrieved. OMIR deliberately does not model this. A MemoryRecord is a stable, auditable record with an explicit version counter and temporal invalidation (validUntil, Relationship.invalidatedAt), not a trace that reconsolidates on access. Accessing a record updates its retrievability (decay.lastAccess, decay.accessCount); it does not rewrite its content.

This is a deliberate trade: OMIR exchanges reconstructive fidelity for auditability, diffability, and non-confabulation — properties a portable, archivable, hand-it-between-vendors format needs and a reconstructive store cannot offer. It also means OMIR’s “forgetting” is decay of retrievability, never deletion of data; governance-driven deletion and retention are a separate concern, deferred to a future governance vocabulary (see Toward a Global Standard).

These are bounds, not gaps

Each boundary above is a stated position, not an oversight. Several have candidate generalizations already named in Toward a Global Standard — open vocabularies, identity and multi-agent memory, a richer temporal model, modality and neutral embeddings, and a governance / retention vocabulary. They enter, if at all, through the RFC + ballot process at low OMM. R1’s contract is the narrower, honest one: a portable serialization of declarative memory state.

Encodings

OMIR defines one logical data model and more than one byte-level encoding. The two encodings MUST be lossless round-trips of the same model: a resource’s meaning MUST NOT depend on how it is encoded (see Design Principles §5).

EncodingExtensionMedia type (provisional)Use
JSON / JSON-LD.omirapplication/omir+jsonCanonical. Interchange, export, backup, hand-off.
Binary profile.omirbapplication/omir+cborEdge, robotics, high-volume. Compact, fast to parse.

JSON-LD canonical form

The canonical encoding is JSON, validated against the OMIR R1 JSON Schemas (draft 2020-12) published at https://omir.io/spec/R1/schemas/. A conforming .omir document is a Bundle: a JSON object whose entry array carries the resources.

{
  "resourceType": "Bundle",
  "omirVersion": "R1",
  "id": "export-2026-05-30",
  "generatedAt": "2026-05-30T18:00:00Z",
  "source": "veld/0.7.6 (MIF adapter)",
  "entry": [
    {
      "resourceType": "MemoryRecord",
      "id": "m-001",
      "content": "User prefers execution-first responses, minimal hedging.",
      "createdAt": "2026-05-30T17:55:12Z",
      "kind": "learning",
      "tier": "longterm",
      "importance": 0.82,
      "confidence": { "alpha": 9.0, "beta": 1.0, "calibrated": 0.9 },
      "entityRefs": [{ "ref": "Entity/john" }]
    },
    {
      "resourceType": "Entity",
      "id": "john",
      "name": "John",
      "labels": ["person"],
      "salience": 0.7,
      "properNoun": true
    }
  ]
}

JSON-LD compatibility

OMIR is JSON-LD compatible without forcing linked-data tooling on anyone. A Bundle MAY carry an optional @context:

{
  "@context": "https://omir.io/spec/R1/context.jsonld",
  "resourceType": "Bundle",
  "omirVersion": "R1",
  "entry": []
}
  • The @context SHOULD be the canonical URL https://omir.io/spec/R1/context.jsonld, or an object that includes it.
  • Adding @context MUST NOT change the core shape of any resource. A plain-JSON consumer that ignores @context reads exactly the same data as a JSON-LD processor.
  • The @context maps OMIR property names to IRIs so that OMIR Bundles can participate in RDF graphs, SPARQL queries, and linked-data pipelines for implementations that want them. This is opt-in.
  • Episode.source is mapped to its own predicate (omir:episodeSource, via a JSON-LD 1.1 type-scoped context) so the kind of input an episode came from is not conflated with the producer/origin source carried by Bundle, Meta, and Provenance. Broader per-field disambiguation of shared term names arrives with the R2 vocabulary work (see Toward a Global Standard).

Constraints on the canonical form

  • Timestamps (Instant) MUST be RFC 3339 / ISO 8601. UTC is RECOMMENDED.
  • Normalized scores (UnitInterval: importance, salience, strength, credibility, confidence.calibrated) MUST lie in the closed interval [0, 1].
  • Identifiers (id) MUST match ^[A-Za-z0-9._:-]{1,128}$ and MUST be unique within their resourceType inside a Bundle.
  • Resources MUST NOT carry undeclared top-level properties; every R1 resource schema sets additionalProperties: false. Implementation-specific data goes in extension[] (see Extensions).
  • JSON numbers are values, not lexical forms: 9 and 9.0 denote the same number, and R1 does not mandate a canonical numeric spelling. A future binary or attestation profile may pin one (see Toward a Global Standard); within R1 the two are equivalent.

The .omirb binary profile

For edge and robotics deployments — where an .omir JSON document is too large or too slow to parse — OMIR defines a compact binary profile with the .omirb extension.

  • .omirb is a CBOR (RFC 8949) encoding of the same logical model as the canonical JSON, with bincode permitted as an internal sub-profile for tightly-coupled producer/consumer pairs.
  • The binary profile MUST be a lossless round-trip: decoding an .omirb document and re-encoding it as .omir JSON MUST yield a Bundle equivalent to the original (modulo insignificant ordering and whitespace).
  • Field names, enums, and reference strings are preserved; the binary profile is a re-encoding, not a re-modeling. There are no binary-only fields and no JSON-only fields.
  • The binary profile is OPTIONAL. An implementation that reads and writes only .omir JSON is fully conformant. An implementation that emits .omirb MUST be able to emit the equivalent .omir on request, so that the canonical form is always reachable.

Choosing an encoding

  • Use .omir for interchange: exports, backups, hand-offs between vendors, human inspection, version control.
  • Use .omirb for transport-adjacent persistence on constrained devices: a robot writing memory to flash over a Zenoh link, an embedded agent with a tight parse budget. The omir-robotics profile (Profiles) is built around this case.

Extensions

OMIR’s core deliberately captures only the ~80% of fields that ~80% of memory systems need (see Design Principles §3). The remaining long tail — vendor-specific scores, experimental signals, model-specific metadata — is carried in the typed extension[] array. This is OMIR’s escape hatch: it lets implementations round-trip their proprietary data through a standard format without breaking core conformance and without every other consumer needing to understand it.

The extension mechanism

Every R1 resource (MemoryRecord, Entity, Relationship, Episode) carries an optional extension[] array. Each element is an Extension object:

FieldTypeRequiredMeaning
urlURIyesCanonical URL that defines this extension.
valueStringstringnoScalar string value.
valueNumbernumbernoScalar numeric value.
valueBooleanbooleannoScalar boolean value.
valueJsonany JSONnoArbitrary structured value.

Rules:

  • An Extension MUST carry a url. The url is the extension’s identity: it tells a consumer what this is and where the definition lives.
  • An Extension SHOULD carry exactly one value* field. Use valueJson when the payload is structured (an object or array); use the scalar value* fields for simple values.
  • A consumer MUST ignore any extension whose url it does not recognize, and MUST still process the resource. Unknown extensions are never an error. This is the rule that makes the format forward-compatible: a producer can add new extensions and old consumers keep working.
  • Extensions MUST NOT be used to override or contradict a core field. They add data; they do not replace it.

The extension URL registry

An extension url is a stable, dereferenceable identifier for an extension definition. OMIR adopts a registry model (as FHIR does):

  • Standard extensions defined by the OMIR Working Group live under the OMIR namespace, e.g. https://omir.io/spec/R1/extension/<name>.
  • Vendor extensions live under a URL the vendor controls, e.g. https://veld.dev/omir/ext/<name>. A vendor MUST NOT mint extensions under the omir.io namespace.
  • The URL SHOULD resolve to a human- and machine-readable definition (name, description, value type, and the resources it applies to).
  • The Working Group maintains a public index of standard extension URLs. Vendor extensions MAY be listed there for discoverability but are not owned by the WG.

This keeps the core small and stable while letting the ecosystem innovate at the edges: two implementations can each carry rich proprietary data, exchange Bundles, and lose nothing — each ignores the other’s extensions and reads the shared core.

Worked example: Veld’s 20-signal retrieval scores

Veld’s retrieval pipeline computes a 20-signal score vector per memory (recency, arousal, source credibility, graph strength, cross-encoder blend, entity match, and so on). None of that belongs in the OMIR core — it is a Veld implementation detail. It rides in an extension instead.

{
  "resourceType": "MemoryRecord",
  "id": "m-042",
  "content": "Switched the cross-encoder to track HF main and pre-warm on startup.",
  "createdAt": "2026-05-30T16:10:00Z",
  "kind": "learning",
  "importance": 0.74,
  "confidence": { "alpha": 7, "beta": 2, "calibrated": 0.78 },
  "extension": [
    {
      "url": "https://veld.dev/omir/ext/scoring-signals",
      "valueJson": {
        "schema": "veld-20-signal",
        "version": "0.7.6",
        "signals": {
          "recency": 0.91,
          "arousal": 0.40,
          "sourceCredibility": 0.85,
          "graphStrength": 0.62,
          "crossEncoder": 0.71,
          "entityMatch": 0.55,
          "tagMatch": 0.33,
          "episodeCoherence": 0.48,
          "activationLevel": 0.66
        }
      }
    },
    {
      "url": "https://veld.dev/omir/ext/external-dimensions",
      "valueJson": {
        "density": 0.58,
        "coherence": 0.72,
        "closure": 0.41,
        "confidence": 0.79,
        "isotropy": 0.63
      }
    }
  ]
}

What this buys each party:

  • Veld round-trips its full retrieval state through .omir and reconstructs it on import — nothing is lost in export.
  • A generic consumer (another agent, a viewer, a different memory engine) ignores both extension URLs it does not recognize and processes the record using only the core fields (content, importance, confidence, …). It still works.
  • A second Veld-aware tool recognizes https://veld.dev/omir/ext/scoring-signals, pulls the signal vector out of valueJson, and uses it.

This is the 80/20 rule in action: the shared 80% is interoperable by everyone; the proprietary 20% travels safely alongside it without becoming everyone’s problem.

Profiles

The OMIR core is intentionally permissive: almost every field is optional so that any memory system can map onto it. That generality is a problem for any particular domain, where a consumer needs guarantees — “every record will have provenance,” “every entity will carry an embedding,” “decay state will always be present.”

A Profile closes that gap. A profile is a published constraint on the OMIR core:

A profile = a constrained subset of the core + a set of required extensions.

Specifically, a profile MAY:

  • mark otherwise-optional core fields as required;
  • forbid core fields or enum values that are meaningless in its domain;
  • require specific extensions (by url) to be present;
  • tighten value ranges or cardinalities within what the base schema allows.

A profile MUST NOT:

  • add new top-level core fields (use extensions);
  • relax a base-schema requirement (a profile only ever narrows);
  • contradict the base schema (a profiled resource is always also a valid base resource).

A resource declares the profiles it claims via meta.profile, an array of canonical profile URLs. A Bundle is profile-conformant when every resource that claims a profile satisfies that profile’s constraints (see Conformance).

R1 ships three reference profiles.

omir-coding-agent

URL: https://omir.io/spec/R1/profile/omir-coding-agent

For coding agents (Claude Code, Copilot agents, Cursor) that remember decisions, patterns, edits, and project context.

Constraints:

  • MemoryRecord.provenance is REQUIRED, and provenance.sourceType SHOULD be one of conversation, document, command, or observation.
  • MemoryRecord.kind is REQUIRED (one of memory, plan, prompt, learning).
  • For records whose experienceType is code_edit, file_access, or command, provenance.externalId SHOULD be present (e.g. github:pr-123, linear:SHO-39) for traceability back to the artifact.
  • Entity.labels SHOULD include technology, project, or skill where applicable, so a project graph is reconstructable.

Required extensions: none mandated; tool-call telemetry, if carried, SHOULD use a vendor extension URL.

omir-robotics (edge / Zenoh)

URL: https://omir.io/spec/R1/profile/omir-robotics

For embodied agents and robots persisting memory on constrained edge hardware, often over a Zenoh/ROS2 transport.

Constraints:

  • The .omirb binary encoding (Encodings) is RECOMMENDED for persistence under this profile; producers MUST still be able to emit .omir JSON on request.
  • Episode.source SHOULD be observation or event (robots ingest sensor and event streams, not chat messages).
  • MemoryRecord.eventTime is REQUIRED — physical events are timestamped at occurrence, distinct from ingestion time.
  • A spatial extension is REQUIRED on location-bearing records and entities, under a canonical URL (e.g. https://omir.io/spec/R1/extension/spatial-pose) carrying a pose/frame payload in valueJson. Spatial coordinates are not core in R1.
  • decay.halfLifeHours SHOULD be present so on-device forgetting survives power cycles.

omir-personal-assistant

URL: https://omir.io/spec/R1/profile/omir-personal-assistant

For personal-assistant agents that hold long-lived facts and preferences about a single user, and must handle PII responsibly.

Constraints:

  • MemoryRecord.confidence is REQUIRED — a personal assistant must know how sure it is before acting on a remembered fact.
  • MemoryRecord.validUntil SHOULD be set on facts known to be time-bounded (a current address, an employer), so stale facts are detectable.
  • Prospective memory is supported: records with experienceType: "intention" carry future-directed reminders and MUST be filtered out of ordinary recall by a conforming consumer.
  • A PII-classification extension is REQUIRED on records and entities that may carry personal data, under a canonical URL (e.g. https://omir.io/spec/R1/extension/pii-class), so downstream handling can honor it.
  • Entity.salience SHOULD be present so the assistant can rank what matters to its user.

Defining your own profile

Implementations and domains MAY publish additional profiles. A profile definition MUST:

  1. have a stable canonical URL;
  2. state the base resource(s) it constrains;
  3. enumerate its added requirements, forbidden fields/values, and required extension URLs;
  4. guarantee that any resource conforming to the profile is also a valid base R1 resource.

Profiles are versioned with the release (R1, R2, …) and follow the same deprecation policy as the core.

Conformance

This section defines what it means for a document, and for an implementation, to conform to OMIR R1. The key words MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY are to be interpreted as described in RFC 2119.

Conformance levels

OMIR defines two levels of conformance.

Core conformance

A document is Core-conformant when it is a valid Bundle whose every resource validates against its R1 JSON Schema and satisfies the structural rules below. Core conformance makes no claim about any domain profile.

Profiled conformance

A document is Profiled-conformant to profile P when it is Core-conformant and every resource that claims P in meta.profile satisfies all of P’s additional constraints (Profiles). Profiled conformance is always in addition to, never instead of, core conformance.

Document rules (MUST)

A Core-conformant document MUST satisfy the following rules. Each carries a stable identifier (CR-1 … CR-8) that other documents cite; the identifier is stable even if the list is later renumbered.

  1. CR-1 — Be a Bundle. The top-level object’s resourceType is Bundle and omirVersion is R1. The entry array is present.
  2. CR-2 — Validate against the schemas. Every entry validates against its resource schema (draft 2020-12) at https://omir.io/spec/R1/schemas/.
  3. CR-3 — Carry required fields per resource. At minimum:
    • MemoryRecordresourceType, id, content, createdAt.
    • EntityresourceType, id, name.
    • RelationshipresourceType, id, from, to, relationType.
    • EpisoderesourceType, id, content, createdAt.
  4. CR-4 — Use unique ids per type. Each id matches ^[A-Za-z0-9._:-]{1,128}$ and is unique within its resourceType inside the Bundle.
  5. CR-5 — Satisfy reference integrity. Every typed reference of the form ResourceType/id — in entityRefs, Relationship.from / .to / .sourceEpisode, MemoryRecord.parentId (within MemoryRecord) — MUST resolve to a resource of that type and id present in the same Bundle’s entry array. A dangling reference is non-conformant.
  6. CR-6 — Carry no undeclared core fields. Every R1 resource schema sets additionalProperties: false. Implementation-specific data MUST be carried in extension[] (see Extensions), not as new top-level properties.
  7. CR-7 — Keep scores in range. Every UnitInterval field (importance, salience, strength, provenance.credibility, confidence.calibrated) lies in [0, 1].
  8. CR-8 — Use RFC 3339 timestamps. Every Instant is a valid RFC 3339 / ISO 8601 date-time.

Producer / consumer rules

Producers

A conforming producer (an implementation that emits OMIR):

  • MUST emit Core-conformant Bundles.
  • MUST place implementation-specific data in extension[] under a URL it controls, never under the omir.io namespace.
  • MUST be able to emit the canonical .omir JSON form, even if it primarily emits .omirb (Encodings).
  • SHOULD populate meta.source and meta.maturity so consumers can reason about provenance and stability.
  • SHOULD NOT emit dangling references; if a referenced resource is intentionally out of scope, the reference SHOULD be omitted rather than left dangling.

Consumers

A conforming consumer (an implementation that reads OMIR):

  • MUST accept any Core-conformant Bundle.
  • MUST ignore unknown extensions and still process the resource. Encountering an unrecognized extension url MUST NOT cause rejection.
  • MUST ignore unknown meta.profile URLs it does not implement, while still reading the core fields.
  • SHOULD preserve extensions it does not understand when re-exporting, so data survives a read-modify-write round trip (“lossless pass-through”).
  • MUST NOT infer meaning from the order of entries; references resolve by ResourceType/id, not position.

Declaring conformance

An implementation declares its conformance by publishing a short conformance statement that lists:

  1. the OMIR release it targets (R1);
  2. its role(s): producer, consumer, or both;
  3. the conformance level: Core, and any Profiles it implements (by URL);
  4. for .omirb support, whether it produces, consumes, or both;
  5. the canonical extension URLs it emits.

A producer SHOULD also stamp meta.source on the resources it emits (e.g. "veld/0.7.6") so a Bundle is self-describing.

Self-test

Conformance is verifiable, not just asserted. An implementation SHOULD pass the OMIR Working Group’s reference validator (Apache-2.0) against:

  • its emitted Bundles (producer conformance), and
  • the published R1 example Bundles (consumer conformance).

The validator checks schema validity, reference integrity, id uniqueness, score ranges, timestamp formats, and — when a profile URL is claimed — that profile’s added constraints.

“Powered by OMIR” badge

An implementation that publishes a conformance statement and passes the reference validator for the role(s) it claims MAY display the “Powered by OMIR” badge.

  • The badge asserts Core conformance at minimum. An implementation MAY annotate the badge with the profiles it additionally satisfies (e.g. “Powered by OMIR — omir-coding-agent”).
  • The badge MUST NOT be displayed by an implementation that fails the reference validator for its claimed role, or that claims a profile it does not satisfy.
  • The OMIR Working Group stewards the badge and its usage guidelines. Misrepresentation of conformance is grounds for the WG to request the badge’s removal.

MemoryRecord

Generated artifact. This page is generated from schemas/MemoryRecord.schema.json and schemas/common.schema.json. Do not hand-edit; regenerate from the schema and the field tables will stay authoritative.

The atomic unit of agent memory: a remembered experience, plan, prompt, or learning.

Maturity: OMM-4

Purpose

A MemoryRecord is the smallest retrievable item of cognitive state — one remembered thing, with the context needed to score, decay, and re-surface it. One core record type carries several lifecycle classes (memory, plan, prompt, learning) so that retrieval stays unified across them.

See Memory Semantics for how confidence, decay, tier, and provenance behave over time.

Fields

FieldTypeCard.Description
resourceType"MemoryRecord" (const)1..1Discriminator. MUST be MemoryRecord.
idId (string)1..1Resource-local identifier, unique among MemoryRecord resources within a Bundle. Pattern ^[A-Za-z0-9._:-]{1,128}$.
contentstring1..1WHAT — the primary human-readable content of the memory.
createdAtInstant (date-time)1..1Encoding time: when the record was written.
metaMeta0..1Metadata envelope (omirVersion, profile[], source, createdAt, lastUpdated, maturity).
kindenum memory | plan | prompt | learning0..1Record lifecycle class. Default memory.
experienceTypeenum (14 values, see below)0..1Finer-grained nature of the experience. intention denotes prospective (future-directed) memory.
tierenum working | session | longterm | archive0..1Position in the memory hierarchy. Default working. Promotion is driven by age × importance × access.
eventTimeInstant (date-time)0..1WHEN the described event actually happened. MAY precede createdAt.
importanceUnitInterval ([0,1])0..1Normalized importance score.
confidenceConfidence0..1Calibrated Bayesian Beta(α, β) posterior plus derived point estimate (calibrated).
decayDecay0..1Forgetting state (halfLifeHours, lastAccess, accessCount, anchored). Anchored records resist decay.
provenanceProvenance0..1Origin and trust (source, sourceType, credibility, externalId).
entityRefsarray of Reference0..*References to Entity resources mentioned by this memory. Enables spreading activation without a graph lookup.
parentIdId (string)0..1Parent MemoryRecord id, for hierarchical knowledge trees.
validUntilInstant (date-time)0..1Temporal invalidation: this record is considered superseded after this instant.
versioninteger ≥ 10..1Record revision counter. Default 1.
extensionarray of Extension0..*Typed, namespaced extensions carrying implementation-specific data without breaking core conformance.

additionalProperties is false: a conformant MemoryRecord carries only the fields above.

experienceType vocabulary

conversation, decision, error, learning, discovery, pattern, context, task, code_edit, file_access, search, command, observation, intention.

Minimal

The required set is resourceType, id, content, createdAt.

{
  "resourceType": "MemoryRecord",
  "id": "mem-001",
  "content": "Position OMIR as the at-rest format MCP transports, not a competing protocol.",
  "createdAt": "2026-05-30T11:42:05Z"
}

Full

{
  "resourceType": "MemoryRecord",
  "id": "mem-positioning",
  "meta": {
    "omirVersion": "R1",
    "profile": ["https://omir.io/spec/R1/profiles/agent-decision"],
    "source": "veld/0.7.6",
    "createdAt": "2026-05-30T11:42:05Z",
    "lastUpdated": "2026-05-30T11:42:05Z",
    "maturity": 4
  },
  "content": "Position OMIR as the at-rest data format that MCP/A2A transport, not a competing protocol.",
  "kind": "learning",
  "experienceType": "decision",
  "tier": "longterm",
  "createdAt": "2026-05-30T11:42:05Z",
  "eventTime": "2026-05-30T11:42:00Z",
  "importance": 0.95,
  "confidence": { "alpha": 9.0, "beta": 1.0, "calibrated": 0.9 },
  "decay": {
    "halfLifeHours": 8760,
    "lastAccess": "2026-05-30T11:42:05Z",
    "accessCount": 1,
    "anchored": true
  },
  "provenance": {
    "source": "design-review",
    "sourceType": "conversation",
    "credibility": 0.92,
    "externalId": "linear:OMIR-1"
  },
  "entityRefs": [
    { "ref": "Entity/omir" },
    { "ref": "Entity/varun" }
  ],
  "parentId": "mem-standardization-thread",
  "validUntil": "2027-05-30T00:00:00Z",
  "version": 1,
  "extension": [
    {
      "url": "https://veld.dev/omir/ext/scoring-signals",
      "valueJson": { "graphStrength": 0.84, "arousal": 0.6, "feedbackMomentum": 0.71 }
    }
  ]
}

References

A MemoryRecord points to other resources by typed Reference (ResourceType/id):

  • entityRefs[]Entity — entities mentioned by this memory, used for spreading activation.

It also carries an intra-type link:

  • parentIdMemoryRecord — parent record in a hierarchical knowledge tree (a local Id, not a Reference).

No MemoryRecord field references Relationship, Episode, or Bundle directly.

Entity

Generated artifact. This page is generated from schemas/Entity.schema.json and schemas/common.schema.json. Do not hand-edit; regenerate from the schema and the field tables will stay authoritative.

A named thing extracted from memory — a person, place, organization, concept, technology, product, skill, or discriminative keyword.

Maturity: OMM-3

Purpose

An Entity is a canonicalized node in the knowledge graph. It is the target of MemoryRecord.entityRefs and Episode.entityRefs, and the endpoint type for every Relationship. Salience governs how strongly an entity pulls on retrieval.

See Memory Semantics §5 for how salience is computed and used in retrieval.

Fields

FieldTypeCard.Description
resourceType"Entity" (const)1..1Discriminator. MUST be Entity.
idId (string)1..1Resource-local identifier, unique among Entity resources within a Bundle. Pattern ^[A-Za-z0-9._:-]{1,128}$.
namestring1..1Canonical surface form, e.g. "John", "Paris", "Rust programming".
metaMeta0..1Metadata envelope (omirVersion, profile[], source, createdAt, lastUpdated, maturity).
labelsarray of enum (12 values, see below)0..*One or more type labels. Open at the edges via other + extension.
summarystring0..1Context summary built from surrounding relationships.
mentionCountinteger ≥ 00..1Number of times this entity has been mentioned.
salienceUnitInterval ([0,1])0..1How important this entity is. Higher salience = stronger gravitational pull in retrieval.
properNounboolean0..1Proper nouns carry higher base salience than common nouns.
attributesobject (string → string)0..1Type-specific key/value attributes.
createdAtInstant (date-time)0..1When the entity was first recorded.
lastSeenAtInstant (date-time)0..1When the entity was most recently observed.
extensionarray of Extension0..*Implementation-specific data (e.g. name embeddings — model + vector — ride here; embeddings are NOT core in R1).

additionalProperties is false: a conformant Entity carries only the fields above.

labels vocabulary

person, organization, location, technology, concept, event, date, product, skill, keyword, project, other.

Minimal

The required set is resourceType, id, name.

{
  "resourceType": "Entity",
  "id": "varun",
  "name": "Varun"
}

Full

{
  "resourceType": "Entity",
  "id": "omir",
  "meta": {
    "omirVersion": "R1",
    "source": "veld/0.7.6",
    "maturity": 3
  },
  "name": "OMIR",
  "labels": ["project", "concept"],
  "summary": "Open Memory Interoperability Resources — a FHIR-style standard for portable agent memory.",
  "mentionCount": 34,
  "salience": 0.97,
  "properNoun": true,
  "attributes": { "kind": "standard", "release": "R1" },
  "createdAt": "2026-05-30T10:00:00Z",
  "lastSeenAt": "2026-05-30T11:42:00Z",
  "extension": [
    {
      "url": "https://veld.dev/omir/ext/name-embedding",
      "valueJson": { "model": "minilm-384", "dim": 384 }
    }
  ]
}

References

An Entity holds no outbound references to other OMIR resources — it is a leaf node by design. The graph edges around it are expressed elsewhere:

  • Relationship.from / Relationship.toEntity — entities are the endpoints of every relationship.
  • MemoryRecord.entityRefs[]Entity and Episode.entityRefs[]Entity — records and episodes point to entities.

Relationship

Generated artifact. This page is generated from schemas/Relationship.schema.json and schemas/common.schema.json. Do not hand-edit; regenerate from the schema and the field tables will stay authoritative.

A directed, weighted edge between two Entity resources. Strength is dynamic (Hebbian): it increases with co-activation and decays without use.

Maturity: OMM-3

Purpose

A Relationship is a typed graph edge linking two entities. Its strength carries Hebbian synaptic weight, and validAt / invalidatedAt give it a temporal lifecycle so superseded edges can be retained rather than deleted.

See Memory Semantics §4 for how strength rises with co-activation and decays with disuse.

Fields

FieldTypeCard.Description
resourceType"Relationship" (const)1..1Discriminator. MUST be Relationship.
idId (string)1..1Resource-local identifier, unique among Relationship resources within a Bundle. Pattern ^[A-Za-z0-9._:-]{1,128}$.
fromReference1..1Source Entity reference, e.g. { "ref": "Entity/john" }.
toReference1..1Target Entity reference.
relationTypestring (open vocabulary)1..1Edge type. Common values below. Implementations MAY mint new lowercase snake_case values.
metaMeta0..1Metadata envelope (omirVersion, profile[], source, createdAt, lastUpdated, maturity).
strengthUnitInterval ([0,1])0..1Synaptic weight. Dynamic under Hebbian plasticity.
contextstring0..1Free-text note describing the relationship.
createdAtInstant (date-time)0..1When the edge was first recorded.
validAtInstant (date-time)0..1When this relationship was last observed to hold (temporal tracking).
invalidatedAtInstant (date-time)0..1When this relationship was invalidated, if ever (temporal edge invalidation).
sourceEpisodeReference0..1Episode reference that produced this relationship.
extensionarray of Extension0..*Typed, namespaced extensions carrying implementation-specific data without breaking core conformance.

additionalProperties is false: a conformant Relationship carries only the fields above.

Reference constraint. A Reference.ref matches ^(MemoryRecord|Entity|Relationship|Episode)/…. By convention from and to point to Entity resources and sourceEpisode points to an Episode; the schema’s Reference shape does not itself narrow the target type, so producers MUST honor these conventions.

relationType common values

works_with, works_at, employed_by, part_of, contains, owned_by, located_in, located_at, related_to. The vocabulary is open.

Minimal

The required set is resourceType, id, from, to, relationType.

{
  "resourceType": "Relationship",
  "id": "rel-varun-omir",
  "from": { "ref": "Entity/varun" },
  "to": { "ref": "Entity/omir" },
  "relationType": "maintains"
}

Full

{
  "resourceType": "Relationship",
  "id": "rel-varun-omir",
  "meta": {
    "omirVersion": "R1",
    "source": "veld/0.7.6",
    "maturity": 3
  },
  "from": { "ref": "Entity/varun" },
  "to": { "ref": "Entity/omir" },
  "relationType": "maintains",
  "strength": 0.88,
  "context": "Convenes the OMIR working group.",
  "createdAt": "2026-05-30T10:05:00Z",
  "validAt": "2026-05-30T11:42:00Z",
  "invalidatedAt": null,
  "sourceEpisode": { "ref": "Episode/ep-launch-chat" },
  "extension": [
    {
      "url": "https://veld.dev/omir/ext/hebbian",
      "valueJson": { "coActivations": 17, "lastStrengthened": "2026-05-30T11:42:00Z" }
    }
  ]
}

The null for invalidatedAt above is illustrative of an edge still in force; producers SHOULD simply omit the field when an edge has not been invalidated.

References

A Relationship points to other resources by typed Reference (ResourceType/id):

  • fromEntity — the source endpoint of the edge.
  • toEntity — the target endpoint of the edge.
  • sourceEpisodeEpisode — the episode that produced this relationship.

No Relationship field references MemoryRecord or Bundle.

Episode

Generated artifact. This page is generated from schemas/Episode.schema.json and schemas/common.schema.json. Do not hand-edit; regenerate from the schema and the field tables will stay authoritative.

A bounded experience — the raw event from which MemoryRecords, Entitys, and Relationships are derived. The episodic backbone of the graph.

Maturity: OMM-3

Purpose

An Episode is the raw, time-bounded input — a message, document, event, or observation — that downstream resources are extracted from. It preserves both event time and ingestion time, and lists the entities it produced, so derived records remain traceable to their origin.

See Memory Semantics §6 for the event-time vs. ingestion-time model.

Fields

FieldTypeCard.Description
resourceType"Episode" (const)1..1Discriminator. MUST be Episode.
idId (string)1..1Resource-local identifier, unique among Episode resources within a Bundle. Pattern ^[A-Za-z0-9._:-]{1,128}$.
contentstring1..1The actual experience data.
createdAtInstant (date-time)1..1When this episode was ingested (ingestion time).
metaMeta0..1Metadata envelope (omirVersion, profile[], source, createdAt, lastUpdated, maturity).
namestring0..1Human-readable title for the episode.
sourceenum message | document | event | observation0..1What kind of input produced this episode.
eventTimeInstant (date-time)0..1When the original event occurred (event time).
entityRefsarray of Reference0..*Entities extracted from this episode.
metadataobject (string → string)0..1Free-form string key/value metadata.
extensionarray of Extension0..*Typed, namespaced extensions carrying implementation-specific data without breaking core conformance.

additionalProperties is false: a conformant Episode carries only the fields above.

Note on source. On Episode this is an enum (message | document | event | observation) and is distinct from the free-text source string inside meta/Provenance used on other resources.

Minimal

The required set is resourceType, id, content, createdAt.

{
  "resourceType": "Episode",
  "id": "ep-launch-chat",
  "content": "Varun and the agent agreed OMIR should be positioned as the at-rest format MCP transports, not a competitor to it.",
  "createdAt": "2026-05-30T11:42:03Z"
}

Full

{
  "resourceType": "Episode",
  "id": "ep-launch-chat",
  "meta": {
    "omirVersion": "R1",
    "source": "veld/0.7.6",
    "maturity": 3
  },
  "name": "Standardization discussion",
  "content": "Varun and the agent agreed OMIR should be positioned as the at-rest format MCP transports, not a competitor to it.",
  "source": "message",
  "eventTime": "2026-05-30T11:42:00Z",
  "createdAt": "2026-05-30T11:42:03Z",
  "entityRefs": [
    { "ref": "Entity/varun" },
    { "ref": "Entity/omir" }
  ],
  "metadata": { "channel": "design-review", "thread": "OMIR-positioning" },
  "extension": [
    {
      "url": "https://veld.dev/omir/ext/wavelet-session",
      "valueJson": { "sessionId": "sess-2026-05-30", "segment": 3 }
    }
  ]
}

References

An Episode points to other resources by typed Reference (ResourceType/id):

  • entityRefs[]Entity — the entities extracted from this episode.

Episode is itself referenced by Relationship.sourceEpisode. No Episode field references MemoryRecord, Relationship, or Bundle.

Bundle

Generated artifact. This page is generated from schemas/Bundle.schema.json and schemas/common.schema.json. Do not hand-edit; regenerate from the schema and the field tables will stay authoritative.

A serialized collection of OMIR resources — the on-disk .omir document itself.

Maturity: n/a — Bundle is the container/transport envelope, not a graded memory resource.

Purpose

The Bundle is the container resource: a .omir file is a Bundle. It carries a flat entry[] of core resources whose typed references (ResourceType/id) resolve within the bundle, and it is JSON-LD compatible — an optional @context enables linked-data processing without changing the core shape.

Fields

FieldTypeCard.Description
resourceType"Bundle" (const)1..1Discriminator. MUST be Bundle.
omirVersion"R1" (const)1..1Spec release the bundle conforms to. MUST be R1.
entryarray of resource (oneOf MemoryRecord | Entity | Relationship | Episode)1..*The resources carried by this bundle. Order is not significant; references resolve by ResourceType/id.
@contextstring or object (JSON-LD context)0..1Optional JSON-LD context. SHOULD be https://omir.io/spec/R1/context.jsonld or an object that includes it.
idId (string)0..1Resource-local identifier for the bundle. Pattern ^[A-Za-z0-9._:-]{1,128}$.
generatedAtInstant (date-time)0..1When the bundle was serialized.
sourcestring0..1Producing implementation, e.g. "veld/0.7.6 (MIF adapter)".

additionalProperties is false: a conformant Bundle carries only the fields above. The schema marks required: ["resourceType", "omirVersion", "entry"]; entry is an array, so a valid bundle contains at least one resource.

File profiles. The canonical encoding is JSON / JSON-LD with extension .omir. The compact binary profile (CBOR/bincode) uses extension .omirb and carries the identical resource model.

Minimal

The required set is resourceType, omirVersion, entry.

{
  "resourceType": "Bundle",
  "omirVersion": "R1",
  "entry": [
    {
      "resourceType": "MemoryRecord",
      "id": "mem-001",
      "content": "OMIR is the at-rest format MCP transports.",
      "createdAt": "2026-05-30T11:42:05Z"
    }
  ]
}

Full

{
  "@context": "https://omir.io/spec/R1/context.jsonld",
  "resourceType": "Bundle",
  "omirVersion": "R1",
  "id": "demo-bundle-001",
  "generatedAt": "2026-05-30T12:00:00Z",
  "source": "veld/0.7.6 (MIF adapter)",
  "entry": [
    {
      "resourceType": "Episode",
      "id": "ep-launch-chat",
      "content": "Varun and the agent agreed OMIR should be positioned as the at-rest format MCP transports, not a competitor to it.",
      "source": "message",
      "eventTime": "2026-05-30T11:42:00Z",
      "createdAt": "2026-05-30T11:42:03Z",
      "entityRefs": [
        { "ref": "Entity/varun" },
        { "ref": "Entity/omir" }
      ]
    },
    {
      "resourceType": "Entity",
      "id": "varun",
      "name": "Varun",
      "labels": ["person"],
      "salience": 0.91,
      "properNoun": true
    },
    {
      "resourceType": "Entity",
      "id": "omir",
      "name": "OMIR",
      "labels": ["project", "concept"],
      "salience": 0.97,
      "properNoun": true
    },
    {
      "resourceType": "Relationship",
      "id": "rel-varun-omir",
      "from": { "ref": "Entity/varun" },
      "to": { "ref": "Entity/omir" },
      "relationType": "maintains",
      "strength": 0.88,
      "sourceEpisode": { "ref": "Episode/ep-launch-chat" }
    },
    {
      "resourceType": "MemoryRecord",
      "id": "mem-positioning",
      "content": "Position OMIR as the at-rest data format that MCP/A2A transport, not a competing protocol.",
      "kind": "learning",
      "experienceType": "decision",
      "tier": "longterm",
      "createdAt": "2026-05-30T11:42:05Z",
      "importance": 0.95,
      "confidence": { "alpha": 9.0, "beta": 1.0, "calibrated": 0.9 },
      "entityRefs": [
        { "ref": "Entity/omir" },
        { "ref": "Entity/varun" }
      ]
    }
  ]
}

References

The Bundle does not hold typed Reference fields of its own; it is the resolution scope for everyone else’s. Within a single bundle:

  • entry[] may contain MemoryRecord, Entity, Relationship, and Episode resources.
  • Cross-resource references (MemoryRecord.entityRefs, Episode.entityRefs, Relationship.from / to / sourceEpisode) resolve against the entry[] members by their ResourceType/id.

A Bundle MUST NOT appear inside another Bundle’s entry[] — the oneOf admits only the four core resources (MemoryRecord, Entity, Relationship, Episode).

Toward a Global Standard

Non-normative. This page is a design discussion, not part of the R1 conformance surface. Nothing here changes what makes a Bundle valid (Conformance). Every change proposed below would enter through the RFC + ballot process and start low on the OMIR Maturity Model — at OMM-0/1 — and earn its level through independent implementation.

OMIR R1 is honest about its origin: its schemas were derived from one production memory engine, Veld. That is a strength — the core is grounded in a system that actually ships calibrated confidence, decay, Hebbian edges, and tiering — and a risk. A standard authored by a single implementation is, until proven otherwise, that implementation’s export format with a logo. The work of becoming a global standard is the work of shedding the assumptions that are true only for Veld while keeping the cognitive substance that makes OMIR worth adopting.

This page names those assumptions and proposes the generalizations that would let a robotics stack, a healthcare agent, a multi-tenant assistant platform, and a coding agent all read and write the same .omir files without loss. It is the OMIR equivalent of FHIR’s long migration from “HL7’s resources” to “everyone’s resources.”

What already generalizes (keep it)

Before the critique, the parts of R1 that are not Veld-specific and should be preserved:

  • Resource + typed-reference model. Everything is a Resource; links are ResourceType/id. This is FHIR-proven and domain-neutral.
  • 80/20 core + typed extension[]. The escape hatch is exactly what lets the long tail of proprietary data travel without bloating the core. See Extensions.
  • Honest maturity (OMM). A promise-about-change per resource type. Domain-neutral.
  • Calibrated confidence as a distribution, not a bare float (Memory Semantics).
  • Portable forgetting state — decay, anchoring, tiers — recorded as state, not as a mandated algorithm.
  • Bitemporal-ish timestamps (eventTime vs createdAt) and temporal invalidation.
  • Two lossless encodings and the Profiles mechanism for domain tightening.

The generalizations below are mostly additive — new optional structure and extension points — precisely so they do not break the parts that already work.

Generalization themes

Each theme states what is Veld-specific today, why it limits interchange, and a concrete proposal. The FHIR precedent is cited where one exists, because OMIR is FHIR-modeled and should borrow FHIR’s solved problems rather than reinvent them.

A. Open vocabularies, not closed enums (highest interchange leverage)

Today. MemoryRecord.kind, experienceType, Entity.labels, and Episode.source are closed enums, and they lean toward coding/dev-agent life: code_edit, file_access, command, prompt. A robot cannot say “object grasped”; a healthcare agent cannot say “symptom reported”; a research agent cannot say “hypothesis formed.” relationType is already an open string — good — but it has no way to say which vocabulary a term comes from.

Risk. Closed enums hard-code one domain’s worldview into the core. Every other domain is forced into other + an extension, which means their primary semantics are invisible to a generic consumer — defeating interchange for everyone but Veld-likes.

Proposal. Adopt a FHIR-style CodeableConcept for these fields: a structure carrying an optional system (vocabulary URI), a code, and human text, e.g.

"experienceType": { "system": "https://omir.io/vocab/robotics", "code": "object_grasped", "text": "Picked up the red block" }

R1’s bare-string/enum forms remain valid as the degenerate case (text only). OMIR ships core vocabularies (the current enums, promoted to published code systems) and lets domains register their own. This single change is the difference between “a memory format for coding agents” and “a memory format.”

B. Identity, ownership, and multi-agent memory

Today. Memory is implicitly single-agent. There is no first-class notion of whose memory a record is, who authored it, or how two agents share a memory store. Veld carries optional agent_id/actor_id tags, but they are tags, not a model.

Risk. The 2026 reality is fleets of agents and multi-tenant platforms. Without an ownership/authorship model, a Bundle exchanged between agents cannot answer “can I trust this? who wrote it? am I allowed to read it?” — the questions that matter most when memory crosses a trust boundary.

Proposal. Introduce an Agent (or Principal) resource and optional authoredBy / owner references on records and episodes; let Provenance carry the chain of agents a memory passed through (see Theme E). Access control itself stays a profile/extension concern (it is policy, not data shape), but the identity hooks the policy needs belong in the core. Grade Agent at OMM-0 and let multi-agent stacks prove it.

C. Entity resolution across systems

Today. Entity.id is bundle-local; there is no canonical external identifier, no alias set, and no “this entity is the same as that one.” Within Veld a UUID suffices because there is one graph.

Risk. Interchange is entity resolution. If Acme’s “Alice Smith (employee 123)” and Globex’s “@asmith” cannot be declared the same person, two memory stores can never merge — the whole point of a portable format collapses to per-pair adapters.

Proposal. Add FHIR-style Entity.identifier[] ({ system, value } pairs, e.g. { "system": "mailto", "value": "alice@acme.com" }), an aliases[] list of surface forms, and a sameAs link type so a Bundle can assert cross-system identity. This is the backbone of federation (Theme I).

D. Beyond text: modality and embeddings

Today. content is a single string. Multimodal data and embeddings are extension-only. Veld carries image/audio/video vectors internally, but the at-rest core sees text.

Risk. Agent memory in 2026 is increasingly multimodal (vision, audio, sensor). A text-only core relegates non-text memory to opaque extensions that no generic consumer can interpret.

Proposal. Generalize content to an optional multi-part content modelcontentType (a media type) plus either inline content or a MediaReference ({ uri, contentType, hash }) for out-of-line bytes — and define an algorithm-neutral, efficiency-bearing Embedding that any vendor can emit without mandating a model. Embeddings stay optional (derived, not source) but become interpretable rather than vendor-opaque. The block below is the single canonical definition of the embedding object: it folds in the efficiency-bearing fields (Matryoshka prefixes, sparse codes, a dual-trace role, and reserved drift / VSA spaces). The Efficiency & Information-Bearing Codes page (Theme D reframed from “neutral embeddings” to efficiency-bearing codes) supplies the rationale — watts / inference / canon — for each efficiency field but does not redefine it.

"Embedding": {
  "type": "object",
  "description": "Algorithm-neutral, optional embedding code (derived, not source). Carries a dense, sparse, Matryoshka-nested, or out-of-line code; comparable only within one 'space'. Efficiency rationale: efficiency.md.",
  "properties": {
    "space":  { "type": "string", "description": "Opaque equality key — codes are comparable IFF 'space' strings are byte-equal. Reserved WG spaces: 'omir:temporal-context' (a slow-drift contiguity vector, efficiency.md EP-5a), 'omir:vsa' (a bind/bundle structural hypervector, efficiency.md EP-F)." },
    "model":  { "type": "string", "description": "Producer model id that generated the code." },
    "dims":   { "type": "integer", "minimum": 1, "description": "Full dimensionality; MUST equal a dense 'vector' length and the sparse index space width." },
    "dtype":  { "type": "string", "description": "Element byte form, e.g. 'f32' | 'int8' | 'binary'. Pins the .omirb encoding; excluded from signed images by default (Phase 3)." },
    "vector": { "type": "array", "items": { "type": "number" }, "description": "Dense code. Mutually exclusive with 'sparse'." },
    "sparse": {
      "type": "object",
      "description": "Sparse code for CPU inverted-index search + one-step associative completion (efficiency.md EP-3). Mutually exclusive with 'vector'.",
      "properties": {
        "indices": { "type": "array", "items": { "type": "integer", "minimum": 0 }, "description": "Active dimension indices, strictly ascending." },
        "values":  { "type": "array", "items": { "type": "number" }, "description": "Weights parallel to 'indices' (equal length)." }
      },
      "required": ["indices", "values"],
      "additionalProperties": false
    },
    "ref":    { "$ref": "#/$defs/MediaReference", "description": "Out-of-line code — an offloaded / verbatim payload (efficiency.md EP-2)." },
    "matryoshka": { "type": "boolean", "description": "True if 'vector' is a nested (Matryoshka) code: any prefix whose length is in 'nestedDims' is itself a valid, rankable embedding — coarse-to-fine shortlisting without re-embedding (efficiency.md EP-2)." },
    "nestedDims": { "type": "array", "items": { "type": "integer", "minimum": 1 }, "description": "Ascending valid prefix lengths, e.g. [64,128,256,512,768]. Rank on any listed prefix, re-rank on a longer one; prefixes compare only within one 'space' (efficiency.md EP-2)." },
    "role":   { "enum": ["gist", "verbatim"], "description": "Dual-trace role (efficiency.md EP-2): 'gist' = durable, compact, anchorable, ranked cheaply; 'verbatim' = fast-decaying surface trace, usually offloaded via 'ref' and fetched only for top-k." }
  },
  "additionalProperties": false
}

E. Provenance and trust as a chain

Today. Provenance is a flat { source, sourceType, credibility, externalId }. Veld’s internal model is richer (a relay chain with per-hop credibility and a verified flag); the core flattens it.

Risk. When memory crosses vendors, “where did this come from, through whom, and is it signed?” is a trust-critical question a flat source string cannot answer.

Proposal. Align with W3C PROV: let Provenance carry a chain[] of derivation steps (agent, activity, time, per-hop credibility) and an optional attestation/signature so a consumer can verify a record was not tampered with in transit. Keep the flat form as the common case.

F. Privacy, sensitivity, and retention

Today. Absent. R1 has no notion of data sensitivity, consent, legal basis, retention/expiry policy, or redaction. validUntil is for contradiction, not governance. The omir-personal-assistant profile gestures at a PII-class extension, but it is opt-in and shallow.

Risk. A global memory format will carry personal and regulated data. Without a governance vocabulary, OMIR cannot be adopted where GDPR/CCPA/HIPAA-style obligations apply — which is most of the interesting market.

Proposal. Define an optional governance block (sensitivity classification, legal basis, retention policy / deleteAfter, redaction markers, consent reference). Likely a core extension family with published URLs first, promoted to core once proven. This is the theme most likely to decide whether OMIR is adoptable by enterprises at all.

G. Uncertainty beyond a point estimate

Today. Confidence is Beta(α, β) + a calibrated point — already better than most. But it assumes one uncertainty model and conflates evidence count with belief.

Proposal. Keep Beta as the recommended default; allow a more general uncertainty representation (credible interval, or a named distribution + parameters) and optionally distinguish epistemic (lack of evidence) from aleatoric (inherent variability) uncertainty for agents that reason about it.

H. A complete temporal model

Today. eventTime vs createdAt plus validUntil / validAt / invalidatedAt is a partial bitemporal model, and time is always a precise RFC 3339 instant.

Proposal. Make bitemporality explicit and uniform (a valid-time interval and a transaction-time interval), and support imprecise time (“sometime in 2024”, “before the incident”) via interval/precision-qualified instants. Memory is frequently vague about when; the format should be able to say so.

I. Federation and cross-Bundle references

Today. R1 is deliberately closed-world: every reference must resolve inside the Bundle. That is the right call for R1 (it makes conformance decidable), but it blocks linking a small export to a large shared graph.

Proposal. Define an optional, higher conformance level for resolvable external references (a reference may target a resource in another, addressable Bundle/store), with clear rules so the closed-world guarantee remains the default and the validator can still decide conformance. Federation is how a memory standard scales past one file.

J. Structured knowledge and typed attributes

Today. Entity.attributes is string → string; Episode.metadata likewise. All structure degrades to stringly-typed key/values.

Proposal. Allow typed attribute values (number, boolean, dateTime, quantity-with-unit, CodeableConcept, reference) so a global graph can carry real structured knowledge — a measurement with units, a date, a link — without serializing everything to strings.

K. Language and internationalization

Today. Text is implicitly English; there are no language tags, and the reference pipeline’s tokenization/NER assume English.

Proposal. Add optional language tags (BCP 47) on textual content and entity names, and allow multiple language variants of a name/summary. A global standard cannot assume one language.

A prioritized path

Not all of these are equal. For interchange specifically — the existential goal — the order is clear:

PriorityThemeWhy firstLikely entry
1A. Open vocabulariesUnblocks every non-coding domain at once; small, additive change.R1.x · OMM-1
2C. Entity resolutionInterchange is identity; without it, stores can’t merge.R2 · OMM-1
3F. Privacy / retentionGate to enterprise & regulated adoption.R2 · OMM-0
4B. Identity / multi-agentMatches the 2026 fleet reality; trust across boundaries.R2 · OMM-0
5E. Provenance chain + attestationTrust when memory crosses vendors.R2 · OMM-1
6D. Modality + neutral embeddingsMultimodal memory becomes interpretable, not opaque.R2 · OMM-0
H, G, J, K, IValuable, lower interchange-urgency; fold in as domains demand.R2+

The first item is the highest-leverage and lowest-cost: turning four closed enums into CodeableConcepts, with the current values published as the seed vocabularies, would by itself move OMIR from “a coding-agent memory format” to “a memory format that coding agents use,” at OMM-1, without breaking a single R1 document.

The adoption plan

The prioritization above is a ranking; this is the plan. The Working Group’s stated intent is to adopt all of the proposals, in a deliberate sequence set by dependency and leverage — vocabularies (A) first, identity & resolution (C) next — with W3C PROV (E) as the trust backbone the rest hangs from, and the refinements that fix G, J, and K folded in once the load-bearing pieces land. Every phase enters at low OMM and is gated by independent implementation. “Additive and non-breaking” is a claim with teeth here, not a slogan: every R1 resource schema sets additionalProperties: false (Conformance §Document rules, rule CR-6), so a new core field is rejected by the very R1 schema we promise to keep honoring. We therefore label each change honestly against GOVERNANCE §5.1/§5.2 and route it through the correct release vehicle below.

A note on the §5.2 boundary, stated once and reused. Adding a new, unreferenced $defs member to common.schema.json is purely additive under §5.2 (no existing bundle becomes invalid). It is the widening of an existing field or def to reference it, or the widening of the shared Reference pattern, that is non-additive. This is the principle that lets CodeableConcept (Phase 1) and TypedValue (Phase 5) be added cleanly as defs while the union widening (Phase 1) and the Agent/Reference widening (Phase 2) are correctly breaking. Wherever the plan calls a shared-schema change “breaking” or “additive,” it means this. A corollary the plan applies throughout: any field whose value can only be expressed by widening the shared Reference pattern (authoredBy, owner, an Agent-targeting sameAs) is NOT §5.1-additive and is R2/RFC-gated, full stop — it cannot ride an R1.x increment.

Four preconditions before any phase ships

Each precondition is itself a §3.1 normative RFC (it amends GOVERNANCE/principles/CONTRIBUTING, the common.schema.json shared defs, or process), so each needs a 14-day-minimum comment window and a two-thirds TSC ballot (§3.3) before Phase 1 opens. Bundle P0+P0a into one governance-reconciliation RFC (they are coupled) and ship P1 as a second RFC; P-1 (the founding-TSC bootstrap) is procedurally first because no ballot is callable without a seated TSC.

P-1 — Seat a founding TSC (the true first domino). GOVERNANCE §2.3 targets a 3–7 seat TSC filled “by community nomination and confirmation by the sitting TSC” — but no TSC is seated, and §2.3 has no bootstrap clause for the first one, so the balloting body is self-perpetuating by design and cannot start. “Confirm it before P0” is not a mechanism. P-1 (a GOVERNANCE amendment, in scope) adds a one-time founding-TSC procedure that does not require a sitting TSC: a public call-for-nominations seating an initial 3-seat TSC with at least two non-Veld seats (so Veld holds ≤ one-third from day one and the §2.4 cap holds — at 3 seats, one-third rounded down is zero, so a Veld-majority bootstrap would be unlawful), confirmed by lazy consensus over a fixed comment window rather than by a nonexistent sitting TSC. P-1 is therefore where external participation is first recruited, not an administrative formality — and it couples directly to P2: you cannot seat a neutral TSC without at least one non-Veld party, the same party the second-implementer gate needs.

P0 — Reconcile the OMM ladder and re-grade the published R1 table. The repo carries three inconsistent OMM-2 definitions: GOVERNANCE §4.1 OMM-2 = “Exercised in real bundles by at least one implementation” (GOVERNANCE.md:233); principles.md §4 OMM-2 = “implemented in more than one system” (principles.md:55); and CONTRIBUTING.md §8 collapses OMM-1 and OMM-2 into one band“Trial use — implemented somewhere, but limited field experience” (CONTRIBUTING.md:372-378) — putting “multiple implementations” only at OMM-3. The reconciliation RFC MUST rewrite the ladder identically across all THREE documents (GOVERNANCE §4.1+§4.3, principles.md §4, CONTRIBUTING.md §8, splitting its merged “1–2” row), and update the §3.4 Draft/Trial-Use/Normative band table in the same RFC:

  • OMM-1 = implemented in ≥1 system; field set volatile.
  • OMM-2 = ≥2 independent implementations; shape settling, not yet broad field use.
  • OMM-3 = ≥2 independent implementations plus real-world cross-validated usage/conformance evidence (the discriminator from OMM-2 is settling vs cross-validated field experience).

The reconciliation is monotonic in BOTH directions — it lowers any existing grade the new independence gate no longer supports. Today GOVERNANCE §4.3 grades MemoryRecord at OMM-4 and Entity/Relationship/Episode at OMM-3 with Veld as sole implementer (GOVERNANCE.md:255-258), repeated in principles.md §4 (62-64), CONTRIBUTING.md §8 (382-384), and the README grade table (README.md:48-51). Under the reconciled gate those grades are unsupportable: there is no second independent implementation. The same RFC MUST re-grade the R1 table — demoting MemoryRecord and Entity/Relationship/Episode to OMM-1 (“settled shape in the reference implementation; awaiting a second independent implementation”) across all four artifacts in one ballot. Leaving four overclaimed grades standing while reforming the OMM in the name of honesty is exactly the dishonesty §4.2 forbids and the loudest possible “the OMM is marketing” signal to a prospective second implementer. Until P0 lands, no phase may claim OMM-2, and every OMM target below is read against the reconciled, re-graded ladder. P0 also assigns stable rule anchors CR-1..CR-8 to Conformance §“Document rules (MUST)” (today a positional 1–8 list with no stable IDs), so every cross-reference in this plan and its RFCs survives renumbering.

P0a — Define “independent” so it is recorded, neutral, and not gameable. An implementation counts toward the gate iff (a) it has a separate codebase/maintainer, developed without copying reference-implementation code under any license other than the Apache-2.0 grantre-implementing from the CC-BY spec and using the validator as a conformance oracle explicitly COUNTS as independent (otherwise the licensing invariant that anyone may re-implement from the spec contradicts the gate); (b) it ships under a separate product/funding line with its own users; and (c) it passes the reference validator while PRODUCING the feature (not merely ignoring it). The gate is decoupled from TSC seats and employer: a committed second adopter who joins the TSC is exactly what we want and MUST NOT be disqualified for it. An anti-sock-puppet clause (common control, shared funding, or shared engineering team → counts as one) replaces any unenforceable string-match on “shared employer.” To keep the gate from being decided by the party that benefits from declaring it met: the counts-as-one determination uses the SAME “organization/common control” definition as the §2.4 seat cap (one bright line governs both seat math and implementation counting), the TSC MUST publish written findings against the three criteria on the ballot record, and the determination is taken by a supermajority EXCLUDING any TSC member employed by or contracted to the reference implementer (extending the §2.3 recusal rule — the convening vendor has a material non-interoperability interest in the gate being declared met). The WG validator and any Veld-authored tool never count.

P1 — A versioning contract for the R1→R2 line, with a named validator-rewrite scope. The R1 version pin is enforced at four points today; the R2 schema set and the validator MUST handle each: (i) Bundle.omirVersion const:"R1" is neutralized to a plain string in the validator (schemas.rs:103) — the literal-R1 envelope gate actually lives in Rust at version_presence (lib.rs:390-402 → E300), so making it version-relative is a code change, not a schema const edit; (ii) Meta.omirVersion const:"R1" (common.schema.json:39 → E301); (iii) Meta is additionalProperties:false (common.schema.json:58); (iv) every resource schema AND the Bundle envelope set additionalProperties:false (CR-6; Bundle.schema.json:30). Specify:

  1. R2 Bundles and resources carry omirVersion:"R2"; the R2 common.schema.json (under the R2 namespace) sets the Meta.omirVersion site to const:"R2" (or an enum of accepted majors), leaving R1 schemas frozen. E300/E301 become version-relative, not literal-R1.
  2. A consumer reads the major version and applies that version’s schema set, and MAY reject a newer major it does not implement. Forward-read of unknown core fields is constrained by closed schemas and is stated honestly: because R2.0 resource schemas are additionalProperties:false, an already-shipped R2.0 validator does NOT silently forward-read a new core field introduced by a later R2.x minor — it would hard-reject it (E120), exactly as “R1 readers do not tolerate R2.” Intra-major additivity therefore rides the extension[] lane (always tolerated by ignore-unknown-extensions), not undeclared new top-level properties. Any genuinely new optional core field that we want same-major consumers to skip is shipped only when a minor schema bump relaxes the relevant object to a controlled, pattern- gated extension lane; absent that, it is a new-release item. The earlier blanket “SHOULD ignore unknown newer-minor fields” claim is dropped as unimplementable against closed schemas — this corrects the Overlook table’s “R1.x-additive ⇒ forward-readable” assumption.
  3. Forward-read across majors is a property of R2+ tools about older majors, never of already-shipped R1 tools about newer ones. A published R1 validator WILL reject any R2 bundle at the envelope (E300) by design. The honest compatibility guarantee is exactly “R1 bundles stay valid forever” + “R2 readers accept R1,” not “R1 readers tolerate R2.”
  4. The reference validator becomes version-aware as a named, scoped rewrite, not a config switch: (a) parameterize SchemaFiles, COMMON_ID, and RESOURCE_TYPES (schemas.rs) by declared major version (an R1 set and an R2 set, each with its own common.schema.json $id); (b) build a version-keyed Registry and dispatch in Schemas::build; (c) the validator reads the declared Bundle-level omirVersion at PARSE time, before any structural / reference / version check, and selects the version-keyed registry; E110/E120/E200/E300/E301 then all run against the selected set. (Selecting “before E300” is insufficient: today structural E110/E120 run first (lib.rs:31-132) and E300 last (lib.rs:142), so an R2 bundle would be double-reported as wrong-version AND malformed-against-R1.) An unknown newer major short-circuits to a single E3xx at the envelope and runs NO resource-schema checks; (d) the embedded-schema include_str! mechanism carries both releases’ schemas. DoD: P1 is done only when the same binary validates a published R1 example AND an R2 example, and reports an R2-declared bundle as a single envelope-level E300 under the R1 set.

The phases

Phase 1 — Vocabulary (A). Split into 1a (the cheap adoption surface) and 1b (closed-vocabulary enforcement); 1b does NOT gate 1a and is not on the second-implementer critical path.

Phase 1a, in two decoupled moves so namespace-opening costs nothing on day one. (1) Open the namespace first, additively. Publish the seed code systems and the vocabulary registry under https://omir.io/spec/R2/vocab/<field> as an R1.x-additive documentation + JSON-LD artifact (no schema type change), so other domains can NAME their own concepts immediately with zero bundle becoming R2. This is the real payoff of A-first under the directive — opening the namespace, not paying a major-version envelope break before any consumer benefits. (2) Widen the union when a coded value is demanded. Add a CodeableConcept def ({ system, code, text }) to common.schema.json (purely-additive per the §5.2 note) and widen kind, experienceType, Entity.labels, Episode.source, and Relationship.relationType (the spec’s one already-open vocabulary — CONTRIBUTING §3.10:137-140, GOVERNANCE §5.1:282 — and the single largest real-world fragmentation vector; widening it is the least breaking because its bare branch is already type:string). Widen each closed enum’s string branch to the EXISTING enum, not to type:string: anyOf: [ <the R1 enum>, CodeableConcept ] (for relationType, anyOf: [ {type:string}, CodeableConcept ]). This keeps the legacy closed set hard-validated at Core (a bad kind like "banana" still fails E120). Make the seed/object mapping STRUCTURAL, not prose: in the R2 schema, constrain the CodeableConcept branch of each widened field so its system MUST NOT be that field’s own seed-system URL (not:{properties:{system:{const:<seedURL>}}}), so a seed value is expressible only as the bare string and the union is genuinely bijective at the Core schema layer — no Phase-1b dependency, no honor-system. Move (2) ships as the first R2 change; move (1) does not. Adding or retiring a seed value is governed per the operation, below.

Seed-registry governance, split by direction (so additive growth stays cheap):

  • ADDING a seed value to an existing seed system is §5.1-additive open-vocabulary growth (it matches GOVERNANCE §5.1’s existing “new open-vocabulary values” carve-out): an ordinary PR under Maintainer review, NOT a full RFC. Routing every new concept through a 14-day ballot would make A heavier than today’s open-vocab path — the opposite of cheap adoption.
  • RETIRING, narrowing, or re-meaning a seed value is enum-narrowing: §3.1 RFC + ballot, and (see Phase 3) a canonicalization-version bump under §6 so historical signatures are verified under the frozen seed table of their own version, never retroactively flipped.

Normalization, stated precisely and per-field cardinality:

  • A bare seed string s and the seeded coding {system:<seed URL>, code:s} are semantically equivalent for matching/comparison; the bare-string seed form is the canonical at-rest form. R2 introduces a NEW consumer rule (not a restatement of the existing extensions-scoped SHOULD): an R2 consumer MUST NOT canonicalize between bare-string and object form for seed-coded fields on re-export — lossless pass-through. This is a consumer-side tightening listed in the R2 migration note; equality is a consumer obligation, not a structural check (the core validates union shape only, with the seed-system exclusion above making the shape itself bijective).
  • For fields carrying a schema default (kind, tier): do NOT mutate the R1 default in place (MemoryRecord.kind:18 default:"memory", tier:31 default:"working" live on an OMM-1 type and are read by default-applying codegen/normalizers). The R2 schemas keep default; the canonical at-rest form is omission when a field equals its default (not the explicit value). Phase 3’s canonicalizer normalizes a defaulted field to its omitted form before building the signed map M, so absent and explicit-default sign identically. Removing/relocating a default would change the value seen by default-applying consumers — so this is a documented R2 migration item under “behavior of absent fields” for both kind and tier (“an absent kind MUST be interpreted as memory; an absent tier as working”), not a silent edit.
  • For scalar optional fields with no default (experienceType, Episode.source): absent stays absent and is distinct from {text:s}.
  • For the array field Entity.labels: the legacy "other" + label-extension idiom is DEPRECATED under §6 in favor of a CodeableConcept item. Update context.jsonld labels/term mappings accordingly.
  • A non-seed coded value MUST use the object form; a seed value MUST use the bare string — now enforced structurally (above), so Phase-3 attestation canonicalization collapses the union to one byte sequence.

Companion item (atomic with the widening, NOT deferred to 1b): re-key every behavioral MUST that references a widened enum value. profiles.md:88-90 makes prospective-memory filtering a behavioral MUST keyed on the literal experienceType:"intention". The moment 1a admits the object form, an unmodified profile MUST is silent on experienceType:{system:<seed>, code:"intention"} — a safety-relevant recall gap. The normative re-key (“any experienceType whose canonical form is the seeded intention coding MUST be filtered from ordinary recall”) lands in the SAME RFC as the experienceType widening, with a 1a-DoD fixture proving an object-form intention is still filtered. Only the machine-readable binding engine stays in 1b.

JSON-LD @context audit (a DoD blocker, not an afterthought): widening a field that shares an overloaded @context term silently corrupts the RDF lift of existing bundles. In R1, one term source is bound to a single bare omir:source with no per-field @type (context.jsonld:33), yet four fields use it: Episode.source (being widened), Bundle.source, Meta.source, Provenance.source (all plain strings). A coded Episode.source would lift a {system,code,text} node onto a predicate that elsewhere carries a literal, and system/code/text (unmapped) would fall to accidental @vocab predicates. Before widening any field, split every overloaded term in the R2 context.jsonld: give the widened Episode.source its own term (episodeSource → omir:episodeSource) typed for an embedded coding, add CodeableConcept’s system/code/text with explicit typing, and keep the three string source fields on a separate literal predicate. DoD: a Turtle/N-Quads golden-file test that the R1 minimal bundle and an R2 coded bundle both lift to the intended triples.

Honest status: this is a type change to existing fields, two of which live on MemoryRecord (now OMM-1 under P0). A bare-string producer stays valid; an object value is rejected by an R1 consumer; an R2 bundle is opaque to an unmodified R1 consumer at the envelope (per P1). Domain vocabularies (robotics, healthcare) live under their own non-omir.io URLs; the omir.io namespace is reserved for WG-published seed systems only. Target: R2 · OMM-1. A-first is about opening the namespace (move 1, day-one, additive) so other domains can name their own concepts; the union widening (move 2) is held to R2 and is the cheapest additive spec change — but milestone zero (below) is the literally cheapest adoption.

Phase 1b — closed-vocabulary enforcement (separate work item, off the 1a critical path). Extend profiles.md with a machine-readable terminology-binding artifact (a profile schema, not prose) binding a CodeableConcept field to a code system/value set with a binding strength (required/extensible/preferred), mirroring FHIR. Implement profile-constraint checking in the reference validator — today there is no code path (lib.rs:144 is a bare comment): this is a net-new subsystem (profile loader, meta.profile dispatch, value-set membership check, new finding codes: required → error, extensible/preferred → warning), with passing and failing examples. Re-issue the three R1 reference profiles, splitting by kind:

  • Genuinely REQUIRED enums (omir-coding-agent.kind) → required-strength bindings. The required-strength binding requires the field to be PRESENT — an absent kind relying on the Phase-1a normative default does not satisfy the profile (a profile narrows the base; presence is the thing it guarantees). Cross-referenced from the 1a “default” bullet.
  • SHOULD value fields (omir-coding-agent provenance.sourceType (profiles.md:41), Entity.labels) → preferred/extensible bindings reported as warnings, never errors. Binding strength MUST match the original RFC-2119 keyword; promoting a SHOULD to an enforced required binding is a new-release narrowing, not a re-expression.
  • Behavioral MUSTs are NOT terminology bindings and stay as prose RFC-2119 requirements. Re-issuing a profile MUST preserve every behavioral MUST verbatim (the intention-filtering re-key already landed in 1a); terminology bindings constrain values, not consumer behavior.

Phase 2 — Identity (B: Agent) and Resolution (C). B (Agent) is a prerequisite ONLY for E2 (agent-attribution hops); the headline trust deliverable E1 depends only on A and the existing Provenance and ships without Agent — so Agent’s adoption gate never holds the trust pillar hostage. B: introduce an Agent (Principal) resource and optional authoredBy / owner references. C: add optional Entity.identifier[] ({ system, value }), aliases[], and a sameAs link so two stores can declare cross-system identity and merge. Honest status, partitioned by whether a field touches the Reference pattern:

  • §5.1-additive (R1.x-eligible): Entity.identifier[] and aliases[] (a new non-ref def {system,value} and string array — no Reference widening).
  • §5.2/§3.1-breaking (R2/RFC-gated): any field that can reference AgentauthoredBy, owner, and sameAs insofar as it permits Agent targets — because each forces the common.schema.json#/$defs/Reference.pattern widening. These are NOT R1.x-additive. Adding Agent does not invalidate any existing R1 bundle, but the conformance/bundle prose enumerating “the four core resources” must change in the same PR. Required, atomic work items:
  • (a1) author schemas/Agent.schema.json; (a2) add its $ref to Bundle.entry.items.oneOf (a oneOf of resource-schema $refs, not a regex); (a3) widen the common.schema.json#/$defs/Reference.pattern regex (today ^(MemoryRecord|Entity| Relationship|Episode)/..., common.schema.json:18) to include Agent (the single load-bearing pattern change).
  • (a4) extend the validator’s SchemaFiles struct + from_dir/embedded loaders + by_type map (schemas.rs is a hard-coded 6-file / 4-type shape that will not pick up a new type automatically); update RESOURCE_TYPES (item b) and both “four core resources” message sites: the validator E101 string (lib.rs:124) and validator/README.md:93.
  • (b) add Agent to RESOURCE_TYPES.
  • (c) replace the hand-coded per-field reference walk with a registry-driven walker. The mandate is not “resolve every {ref:"Type/id"} anywhere.” It resolves (i) every Reference-object {ref:...} at a schema-declared Reference-typed fieldNEVER inside extension[].valueJson or any free-form/additionalProperties map (a vendor may legitimately put a {"ref":...}-shaped object in valueJson); and (ii) every closed-world bare-id field enumerated in CR-5 (currently MemoryRecord.parentId, special-cased at lib.rs:219-230). Drive the walk from an explicit registry of (field-path → target-type-SET, ref-shape) entries (a set, because some R2 slots are poly-typed — see (c′)). Seed the registry with the FULL existing closed-world set as ground truth from lib.rs:217-239 and CR-5: entityRefs → {Entity} (on MemoryRecord and Episode), Relationship.from → {Entity}, Relationship.to → {Entity}, Relationship.sourceEpisode → {Episode} (silently dropped in earlier drafts — its E201 expected-type=Episode check must survive the refactor), MemoryRecord.parentId → {MemoryRecord} (bare-id). Adding authoredBy/owner/sameAs/provenance.chain[] then extends the registry rather than relying on shape-sniffing, and the per-field target type is preserved (E201 must still reject, e.g., Agent/x in Relationship.from).
  • (c′) Change E201’s check from ref_type == expected to expected_set.contains(ref_type) (lib.rs:337), since Phase 3’s provenance.chain[].from is poly-typed (any in-Bundle resource) while chain[].agent is mono-typed to {Agent}. The current single-type tuple cannot model this; the set form can.
  • (d) amend Conformance CR-5 by appending the new reference-bearing fields to the existing closed-world list (never re-deriving it from the Reference type); state that the closed-world set is exactly the schema’s Reference-typed fields plus the enumerated bare-id fields, and record the per-field target-type table in the migration note.
  • (e) register Agent in the JSON-LD @context; (f) ship a migration note.

For Phase 5’s federation form, do NOT relax the shared Reference pattern: define the external-reference form as a distinct $def (ExternalReference, its own property, not ref) so the Core closed-world pattern is untouched. Target: R2 · B (Agent): OMM-0 / C: OMM-0 — C is graded OMM-0 “mechanism present, cross-system merge unverifiable single-party”, not OMM-1: identifier[]/aliases/sameAs are inert until either a second store exists or federation (I) lands. C’s identifier fields and I’s federation mechanism are a matched pair whose combined value requires the second implementer; neither is claimed as delivered interop value at single-party.

Phase 3 — Provenance & trust: W3C PROV (E), the pillar. Re-shape Provenance into a PROV-aligned derivation chain with optional, redaction-aware attestation. Detailed below. E splits: E1 (chain over prior resources + per-hop credibility) depends only on A and the existing Provenance; E2 (agent-attribution hops referencing Agent) depends on B. The critical path is A → E1-chain — pure additive optional structure, immediately reviewable by a second party. The heavyweight attestation subsystem (det-CBOR / M / inclusion allow-list / redaction commitments / key resolution) is a PARALLEL track explicitly OFF the critical path, because cross-verification is its only meaningful test and is worthless with one implementer: it cannot reach OMM-1 or interop until the P2 second-verifier milestone is met. This honors “E emphasized” as the trust design pillar without front-loading a multi-quarter cryptographic interop project ahead of any second implementer. The E1/E2 discriminator is normative: a record is E1 only if its chain contains no agent operand and no agent-naming attestation; the moment any hop carries agent or the attestation’s signed subset includes an Agent reference, it is E2 and capped at the Agent OMM floor. E2 MUST NOT advance past OMM-0 until Agent is at least OMM-1 with Phase-2 Reference support. Target: R2 · E1-chain: OMM-1 / attestation & E2: OMM-0 (capped at the Agent floor; non-interoperable until P2).

Phase 4 — Sensitivity & modality (F, D). F: an optional governance block (classification, legal basis, retention / deleteAfter, redaction markers, consent reference) — plus a typed redactionCommitments[] array ({ fieldPath, commitment, salt }) so per-field signed commitments have a declared schema home under additionalProperties:false (without it there is no legal place to store a commitment for an arbitrary redactable field). The redaction mechanism is co-designed with E’s attestation envelope (Phase 3). On introducing the core governance block, deprecate the WG-published, omir.io-namespaced, profile-REQUIRED pii-class extension (https://omir.io/spec/R1/extension/pii-class, profiles.md:91-93) under §6 (deprecated R2, earliest removal R3): because it is required by omir-personal-assistant, during the R2 window that profile MUST accept EITHER the deprecated extension OR the core classification field (so no existing producer is instantly non-conformant), and the R2 migration note MUST carry an explicit field-by-field value-shape mapping (pii-class → core classification). D: a multi-part content model (contentType + inline-or-MediaReference) and an algorithm-neutral Embedding representation { space, model, dims, dtype, vector?|ref? } where space is an opaque equality key (vectors comparable iff space strings are byte-equal; no cross-space comparability claim), dims MUST equal the vector length, and dtype pins the byte form. On-the-wire: a JSON decimal array in .omir, a typed CBOR array in .omirb. Because .omir decimal and .omirb dtype bytes are not byte-reconstructable from each other, embedding vectors are EXCLUDED from signed images by default; if ever committed, the commitment is over a single dtype-independent canonical-decimal (shortest-round-trip) form, never the dtype byte form (see Phase 3). Honest status: F and D are new optional fields; absent the Agent/ref dependency they qualify for R1.x-additive (footnote ¹). Target: R2 · OMM-0 (D grades OMM-0 until two vendors demonstrate a cross-store similarity round-trip).

Phase 5 — Refinements: fix G, J, and K (and fold in H, I).

  • G — uncertainty: keep Beta as default; add a general uncertainty value (credible interval, or named distribution + params) and optionally separate epistemic from aleatoric. New optional fields — R1.x-additive-eligible.
  • J — typed attributes: do not replace the string maps. Add an additive optional sibling channel — Entity.typedAttributes[] / Episode.typedMetadata[], each { key, value } where value is a shared TypedValue def in common.schema.json (number, boolean, dateTime, quantity-with-unit, CodeableConcept, reference; the def-add is purely additive) — and mark attributes/metadata DEPRECATED under §6 (deprecated R2, earliest removal R3), updating context.jsonld. Adding the typed channel AND marking the maps deprecated are both §5.1-additive and R1.x-eligible (per §6.4 validators only warn on deprecated-but-present items); only the eventual R3 REMOVAL of the maps is the breaking boundary. CodeableConcept (from A) is one member of the union; keep the dependency A → J, and J’s typed channel caps at A’s grade. TypedValue is NOT reused in the Extension value slots — constraining them to a closed union would remove arbitrary valueJson (a §5.2 break violating the typed-extension escape-hatch invariant). Extension.valueJson stays maximally permissive; if typed extension values are later wanted, add an optional valueTyped ($ref TypedValue) alongside the existing value* fields (purely additive) — never retro-type valueJson.
  • K — i18n: BCP-47 language tags on textual content and entity names, with optional multilingual variants. New optional fields — R1.x-additive-eligible.
  • Fold in H (full bitemporality + imprecise/interval time — new optional fields) and I (federation). I requires a mechanism before a target: introduce the syntactically distinct ExternalReference $def (its own property, an absolute/identifier-based locator built on Theme C’s identifiers, never the bare ResourceType/id and never a relaxed Core pattern); define a third conformance level “Federated” in Conformance.md (via RFC) that relaxes CR-5 only for explicitly-external references; closed-world remains the Core default and the R1/R2 invariant is preserved for anything not claiming Federated. I is blocked by (1) the widened Reference (Phase 2 B-work), (2) the new ExternalReference $def, and (3) the CR-5 carve-out RFC — C’s identifiers are an input to how external references are expressed, not the gate. Target: R2+ · OMM-0.

Phase 3 in depth — aligning Provenance with W3C PROV

W3C PROV models the world with Entity (a thing), Activity (something that acts over time), and Agent (who/what is responsible), related by a type-constrained verb set whose relations take incompatible operands: wasAttributedTo is Entity→Agent; wasDerivedFrom is Entity→Entity; wasGeneratedBy is Entity→Activity (generated entity is the subject); used is Activity→Entity and wasInformedBy is Activity→Activity — both Activity-subject. OMIR’s flat { source, sourceType, credibility, externalId } is the common case; the proposal lets it expand into a chain when trust matters. The chain’s implicit subject is the resource carrying the provenance block, lifting to prov:Entity. Each step is a discriminated union keyed on role that pins its legal operands. To avoid a fatal @context term collision, the chain does NOT reuse the globally-bound terms from (Relationship operand, context.jsonld:158) or credibility (context.jsonld:91): it uses distinct field names derivedFrom and hopCredibility (JSON-LD term definitions are document-global, not subtree-scoped — the same term cannot mean two things under one merged context).

"provenance": {
  "source": "design-review",
  "credibility": 0.92,
  "aggregateCredibility": { "value": 0.75, "model": "product" },
  "chain": [
    { "role": "wasAttributedTo", "agent": { "ref": "Agent/agent-a" },         "at": "2026-05-30T11:42:00Z", "hopCredibility": 0.95 },
    { "role": "wasDerivedFrom",  "derivedFrom": { "ref": "Episode/ep-launch-chat" }, "at": "2026-05-30T11:42:03Z", "hopCredibility": 0.90 },
    { "role": "wasGeneratedBy",  "activity": { "id": "act-consolidation", "label": "consolidation", "at": "2026-05-31T03:00:00Z" }, "hopCredibility": 0.88 }
  ],
  "attestation": { "alg": "ed25519", "keyId": "did:web:example.org#k1", "key": "<inline JWK>", "agent": "Agent/agent-a", "at": "2026-05-31T03:00:01Z", "signature": "base64url…" }
}
  • Role/operand validation is normative and enforced. With the bearing resource as subject, the Entity-subject verbs are well-formed and enumerated: wasAttributedTo → MUST carry agent, MUST NOT carry derivedFrom; wasDerivedFrom → MUST carry derivedFrom (a prior resource), MUST NOT carry agent; wasGeneratedBy → MUST carry an activity operand, MUST NOT carry agent. A validator rule rejects role/operand mismatches. The Activity-subject verbs used, wasAssociatedWith, and wasInformedBy are ALL reserved (none can take the bearing prov:Entity as subject) until a first-class Activity referent exists. (wasInformedBy is Activity→Activity, the same subject-type problem as the others — it was previously and contradictorily permitted with an inline activity; that permission is dropped, leaving wasGeneratedBy as the only Activity-touching verb in R2.)

  • derivedFrom is poly-typed; agent is mono-typed. wasDerivedFrom.derivedFrom may resolve to any in-Bundle resource type (Episode, MemoryRecord, Entity, Relationship, Agent), so its registry entry carries the full resource-type set and role/operand validation is what narrows it per hop; wasAttributedTo.agent is mono-typed to {Agent}. The widened E201 set-membership check (Phase 2 c′) covers both.

  • Activity is an inline shape, not a literal and not a top-level resource. R2 carries Activity within the hop as activity: { id, label?, at? } (no new Bundle.entry/oneOf type, so the “five core resources” prose is untouched). The PROV-O lift mints a blank-node prov:Activity from it. Activity operands carry no ref and are EXEMPT from closed-world resolution; the walker keys strictly on ref/derivedFrom/agent and skips them. Promoting Activity to a first-class resource is deferred.

  • Closed-world applies to the resource-typed chain refs. chain[].agent and chain[].derivedFrom are closed-world references in R2 and MUST resolve within the Bundle; the registry-driven walker covers them. Chain-ref integrity is a DOCUMENT property: the validator walks it unconditionally. A producer that cannot include an upstream Agent/Episode MUST inline a minimal resource, satisfy the hop with a placeholder tombstone of the right type, or omit the hop — never emit a dangling ref. A placeholder tombstone Agent MUST carry an explicit placeholder:true marker. Cross-Bundle provenance is deferred to Federated, never permitted at Core; until I lands, attested records carry their provenance closure.

  • Per-hop credibility is the only normative trust number; any roll-up is optional and non-normative. chain[].hopCredibility is a UnitInterval, added to the CR-7 [0,1] enumeration and bound to common.schema.json#/$defs/UnitInterval. The legacy provenance.credibility keeps its R1 meaning — it is not silently redefined as a derived product. Any roll-up lives in a new optional provenance.aggregateCredibility { value: UnitInterval, model: "product"|"min"|… }. product is a series-reliability / weakest-chain heuristic, NOT an independent-evidence probability — the “independent-evidence” label is dropped as false. A SHOULD-level check warns when aggregateCredibility.value is inconsistent with model over the present hops (rounding-tolerant).

  • Attestation — explicitly an OMM-0, off-critical-path, parallel track until two independent verifiers demonstrate cross-encoding verification AND the A-normalization and J-migration canonicalization versions are pinned. The validator cannot enforce any of this today (Outcome is a closed {Pass, Fail} enum, report.rs:64-69; Check is {Structural, ReferenceIntegrity, VersionPresence, Profile}, report.rs:45-52; lib.rs has no crypto, no CBOR, and reads only parsed .omir JSON). The attestation track therefore includes an honest validator work item paralleling Phase 1b’s “net-new subsystem”: (a) add an .omirb/CBOR reader with a declared sub-profile tag byte; (b) add a Check::Attestation variant and a per-attestation finding triplet verified / tampered / unverifiable that is SEPARATE from the document-level Pass/Fail — an unverifiable attestation MUST NOT flip core_conformant to false (it is orthogonal to schema conformance); (c) scope the ed25519/key code as OMM-0 and OUT of the Core conformance path, so a Core-R2 consumer that does not verify is still conformant. Until (a)-(c) land, the bincode-under-attestation rule is documentation only and attestation is OMM-0/non-interoperable — no validator-enforced MUST is claimed. The mechanism, when built:

    • Signing input is a canonical typed map M defined by a closed, version-tagged INCLUSION allow-list, enumerated per signable resource type. For MemoryRecord, M includes id, resourceType, a content-commitment, the ordered chain (roles, operand ref/inline-activity-id, per-hop hopCredibility), the asserting Agent ref + keyId + the inline signing key + signing-time at, and classification once Phase 4 lands; M excludes mutable/operational fields (decay.*, meta.lastUpdated, version). Entity/ Relationship subsets, if signable, are enumerated separately (Entity excludes mentionCount, salience, lastSeenAt; Relationship excludes strength, validAt, invalidatedAt).
    • Scope. An attestation signs the bearing resource’s enumerated subset plus a hash-commitment of each referenced resource’s id+resourceType (not their mutable bodies).
    • One canonical byte form. The signature is over SHA-256(det-CBOR(M)) (RFC 8949 §4.2). A JSON (.omir) signer/verifier MUST construct the identical typed map M and the identical det-CBOR(M) bytes; JCS is at most an aid to building M, never an alternate signed form. No field whose .omir and .omirb representations are not provably byte-reconstructable from each other may enter a signed commitment (dtype-pinned binary byte forms MUST NOT) — this clause is also written into encodings.md so encoding-neutrality (Principle 5) and one-canonical-form hold simultaneously.
    • Number normalization is a single pre-CBOR step applying to ALL signed numeric fields (not only UnitInterval/score): every JSON number in M maps to its shortest round-tripping decimal, so 9, 9.0, 9.00 collapse (the shipped corpus already drifts — {alpha:9,beta:1} in encodings.md:36 vs {alpha:9.0,beta:1.0} in minimal-bundle.omir:82) and CBOR floats are forbidden for signed numeric fields. A KAT fixture proves {alpha:9} and {alpha:9.0} sign identically. Defaulted fields are normalized to their omitted form before M is built, so absent and explicit-default sign identically (Phase 1a). A KAT fixture proves two MemoryRecords differing only in presence/absence of a default-valued kind produce the identical digest.
    • CodeableConcept-union fields are pre-canonicalized into M (seed values → bare string — enforced bijective by the Phase-1a seed-system exclusion — non-seed → {system,code} with text dropped) before CBOR. The canonicalization-profile version is carried INSIDE M (not merely on the envelope), and the verifier selects canonicalization rules by the version recorded in M, never by its current profile. A seed retirement (Phase 1a) is a canonicalization-version bump; attestations are verified under their own version’s frozen, per-release-published seed table (§5.3) — so seed evolution is non-retroactive to historical signatures.
    • Ordered chain, not a set. M commits to the ordered chain (a det-CBOR array of per-hop commitments, equivalently a Merkle root), so reorder/drop/duplicate is detected.
    • Key authority is self-contained — verification is decidable from the Bundle alone. Because OMIR is an at-rest format with a closed-world invariant, a verified result MUST be a pure function of the bundle bytes. M therefore inlines the full public key (JWK) used and MAY inline a short Agent-signed key-authorization assertion binding keyId at at; verification = signature valid over M + key-binding valid + at within validity. Out-of-band DID/HTTPS key resolution is demoted to an OPTIONAL Federated-level enhancement (it belongs with Theme I, which already gates open-world). A sameAs merge MUST NOT rewrite a signed Agent id. Lawful key rotation/revocation does not retroactively flip historical attestations.
    • Redaction is cryptographically honest — and the security claim is stated honestly. For each redactable field M contains a commitment = H(field-bytes ∥ per-field-salt) (stored in the Phase-4 redactionCommitments[]) and MUST NOT contain the plaintext. Redaction deletes only the plaintext and leaves the commitment, so the signature still verifies; absent plaintext + intact signature-covered commitment = “lawfully redacted, chain intact,” NOT tampered. Because the salt is retained in the at-rest document, it provides ZERO brute-force resistance for low-entropy fields — the honest property is unlinkability (commitments are non-correlatable across resources/documents), not brute-force resistance. For genuine brute-force resistance a producer MUST use a high-entropy per-field nonce that is itself deleted at redaction time (accepting that such fields can never be re-verified against plaintext) or explicit out-of-band salt escrow; the chosen model is recorded. The impossible “tombstone bearing the same hash” phrasing is dropped: the tombstone bears a redacted:true marker; the sibling commitment carries the hash.
    • Erasure dominates the trust pillar — the conflict is acknowledged, not legislated away. When a data subject’s erasure is legally compelled and the subject is the signing Agent, retaining id + key-commitment + an attributing signature can itself be unlawful personal data. So: Agent identity enters M via the erasable salted commitment (above), not a raw resolvable ref; erasing the Agent’s PII deletes the plaintext while the commitment + signature survive, and closed-world is preserved by a placeholder tombstone carrying that commitment. The earlier “MUST NOT sign over a placeholder Agent reference” is relaxed to “MUST NOT sign over a placeholder that carries no key-commitment” — i.e. sign over the commitment, not the resolvable id. When erasure nonetheless forces signature invalidation, the verifier reports the record as unverifiable-by-erasure (a defined sub-outcome), NOT tampered. The hard privacy invariant is not subordinated to the trust pillar.
    • Encoding cannot silently break a signature — and the rule lives in the SPEC, not only the validator. A producer emitting .omirb for a bundle containing any provenance.attestation MUST use the CBOR sub-profile, never bincode (bincode is not self-describing and cannot reconstruct M). This narrows an existing release-published allowance, so it is an R2 normative edit to encodings.md §“.omirb binary profile” (encodings.md:84-101) under §3.1/§5.2: amend the “bincode permitted as an internal sub-profile” sentence (encodings.md:90) to “…permitted EXCEPT when the bundle carries any provenance.attestation, in which case the CBOR sub-profile is REQUIRED.” The validator Check::Attestation then enforces a rule the spec actually states. An attestation whose canonical form cannot be reconstructed is reported unverifiable — a third outcome distinct from verified/tampered. A cross-verifier known-answer test vector (canonical det-CBOR bytes + digest over the worked example) ships as a DoD artifact, gated on the P2 second verifier.
  • The PROV-O lift is a SEPARATE opt-in context that does NOT silently compose under @vocab. The base context.jsonld sets @vocab: https://omir.io/ns# (line 4) and resourceType → @type, so unmapped chain terms would otherwise mint accidental omir:role/omir:at predicates and the chain hops (which carry no resourceType) have nothing for @type to attach to. The https://omir.io/spec/R2/prov-context.jsonld therefore: (a) re-declares the provenance term, dropping the inherited @type:@id (context.jsonld:144) and declaring it an embedded node with typed interior chain/agent/derivedFrom/activity terms; (b) sets @vocab: null within the chain/attestation sub-context (or gives every chain/attestation property an explicit @id) so unmapped terms are DROPPED, not coerced to omir:; (c) attaches PROV to the bearing resource via explicit PROV relations whose subject is the omir:MemoryRecord node (the resource keeps its single omir: @type and gains PROV edges) rather than giving it a conflicting second @type of prov:Entity. The disjointness assertion is narrowed: “OMIR Entity-as-subject-matter is NOT prov:Entity; any OMIR resource appearing as a provenance derivation operand lifts to prov:Entity for the derivation graph” (publish that, not a blanket owl:disjointWith that would make a wasDerivedFrom Episode triple ill-formed or the graph inconsistent). A normative omir-role → prov: predicate table accompanies the chain. DoD: a worked RDF-output test vector proving two implementers produce identical triples under base-only and base+prov, with no spurious coercion artifacts.

  • Redaction mechanics (Phase 4) are concrete. A redacted resource retains resourceType/id and all reference targets (graph stays valid per CR-5), sets content to a defined sentinel ("[redacted]" + redacted:true in the governance block) while its signed commitment sibling is preserved, and MUST NOT be removed while anything references it. Whole-resource deletion uses a distinct tombstone (a present, conformant resource of the right type). A redaction round-trip example ships with the phase.

  • Vendor-neutral by construction. The worked example uses placeholder agents (Agent/agent-a) and a placeholder issuer. Any Veld-specific signing convention lives in a vendor extension under veld.dev, never in the normative core example.

Overlook — the sequence (lens 1)

PhaseThemesRelease · OMMBreaking?Unlocks
1a / 1bA vocabulariesR2 · OMM-1Union = type-change; R2, RFC-gated. Vocab registry (move 1) ships R1.x-additive; 1b binding-engine off the critical pathEvery non-coding domain can name concepts
2B Agent + C identityR2 · OMM-0/0New resource + Reference widening; R2, RFC-gated, non-invalidating to R1 bundles. C inert single-party → OMM-0Agents to attribute to; cross-store merge (needs 2nd store)
3E PROV (E1-chain on path; attestation parallel)R2 · E1-chain OMM-1 / attestation OMM-0New optional fields (R2 line)Verifiable trust across vendors (at P2)
4F + D governance, modalityR1.x where additive¹New optional fieldsRegulated & multimodal adoption
5G J K + H, IR1.x additive / R2+ structural¹I’s CR-5 carve-out → R2+; G/K/H/F/D and J’s typed-channel + deprecation markers additive → R1.x¹Structured, multilingual, federated memory

¹ Genuinely §5.1-additive themes ship as R1.x increments as they land (K, G, H, F, D, the non-Reference C fields onlyidentifier[]/aliases[], never authoredBy/owner/ Agent-sameAs — and J’s typed channel + deprecation markers, since §6.4 makes a deprecation marker a warn-only additive change). The R2 boundary is reserved strictly for the actually non-additive items: the CodeableConcept union (type change), the Agent resource + Reference widening (§5.2/§3.1), the eventual R3 REMOVAL of J’s deprecated string maps, and I’s CR-5 carve-out. Holding a purely-additive optional field for a major release works against the cheap-adoption invariant. Caveat from P1: “R1.x-additive” means “additive to the data model and deprecation-policy-clean,” NOT “forward-readable by an already-shipped same-major closed-schema validator” — closed schemas reject undeclared new core fields, so intra-major novelty rides extension[] until a minor schema bump admits it.

Overlook — dependencies (lens 2)

A (vocabularies) ──► everything (domains can finally name their own concepts)
A ──► J (typed values reuse CodeableConcept via the shared TypedValue def; NOT via Extension)
A ──► E1-chain (chain over prior resources; needs no Agent)
B (Agent) ──► E2 (agent-attribution hops attribute memory to Agents) ──► (only) the agent case
B (Reference widening) ──► I (the ExternalReference $def extends, never relaxes, the widened Reference)
C (identifier/sameAs) ──► I (input to how external refs are expressed; not the gate)
I (Federated CR-5 carve-out) ──► attested cross-store E2 (a real Agent or an external ref)
D, F ──► E (close the attestation inclusion allow-list; F's classification enters M)
E (attestation, redaction-aware) ──► F (signed retention/consent; redaction preserves the commitment)
P2 (a second implementer / verifier) ──► OMM-2 anywhere; ──► attestation interop & E maturity at all

The critical path is A → E1-chain, a pure additive structure reviewable by a second party; the attestation subsystem and E2 are a parallel track gated on P2 (cross-verification is their only test). C runs in parallel after A per the directive, but its value (cross-system merge) and I’s mechanism are a matched pair that also need P2. Maturity floor rule: a dependent feature’s OMM is min(grade(its dependencies), grade(its own mechanism), gate(P2 where interop is the test)) — while Agent is OMM-0, E2 and any agent-signing attestation are at most OMM-0; J caps at A’s grade; I and C at single-party OMM-0 until P2. I’s true gate is the CR-5 carve-out RFC + the ExternalReference $def, not C.

Breakers — adversarial stress-test (3 passes)

Pass 1 — semantic collisions & compatibility.

  • PROV Entity vs OMIR Entity, the base context, and term collisions. Mitigation: distinct chain field names (derivedFrom/hopCredibility, never the globally-bound from/credibility); a separate opt-in PROV context that re-declares provenance, suppresses @vocab inside the subtree, attaches PROV via explicit relations on the omir: node (not a second @type), and publishes the narrowed disjointness (“subject-matter Entity ≠ prov:Entity; derivation operands DO lift to prov:Entity”). The published context + RDF golden-file vector is what makes the lift deterministic.
  • CodeableConcept softens validation. Mitigation: the widened branch is the existing enum (not free type:string), and the seed/object mapping is made structurally bijective by excluding the field’s own seed-system URL from the object branch — so a seed value is expressible only as the bare string at the Core schema layer, with no Phase-1b/honor-system dependency. Phase 1b adds the machine-readable binding construct + net-new validator subsystem for closed-vocabulary enforcement.
  • Vocabulary fragmentation — including the field that actually fragments. Mitigation: WG-published seed code systems (release-governed) + a public registry (add = additive PR; retire = RFC) + a mandatory text fallback. relationType is included in A — the one already-open, highest-fragmentation field — so the namespace discipline lands where collisions actually happen. Domain vocabularies live under non-omir.io URLs.

Pass 2 — trust & governance hard parts.

  • Attestation brittle under re-serialization / across encodings. Mitigation: one canonical form SHA-256(det-CBOR(M)) over a versioned inclusion allow-list, JSON signers build identical M, numbers normalized to shortest-round-trip decimal pre-CBOR (floats forbidden), CodeableConcept folded bijectively, dtype-pinned binary forms barred from signed commitments (and embeddings excluded by default), forbid bincode under attestation (encodings.md edit + validator enforcement), ship a KAT vector — gated on P2.
  • Retention vs immutable provenance; redaction vs signatures; erasure of a signing Agent. Mitigation: signed subset holds salted per-field commitments, not plaintext (with an honest unlinkability-not-brute-force claim and an erasable-nonce option for real resistance); redaction deletes only plaintext; erasure dominates — Agent identity enters via an erasable commitment so a compelled deletion yields unverifiable-by-erasure, never tampered, and never a dangling ref.
  • Key authority across rotation/merge/redaction — and the closed-world invariant. Mitigation: inline the key + key-binding into M so verified is decidable from the bundle bytes alone; demote out-of-band DID/HTTPS resolution to Federated; bind to signing-time at; never rewrite a signed Agent id on merge.
  • Classification across a trust boundary. Mitigation: classification is declarative; enforcement is consumer policy; a profile may require the core governance block (and the legacy pii-class extension is deprecated with a dual-accept window + value-shape mapping).

Pass 3 — adoption & process risks.

  • The whole plan is gated on a second implementer — and nobody is tasked with producing one. Mitigation: P2 makes the gate a named, dated deliverable, not a deferred “later.” Milestone zero is reframed: a NON-Veld party consumes a published minimal Bundle unchanged and publishes a conformance statement (Conformance §“Declaring conformance”). A is the cheapest additive spec change, sequenced first to open the namespace (move 1, additive, day-one). The independence test (P0a) — technical/economic, recusal-bounded — prevents “Veld twice.” Co-design C and E with the first external adopter, named on the RFC record.
  • Backwards compatibility is not a slogan. Adding any new core field is a new-release change because R1/R2 schemas are closed (additionalProperties:false). Honest guarantee: “R<n> bundles stay valid forever” + “R<n+1> readers accept R<n>,” not “older readers tolerate newer.” A published R1 reader rejects any R2 bundle at the envelope (E300) by design. Until a field is promoted, data rides extension[] under a non-omir.io URL.
  • Scope creep / 80-20 violation — measured by the SPEC SURFACE a second implementer must read, not the required-field count. Mitigation: the Adopter floor (below) defines a normatively-labeled Core-R2 mandatory-to-understand tier (the four+Agent core resources + the CodeableConcept text-fallback + ignore-unknown rules) and an Optional-capability tier (B-semantics, D, E, F, G, H, I, J, K) a Core-R2 consumer MAY ignore wholesale without reading their specs. WG effort is tied to the gate: no phase past Phase 2 begins schema work until a candidate second implementer has cleared the Phase-1a floor (P2). “Adopt all” stays the destination without spending the cheap-adoption budget before the gate is met. Extension.valueJson stays arbitrary JSON. OMM is graded per resource TYPE; new volatile fields on a type do not inherit or drag the type’s grade — but the field-level signal must be machine-readable, not buried in prose: a new field on a mature type either rides extension[] until it earns promotion or carries an x-omir-maturity annotation the validator surfaces as an INFO finding when a bundle uses a below-type-grade field (prose description is governed as editorial §5.1 and would let stability claims escape the OMM ballot gate — so it is not the vehicle).
  • Velocity vs honesty. Mitigation: the OMM rule — no OMM-2 without ≥2 independent implementations — is the brake, real only once P0 reconciles the three conflicting definitions and re-grades the four overclaimed R1 types. A falling trigger keeps the gate visible: any feature holding OMM-1 across a full release cycle (or fixed calendar window) with no recorded independent implementation MUST be flagged “OMM-1 (single-party; no independent implementation as of <date>)” and becomes a candidate for §4.2 TSC review — so a standard stuck single-party forever is not indistinguishable in the grade table from a healthy one.

Adopter conformance floor

So a second implementer can scope the work, the Core-R2 floor is explicit, split into a mandatory-to-understand tier and an optional-capability tier.

Mandatory to understand (the whole of Core-R2 reading): the five core resources (MemoryRecord, Entity, Relationship, Episode, Agent), the CodeableConcept union + text fallback, and the ignore-unknown rules. A Core-R2 consumer:

  • MUST parse the CodeableConcept union and read text; MAY ignore system/code it does not know.
  • MUST ignore the semantics of unknown Agents, chain hops, governance blocks, embeddings, typed attributes, and i18n variants without rejecting (extending “ignore unknown extensions” to new optional core blocks).
  • MUST satisfy reference integrity as a document property for Bundles it AUTHORS: a Bundle a Core-R2 implementation authors MUST have a closed chain (every chain[].agent/ chain[].derivedFrom resolves in-Bundle), exactly as CR-5 is unconditional. When RELAYING a Bundle it did not author, it MUST pass the chain through unmodified (lossless pass-through) and MUST NOT introduce a new dangling ref — a trust-agnostic relay is never forced to mint placeholder Agents into a chain it cannot interpret, and the Phase-3 “producer MUST emit a closed chain” wording means author, not relay.

Optional capability (may be ignored wholesale, specs unread): attestation verification, federation references, Agent/B semantics beyond ignore-and-preserve, D modality, F governance enforcement, G/H/J/K. A Core-R2 implementation is NOT required to verify attestations, resolve federation references, or implement any profile it does not claim.

A Core-R2 producer MUST emit the backward-compatible (bare-string) form where it has no coded value, and MUST place implementation-specific data in extension[] under a non-omir.io URL.

Definition of done (per phase)

A phase reaches its stated target maturity when, for each theme it lands: the schema and the version-aware reference validator support it — including that every new ResourceType/id reference field (and every closed-world bare-id field, new and pre-existing) is in the registry-driven reference walk and exercised by a dangling-reference invalid example in examples/invalid/. The non-regression clause is split by the phase that introduces the field, because a field cannot have a fixture before it exists:

  • Phase 2 DoD (the walker-refactor guard): a dangling-parentId fixture AND a dangling Relationship.sourceEpisode fixture MUST be added and MUST still be rejected, asserted in validator/tests/conformance.rs before the registry-walker refactor merges (parentId is the lib.rs:219-230 special case the refactor risks dropping; sourceEpisode was the silently-omitted pre-existing field).
  • Phase 3 DoD: dangling chain[].derivedFrom and chain[].agent fixtures, plus a wrong-shape chain[].agent (e.g. resolving to Episode/x) fixture proving the broadened set-membership E201 still rejects a non-Agent in the Agent slot.

For A: a forward-compat fixture PAIR, each with a single unambiguous expected code — (i) an R2-declared bundle with object-form CodeableConcept that the R1 validator rejects with E300 at the envelope (proves “R1 readers reject R2 by design”); and (ii) an R1-declared bundle that nonetheless contains an object-form CodeableConcept, which the R1 schema set rejects with E120 at per-entry dispatch (proves the legacy enum stays hard-validated). Plus the @context Turtle golden-file (the source-split audit) and the intention-filtering object-form fixture. For 1b: profile-constraint checking enforces code-system membership with passing/failing examples. For E: the cross-verifier known-answer attestation vector, the bincode-under-attestation failing example, the default-vs-omitted and 9-vs-9.0 digest-equality KATs, and the RDF golden-file — the interop DoD items gated on P2. At least one profile exercises each theme; docs + examples exist; an OMM grade is assigned honestly under the reconciled, re-graded P0 ladder.

The independent-implementation requirement applies to claiming OMM-2+, not to completing a phase — but P2 binds at least one phase to a real second party. OMM-0/1 = the reference implementation alone; OMM-2 = ≥2 independent (per P0a). Phases 1–5 may ship at OMM-0/1 on the reference implementation alone, so the standard ships before a second implementer materializes — directly serving “cheap early adoption matters more than feature count.” But Phase 1a MUST NOT be declared done until a candidate independent implementer (named on the RFC/ballot record) has either (a) consumed an R2 CodeableConcept-union bundle and round-tripped it, or (b) recorded a public statement of intent — so the one existential invariant is a gate, not an owner-less “later.” Process preconditions (P-1, P0, P0a, P1) must land before the phases they gate — no phase may claim OMM-2 before the OMM reconciliation, no R2 traffic is testable before the validator is version-aware, and no ballot is callable before the founding TSC is seated.

Process

None of this is unilateral. Every change here is a candidate, to be proposed as an RFC, debated by the Working Group, and balloted (see CONTRIBUTING.md and GOVERNANCE.md). New resources and fields enter at OMM-0/1 and earn maturity through independent implementation — the same honesty rule the rest of the spec lives by. The fastest way to move any of these forward is the project’s stated existential need: a second implementer whose requirements turn one of these themes from a hypothesis into a ballot.

Efficiency & Information-Bearing Codes

Non-normative. This page is a design discussion, not part of the R1 conformance surface. Nothing here changes what makes a Bundle valid (Conformance). Every change below would enter through the RFC + ballot process and start low on the OMMOMM-0/1 — and earn its level through independent implementation. It is the efficiency-first companion to Toward a Global Standard (the interchange-first schema roadmap) and the forward-looking counterpart to memory_theory.md (the backward-looking divergence map).

Status — draft-schema landing (this iteration). The four R1.x-additive proposals are now applied to the draft schemas/ as optional fields, each annotated x-omir-maturity: 0: EP-1 (InformationContentMemoryRecord.informationContent); EP-4 (Episode.boundaryStrength/boundaryReason, MemoryRecord.replayPriority, InterferenceMemoryRecord.interference); EP-5b (FamiliaritySketchBundle.familiaritySketch); and EP-6c — the D5 edge-normalization fix (Relationship.reverseStrength/normalizedStrength/ normalization + EdgeNormalization, with dependentRequired: normalizedStrength → normalization). These are non-breaking (no existing bundle becomes invalid; additionalProperties:false still holds because the fields are now declared), remain RFC-gated for ratification, and are reversible — until a TSC ballots them they are draft and OMM-0. The R2 / breaking proposals (EP-2/EP-3/EP-5a/EP-F on the Theme-D Embedding; EP-6a Chunk; EP-6b schemaType) are not applied — they widen the shared Reference pattern or Bundle.entry and stay candidates here.

Where Toward a Global Standard asks “what blocks interchange across domains?”, this page asks a different question: “what state would let an engine store less, search cheaper, and infer more — at the same fidelity?” The seven proposals below come from the efficient-/predictive-coding, fuzzy-trace, hippocampal-indexing, event-segmentation, temporal-context, and chunking lineages. Each is scored on three axes the interchange roadmap does not track: ⚡ watts (store/search less), 🎯 inference (signal-to-noise), and capability (what becomes expressible).

The one-paragraph thesis

These are not seven scattered fields. Four of the seven (EP-2, EP-3, EP-5a, EP-F) are variants of a single object — the algorithm-neutral Embedding representation that global-standard §D already proposes. The highest-leverage move is therefore a reframing the divergence map already named: extend Theme D from “neutral embeddings” to efficiency-bearing codes — an Embedding that can carry a Matryoshka prefix (rank coarse-to-fine), a sparse code (CPU inverted-index search), a reserved drift space (cheap temporal cues), and a VSA structural code (compositional ops). That fold is realized: those four fields now live in the single canonical §D Embedding definition; this page supplies their watts/inference rationale and does not re-specify the schema. The remaining three are: two information-theoretic scalars OMIR has no field for (EP-1 surprisal, EP-4 replay/interference), and one graph fix that makes spreading activation portable (EP-6c, the D5 edge-normalization remediation). Two proposals add genuinely new structure: an Episode event-boundary (EP-4) and a Chunk consolidation-product resource (EP-6a). One adds a producer-level metamemory primitive (EP-5b, the familiarity sketch).

Proposal index

EPPrescription (from the principle)Concrete deltaAdditivityVehicle · OMMThemeDivergence
EP-1surprisal/novelty scalar + model-redundancy flagnew InformationContent def + MemoryRecord.informationContent§5.1-additiveRFC-gated · R1.x · OMM-0new— (new lever)
EP-2Matryoshka gist + offloadable verbatim§D Embedding fields matryoshka/nestedDims/role + verbatim MediaReference (defined in §D)additive to the D defrides D · R2 · OMM-0DD2 (partial)
EP-3sparse index layer (indices+values)Embedding.sparse + Theme-I external-content pointeradditive to the D defrides D · R2 (+ I · R2+) · OMM-0D + I
EP-4boundary metadata + replay priority + interferenceEpisode.boundaryStrength/boundaryReason; MemoryRecord.replayPriority; new Interference def + field§5.1-additiveRFC-gated · R1.x · OMM-0newD10 (closes), D8/D4
EP-5temporalContext drift vector + familiarity sketchreserved Embedding space omir:temporal-context (D); new FamiliaritySketch def + Bundle.familiaritySketchsketch §5.1-additive; tc rides Dsketch R1.x · OMM-0; tc rides D · R2D + new
EP-6chunk(composedOf) + schema typing + edge-norm fixnew Chunk resource (Templates = reusable Chunks); MemoryRecord.schemaType (CodeableConcept); Relationship.reverseStrength/normalizedStrength/normalization + Check::GraphNormalization (E250/E251)Chunk = §3.1/§5.2 breaking; edge-norm §5.1-additive; schemaType rides AChunk R2 · OMM-0 (§4 RFC); edge-norm R1.x · OMM-0; schemaType rides A · R2A + graph + newD5 (closes), D8
EP-FVSA / HRR structural code (frontier)Embedding space:"vsa" + structure tags; extension-firstextension (no RFC) → promotableextension now; D · R2+ · OMM-0 laterD + .omirb profile

Routing convention (from GOVERNANCE §3.1/§5.1/§5.2): a new optional field is §5.1-additive (no existing bundle becomes invalid → ships as an R1.x increment) but still touches the normative surface, so it is RFC-gated (the RFC authorizes the schema edit; R1.x is the release lane). A change that widens the shared Reference pattern or the Bundle.entry oneOf (a new resource type) is §5.2-breaking → R2, full RFC per CONTRIBUTING §4. Adding a new, unreferenced $defs member to common.schema.json is itself purely additive; it is the field that references it that carries the additivity class. New fields ride at OMM-0 under the per-field x-omir-maturity signal (global-standard, Pass-3), never inheriting their host type’s grade.


EP-1 — Information content (don’t store what the model can regenerate)

Principle 1 (efficient coding · predictive coding). “A surprisal/novelty scalar at encoding (information content vs. the producer’s model) and a model-redundancy flag (reconstructable from the base model?). This is not importance (value) or confidence (belief) — it’s information content, which OMIR has no field for.”

The orthogonality is the whole point: importance is value, confidence is belief, this is information (−log p against a generative prior). A record the base model already knows is near-zero information regardless of how important or believed it is — and is the single biggest watt lever, because most “memories” are model-knowable and need not be persisted or searched at all (cf. Titans’ ‖∇loss/∇input‖ write-gate, EM-LLM’s Bayesian surprise).

New common.schema.json#/$defs member (purely additive):

"InformationContent": {
  "type": "object",
  "description": "Information content of a record against a generative prior — what is NOVEL, distinct from what is valued (importance) or believed (confidence). Operationalizes efficient/predictive coding: store the surprising, regenerate the predictable.",
  "properties": {
    "novelty": {
      "$ref": "#/$defs/UnitInterval",
      "description": "Producer-relative normalized novelty in [0,1]: how surprising this record was against the producer's model at encoding. Comparable WITHIN one producer only (Theory & Scope: stored scalars are producer-relative)."
    },
    "surprisalBits": {
      "type": "number",
      "minimum": 0,
      "description": "Raw Shannon surprisal -log2 p(x) in bits against the model named in 'model'. Unbounded and model-relative; the reproducible quantity when 'model' is shared."
    },
    "model": {
      "type": "string",
      "description": "Identifier of the generative model whose p(x) defines 'novelty'/'surprisalBits', e.g. 'minilm-l6-v2' or a base-LLM id."
    },
    "reconstructable": {
      "type": "boolean",
      "description": "True if this content is regenerable from the model named in 'reconstructableBy' and is therefore a candidate to DROP and regenerate rather than store/search."
    },
    "reconstructableBy": {
      "type": "string",
      "description": "Identifier of the model that can regenerate this content when 'reconstructable' is true."
    }
  },
  "additionalProperties": false
}

MemoryRecord additive field: "informationContent": { "$ref": "common.schema.json#/$defs/InformationContent" }.

Classification. New optional field on MemoryRecord; no enum/required/Reference change → §5.1-additive, ships as R1.x, RFC-gated for the normative meaning. OMM-0. novelty is a UnitInterval (CR-7 conformant) and a producer-relative snapshot → on landing it joins the Theory & Scope producer-relative list; surprisalBits is the model-relative reproducible form.

Hits. ⚡⚡⚡ watts (persist/search only the genuinely novel) · 🎯 inference (signal-to-noise) · capability (compression). Open questions: does reconstructable:true license a consumer to omit content entirely (ties EP-2 verbatim-eviction)? bits vs nats — pin one (surprisalBits, log2, here).


EP-2 — Matryoshka gist + offloadable verbatim (store the gist, fetch the words)

Principle 2 (fuzzy-trace · rate-distortion). “A dual-trace model: a compact, durable, anchorable gist code (cheap to rank, slow decay) + an optional, fast-decaying, offloadable verbatim payload (via a MediaReference). Crucially, mandate the gist be prefix-truncatable (Matryoshka-style) so coarse-to-fine works across producers.”

This is the efficiency-bearing-codes extension of Theme D, stated literally. Matryoshka representation learning (2205.13147) is the exact engineering analog: rank millions on a truncated prefix, re-rank survivors on more dims. The dual trace maps the gist to a durable anchorable Embedding and the verbatim surface to an offloaded MediaReference (Theme D) with its own faster decay.

Schema: the matryoshka, nestedDims, and role fields of the canonical §D Embedding — defined there, not duplicated here. matryoshka + nestedDims give the prefix-truncatable gist; role splits the durable, anchorable gist from the offloadable verbatim (carried out-of-line via refMediaReference).

Decay split. The gist reuses the existing Decay block (anchored:true, long halfLifeHours); the verbatim trace carries a short halfLifeHours and is the first thing dropped under memory pressure — graceful degradation that keeps the rankable gist.

Classification. Additive to the (not-yet-landed) Theme-D Embedding def → rides Theme D · R2 · OMM-0. No Reference-pattern impact (MediaReference is {uri,contentType,hash}, not a typed Reference). Closes part of D2 (storage/retrieval-strength duality): the durable gist is the storage-strength-bearing trace, the evictable verbatim is retrieval-strength-bearing.

Hits. ⚡⚡ watts (coarse-to-fine + verbatim eviction) · 🎯 inference (gist generalizes) · capability (summary recall). Open question: make nestedDims ascending + power-of-two by convention so two producers’ prefixes align for cross-store shortlisting.


EP-3 — Sparse index layer (index, don’t scan)

Principle 3 (hippocampal indexing · SDM · sparse codes). “Separate a cheap index layer (sparse keys/pointers) from content (offloadable); standardize a sparse-code representation (indices+values). The index may point at content living elsewhere (ties to federation, §I).”

A sparse code turns ANN-on-GPU into an exact inverted-index lookup on CPU (≈ an order of magnitude cheaper), and modern-Hopfield / SDM gives one-step associative completion. The “content lives elsewhere” half is exactly global-standard §I’s federation: the index entry resolves to remote content via the ExternalReference $def (never the closed-world bare ResourceType/id).

Schema: the sparse field ({indices, values}) of the canonical §D Embedding, mutually exclusive with the dense vector — defined there, not duplicated here.

Classification. Additive to the Theme-D Embedding def → rides D · R2 · OMM-0; the pointer-to-remote-content rides Theme I · R2+ (its true gate is I’s CR-5 carve-out + ExternalReference $def, not this field). The index/content split composes with EP-2’s gist/verbatim and EP-1’s reconstructable drop.

Hits. ⚡⚡⚡ watts (CPU sparse search + lazy content load) · 🎯 inference (DG-style pattern-separation, anti-interference) · capability (recall from fragments). Open question: indices length cap / a density hint so a consumer can pick inverted-index vs dense path.


EP-4 — Boundary, replay priority, interference (encode by surprise, prune by interference)

Principle 4 (event segmentation · prioritized replay · rational forgetting). “Event-boundary metadata on episodes (location + boundary strength + why); a replay/consolidation priority (surprise × value × recency); an eviction/interference signal (need-probability or local embedding density).”

Three deltas, the third of which closes divergence D10 — the doc’s explicit reframe of D10 from “missing fidelity” (a bounded non-goal) to “missing the rational-eviction efficiency mechanism” (a load-bearing watt lever: a smaller hot index makes every query cheaper).

(a) Episode additive fields — the EM-LLM / Event-Segmentation boundary:

"boundaryStrength": {
  "$ref": "common.schema.json#/$defs/UnitInterval",
  "description": "Prediction-error / Bayesian-surprise magnitude at this episode's boundary. Where memory is structured; a segmentation cue for downstream consolidation."
},
"boundaryReason": {
  "type": "string",
  "description": "Open vocabulary: why the cut was made (e.g. 'topic_shift', 'temporal_gap', 'actor_change'). Promotable to a CodeableConcept under Theme A."
}

(b) MemoryRecord.replayPriority — the prioritized-replay (Schaul 2015) weight:

"replayPriority": {
  "$ref": "common.schema.json#/$defs/UnitInterval",
  "description": "Producer-relative consolidation/replay priority (surprise x value x recency snapshot). Prioritizes amortized OFFLINE re-embedding/consolidation/decay to idle time. Snapshot as of meta.lastUpdated; producer-relative (Theory & Scope)."
}

(c) New Interference def + MemoryRecord.interference field — the D10 remediation:

"Interference": {
  "type": "object",
  "description": "Rational-eviction / interference signal: why a record competes for retrieval and how prunable it is. Closes divergence D10 — forgetting as interference (retroactive/proactive competition among similar traces), not pure time-decay.",
  "properties": {
    "needProbability": {
      "$ref": "#/$defs/UnitInterval",
      "description": "Anderson need-probability: estimated P(this record is needed soon). The rational eviction key — evict lowest need, NOT oldest (LRU). Distinct from decay (time) and importance (value)."
    },
    "localDensity": {
      "type": "number",
      "minimum": 0,
      "description": "Nearest-neighbour crowding in embedding space. High density = high interference from similar traces (the dominant real forgetting mechanism in similarity-based stores)."
    },
    "competesWith": {
      "type": "array",
      "items": { "$ref": "#/$defs/Reference" },
      "description": "MemoryRecord references this record competes with — similar traces that degrade each other's retrievability. Reuses the existing Reference pattern (MemoryRecord already in it); no widening."
    }
  },
  "additionalProperties": false
}

Classification. All §5.1-additive (new optional fields; competesWith reuses the existing Reference pattern → no widening) → R1.x, RFC-gated, OMM-0. On landing, replayPriority joins the producer-relative + snapshot enumerations in Theory & Scope. Closes D10; advances D8/D4 (the boundary is the consolidation-as-process / continuous-segmentation half).

Hits. ⚡⚡⚡ watts (encode less, small hot set, amortized offline replay) · 🎯 inference (clean event retrieval, fewer distractors) · capability (continual learning without catastrophic forgetting). Open question: is competesWith producer-authored or derivable from localDensity at import — carry both and let the consumer choose.


EP-5 — Temporal-context drift + familiarity sketch (cheap cues, skip retrieval)

Principle 5 (Temporal Context Model · feeling-of-knowing). “A per-record/episode temporalContext drift vector (timestamps give recency but not cheap similarity-based contiguity); a producer-level familiarity sketch over entities/cues.”

(a) temporalContext is a vector → it rides the canonical §D Embedding under the reserved space "space": "omir:temporal-context" (registered there). Timestamps already give recency; the drift vector gives cheap contiguity (recall the neighbours-in-time of a hit) as a dot product, no scan (EM-LLM adds exactly this temporal-contiguity stage on top of similarity). No new field — a reserved space (documented in §D) + the contiguity-retrieval semantics. Rides Theme D · R2 · OMM-0.

(b) familiarity sketch is genuinely new: producer-level aggregate state OMIR has no home for. It answers “do I plausibly hold this?” before paying for retrieval — and in production the skipped retrievals are often the dominant cost. A negative is authoritative (skip RAG entirely); a positive is probabilistic. It lives at the Bundle level (Bundle has no OMM of its own; it tracks its resources):

"FamiliaritySketch": {
  "type": "object",
  "description": "Producer-level approximate-membership sketch over entities/cues, for metamemory gating: answer 'do I plausibly hold this?' before paying for retrieval (feeling-of-knowing). A negative is authoritative (skip retrieval); a positive is probabilistic. Complements confidence's in-weights-vs-retrieve gate.",
  "properties": {
    "kind":   { "enum": ["bloom", "count_min"], "description": "Sketch family." },
    "domain": { "enum": ["entity", "cue", "content"], "description": "What the sketch is built over." },
    "hashes": { "type": "integer", "minimum": 1, "description": "Number of hash functions k." },
    "bits":   { "type": "integer", "minimum": 1, "description": "Filter width m in bits (bloom) / table width (count-min)." },
    "data":   { "type": "string", "contentEncoding": "base64", "description": "Serialized filter bytes." }
  },
  "required": ["kind", "domain", "hashes", "bits", "data"],
  "additionalProperties": false
}

Bundle additive field: "familiaritySketch": { "$ref": "common.schema.json#/$defs/FamiliaritySketch" }.

Classification. temporalContext rides D · R2. The sketch is a new optional Bundle field → §5.1-additive · R1.x, RFC-gated, OMM-0. (A Bundle-level field, unlike a resource field, does not interact with any resource’s OMM grade.)

Hits. ⚡⚡⚡ watts (short-circuit/skip retrieval; cheap temporal cue) · 🎯 inference (contiguity recall) · capability (temporal-neighbourhood recall). Open question: sketch hash-function identity must be pinned (a named, versioned hash) or a consumer cannot test membership — carry a hashAlg field if EP-5b is balloted.


EP-6 — Chunks, schema typing, and portable spreading activation

Principle 6 (chunking/expertise · schema · spreading activation). “Chunk/template resources (composedOf) produced by consolidation; schema typing new memories attach to; and — load-bearing — make the graph spreading-activation-ready by fixing the edge-strength normalization (divergence D5) so PPR gives the same answer across importers, with salience as seed weights.”

Three deltas; (c) is the one the principle flags load-bearing.

(a) The Chunk resource — Templates are reusable Chunks. A Chunk is a consolidation product: a compressed/abstracted unit (composedOf the memories/episodes/entities it consolidates), the episodic→semantic derivation made first-class (divergence D8). A reusable Chunk (reusable: true) is a Template — a schema/pattern that new memories instantiate (via MemoryRecord.schemaType, EP-6b) rather than a one-off abstraction. The two are one resource distinguished by the flag, not two resource typesTemplate is not minted separately. This is the proposal’s only new core resource type: it widens the Reference pattern and Bundle.entry oneOf§3.1/§5.2 breaking → R2, full CONTRIBUTING §4 RFC (spec/rfcs/RFC-<nnnn>-chunk.md, number TSC-assigned).

The consolidation event is not a resource. HANDOFF §5’s floated ConsolidationEvent/Reflection re-imports the consolidation process that semantics.md (“what OMIR deliberately does not specify”) puts out of scope, and duplicates what Theme E already models. The derivation is carried as a Theme-E provenance hop on the ChunkwasGeneratedBy { activity: "consolidation" } + wasDerivedFrom over its composedOf set — never a second resource. Veld concurs structurally: its event-sourced journal IntentPayload (src/intent_log/payload.rs) is Remember/Forget/Update/Anchor — pure memory CRUD with no consolidation-event variant; consolidation lands its output as Remember/Update of records, so even Veld’s journal models the product, not the event. Draft model:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://omir.io/spec/R2/schemas/Chunk.schema.json",
  "title": "OMIR Chunk (R2 candidate)",
  "type": "object",
  "properties": {
    "resourceType": { "const": "Chunk" },
    "id":   { "$ref": "common.schema.json#/$defs/Id" },
    "meta": { "$ref": "common.schema.json#/$defs/Meta" },
    "content": { "type": "string", "description": "The compressed / abstracted unit — a named template, schema, or expert chunk (MDL: search fewer, denser units)." },
    "composedOf": {
      "type": "array",
      "items": { "$ref": "common.schema.json#/$defs/Reference" },
      "description": "MemoryRecord / Episode / Entity references this chunk consolidates. The episodic->semantic derivation, first-class (divergence D8)."
    },
    "reusable": {
      "type": "boolean",
      "default": false,
      "description": "True => this Chunk is a TEMPLATE: a reusable schema/pattern that NEW memories instantiate (via MemoryRecord.schemaType), not a one-off abstraction. Templates are reusable Chunks, never a separate resource."
    },
    "schemaType": {
      "$comment": "CodeableConcept once Theme A lands; bare string until then.",
      "type": "string",
      "description": "The schema/pattern class this Chunk represents (open vocabulary, Theme A). A MemoryRecord.schemaType equal to this value attaches that memory to this Template (meaningful when reusable=true)."
    },
    "confidence":  { "$ref": "common.schema.json#/$defs/Confidence" },
    "provenance":  { "$ref": "common.schema.json#/$defs/Provenance" },
    "createdAt":   { "$ref": "common.schema.json#/$defs/Instant" },
    "extension":   { "type": "array", "items": { "$ref": "common.schema.json#/$defs/Extension" } }
  },
  "required": ["resourceType", "id", "content", "createdAt"],
  "additionalProperties": false
}

Required atomic work items mirror global-standard Phase 2’s new-resource checklist: author the schema; add to Bundle.entry.items.oneOf; widen common.schema.json#/$defs/Reference.pattern to include Chunk; extend the validator SchemaFiles/RESOURCE_TYPES/registry-walker; register in the JSON-LD @context; ship a migration note (no R1 bundle becomes invalid — older consumers ignore unknown entry items).

(b) MemoryRecord.schemaType — schema-consistent fast integration (Tse et al. 2007): a new memory attaches to a schema it instantiates. Best as a CodeableConcept (Theme A) since schema-types are an open, per-domain vocabulary → rides Theme A · R2, caps at A’s grade.

(c) D5 edge-normalization fix — the load-bearing one. Today Relationship.strength is a single [0,1] scalar; fan normalization is not representable, so two importers running spreading activation / Personalized PageRank over the same exported strengths get different answers — the normalization that makes a strength mean something is producer-private. HippoRAG shows single-pass PPR multi-hop is 10–20× cheaper and 6–13× faster than iterative retrieval only if the weights are portable. Relationship additive fields:

"reverseStrength": {
  "$ref": "common.schema.json#/$defs/UnitInterval",
  "description": "Backward association weight P(from|to); asymmetric counterpart to 'strength' = P(to|from). Resolves the symmetric-scalar half of D5 without two independent edges."
},
"normalizedStrength": {
  "$ref": "common.schema.json#/$defs/UnitInterval",
  "description": "Fan-normalized strength: within a source Entity's outgoing edge set sharing one 'normalization', these values sum to <= 1 + epsilon (ACT-R fan S - ln(fan): source activation conserved and divided). THIS is the portable spreading-activation weight; raw 'strength' is producer-private. MUST co-occur with 'normalization' (dependentRequired)."
},
"normalization": { "$ref": "common.schema.json#/$defs/EdgeNormalization" }
"EdgeNormalization": {
  "type": "object",
  "description": "Declares the regime that makes 'normalizedStrength' portable. Without it, PPR/spreading-activation diverges across importers (divergence D5).",
  "properties": {
    "scheme": { "enum": ["fan", "softmax", "none"], "description": "fan = source activation divided among associates; softmax = exp-normalized; none = raw (not portable)." },
    "over":   { "enum": ["source", "target"], "description": "Normalized over a node's outgoing (source) or incoming (target) edge set." }
  },
  "additionalProperties": false
}

Seed weights. With normalizedStrength + normalization present, PPR is fully specified across importers when seeded by Entity.salience (the documented seed-weight convention) — salience as seed, fan-normalized edges as the transition matrix, one pass.

Validator rule — fan-normalization is enforced, not producer-asserted (new Check::GraphNormalization, codes E250/E251; verified and sapper-hardened):

  • E250 — fan-normalization overflow (error). For each Entity E, partition its edges by (normalization.scheme, normalization.over). Within each partition where scheme ∈ {fan, softmax}: the sum of normalizedStrength over edges with from = E (over: source) — or to = E (over: target) — MUST be ≤ 1 + ε (ε = 1e-6, rounding-tolerant, per global-standard’s aggregateCredibility precedent). Edges with normalization absent or scheme: none are excluded; reverseStrength is excluded (asymmetry hint, not part of the fan sum).
  • E251 — normalizedStrength without normalization (error). Also enforced in-schema via dependentRequired: { normalizedStrength: ["normalization"] }. Without the regime the value is unportable — the exact D5 defect the field cures.

Why it survives the sapper pass: upper-bound only (a partial export’s subset of fan weights sums to ≤ the full sum ≤ 1, so ≤ 1+ε holds on partial graphs; a lower bound ≈ 1 is deliberately NOT checked); partitioned (mixed regimes never cross-contaminate the sum); no retroactive invalidation (new R2 fields, the rule ships with them — §5.2-clean, not a tightening of an existing field); decidable & cheap (pure function of the closed-world bundle, O(edges) group-by on from/to, no new index — a fifth Check variant beside {Structural, ReferenceIntegrity, VersionPresence, Profile}, mirroring global-standard’s Check::Attestation); not a silent hole (dodging via scheme: none self-labels the value non-portable — honest degradation, not a bypass). Final code numbers are TSC-assigned.

Classification. Chunk (+reusable Templates) = R2 · OMM-0 (§4 RFC, Reference-widening). schemaType rides A · R2. Edge-norm fields = §5.1-additive (no widening) → R1.x, RFC-gated, OMM-0, on Relationship (OMM-3), plus the Check::GraphNormalization validator variant (E250/E251). Closes D5; advances D8.

Hits. ⚡⚡ watts (fewer, denser units; single-pass vs iterative multi-hop) · 🎯 inference (portable multi-hop/transitive) · capability (abstraction, expertise).


EP-F — VSA / HRR structural code (frontier; lowest priority, highest ceiling)

Frontier (Vector Symbolic Architectures · Holographic Reduced Representations · HDC). An optional VSA structural code — bind (circular convolution) + bundle (superpose) a whole relational structure into one low-precision hypervector, cleanup via an item memory — gives OMIR analogical/compositional retrieval in cheap vector ops. Low-precision and edge-friendly: a natural fit for the .omirb robotics profile. Speculative — propose extension-first (no RFC), under a WG/vendor URL, promotable to the reserved omir:vsa space of the canonical §D Embedding once a second implementer exercises it:

{
  "url": "https://omir.io/spec/R2/ext/vsa-code",
  "valueJson": {
    "space": "vsa", "op": "hrr", "dims": 10000, "dtype": "int8",
    "vector": "<base64 or array>", "cleanupRef": "Entity/item-memory"
  }
}

Classification. Extension now (prefer-an-extension-over-an-RFC, the 80/20 rule); promotable to a Theme-D Embedding space at R2+ · OMM-0 when field-exercised. Ties to the .omirb binary profile (low-precision bytes). Hits. capability (compositional/analogical recall) · ⚡ watts (low-precision edge ops); 🎯 neutral. No divergence — pure capability frontier.


Overlook — sequence

StepEPsRelease · OMMBreaking?Unlocks
1EP-1 info-content · EP-4 replay/interference · EP-6c edge-normR1.x · OMM-0No (additive optional fields)The watt levers that need no new object: store-less, prune-by-interference, portable PPR. Closes D5 & D10.
2EP-5b familiarity sketch · EP-4a boundaryR1.x · OMM-0NoSkip-retrieval gating; segmentation cue.
3EP-2 / EP-3 / EP-5a / EP-F (all on the Theme-D Embedding)R2 · OMM-0Rides Theme D’s R2 lineEfficiency-bearing codes: Matryoshka, sparse, drift, VSA.
4EP-6a Chunk · EP-6b schemaTypeR2 · OMM-0Yes (new resource / CodeableConcept)Consolidation product + schema attachment. Needs Themes A & E.

Overlook — dependencies

Theme D (neutral Embedding) ──► EP-2 (matryoshka), EP-3 (sparse), EP-5a (drift space), EP-F (vsa)
Theme A (CodeableConcept)   ──► EP-6b (schemaType), EP-4a boundaryReason (promotion)
Theme E (PROV wasGeneratedBy)──► EP-6a Chunk's consolidation *event* (vs a 2nd resource)
Theme I (ExternalReference)  ──► EP-3 (index points at remote content)
EP-1 reconstructable         ──► EP-2 verbatim-eviction / EP-3 lazy content load (compose)
P2 (a second implementer)    ──► OMM-2 anywhere; required for the Embedding-code interop tests

The critical path is Step 1 — three pure-additive R1.x field sets that need no new object, no Reference widening, and no other theme, yet close the two divergences this track inherits (D5, D10) and land the biggest watt lever (EP-1). Steps 3–4 are gated on Theme D / A / E landing first, and the cross-store value of the efficiency-bearing codes (like all interchange value) needs the global-standard P2 second implementer.

Definition of done (per EP)

A proposal reaches its stated OMM when, for each field it lands: the schema and the version-aware reference validator support it; every new ResourceType/id reference field is in the registry-driven reference walk with a dangling-ref negative fixture in examples/invalid/ (EP-4 competesWith, EP-6a Chunk refs); and UnitInterval fields (EP-1 novelty, EP-4 replayPriority/needProbability, EP-6 reverseStrength/normalizedStrength) are in the CR-7 range check. EP-6c additionally ships a worked PPR golden-file proving two importers produce the same ranking from the same normalizedStrength + salience seeds — the portability claim made testable, the same discipline global-standard applies to the RDF and attestation vectors.

Process

None of this is unilateral. Every change here is a candidate — an RFC, debated by the Working Group and balloted (CONTRIBUTING, GOVERNANCE) — entering at OMM-0/1 and earning maturity through independent implementation. The efficiency framing changes the motivation (watts, not just interchange), not the gate.