Overview
OMIR — Open Memory Interoperability Resources (pronounced “OH-meer”) — is an open, vendor-neutral data format for portable AI agent and cognitive memory. This document describes OMIR R1, the first release.
OMIR is modeled, deliberately and openly, on HL7 FHIR (Fast Healthcare Interoperability Resources). Where FHIR made clinical data portable across hospital systems, OMIR makes memory portable across agent systems. The mapping is direct: Open ↔ Fast, Memory ↔ Healthcare, and “Interoperability Resources” is kept verbatim.
What OMIR is
OMIR is an at-rest data format — a document/file standard. It defines how a unit of agent memory looks when it is written to disk, exported, backed up, or handed from one system to another.
OMIR is:
- A file format. The canonical serialization is JSON / JSON-LD with the
.omirextension; a compact binary profile uses.omirb. - Vendor-neutral. No single implementation owns the format. Veld convenes the working group as the reference implementation, but the spec and schemas are free (see Conformance and the governance notes below).
- Resource-oriented. Borrowing FHIR’s model, everything in OMIR is a Resource.
OMIR is not:
- Not a wire protocol. OMIR does not define how memory is transported, queried live, or synchronized between running agents. It describes the bytes at rest.
- Not a product. OMIR is a standard. Implementations are products; the format is not.
- Not a database. OMIR says nothing about indexing, vector search, or storage engines. Those are implementation concerns.
Where OMIR sits: MCP, A2A, and OMIR
Two transport standards already exist for agent systems, both now stewarded by the Linux Foundation:
- MCP (Model Context Protocol, originally Anthropic) — connects agents to tools and context sources.
- A2A (Agent-to-Agent, originally Google) — lets agents talk to each other.
Both are transport concerns: they move data between live endpoints. Neither defines what memory is once it has been written down.
MCP and A2A transport memory. OMIR is the memory at rest.
OMIR is complementary, never competing. An agent can serve memory over MCP, hand
it to a peer over A2A, and persist or export it as an .omir Bundle. The transport
moves the bytes; OMIR defines the bytes. A clean separation of “in motion” from
“at rest” is exactly the separation FHIR drew between its RESTful API (motion) and its
Resources (rest).
The resource model
Every piece of data in OMIR is a Resource. R1 defines five resource types:
| Resource | Purpose | OMM |
|---|---|---|
| MemoryRecord | The atomic unit of memory: a remembered experience, plan, prompt, or learning. | 4 |
| Entity | A named thing extracted from memory — person, place, concept, technology, skill. | 3 |
| Relationship | A directed, weighted (Hebbian) edge between two entities. | 3 |
| Episode | A bounded experience — the raw event the other resources are derived from. | 3 |
| Bundle | The container. A Bundle is the .omir document. | — |
The maturity column refers to the OMIR Maturity Model (OMM) — see
Design Principles. It is surfaced per
resource in meta.maturity.
References
Resources do not nest each other arbitrarily; they link by typed reference. A
reference is an object carrying a single string of the form ResourceType/id:
{ "ref": "Entity/john" }
The reference target’s ResourceType MUST be one of MemoryRecord, Entity,
Relationship, or Episode, and the id MUST match an id of that type. References
resolve within the Bundle: a MemoryRecord carrying entityRefs: [{ "ref": "Entity/john" }] is satisfied by an Entity with id: "john" in the same Bundle’s
entry array. Reference integrity is a conformance rule
(Conformance).
This is how the graph is expressed at rest: a Relationship points from one Entity
to another; an Episode lists the entityRefs it produced; a MemoryRecord links
the entities it mentions. No pointers, no foreign keys — just typed string references.
The 80/20 rule
OMIR does not try to model every field every memory system has ever invented. It follows FHIR’s 80/20 rule: the core of each resource captures the ~80% of fields that ~80% of implementations actually need — content, timestamps, importance, confidence, decay, provenance, references. The remaining long tail of implementation-specific data does not bloat the core.
Instead, every resource carries a typed extension[] array: a structured,
namespaced escape hatch for proprietary or experimental data. Veld’s 20-signal
retrieval scores, external isotropy/closure dimensions, and name embeddings all ride
in extensions — never in the core. A consumer that does not understand an extension
MUST ignore it and still process the resource. See Extensions.
Where an implementation needs to constrain the core (require certain fields, forbid others, mandate specific extensions), it publishes a Profile. See Profiles.
Heritage
OMIR’s core faithfully carries the cognitive-memory features proven in its reference implementation, Veld — Agentic Memory:
- Calibrated Bayesian confidence — a Beta(α, β) posterior plus a point estimate, not a bare float.
- Multi-time-scale decay and anchoring — “better forgetting”: records decay on a half-life unless anchored.
- Tiered memory —
working → session → longterm → archive. - Hebbian relationship strength — edges strengthen with co-activation, decay without use.
- Temporal invalidation —
validUntilon records,invalidatedAton relationships. - Provenance and source credibility — where a memory came from, and how much to trust the origin.
- Prospective memory —
experienceType: "intention"for future-directed records. - Entity salience — how strongly an entity pulls on retrieval.
These are not optional add-ons bolted onto a flat record; they are the core OMIR resource fields. The format encodes how memory behaves over time, not just what was said.
Governance and licensing
OMIR is stewarded by a vendor-neutral OMIR Working Group in a neutral GitHub organization. Veld convenes the group but does not own the standard. The group operates an open RFC + ballot process, a Technical Steering Committee (TSC), semantic release versioning (R1, R2, …), and a published deprecation policy.
Licensing is deliberately decoupled from any implementation:
- The specification and schemas are licensed CC-BY-4.0.
- The reference code (validator, generators) is licensed Apache-2.0.
A standard under a restrictive license is dead on arrival. The OMIR spec is, and will remain, free.
Design Principles
The OMIR design is governed by six principles. The key words MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY in this specification are to be interpreted as described in RFC 2119.
1. Open and neutral
OMIR is an open standard owned by no vendor.
- The specification and JSON Schemas MUST remain freely licensed (CC-BY-4.0 for the spec, Apache-2.0 for the reference code). A standard behind a restrictive license is dead on arrival.
- No single implementation’s behavior is normative. Where this spec describes cognitive behavior (decay, Hebbian strengthening, tiering), it describes the data shape that records that behavior, not a required algorithm.
- The format MUST NOT privilege one vendor’s identifiers, URLs, or schemas in its
core. Vendor-specific data lives in
extension[]under vendor-controlled URLs.
2. At-rest, not a protocol
OMIR describes memory at rest — bytes on disk, in a backup, in an export, in a hand-off file.
- OMIR MUST NOT be read as a transport or query protocol. It defines no endpoints, no request/response shapes, no synchronization semantics.
- Live transport is the job of MCP and A2A. OMIR is complementary: those protocols move memory; OMIR is the memory once it lands. An implementation MAY serve an OMIR Bundle over any transport, but the Bundle’s validity does not depend on how it travelled.
3. 80/20 core plus extensions
Each resource defines a small, stable core and an open extension mechanism.
- The core of a resource captures the ~80% of fields that ~80% of memory systems need. New core fields are added conservatively and only through the ballot process.
- Implementation-specific, experimental, or proprietary data MUST be carried in
the typed
extension[]array under a canonical URL, not by adding properties to the core. Every resource schema setsadditionalProperties: false, so undeclared top-level fields are non-conformant by construction. - A consumer MUST ignore extensions it does not recognize and MUST still process the resource. Unknown extensions are never an error. See Extensions.
4. Honest maturity via OMM
OMIR grades the stability of each resource type with the OMIR Maturity Model
(OMM), an integer 0–5 surfaced in meta.maturity.
These are the canonical OMM levels (identical to GOVERNANCE §4.1), assigned per resource type:
| OMM | Name | Criterion |
|---|---|---|
| 0 | Draft | Newly proposed; shape may change freely; not safe to depend on. |
| 1 | Trial Use (early) | Implemented in at least one system; minimal field experience; breaking changes expected. |
| 2 | Trial Use (proven) | Exercised in real bundles by at least one implementation; shape settling but not yet cross-validated. |
| 3 | Established | Multiple independent implementations and real-world usage; stabilizing, with conservative change. |
| 4 | Mature | Broad field experience across implementations; changes rare and strictly backward-compatible within a release. |
| 5 | Normative | Stable and authoritative; breaking changes only across releases (deprecation policy). |
R1 grades honestly and MUST NOT overclaim:
MemoryRecord— OMM-4.Entity,Relationship,Episode— OMM-3.- New resources introduced in later releases start lower and earn their level through real, independent implementation.
Maturity is a promise about change, not a quality score. A consumer SHOULD treat lower-OMM resources as more likely to change between releases.
5. Encoding-neutral
OMIR defines a single logical data model with more than one byte-level encoding.
- The canonical encoding is JSON / JSON-LD (
.omir), chosen for ubiquity and human-readability. - A compact binary profile (
.omirb, CBOR/bincode) exists for edge and robotics deployments where size and parse cost matter. - The two encodings MUST be lossless round-trips of the same logical model. A resource’s meaning MUST NOT depend on its encoding. See Encodings.
The file extensions .omir and .omirb are the canonical, collision-free identifiers
for the format. Implementations MUST NOT use .mf or .mif for OMIR documents;
those extensions are heavily collided and are not part of this standard.
6. Forgetting is first-class
Most data formats assume data is permanent. Memory is not. OMIR treats forgetting as core, not as an afterthought.
- Every
MemoryRecordMAY carry adecayblock: a half-life, last-access time, access count, and ananchoredflag. Anchored records resist decay. Relationshipstrength is Hebbian: it rises with co-activation and falls without use. Thestrengthfield records the current synaptic weight.- Temporal invalidation is explicit:
validUntilsupersedes a record after an instant;invalidatedAtretires a relationship. tier(working → session → longterm → archive) records where a memory sits in the consolidation hierarchy.
These fields let a faithful implementation reconstruct better forgetting from an at-rest document — decay curves, anchors, and tier transitions survive export. OMIR does not mandate a forgetting algorithm; it standardizes the state that algorithm reads and writes.
Memory Semantics
The resource pages define what fields a record carries. This page defines what those fields mean over time — the cognitive model OMIR encodes. It is the answer to a fair question about the field tables: “why is confidence two numbers and a third? what is a half-life doing in a data format? what makes an edge’s strength go up?”
OMIR’s guiding rule, restated from Design Principles §6:
OMIR standardizes the state a memory algorithm reads and writes — not the algorithm.
Two faithful implementations may decay, consolidate, and re-rank memory with completely different code. What they must agree on is the meaning of the state in the file. The mechanisms below are drawn from the reference implementation (Veld); the spec records the state, and an implementation is free to reproduce the behavior however it likes.
1. Confidence as calibrated belief
MemoryRecord.confidence is a Confidence object — { alpha, beta, calibrated } — not a
bare float, because an agent needs to distinguish “90% sure, having checked many times”
from “90% sure, having seen this once.” A single number cannot.
OMIR models belief as a Beta(α, β) posterior:
alphaaccumulates confirming evidence (the memory proved helpful/correct), plus a prior.betaaccumulates disconfirming evidence (the memory proved misleading/wrong), plus a prior.calibratedis the point estimate a consumer should act on. The natural estimate is the posterior mean,α / (α + β), but a faithful producer damps it toward 0.5 when evidence is thin — at one or two observations the prior should dominate, so an unproven memory does not masquerade as a confident one. Asα + βgrows,calibratedapproaches the raw mean. R1 does not mandate the exact shrinkage, socalibratedis a producer-asserted point estimate: two faithful producers MAY emit a differentcalibratedfor the same(α, β). Theα/βpair is the portable, reproducible quantity; a consumer that needs a reproducible or cross-producer-comparable estimate SHOULD recompute it fromα/βunder its own rule rather than rely oncalibratedbeing identical across producers (see Theory & Scope).
A consumer that only reads calibrated behaves correctly within a single producer’s
records; the α/β pair is there for consumers that want to keep updating belief as new
feedback arrives. This is why confidence is a distribution rather than a bare float — it
carries its own evidence count, so belief can be revised rather than merely averaged.
One caveat on merging: adding two stores’ α/β yields a coherent posterior only when
the two evidence streams are independent. When both stores derived their belief from the
same upstream source — common when memories are exported and re-imported — summing the counts
double-counts the shared evidence and overstates confidence. A consumer that merges
confidence across stores SHOULD account for shared provenance (e.g. via
provenance.externalId) rather than blindly add counts.
2. Forgetting as a curve
MemoryRecord.decay ({ halfLifeHours, lastAccess, accessCount, anchored }) records a
memory’s forgetting state so that “better forgetting” survives export. The intent:
- A record’s retrievability falls over time on a half-life — recent, frequently
accessed memories stay sharp; stale ones fade.
lastAccessandaccessCountare the inputs a decay function reads; each access pushes retrievability back up. - Decay is naturally multi-time-scale: fast in the first hours/days (filtering noise),
then much slower for what survives — empirically closer to a power law than a single
exponential. OMIR does not mandate the curve; it records the parameters
(
halfLifeHours) and the access signal a curve consumes. anchored: truemarks a memory that resists decay — a pinned fact, a user-stated preference, a safety constraint. Anchored memories have a floor below which retrievability does not fall.
The spec stores forgetting state, not a forgetting algorithm, precisely so a robot on a power-cycle budget and a cloud assistant can both reconstruct sensible decay from the same fields.
3. Tiering and consolidation
MemoryRecord.tier (working → session → longterm → archive) records where a memory sits
in a consolidation hierarchy — the at-rest analogue of working vs. long-term memory:
- working — active, in-focus, short-lived; the smallest, hottest set.
- session — bounded to a task/conversation; indexed for fast recall.
- longterm — consolidated knowledge, retrieved by semantic cue.
- archive — cold, near-permanent, batch-retrieved.
Promotion is driven by age × importance × access: a memory accessed repeatedly, or marked important, or linked to long-term knowledge, migrates inward; an untouched low-importance memory drifts outward and eventually compresses. OMIR records the tier a record currently occupies; the promotion policy is the implementation’s.
4. Hebbian relationship strength
Relationship.strength is a synaptic weight, not a static label. The cognitive model is
Hebbian — cells that fire together wire together:
- When two entities are co-activated (retrieved or mentioned together), the edge between them strengthens.
- An edge that is not used decays toward zero, the same “better forgetting” applied to structure rather than content.
validAtrecords when the relationship was last observed to hold;invalidatedAtretires an edge that has been contradicted without deleting it, so the graph keeps its history rather than silently rewriting it.
Heavier machinery a faithful engine may run on top of this — long-term potentiation,
asymmetric forward/backward strengths, consolidation tiers on the edge itself — is
implementation state and rides in extension[], not the core. The core carries the one
number every graph engine agrees on: current strength.
5. Salience
Entity.salience is how strongly an entity pulls on retrieval — its gravitational mass in
the memory space. A high-salience entity is more likely to be surfaced and to drag related
memories up with it via spreading activation. Salience rises with frequency
(mentionCount), recency (lastSeenAt), whether the entity is a proper noun
(properNoun), and explicit user importance. As with decay, OMIR stores the salience value
and its inputs; how an engine computes and uses it is its own business.
6. The temporal model
OMIR is bi-temporal in spirit: it separates when something happened from when it was recorded.
eventTime— when the described event actually occurred. It may precedecreatedAt(you can record a memory of last week today).createdAt— encoding time: when the record was written.validUntil(onMemoryRecord) andinvalidatedAt(onRelationship) express temporal invalidation — a fact superseded after an instant, or an edge that stopped holding — so consumers can filter stale knowledge rather than trust it forever.
Entry order is not significant; everything resolves by ResourceType/id. A consumer must
never infer meaning from position (Conformance). A richer, fully
interval-based bitemporal model is a candidate generalization — see
Toward a Global Standard §H.
7. Provenance and source credibility
MemoryRecord.provenance ({ source, sourceType, credibility, externalId }) answers where
a memory came from and how much to trust the origin. credibility (a [0,1]
UnitInterval) lets a consumer weight a memory by the trustworthiness of its source — a
user statement, a verified document, and an inferred guess are not equally reliable, and a
retrieval ranker should be able to say so. externalId (system:id, e.g. github:pr-123,
linear:SHO-39) keeps a memory traceable back to the artifact that produced it. A
multi-hop, signable provenance chain is a candidate generalization — see
Toward a Global Standard §E.
8. Prospective memory
Most memory is about the past. Prospective memory is about the future: a remembered
intention to do something later. OMIR models it with experienceType: "intention" on an
otherwise ordinary MemoryRecord — future-directed records live in the same store as
retrospective ones, so the same decay, confidence, and provenance machinery applies. A
conforming consumer SHOULD keep intentions out of ordinary retrospective recall and
surface them when their trigger condition is met (a time, a context); the
omir-personal-assistant profile makes this explicit (Profiles).
What OMIR deliberately does not specify
The following are implementation concerns. They are real and important, but they are not part of the at-rest contract, and a conformant document says nothing about them:
- Retrieval scoring. How memories are ranked at query time (similarity, BM25, graph
spreading, cross-encoders, RRF fusion, multi-signal blends) is engine-specific. Score
vectors ride in
extension[]; they are never core. See the worked example in Extensions. - Embeddings. Which model, which dimensionality, which vector — all implementation. R1 carries embeddings only as extensions; a neutral embedding representation is a candidate generalization (Toward a Global Standard §D).
- Consolidation schedules, replay, “sleep” phases. When and how memory is reorganized is an algorithm, not a state.
- Storage and indexing. Vector indexes, key-value engines, graph databases — none of it is OMIR’s concern. OMIR is the bytes at rest, not the engine.
This boundary is the whole design: standardize the state so any engine can read another engine’s memory, and leave the behavior free so engines still have something to compete on.
Theory & Scope
Non-normative. This page explains the theory of memory OMIR R1 encodes and states the boundaries of that theory. It adds no conformance requirements beyond those in Conformance; the RFC-2119 keywords below restate consumer guidance already implied by the resource semantics. It is the conceptual companion to Memory Semantics, which defines how individual fields behave over time.
The theory OMIR encodes
OMIR R1 standardizes the state a memory algorithm reads and writes, not the algorithm (Design Principles §6). The state it makes first-class — calibrated belief, half-life decay with anchoring, consolidation tiers, Hebbian edge strength, entity salience, bi-temporal timestamps, prospective intentions, and source credibility — places OMIR in a specific, well-established tradition: the rational analysis of memory, in which a memory’s strength tracks the statistics of the environment (recency and frequency predict future need), realized in cognitive architectures such as ACT-R and complemented by the hippocampal/neocortical picture of consolidation.
Naming the lineage matters because “state, not algorithm” does not make the choice of state neutral. The fields OMIR blesses are an opinionated, defensible theory of what memory is. This page states the boundaries of that theory so they read as deliberate design positions, not as omissions.
Scope: declarative memory
OMIR R1 models declarative memory — the episodic and semantic memory of facts,
experiences, and their relationships. In the common taxonomy of agent memory (a working
store plus long-term episodic, semantic, and procedural memory), R1 covers the
episodic and semantic long-term store, with tier standing in for a working/activation
notion.
The following are out of scope in R1 and have no first-class representation:
- Procedural / skill memory — learned tool-use policies, workflows, and macros.
- Parametric memory — knowledge held in model weights, adapters (e.g. LoRA), or fine-tunes.
- Activation / KV-cache memory — transient in-context state below the record level.
- Priming and other implicit (non-declarative) memory.
Such state MAY be carried opaquely (inside content, or under extension[]), but it is
not interpretable by a generic consumer and is not what R1 standardizes. A later release
MAY introduce first-class resources for these systems; per
Design Principles §4 they would start low on the
OMM and earn their level through independent implementation. Until then, “OMIR memory” means
declarative memory — stated here so adopters self-select rather than discover the boundary
after building an adapter.
State, not dynamics — and the snapshot it implies
Because OMIR records state and not the algorithm that evolves it, every time-dependent value
is a snapshot as of its timestamp. A decay block, an Entity.salience, a
Relationship.strength, and the retrievability they feed are all functions of time, frozen
at the moment of export. R1 defines no procedure for aging them forward.
A consumer SHOULD treat these values as valid as of their associated timestamp
(decay.lastAccess, Entity.lastSeenAt, Relationship.validAt, meta.lastUpdated) and
MAY recompute them under its own decay / salience / strength model on import. A consumer
MUST NOT assume an exported retrievability, salience, or strength is current at read
time. This is the price of encoding-neutral, algorithm-free portability: the record survives
the hand-off; the dynamics are reconstructed by the importer.
Stored scalars are producer-relative
The normalized scores OMIR carries — importance, Entity.salience,
Relationship.strength, provenance.credibility, and confidence.calibrated — are
producer-asserted and producer-normalized. They are comparable within one producer’s
output, where they share a normalization regime, and are not guaranteed comparable
across producers.
R1 defines no cross-producer scale for these fields. Consequently:
- A consumer MUST NOT assume two producers’ scalars share a scale: one store’s
importance: 0.8need not mean what another store’s0.8means. - A consumer that ranks or merges records from more than one producer SHOULD
renormalize per producer (keyed by
meta.source) rather than compare raw values. - For belief specifically, prefer the evidence-bearing
confidence.alpha/betaover the derivedcalibratedwhen comparing or merging across producers (Memory Semantics §1).
This reflects a fact about memory the literature is explicit on: salience and relevance are
cue-dependent and emergent at retrieval, not context-free stored properties. OMIR must
freeze them into stored scalars to serialize them at all; this section bounds what that
freezing does and does not guarantee. A future release MAY add an optional
normalizationRef so cross-producer comparability becomes detectable rather than assumed
(see Toward a Global Standard).
Records, not reconstructions
Human recall is reconstructive: a trace is rebuilt, and often altered, each time it is
retrieved. OMIR deliberately does not model this. A MemoryRecord is a stable, auditable
record with an explicit version counter and temporal invalidation (validUntil,
Relationship.invalidatedAt), not a trace that reconsolidates on access. Accessing a record
updates its retrievability (decay.lastAccess, decay.accessCount); it does not rewrite its
content.
This is a deliberate trade: OMIR exchanges reconstructive fidelity for auditability, diffability, and non-confabulation — properties a portable, archivable, hand-it-between-vendors format needs and a reconstructive store cannot offer. It also means OMIR’s “forgetting” is decay of retrievability, never deletion of data; governance-driven deletion and retention are a separate concern, deferred to a future governance vocabulary (see Toward a Global Standard).
These are bounds, not gaps
Each boundary above is a stated position, not an oversight. Several have candidate generalizations already named in Toward a Global Standard — open vocabularies, identity and multi-agent memory, a richer temporal model, modality and neutral embeddings, and a governance / retention vocabulary. They enter, if at all, through the RFC + ballot process at low OMM. R1’s contract is the narrower, honest one: a portable serialization of declarative memory state.
Encodings
OMIR defines one logical data model and more than one byte-level encoding. The two encodings MUST be lossless round-trips of the same model: a resource’s meaning MUST NOT depend on how it is encoded (see Design Principles §5).
| Encoding | Extension | Media type (provisional) | Use |
|---|---|---|---|
| JSON / JSON-LD | .omir | application/omir+json | Canonical. Interchange, export, backup, hand-off. |
| Binary profile | .omirb | application/omir+cbor | Edge, robotics, high-volume. Compact, fast to parse. |
JSON-LD canonical form
The canonical encoding is JSON, validated against the OMIR R1 JSON Schemas
(draft 2020-12) published at https://omir.io/spec/R1/schemas/. A conforming
.omir document is a Bundle: a JSON object whose entry
array carries the resources.
{
"resourceType": "Bundle",
"omirVersion": "R1",
"id": "export-2026-05-30",
"generatedAt": "2026-05-30T18:00:00Z",
"source": "veld/0.7.6 (MIF adapter)",
"entry": [
{
"resourceType": "MemoryRecord",
"id": "m-001",
"content": "User prefers execution-first responses, minimal hedging.",
"createdAt": "2026-05-30T17:55:12Z",
"kind": "learning",
"tier": "longterm",
"importance": 0.82,
"confidence": { "alpha": 9.0, "beta": 1.0, "calibrated": 0.9 },
"entityRefs": [{ "ref": "Entity/john" }]
},
{
"resourceType": "Entity",
"id": "john",
"name": "John",
"labels": ["person"],
"salience": 0.7,
"properNoun": true
}
]
}
JSON-LD compatibility
OMIR is JSON-LD compatible without forcing linked-data tooling on anyone. A Bundle
MAY carry an optional @context:
{
"@context": "https://omir.io/spec/R1/context.jsonld",
"resourceType": "Bundle",
"omirVersion": "R1",
"entry": []
}
- The
@contextSHOULD be the canonical URLhttps://omir.io/spec/R1/context.jsonld, or an object that includes it. - Adding
@contextMUST NOT change the core shape of any resource. A plain-JSON consumer that ignores@contextreads exactly the same data as a JSON-LD processor. - The
@contextmaps OMIR property names to IRIs so that OMIR Bundles can participate in RDF graphs, SPARQL queries, and linked-data pipelines for implementations that want them. This is opt-in. Episode.sourceis mapped to its own predicate (omir:episodeSource, via a JSON-LD 1.1 type-scoped context) so the kind of input an episode came from is not conflated with the producer/originsourcecarried byBundle,Meta, andProvenance. Broader per-field disambiguation of shared term names arrives with the R2 vocabulary work (see Toward a Global Standard).
Constraints on the canonical form
- Timestamps (
Instant) MUST be RFC 3339 / ISO 8601. UTC is RECOMMENDED. - Normalized scores (
UnitInterval:importance,salience,strength,credibility,confidence.calibrated) MUST lie in the closed interval[0, 1]. - Identifiers (
id) MUST match^[A-Za-z0-9._:-]{1,128}$and MUST be unique within theirresourceTypeinside a Bundle. - Resources MUST NOT carry undeclared top-level properties; every R1 resource
schema sets
additionalProperties: false. Implementation-specific data goes inextension[](see Extensions). - JSON numbers are values, not lexical forms:
9and9.0denote the samenumber, and R1 does not mandate a canonical numeric spelling. A future binary or attestation profile may pin one (see Toward a Global Standard); within R1 the two are equivalent.
The .omirb binary profile
For edge and robotics deployments — where an .omir JSON document is too large or too
slow to parse — OMIR defines a compact binary profile with the .omirb extension.
.omirbis a CBOR (RFC 8949) encoding of the same logical model as the canonical JSON, with bincode permitted as an internal sub-profile for tightly-coupled producer/consumer pairs.- The binary profile MUST be a lossless round-trip: decoding an
.omirbdocument and re-encoding it as.omirJSON MUST yield a Bundle equivalent to the original (modulo insignificant ordering and whitespace). - Field names, enums, and reference strings are preserved; the binary profile is a re-encoding, not a re-modeling. There are no binary-only fields and no JSON-only fields.
- The binary profile is OPTIONAL. An implementation that reads and writes only
.omirJSON is fully conformant. An implementation that emits.omirbMUST be able to emit the equivalent.omiron request, so that the canonical form is always reachable.
Choosing an encoding
- Use
.omirfor interchange: exports, backups, hand-offs between vendors, human inspection, version control. - Use
.omirbfor transport-adjacent persistence on constrained devices: a robot writing memory to flash over a Zenoh link, an embedded agent with a tight parse budget. Theomir-roboticsprofile (Profiles) is built around this case.
Extensions
OMIR’s core deliberately captures only the ~80% of fields that ~80% of memory systems
need (see Design Principles §3). The
remaining long tail — vendor-specific scores, experimental signals, model-specific
metadata — is carried in the typed extension[] array. This is OMIR’s escape hatch:
it lets implementations round-trip their proprietary data through a standard format
without breaking core conformance and without every other consumer needing to
understand it.
The extension mechanism
Every R1 resource (MemoryRecord, Entity, Relationship, Episode) carries an
optional extension[] array. Each element is an Extension object:
| Field | Type | Required | Meaning |
|---|---|---|---|
url | URI | yes | Canonical URL that defines this extension. |
valueString | string | no | Scalar string value. |
valueNumber | number | no | Scalar numeric value. |
valueBoolean | boolean | no | Scalar boolean value. |
valueJson | any JSON | no | Arbitrary structured value. |
Rules:
- An Extension MUST carry a
url. Theurlis the extension’s identity: it tells a consumer what this is and where the definition lives. - An Extension SHOULD carry exactly one
value*field. UsevalueJsonwhen the payload is structured (an object or array); use the scalarvalue*fields for simple values. - A consumer MUST ignore any extension whose
urlit does not recognize, and MUST still process the resource. Unknown extensions are never an error. This is the rule that makes the format forward-compatible: a producer can add new extensions and old consumers keep working. - Extensions MUST NOT be used to override or contradict a core field. They add data; they do not replace it.
The extension URL registry
An extension url is a stable, dereferenceable identifier for an extension
definition. OMIR adopts a registry model (as FHIR does):
- Standard extensions defined by the OMIR Working Group live under the OMIR
namespace, e.g.
https://omir.io/spec/R1/extension/<name>. - Vendor extensions live under a URL the vendor controls, e.g.
https://veld.dev/omir/ext/<name>. A vendor MUST NOT mint extensions under theomir.ionamespace. - The URL SHOULD resolve to a human- and machine-readable definition (name, description, value type, and the resources it applies to).
- The Working Group maintains a public index of standard extension URLs. Vendor extensions MAY be listed there for discoverability but are not owned by the WG.
This keeps the core small and stable while letting the ecosystem innovate at the edges: two implementations can each carry rich proprietary data, exchange Bundles, and lose nothing — each ignores the other’s extensions and reads the shared core.
Worked example: Veld’s 20-signal retrieval scores
Veld’s retrieval pipeline computes a 20-signal score vector per memory (recency, arousal, source credibility, graph strength, cross-encoder blend, entity match, and so on). None of that belongs in the OMIR core — it is a Veld implementation detail. It rides in an extension instead.
{
"resourceType": "MemoryRecord",
"id": "m-042",
"content": "Switched the cross-encoder to track HF main and pre-warm on startup.",
"createdAt": "2026-05-30T16:10:00Z",
"kind": "learning",
"importance": 0.74,
"confidence": { "alpha": 7, "beta": 2, "calibrated": 0.78 },
"extension": [
{
"url": "https://veld.dev/omir/ext/scoring-signals",
"valueJson": {
"schema": "veld-20-signal",
"version": "0.7.6",
"signals": {
"recency": 0.91,
"arousal": 0.40,
"sourceCredibility": 0.85,
"graphStrength": 0.62,
"crossEncoder": 0.71,
"entityMatch": 0.55,
"tagMatch": 0.33,
"episodeCoherence": 0.48,
"activationLevel": 0.66
}
}
},
{
"url": "https://veld.dev/omir/ext/external-dimensions",
"valueJson": {
"density": 0.58,
"coherence": 0.72,
"closure": 0.41,
"confidence": 0.79,
"isotropy": 0.63
}
}
]
}
What this buys each party:
- Veld round-trips its full retrieval state through
.omirand reconstructs it on import — nothing is lost in export. - A generic consumer (another agent, a viewer, a different memory engine) ignores
both extension URLs it does not recognize and processes the record using only the
core fields (
content,importance,confidence, …). It still works. - A second Veld-aware tool recognizes
https://veld.dev/omir/ext/scoring-signals, pulls the signal vector out ofvalueJson, and uses it.
This is the 80/20 rule in action: the shared 80% is interoperable by everyone; the proprietary 20% travels safely alongside it without becoming everyone’s problem.
Profiles
The OMIR core is intentionally permissive: almost every field is optional so that any memory system can map onto it. That generality is a problem for any particular domain, where a consumer needs guarantees — “every record will have provenance,” “every entity will carry an embedding,” “decay state will always be present.”
A Profile closes that gap. A profile is a published constraint on the OMIR core:
A profile = a constrained subset of the core + a set of required extensions.
Specifically, a profile MAY:
- mark otherwise-optional core fields as required;
- forbid core fields or enum values that are meaningless in its domain;
- require specific extensions (by
url) to be present; - tighten value ranges or cardinalities within what the base schema allows.
A profile MUST NOT:
- add new top-level core fields (use extensions);
- relax a base-schema requirement (a profile only ever narrows);
- contradict the base schema (a profiled resource is always also a valid base resource).
A resource declares the profiles it claims via meta.profile, an array of canonical
profile URLs. A Bundle is profile-conformant when every resource that claims a
profile satisfies that profile’s constraints (see Conformance).
R1 ships three reference profiles.
omir-coding-agent
URL: https://omir.io/spec/R1/profile/omir-coding-agent
For coding agents (Claude Code, Copilot agents, Cursor) that remember decisions, patterns, edits, and project context.
Constraints:
MemoryRecord.provenanceis REQUIRED, andprovenance.sourceTypeSHOULD be one ofconversation,document,command, orobservation.MemoryRecord.kindis REQUIRED (one ofmemory,plan,prompt,learning).- For records whose
experienceTypeiscode_edit,file_access, orcommand,provenance.externalIdSHOULD be present (e.g.github:pr-123,linear:SHO-39) for traceability back to the artifact. Entity.labelsSHOULD includetechnology,project, orskillwhere applicable, so a project graph is reconstructable.
Required extensions: none mandated; tool-call telemetry, if carried, SHOULD use a vendor extension URL.
omir-robotics (edge / Zenoh)
URL: https://omir.io/spec/R1/profile/omir-robotics
For embodied agents and robots persisting memory on constrained edge hardware, often over a Zenoh/ROS2 transport.
Constraints:
- The
.omirbbinary encoding (Encodings) is RECOMMENDED for persistence under this profile; producers MUST still be able to emit.omirJSON on request. Episode.sourceSHOULD beobservationorevent(robots ingest sensor and event streams, not chat messages).MemoryRecord.eventTimeis REQUIRED — physical events are timestamped at occurrence, distinct from ingestion time.- A spatial extension is REQUIRED on location-bearing records and entities, under a
canonical URL (e.g.
https://omir.io/spec/R1/extension/spatial-pose) carrying a pose/frame payload invalueJson. Spatial coordinates are not core in R1. decay.halfLifeHoursSHOULD be present so on-device forgetting survives power cycles.
omir-personal-assistant
URL: https://omir.io/spec/R1/profile/omir-personal-assistant
For personal-assistant agents that hold long-lived facts and preferences about a single user, and must handle PII responsibly.
Constraints:
MemoryRecord.confidenceis REQUIRED — a personal assistant must know how sure it is before acting on a remembered fact.MemoryRecord.validUntilSHOULD be set on facts known to be time-bounded (a current address, an employer), so stale facts are detectable.- Prospective memory is supported: records with
experienceType: "intention"carry future-directed reminders and MUST be filtered out of ordinary recall by a conforming consumer. - A PII-classification extension is REQUIRED on records and entities that may carry
personal data, under a canonical URL (e.g.
https://omir.io/spec/R1/extension/pii-class), so downstream handling can honor it. Entity.salienceSHOULD be present so the assistant can rank what matters to its user.
Defining your own profile
Implementations and domains MAY publish additional profiles. A profile definition MUST:
- have a stable canonical URL;
- state the base resource(s) it constrains;
- enumerate its added requirements, forbidden fields/values, and required extension URLs;
- guarantee that any resource conforming to the profile is also a valid base R1 resource.
Profiles are versioned with the release (R1, R2, …) and follow the same deprecation policy as the core.
Conformance
This section defines what it means for a document, and for an implementation, to conform to OMIR R1. The key words MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY are to be interpreted as described in RFC 2119.
Conformance levels
OMIR defines two levels of conformance.
Core conformance
A document is Core-conformant when it is a valid Bundle whose every resource validates against its R1 JSON Schema and satisfies the structural rules below. Core conformance makes no claim about any domain profile.
Profiled conformance
A document is Profiled-conformant to profile P when it is Core-conformant and
every resource that claims P in meta.profile satisfies all of P’s additional
constraints (Profiles). Profiled conformance is always in addition
to, never instead of, core conformance.
Document rules (MUST)
A Core-conformant document MUST satisfy the following rules. Each carries a stable identifier (CR-1 … CR-8) that other documents cite; the identifier is stable even if the list is later renumbered.
- CR-1 — Be a Bundle. The top-level object’s
resourceTypeisBundleandomirVersionisR1. Theentryarray is present. - CR-2 — Validate against the schemas. Every entry validates against its resource schema
(draft 2020-12) at
https://omir.io/spec/R1/schemas/. - CR-3 — Carry required fields per resource. At minimum:
MemoryRecord—resourceType,id,content,createdAt.Entity—resourceType,id,name.Relationship—resourceType,id,from,to,relationType.Episode—resourceType,id,content,createdAt.
- CR-4 — Use unique ids per type. Each
idmatches^[A-Za-z0-9._:-]{1,128}$and is unique within itsresourceTypeinside the Bundle. - CR-5 — Satisfy reference integrity. Every typed reference of the form
ResourceType/id— inentityRefs,Relationship.from/.to/.sourceEpisode,MemoryRecord.parentId(withinMemoryRecord) — MUST resolve to a resource of that type and id present in the same Bundle’sentryarray. A dangling reference is non-conformant. - CR-6 — Carry no undeclared core fields. Every R1 resource schema sets
additionalProperties: false. Implementation-specific data MUST be carried inextension[](see Extensions), not as new top-level properties. - CR-7 — Keep scores in range. Every
UnitIntervalfield (importance,salience,strength,provenance.credibility,confidence.calibrated) lies in[0, 1]. - CR-8 — Use RFC 3339 timestamps. Every
Instantis a valid RFC 3339 / ISO 8601 date-time.
Producer / consumer rules
Producers
A conforming producer (an implementation that emits OMIR):
- MUST emit Core-conformant Bundles.
- MUST place implementation-specific data in
extension[]under a URL it controls, never under theomir.ionamespace. - MUST be able to emit the canonical
.omirJSON form, even if it primarily emits.omirb(Encodings). - SHOULD populate
meta.sourceandmeta.maturityso consumers can reason about provenance and stability. - SHOULD NOT emit dangling references; if a referenced resource is intentionally out of scope, the reference SHOULD be omitted rather than left dangling.
Consumers
A conforming consumer (an implementation that reads OMIR):
- MUST accept any Core-conformant Bundle.
- MUST ignore unknown extensions and still process the resource. Encountering an
unrecognized extension
urlMUST NOT cause rejection. - MUST ignore unknown
meta.profileURLs it does not implement, while still reading the core fields. - SHOULD preserve extensions it does not understand when re-exporting, so data survives a read-modify-write round trip (“lossless pass-through”).
- MUST NOT infer meaning from the order of entries; references resolve by
ResourceType/id, not position.
Declaring conformance
An implementation declares its conformance by publishing a short conformance statement that lists:
- the OMIR release it targets (R1);
- its role(s): producer, consumer, or both;
- the conformance level: Core, and any Profiles it implements (by URL);
- for
.omirbsupport, whether it produces, consumes, or both; - the canonical extension URLs it emits.
A producer SHOULD also stamp meta.source on the resources it emits (e.g.
"veld/0.7.6") so a Bundle is self-describing.
Self-test
Conformance is verifiable, not just asserted. An implementation SHOULD pass the OMIR Working Group’s reference validator (Apache-2.0) against:
- its emitted Bundles (producer conformance), and
- the published R1 example Bundles (consumer conformance).
The validator checks schema validity, reference integrity, id uniqueness, score ranges, timestamp formats, and — when a profile URL is claimed — that profile’s added constraints.
“Powered by OMIR” badge
An implementation that publishes a conformance statement and passes the reference validator for the role(s) it claims MAY display the “Powered by OMIR” badge.
- The badge asserts Core conformance at minimum. An implementation MAY annotate the badge with the profiles it additionally satisfies (e.g. “Powered by OMIR — omir-coding-agent”).
- The badge MUST NOT be displayed by an implementation that fails the reference validator for its claimed role, or that claims a profile it does not satisfy.
- The OMIR Working Group stewards the badge and its usage guidelines. Misrepresentation of conformance is grounds for the WG to request the badge’s removal.
MemoryRecord
Generated artifact. This page is generated from
schemas/MemoryRecord.schema.jsonandschemas/common.schema.json. Do not hand-edit; regenerate from the schema and the field tables will stay authoritative.
The atomic unit of agent memory: a remembered experience, plan, prompt, or learning.
Maturity: OMM-4
Purpose
A MemoryRecord is the smallest retrievable item of cognitive state — one remembered thing, with the context needed to score, decay, and re-surface it. One core record type carries several lifecycle classes (memory, plan, prompt, learning) so that retrieval stays unified across them.
See Memory Semantics for how
confidence,decay,tier, andprovenancebehave over time.
Fields
| Field | Type | Card. | Description |
|---|---|---|---|
resourceType | "MemoryRecord" (const) | 1..1 | Discriminator. MUST be MemoryRecord. |
id | Id (string) | 1..1 | Resource-local identifier, unique among MemoryRecord resources within a Bundle. Pattern ^[A-Za-z0-9._:-]{1,128}$. |
content | string | 1..1 | WHAT — the primary human-readable content of the memory. |
createdAt | Instant (date-time) | 1..1 | Encoding time: when the record was written. |
meta | Meta | 0..1 | Metadata envelope (omirVersion, profile[], source, createdAt, lastUpdated, maturity). |
kind | enum memory | plan | prompt | learning | 0..1 | Record lifecycle class. Default memory. |
experienceType | enum (14 values, see below) | 0..1 | Finer-grained nature of the experience. intention denotes prospective (future-directed) memory. |
tier | enum working | session | longterm | archive | 0..1 | Position in the memory hierarchy. Default working. Promotion is driven by age × importance × access. |
eventTime | Instant (date-time) | 0..1 | WHEN the described event actually happened. MAY precede createdAt. |
importance | UnitInterval ([0,1]) | 0..1 | Normalized importance score. |
confidence | Confidence | 0..1 | Calibrated Bayesian Beta(α, β) posterior plus derived point estimate (calibrated). |
decay | Decay | 0..1 | Forgetting state (halfLifeHours, lastAccess, accessCount, anchored). Anchored records resist decay. |
provenance | Provenance | 0..1 | Origin and trust (source, sourceType, credibility, externalId). |
entityRefs | array of Reference | 0..* | References to Entity resources mentioned by this memory. Enables spreading activation without a graph lookup. |
parentId | Id (string) | 0..1 | Parent MemoryRecord id, for hierarchical knowledge trees. |
validUntil | Instant (date-time) | 0..1 | Temporal invalidation: this record is considered superseded after this instant. |
version | integer ≥ 1 | 0..1 | Record revision counter. Default 1. |
extension | array of Extension | 0..* | Typed, namespaced extensions carrying implementation-specific data without breaking core conformance. |
additionalProperties is false: a conformant MemoryRecord carries only the fields above.
experienceType vocabulary
conversation, decision, error, learning, discovery, pattern, context, task, code_edit, file_access, search, command, observation, intention.
Minimal
The required set is resourceType, id, content, createdAt.
{
"resourceType": "MemoryRecord",
"id": "mem-001",
"content": "Position OMIR as the at-rest format MCP transports, not a competing protocol.",
"createdAt": "2026-05-30T11:42:05Z"
}
Full
{
"resourceType": "MemoryRecord",
"id": "mem-positioning",
"meta": {
"omirVersion": "R1",
"profile": ["https://omir.io/spec/R1/profiles/agent-decision"],
"source": "veld/0.7.6",
"createdAt": "2026-05-30T11:42:05Z",
"lastUpdated": "2026-05-30T11:42:05Z",
"maturity": 4
},
"content": "Position OMIR as the at-rest data format that MCP/A2A transport, not a competing protocol.",
"kind": "learning",
"experienceType": "decision",
"tier": "longterm",
"createdAt": "2026-05-30T11:42:05Z",
"eventTime": "2026-05-30T11:42:00Z",
"importance": 0.95,
"confidence": { "alpha": 9.0, "beta": 1.0, "calibrated": 0.9 },
"decay": {
"halfLifeHours": 8760,
"lastAccess": "2026-05-30T11:42:05Z",
"accessCount": 1,
"anchored": true
},
"provenance": {
"source": "design-review",
"sourceType": "conversation",
"credibility": 0.92,
"externalId": "linear:OMIR-1"
},
"entityRefs": [
{ "ref": "Entity/omir" },
{ "ref": "Entity/varun" }
],
"parentId": "mem-standardization-thread",
"validUntil": "2027-05-30T00:00:00Z",
"version": 1,
"extension": [
{
"url": "https://veld.dev/omir/ext/scoring-signals",
"valueJson": { "graphStrength": 0.84, "arousal": 0.6, "feedbackMomentum": 0.71 }
}
]
}
References
A MemoryRecord points to other resources by typed Reference (ResourceType/id):
entityRefs[]→Entity— entities mentioned by this memory, used for spreading activation.
It also carries an intra-type link:
parentId→MemoryRecord— parent record in a hierarchical knowledge tree (a localId, not aReference).
No MemoryRecord field references Relationship, Episode, or Bundle directly.
Entity
Generated artifact. This page is generated from
schemas/Entity.schema.jsonandschemas/common.schema.json. Do not hand-edit; regenerate from the schema and the field tables will stay authoritative.
A named thing extracted from memory — a person, place, organization, concept, technology, product, skill, or discriminative keyword.
Maturity: OMM-3
Purpose
An Entity is a canonicalized node in the knowledge graph. It is the target of MemoryRecord.entityRefs and Episode.entityRefs, and the endpoint type for every Relationship. Salience governs how strongly an entity pulls on retrieval.
See Memory Semantics §5 for how
salienceis computed and used in retrieval.
Fields
| Field | Type | Card. | Description |
|---|---|---|---|
resourceType | "Entity" (const) | 1..1 | Discriminator. MUST be Entity. |
id | Id (string) | 1..1 | Resource-local identifier, unique among Entity resources within a Bundle. Pattern ^[A-Za-z0-9._:-]{1,128}$. |
name | string | 1..1 | Canonical surface form, e.g. "John", "Paris", "Rust programming". |
meta | Meta | 0..1 | Metadata envelope (omirVersion, profile[], source, createdAt, lastUpdated, maturity). |
labels | array of enum (12 values, see below) | 0..* | One or more type labels. Open at the edges via other + extension. |
summary | string | 0..1 | Context summary built from surrounding relationships. |
mentionCount | integer ≥ 0 | 0..1 | Number of times this entity has been mentioned. |
salience | UnitInterval ([0,1]) | 0..1 | How important this entity is. Higher salience = stronger gravitational pull in retrieval. |
properNoun | boolean | 0..1 | Proper nouns carry higher base salience than common nouns. |
attributes | object (string → string) | 0..1 | Type-specific key/value attributes. |
createdAt | Instant (date-time) | 0..1 | When the entity was first recorded. |
lastSeenAt | Instant (date-time) | 0..1 | When the entity was most recently observed. |
extension | array of Extension | 0..* | Implementation-specific data (e.g. name embeddings — model + vector — ride here; embeddings are NOT core in R1). |
additionalProperties is false: a conformant Entity carries only the fields above.
labels vocabulary
person, organization, location, technology, concept, event, date, product, skill, keyword, project, other.
Minimal
The required set is resourceType, id, name.
{
"resourceType": "Entity",
"id": "varun",
"name": "Varun"
}
Full
{
"resourceType": "Entity",
"id": "omir",
"meta": {
"omirVersion": "R1",
"source": "veld/0.7.6",
"maturity": 3
},
"name": "OMIR",
"labels": ["project", "concept"],
"summary": "Open Memory Interoperability Resources — a FHIR-style standard for portable agent memory.",
"mentionCount": 34,
"salience": 0.97,
"properNoun": true,
"attributes": { "kind": "standard", "release": "R1" },
"createdAt": "2026-05-30T10:00:00Z",
"lastSeenAt": "2026-05-30T11:42:00Z",
"extension": [
{
"url": "https://veld.dev/omir/ext/name-embedding",
"valueJson": { "model": "minilm-384", "dim": 384 }
}
]
}
References
An Entity holds no outbound references to other OMIR resources — it is a leaf node by design. The graph edges around it are expressed elsewhere:
Relationship.from/Relationship.to→Entity— entities are the endpoints of every relationship.MemoryRecord.entityRefs[]→EntityandEpisode.entityRefs[]→Entity— records and episodes point to entities.
Relationship
Generated artifact. This page is generated from
schemas/Relationship.schema.jsonandschemas/common.schema.json. Do not hand-edit; regenerate from the schema and the field tables will stay authoritative.
A directed, weighted edge between two Entity resources. Strength is dynamic (Hebbian): it increases with co-activation and decays without use.
Maturity: OMM-3
Purpose
A Relationship is a typed graph edge linking two entities. Its strength carries Hebbian synaptic weight, and validAt / invalidatedAt give it a temporal lifecycle so superseded edges can be retained rather than deleted.
See Memory Semantics §4 for how
strengthrises with co-activation and decays with disuse.
Fields
| Field | Type | Card. | Description |
|---|---|---|---|
resourceType | "Relationship" (const) | 1..1 | Discriminator. MUST be Relationship. |
id | Id (string) | 1..1 | Resource-local identifier, unique among Relationship resources within a Bundle. Pattern ^[A-Za-z0-9._:-]{1,128}$. |
from | Reference | 1..1 | Source Entity reference, e.g. { "ref": "Entity/john" }. |
to | Reference | 1..1 | Target Entity reference. |
relationType | string (open vocabulary) | 1..1 | Edge type. Common values below. Implementations MAY mint new lowercase snake_case values. |
meta | Meta | 0..1 | Metadata envelope (omirVersion, profile[], source, createdAt, lastUpdated, maturity). |
strength | UnitInterval ([0,1]) | 0..1 | Synaptic weight. Dynamic under Hebbian plasticity. |
context | string | 0..1 | Free-text note describing the relationship. |
createdAt | Instant (date-time) | 0..1 | When the edge was first recorded. |
validAt | Instant (date-time) | 0..1 | When this relationship was last observed to hold (temporal tracking). |
invalidatedAt | Instant (date-time) | 0..1 | When this relationship was invalidated, if ever (temporal edge invalidation). |
sourceEpisode | Reference | 0..1 | Episode reference that produced this relationship. |
extension | array of Extension | 0..* | Typed, namespaced extensions carrying implementation-specific data without breaking core conformance. |
additionalProperties is false: a conformant Relationship carries only the fields above.
Reference constraint. A
Reference.refmatches^(MemoryRecord|Entity|Relationship|Episode)/…. By conventionfromandtopoint toEntityresources andsourceEpisodepoints to anEpisode; the schema’sReferenceshape does not itself narrow the target type, so producers MUST honor these conventions.
relationType common values
works_with, works_at, employed_by, part_of, contains, owned_by, located_in, located_at, related_to. The vocabulary is open.
Minimal
The required set is resourceType, id, from, to, relationType.
{
"resourceType": "Relationship",
"id": "rel-varun-omir",
"from": { "ref": "Entity/varun" },
"to": { "ref": "Entity/omir" },
"relationType": "maintains"
}
Full
{
"resourceType": "Relationship",
"id": "rel-varun-omir",
"meta": {
"omirVersion": "R1",
"source": "veld/0.7.6",
"maturity": 3
},
"from": { "ref": "Entity/varun" },
"to": { "ref": "Entity/omir" },
"relationType": "maintains",
"strength": 0.88,
"context": "Convenes the OMIR working group.",
"createdAt": "2026-05-30T10:05:00Z",
"validAt": "2026-05-30T11:42:00Z",
"invalidatedAt": null,
"sourceEpisode": { "ref": "Episode/ep-launch-chat" },
"extension": [
{
"url": "https://veld.dev/omir/ext/hebbian",
"valueJson": { "coActivations": 17, "lastStrengthened": "2026-05-30T11:42:00Z" }
}
]
}
The
nullforinvalidatedAtabove is illustrative of an edge still in force; producers SHOULD simply omit the field when an edge has not been invalidated.
References
A Relationship points to other resources by typed Reference (ResourceType/id):
from→Entity— the source endpoint of the edge.to→Entity— the target endpoint of the edge.sourceEpisode→Episode— the episode that produced this relationship.
No Relationship field references MemoryRecord or Bundle.
Episode
Generated artifact. This page is generated from
schemas/Episode.schema.jsonandschemas/common.schema.json. Do not hand-edit; regenerate from the schema and the field tables will stay authoritative.
A bounded experience — the raw event from which MemoryRecords, Entitys, and Relationships are derived. The episodic backbone of the graph.
Maturity: OMM-3
Purpose
An Episode is the raw, time-bounded input — a message, document, event, or observation — that downstream resources are extracted from. It preserves both event time and ingestion time, and lists the entities it produced, so derived records remain traceable to their origin.
See Memory Semantics §6 for the event-time vs. ingestion-time model.
Fields
| Field | Type | Card. | Description |
|---|---|---|---|
resourceType | "Episode" (const) | 1..1 | Discriminator. MUST be Episode. |
id | Id (string) | 1..1 | Resource-local identifier, unique among Episode resources within a Bundle. Pattern ^[A-Za-z0-9._:-]{1,128}$. |
content | string | 1..1 | The actual experience data. |
createdAt | Instant (date-time) | 1..1 | When this episode was ingested (ingestion time). |
meta | Meta | 0..1 | Metadata envelope (omirVersion, profile[], source, createdAt, lastUpdated, maturity). |
name | string | 0..1 | Human-readable title for the episode. |
source | enum message | document | event | observation | 0..1 | What kind of input produced this episode. |
eventTime | Instant (date-time) | 0..1 | When the original event occurred (event time). |
entityRefs | array of Reference | 0..* | Entities extracted from this episode. |
metadata | object (string → string) | 0..1 | Free-form string key/value metadata. |
extension | array of Extension | 0..* | Typed, namespaced extensions carrying implementation-specific data without breaking core conformance. |
additionalProperties is false: a conformant Episode carries only the fields above.
Note on
source. OnEpisodethis is an enum (message|document|event|observation) and is distinct from the free-textsourcestring insidemeta/Provenanceused on other resources.
Minimal
The required set is resourceType, id, content, createdAt.
{
"resourceType": "Episode",
"id": "ep-launch-chat",
"content": "Varun and the agent agreed OMIR should be positioned as the at-rest format MCP transports, not a competitor to it.",
"createdAt": "2026-05-30T11:42:03Z"
}
Full
{
"resourceType": "Episode",
"id": "ep-launch-chat",
"meta": {
"omirVersion": "R1",
"source": "veld/0.7.6",
"maturity": 3
},
"name": "Standardization discussion",
"content": "Varun and the agent agreed OMIR should be positioned as the at-rest format MCP transports, not a competitor to it.",
"source": "message",
"eventTime": "2026-05-30T11:42:00Z",
"createdAt": "2026-05-30T11:42:03Z",
"entityRefs": [
{ "ref": "Entity/varun" },
{ "ref": "Entity/omir" }
],
"metadata": { "channel": "design-review", "thread": "OMIR-positioning" },
"extension": [
{
"url": "https://veld.dev/omir/ext/wavelet-session",
"valueJson": { "sessionId": "sess-2026-05-30", "segment": 3 }
}
]
}
References
An Episode points to other resources by typed Reference (ResourceType/id):
entityRefs[]→Entity— the entities extracted from this episode.
Episode is itself referenced by Relationship.sourceEpisode. No Episode field references MemoryRecord, Relationship, or Bundle.
Bundle
Generated artifact. This page is generated from
schemas/Bundle.schema.jsonandschemas/common.schema.json. Do not hand-edit; regenerate from the schema and the field tables will stay authoritative.
A serialized collection of OMIR resources — the on-disk .omir document itself.
Maturity: n/a — Bundle is the container/transport envelope, not a graded memory resource.
Purpose
The Bundle is the container resource: a .omir file is a Bundle. It carries a flat entry[] of core resources whose typed references (ResourceType/id) resolve within the bundle, and it is JSON-LD compatible — an optional @context enables linked-data processing without changing the core shape.
Fields
| Field | Type | Card. | Description |
|---|---|---|---|
resourceType | "Bundle" (const) | 1..1 | Discriminator. MUST be Bundle. |
omirVersion | "R1" (const) | 1..1 | Spec release the bundle conforms to. MUST be R1. |
entry | array of resource (oneOf MemoryRecord | Entity | Relationship | Episode) | 1..* | The resources carried by this bundle. Order is not significant; references resolve by ResourceType/id. |
@context | string or object (JSON-LD context) | 0..1 | Optional JSON-LD context. SHOULD be https://omir.io/spec/R1/context.jsonld or an object that includes it. |
id | Id (string) | 0..1 | Resource-local identifier for the bundle. Pattern ^[A-Za-z0-9._:-]{1,128}$. |
generatedAt | Instant (date-time) | 0..1 | When the bundle was serialized. |
source | string | 0..1 | Producing implementation, e.g. "veld/0.7.6 (MIF adapter)". |
additionalProperties is false: a conformant Bundle carries only the fields above. The schema marks required: ["resourceType", "omirVersion", "entry"]; entry is an array, so a valid bundle contains at least one resource.
File profiles. The canonical encoding is JSON / JSON-LD with extension
.omir. The compact binary profile (CBOR/bincode) uses extension.omirband carries the identical resource model.
Minimal
The required set is resourceType, omirVersion, entry.
{
"resourceType": "Bundle",
"omirVersion": "R1",
"entry": [
{
"resourceType": "MemoryRecord",
"id": "mem-001",
"content": "OMIR is the at-rest format MCP transports.",
"createdAt": "2026-05-30T11:42:05Z"
}
]
}
Full
{
"@context": "https://omir.io/spec/R1/context.jsonld",
"resourceType": "Bundle",
"omirVersion": "R1",
"id": "demo-bundle-001",
"generatedAt": "2026-05-30T12:00:00Z",
"source": "veld/0.7.6 (MIF adapter)",
"entry": [
{
"resourceType": "Episode",
"id": "ep-launch-chat",
"content": "Varun and the agent agreed OMIR should be positioned as the at-rest format MCP transports, not a competitor to it.",
"source": "message",
"eventTime": "2026-05-30T11:42:00Z",
"createdAt": "2026-05-30T11:42:03Z",
"entityRefs": [
{ "ref": "Entity/varun" },
{ "ref": "Entity/omir" }
]
},
{
"resourceType": "Entity",
"id": "varun",
"name": "Varun",
"labels": ["person"],
"salience": 0.91,
"properNoun": true
},
{
"resourceType": "Entity",
"id": "omir",
"name": "OMIR",
"labels": ["project", "concept"],
"salience": 0.97,
"properNoun": true
},
{
"resourceType": "Relationship",
"id": "rel-varun-omir",
"from": { "ref": "Entity/varun" },
"to": { "ref": "Entity/omir" },
"relationType": "maintains",
"strength": 0.88,
"sourceEpisode": { "ref": "Episode/ep-launch-chat" }
},
{
"resourceType": "MemoryRecord",
"id": "mem-positioning",
"content": "Position OMIR as the at-rest data format that MCP/A2A transport, not a competing protocol.",
"kind": "learning",
"experienceType": "decision",
"tier": "longterm",
"createdAt": "2026-05-30T11:42:05Z",
"importance": 0.95,
"confidence": { "alpha": 9.0, "beta": 1.0, "calibrated": 0.9 },
"entityRefs": [
{ "ref": "Entity/omir" },
{ "ref": "Entity/varun" }
]
}
]
}
References
The Bundle does not hold typed Reference fields of its own; it is the resolution scope for everyone else’s. Within a single bundle:
entry[]may containMemoryRecord,Entity,Relationship, andEpisoderesources.- Cross-resource references (
MemoryRecord.entityRefs,Episode.entityRefs,Relationship.from/to/sourceEpisode) resolve against theentry[]members by theirResourceType/id.
A Bundle MUST NOT appear inside another Bundle’s entry[] — the oneOf admits only the four core resources (MemoryRecord, Entity, Relationship, Episode).
Toward a Global Standard
Non-normative. This page is a design discussion, not part of the R1 conformance surface. Nothing here changes what makes a Bundle valid (Conformance). Every change proposed below would enter through the RFC + ballot process and start low on the OMIR Maturity Model — at OMM-0/1 — and earn its level through independent implementation.
OMIR R1 is honest about its origin: its schemas were derived from one production memory engine, Veld. That is a strength — the core is grounded in a system that actually ships calibrated confidence, decay, Hebbian edges, and tiering — and a risk. A standard authored by a single implementation is, until proven otherwise, that implementation’s export format with a logo. The work of becoming a global standard is the work of shedding the assumptions that are true only for Veld while keeping the cognitive substance that makes OMIR worth adopting.
This page names those assumptions and proposes the generalizations that would let a
robotics stack, a healthcare agent, a multi-tenant assistant platform, and a coding agent
all read and write the same .omir files without loss. It is the OMIR equivalent of
FHIR’s long migration from “HL7’s resources” to “everyone’s resources.”
What already generalizes (keep it)
Before the critique, the parts of R1 that are not Veld-specific and should be preserved:
- Resource + typed-reference model. Everything is a Resource; links are
ResourceType/id. This is FHIR-proven and domain-neutral. - 80/20 core + typed
extension[]. The escape hatch is exactly what lets the long tail of proprietary data travel without bloating the core. See Extensions. - Honest maturity (OMM). A promise-about-change per resource type. Domain-neutral.
- Calibrated confidence as a distribution, not a bare float (Memory Semantics).
- Portable forgetting state — decay, anchoring, tiers — recorded as state, not as a mandated algorithm.
- Bitemporal-ish timestamps (
eventTimevscreatedAt) and temporal invalidation. - Two lossless encodings and the Profiles mechanism for domain tightening.
The generalizations below are mostly additive — new optional structure and extension points — precisely so they do not break the parts that already work.
Generalization themes
Each theme states what is Veld-specific today, why it limits interchange, and a concrete proposal. The FHIR precedent is cited where one exists, because OMIR is FHIR-modeled and should borrow FHIR’s solved problems rather than reinvent them.
A. Open vocabularies, not closed enums (highest interchange leverage)
Today. MemoryRecord.kind, experienceType, Entity.labels, and Episode.source
are closed enums, and they lean toward coding/dev-agent life: code_edit,
file_access, command, prompt. A robot cannot say “object grasped”; a healthcare
agent cannot say “symptom reported”; a research agent cannot say “hypothesis formed.”
relationType is already an open string — good — but it has no way to say which
vocabulary a term comes from.
Risk. Closed enums hard-code one domain’s worldview into the core. Every other domain
is forced into other + an extension, which means their primary semantics are invisible
to a generic consumer — defeating interchange for everyone but Veld-likes.
Proposal. Adopt a FHIR-style CodeableConcept for these fields: a structure
carrying an optional system (vocabulary URI), a code, and human text, e.g.
"experienceType": { "system": "https://omir.io/vocab/robotics", "code": "object_grasped", "text": "Picked up the red block" }
R1’s bare-string/enum forms remain valid as the degenerate case (text only). OMIR ships
core vocabularies (the current enums, promoted to published code systems) and lets
domains register their own. This single change is the difference between “a memory format
for coding agents” and “a memory format.”
B. Identity, ownership, and multi-agent memory
Today. Memory is implicitly single-agent. There is no first-class notion of whose
memory a record is, who authored it, or how two agents share a memory store. Veld carries
optional agent_id/actor_id tags, but they are tags, not a model.
Risk. The 2026 reality is fleets of agents and multi-tenant platforms. Without an ownership/authorship model, a Bundle exchanged between agents cannot answer “can I trust this? who wrote it? am I allowed to read it?” — the questions that matter most when memory crosses a trust boundary.
Proposal. Introduce an Agent (or Principal) resource and optional
authoredBy / owner references on records and episodes; let Provenance carry the
chain of agents a memory passed through (see Theme E). Access control itself stays a
profile/extension concern (it is policy, not data shape), but the identity hooks the
policy needs belong in the core. Grade Agent at OMM-0 and let multi-agent stacks prove
it.
C. Entity resolution across systems
Today. Entity.id is bundle-local; there is no canonical external identifier, no
alias set, and no “this entity is the same as that one.” Within Veld a UUID suffices
because there is one graph.
Risk. Interchange is entity resolution. If Acme’s “Alice Smith (employee 123)” and Globex’s “@asmith” cannot be declared the same person, two memory stores can never merge — the whole point of a portable format collapses to per-pair adapters.
Proposal. Add FHIR-style Entity.identifier[] ({ system, value } pairs, e.g.
{ "system": "mailto", "value": "alice@acme.com" }), an aliases[] list of surface
forms, and a sameAs link type so a Bundle can assert cross-system identity. This is the
backbone of federation (Theme I).
D. Beyond text: modality and embeddings
Today. content is a single string. Multimodal data and embeddings are
extension-only. Veld carries image/audio/video vectors internally, but the at-rest core
sees text.
Risk. Agent memory in 2026 is increasingly multimodal (vision, audio, sensor). A text-only core relegates non-text memory to opaque extensions that no generic consumer can interpret.
Proposal. Generalize content to an optional multi-part content model —
contentType (a media type) plus either inline content or a MediaReference
({ uri, contentType, hash }) for out-of-line bytes — and define an algorithm-neutral,
efficiency-bearing Embedding that any vendor can emit without mandating a model. Embeddings
stay optional (derived, not source) but become interpretable rather than vendor-opaque. The
block below is the single canonical definition of the embedding object: it folds in the
efficiency-bearing fields (Matryoshka prefixes, sparse codes, a dual-trace role, and reserved
drift / VSA spaces). The Efficiency & Information-Bearing Codes page (Theme D
reframed from “neutral embeddings” to efficiency-bearing codes) supplies the rationale —
watts / inference / canon — for each efficiency field but does not redefine it.
"Embedding": {
"type": "object",
"description": "Algorithm-neutral, optional embedding code (derived, not source). Carries a dense, sparse, Matryoshka-nested, or out-of-line code; comparable only within one 'space'. Efficiency rationale: efficiency.md.",
"properties": {
"space": { "type": "string", "description": "Opaque equality key — codes are comparable IFF 'space' strings are byte-equal. Reserved WG spaces: 'omir:temporal-context' (a slow-drift contiguity vector, efficiency.md EP-5a), 'omir:vsa' (a bind/bundle structural hypervector, efficiency.md EP-F)." },
"model": { "type": "string", "description": "Producer model id that generated the code." },
"dims": { "type": "integer", "minimum": 1, "description": "Full dimensionality; MUST equal a dense 'vector' length and the sparse index space width." },
"dtype": { "type": "string", "description": "Element byte form, e.g. 'f32' | 'int8' | 'binary'. Pins the .omirb encoding; excluded from signed images by default (Phase 3)." },
"vector": { "type": "array", "items": { "type": "number" }, "description": "Dense code. Mutually exclusive with 'sparse'." },
"sparse": {
"type": "object",
"description": "Sparse code for CPU inverted-index search + one-step associative completion (efficiency.md EP-3). Mutually exclusive with 'vector'.",
"properties": {
"indices": { "type": "array", "items": { "type": "integer", "minimum": 0 }, "description": "Active dimension indices, strictly ascending." },
"values": { "type": "array", "items": { "type": "number" }, "description": "Weights parallel to 'indices' (equal length)." }
},
"required": ["indices", "values"],
"additionalProperties": false
},
"ref": { "$ref": "#/$defs/MediaReference", "description": "Out-of-line code — an offloaded / verbatim payload (efficiency.md EP-2)." },
"matryoshka": { "type": "boolean", "description": "True if 'vector' is a nested (Matryoshka) code: any prefix whose length is in 'nestedDims' is itself a valid, rankable embedding — coarse-to-fine shortlisting without re-embedding (efficiency.md EP-2)." },
"nestedDims": { "type": "array", "items": { "type": "integer", "minimum": 1 }, "description": "Ascending valid prefix lengths, e.g. [64,128,256,512,768]. Rank on any listed prefix, re-rank on a longer one; prefixes compare only within one 'space' (efficiency.md EP-2)." },
"role": { "enum": ["gist", "verbatim"], "description": "Dual-trace role (efficiency.md EP-2): 'gist' = durable, compact, anchorable, ranked cheaply; 'verbatim' = fast-decaying surface trace, usually offloaded via 'ref' and fetched only for top-k." }
},
"additionalProperties": false
}
E. Provenance and trust as a chain
Today. Provenance is a flat { source, sourceType, credibility, externalId }. Veld’s
internal model is richer (a relay chain with per-hop credibility and a verified flag);
the core flattens it.
Risk. When memory crosses vendors, “where did this come from, through whom, and is it signed?” is a trust-critical question a flat source string cannot answer.
Proposal. Align with W3C PROV: let Provenance carry a chain[] of derivation
steps (agent, activity, time, per-hop credibility) and an optional attestation/signature
so a consumer can verify a record was not tampered with in transit. Keep the flat form as
the common case.
F. Privacy, sensitivity, and retention
Today. Absent. R1 has no notion of data sensitivity, consent, legal basis,
retention/expiry policy, or redaction. validUntil is for contradiction, not governance.
The omir-personal-assistant profile gestures at a PII-class extension, but it is opt-in
and shallow.
Risk. A global memory format will carry personal and regulated data. Without a governance vocabulary, OMIR cannot be adopted where GDPR/CCPA/HIPAA-style obligations apply — which is most of the interesting market.
Proposal. Define an optional governance block (sensitivity classification, legal
basis, retention policy / deleteAfter, redaction markers, consent reference). Likely a
core extension family with published URLs first, promoted to core once proven. This is
the theme most likely to decide whether OMIR is adoptable by enterprises at all.
G. Uncertainty beyond a point estimate
Today. Confidence is Beta(α, β) + a calibrated point — already better than most. But
it assumes one uncertainty model and conflates evidence count with belief.
Proposal. Keep Beta as the recommended default; allow a more general uncertainty representation (credible interval, or a named distribution + parameters) and optionally distinguish epistemic (lack of evidence) from aleatoric (inherent variability) uncertainty for agents that reason about it.
H. A complete temporal model
Today. eventTime vs createdAt plus validUntil / validAt / invalidatedAt is a
partial bitemporal model, and time is always a precise RFC 3339 instant.
Proposal. Make bitemporality explicit and uniform (a valid-time interval and a transaction-time interval), and support imprecise time (“sometime in 2024”, “before the incident”) via interval/precision-qualified instants. Memory is frequently vague about when; the format should be able to say so.
I. Federation and cross-Bundle references
Today. R1 is deliberately closed-world: every reference must resolve inside the Bundle. That is the right call for R1 (it makes conformance decidable), but it blocks linking a small export to a large shared graph.
Proposal. Define an optional, higher conformance level for resolvable external references (a reference may target a resource in another, addressable Bundle/store), with clear rules so the closed-world guarantee remains the default and the validator can still decide conformance. Federation is how a memory standard scales past one file.
J. Structured knowledge and typed attributes
Today. Entity.attributes is string → string; Episode.metadata likewise. All
structure degrades to stringly-typed key/values.
Proposal. Allow typed attribute values (number, boolean, dateTime, quantity-with-unit, CodeableConcept, reference) so a global graph can carry real structured knowledge — a measurement with units, a date, a link — without serializing everything to strings.
K. Language and internationalization
Today. Text is implicitly English; there are no language tags, and the reference pipeline’s tokenization/NER assume English.
Proposal. Add optional language tags (BCP 47) on textual content and entity names,
and allow multiple language variants of a name/summary. A global standard cannot assume
one language.
A prioritized path
Not all of these are equal. For interchange specifically — the existential goal — the order is clear:
| Priority | Theme | Why first | Likely entry |
|---|---|---|---|
| 1 | A. Open vocabularies | Unblocks every non-coding domain at once; small, additive change. | R1.x · OMM-1 |
| 2 | C. Entity resolution | Interchange is identity; without it, stores can’t merge. | R2 · OMM-1 |
| 3 | F. Privacy / retention | Gate to enterprise & regulated adoption. | R2 · OMM-0 |
| 4 | B. Identity / multi-agent | Matches the 2026 fleet reality; trust across boundaries. | R2 · OMM-0 |
| 5 | E. Provenance chain + attestation | Trust when memory crosses vendors. | R2 · OMM-1 |
| 6 | D. Modality + neutral embeddings | Multimodal memory becomes interpretable, not opaque. | R2 · OMM-0 |
| — | H, G, J, K, I | Valuable, lower interchange-urgency; fold in as domains demand. | R2+ |
The first item is the highest-leverage and lowest-cost: turning four closed enums into CodeableConcepts, with the current values published as the seed vocabularies, would by itself move OMIR from “a coding-agent memory format” to “a memory format that coding agents use,” at OMM-1, without breaking a single R1 document.
The adoption plan
The prioritization above is a ranking; this is the plan. The Working Group’s stated
intent is to adopt all of the proposals, in a deliberate sequence set by dependency and
leverage — vocabularies (A) first, identity & resolution (C) next — with W3C PROV (E)
as the trust backbone the rest hangs from, and the refinements that fix G, J, and K folded
in once the load-bearing pieces land. Every phase enters at low OMM and is gated by independent
implementation. “Additive and non-breaking” is a claim with teeth here, not a slogan: every
R1 resource schema sets additionalProperties: false (Conformance §Document rules, rule CR-6),
so a new core field is rejected by the very R1 schema we promise to keep honoring. We
therefore label each change honestly against GOVERNANCE §5.1/§5.2 and route it through the
correct release vehicle below.
A note on the §5.2 boundary, stated once and reused. Adding a new, unreferenced $defs
member to common.schema.json is purely additive under §5.2 (no existing bundle becomes
invalid). It is the widening of an existing field or def to reference it, or the widening of
the shared Reference pattern, that is non-additive. This is the principle that lets
CodeableConcept (Phase 1) and TypedValue (Phase 5) be added cleanly as defs while the union
widening (Phase 1) and the Agent/Reference widening (Phase 2) are correctly breaking. Wherever
the plan calls a shared-schema change “breaking” or “additive,” it means this. A corollary the
plan applies throughout: any field whose value can only be expressed by widening the shared
Reference pattern (authoredBy, owner, an Agent-targeting sameAs) is NOT §5.1-additive and
is R2/RFC-gated, full stop — it cannot ride an R1.x increment.
Four preconditions before any phase ships
Each precondition is itself a §3.1 normative RFC (it amends GOVERNANCE/principles/CONTRIBUTING,
the common.schema.json shared defs, or process), so each needs a 14-day-minimum comment window
and a two-thirds TSC ballot (§3.3) before Phase 1 opens. Bundle P0+P0a into one
governance-reconciliation RFC (they are coupled) and ship P1 as a second RFC; P-1 (the
founding-TSC bootstrap) is procedurally first because no ballot is callable without a seated TSC.
P-1 — Seat a founding TSC (the true first domino). GOVERNANCE §2.3 targets a 3–7 seat TSC filled “by community nomination and confirmation by the sitting TSC” — but no TSC is seated, and §2.3 has no bootstrap clause for the first one, so the balloting body is self-perpetuating by design and cannot start. “Confirm it before P0” is not a mechanism. P-1 (a GOVERNANCE amendment, in scope) adds a one-time founding-TSC procedure that does not require a sitting TSC: a public call-for-nominations seating an initial 3-seat TSC with at least two non-Veld seats (so Veld holds ≤ one-third from day one and the §2.4 cap holds — at 3 seats, one-third rounded down is zero, so a Veld-majority bootstrap would be unlawful), confirmed by lazy consensus over a fixed comment window rather than by a nonexistent sitting TSC. P-1 is therefore where external participation is first recruited, not an administrative formality — and it couples directly to P2: you cannot seat a neutral TSC without at least one non-Veld party, the same party the second-implementer gate needs.
P0 — Reconcile the OMM ladder and re-grade the published R1 table. The repo carries three inconsistent OMM-2 definitions: GOVERNANCE §4.1 OMM-2 = “Exercised in real bundles by at least one implementation” (GOVERNANCE.md:233); principles.md §4 OMM-2 = “implemented in more than one system” (principles.md:55); and CONTRIBUTING.md §8 collapses OMM-1 and OMM-2 into one band — “Trial use — implemented somewhere, but limited field experience” (CONTRIBUTING.md:372-378) — putting “multiple implementations” only at OMM-3. The reconciliation RFC MUST rewrite the ladder identically across all THREE documents (GOVERNANCE §4.1+§4.3, principles.md §4, CONTRIBUTING.md §8, splitting its merged “1–2” row), and update the §3.4 Draft/Trial-Use/Normative band table in the same RFC:
- OMM-1 = implemented in ≥1 system; field set volatile.
- OMM-2 = ≥2 independent implementations; shape settling, not yet broad field use.
- OMM-3 = ≥2 independent implementations plus real-world cross-validated usage/conformance evidence (the discriminator from OMM-2 is settling vs cross-validated field experience).
The reconciliation is monotonic in BOTH directions — it lowers any existing grade the new independence gate no longer supports. Today GOVERNANCE §4.3 grades MemoryRecord at OMM-4 and Entity/Relationship/Episode at OMM-3 with Veld as sole implementer (GOVERNANCE.md:255-258), repeated in principles.md §4 (62-64), CONTRIBUTING.md §8 (382-384), and the README grade table (README.md:48-51). Under the reconciled gate those grades are unsupportable: there is no second independent implementation. The same RFC MUST re-grade the R1 table — demoting MemoryRecord and Entity/Relationship/Episode to OMM-1 (“settled shape in the reference implementation; awaiting a second independent implementation”) across all four artifacts in one ballot. Leaving four overclaimed grades standing while reforming the OMM in the name of honesty is exactly the dishonesty §4.2 forbids and the loudest possible “the OMM is marketing” signal to a prospective second implementer. Until P0 lands, no phase may claim OMM-2, and every OMM target below is read against the reconciled, re-graded ladder. P0 also assigns stable rule anchors CR-1..CR-8 to Conformance §“Document rules (MUST)” (today a positional 1–8 list with no stable IDs), so every cross-reference in this plan and its RFCs survives renumbering.
P0a — Define “independent” so it is recorded, neutral, and not gameable. An implementation counts toward the gate iff (a) it has a separate codebase/maintainer, developed without copying reference-implementation code under any license other than the Apache-2.0 grant — re-implementing from the CC-BY spec and using the validator as a conformance oracle explicitly COUNTS as independent (otherwise the licensing invariant that anyone may re-implement from the spec contradicts the gate); (b) it ships under a separate product/funding line with its own users; and (c) it passes the reference validator while PRODUCING the feature (not merely ignoring it). The gate is decoupled from TSC seats and employer: a committed second adopter who joins the TSC is exactly what we want and MUST NOT be disqualified for it. An anti-sock-puppet clause (common control, shared funding, or shared engineering team → counts as one) replaces any unenforceable string-match on “shared employer.” To keep the gate from being decided by the party that benefits from declaring it met: the counts-as-one determination uses the SAME “organization/common control” definition as the §2.4 seat cap (one bright line governs both seat math and implementation counting), the TSC MUST publish written findings against the three criteria on the ballot record, and the determination is taken by a supermajority EXCLUDING any TSC member employed by or contracted to the reference implementer (extending the §2.3 recusal rule — the convening vendor has a material non-interoperability interest in the gate being declared met). The WG validator and any Veld-authored tool never count.
P1 — A versioning contract for the R1→R2 line, with a named validator-rewrite scope. The R1
version pin is enforced at four points today; the R2 schema set and the validator MUST handle
each: (i) Bundle.omirVersion const:"R1" is neutralized to a plain string in the validator
(schemas.rs:103) — the literal-R1 envelope gate actually lives in Rust at version_presence
(lib.rs:390-402 → E300), so making it version-relative is a code change, not a schema const
edit; (ii) Meta.omirVersion const:"R1" (common.schema.json:39 → E301); (iii) Meta is
additionalProperties:false (common.schema.json:58); (iv) every resource schema AND the
Bundle envelope set additionalProperties:false (CR-6; Bundle.schema.json:30). Specify:
- R2 Bundles and resources carry
omirVersion:"R2"; the R2common.schema.json(under the R2 namespace) sets theMeta.omirVersionsite toconst:"R2"(or anenumof accepted majors), leaving R1 schemas frozen. E300/E301 become version-relative, not literal-R1. - A consumer reads the major version and applies that version’s schema set, and MAY
reject a newer major it does not implement. Forward-read of unknown core fields is
constrained by closed schemas and is stated honestly: because R2.0 resource schemas are
additionalProperties:false, an already-shipped R2.0 validator does NOT silently forward-read a new core field introduced by a later R2.x minor — it would hard-reject it (E120), exactly as “R1 readers do not tolerate R2.” Intra-major additivity therefore rides theextension[]lane (always tolerated byignore-unknown-extensions), not undeclared new top-level properties. Any genuinely new optional core field that we want same-major consumers to skip is shipped only when a minor schema bump relaxes the relevant object to a controlled, pattern- gated extension lane; absent that, it is a new-release item. The earlier blanket “SHOULD ignore unknown newer-minor fields” claim is dropped as unimplementable against closed schemas — this corrects the Overlook table’s “R1.x-additive ⇒ forward-readable” assumption. - Forward-read across majors is a property of R2+ tools about older majors, never of already-shipped R1 tools about newer ones. A published R1 validator WILL reject any R2 bundle at the envelope (E300) by design. The honest compatibility guarantee is exactly “R1 bundles stay valid forever” + “R2 readers accept R1,” not “R1 readers tolerate R2.”
- The reference validator becomes version-aware as a named, scoped rewrite, not a config
switch: (a) parameterize
SchemaFiles,COMMON_ID, andRESOURCE_TYPES(schemas.rs) by declared major version (an R1 set and an R2 set, each with its owncommon.schema.json$id); (b) build a version-keyedRegistryand dispatch inSchemas::build; (c) the validator reads the declared Bundle-levelomirVersionat PARSE time, before any structural / reference / version check, and selects the version-keyed registry; E110/E120/E200/E300/E301 then all run against the selected set. (Selecting “before E300” is insufficient: today structural E110/E120 run first (lib.rs:31-132) and E300 last (lib.rs:142), so an R2 bundle would be double-reported as wrong-version AND malformed-against-R1.) An unknown newer major short-circuits to a single E3xx at the envelope and runs NO resource-schema checks; (d) the embedded-schemainclude_str!mechanism carries both releases’ schemas. DoD: P1 is done only when the same binary validates a published R1 example AND an R2 example, and reports an R2-declared bundle as a single envelope-level E300 under the R1 set.
The phases
Phase 1 — Vocabulary (A). Split into 1a (the cheap adoption surface) and 1b (closed-vocabulary enforcement); 1b does NOT gate 1a and is not on the second-implementer critical path.
Phase 1a, in two decoupled moves so namespace-opening costs nothing on day one. (1) Open the
namespace first, additively. Publish the seed code systems and the vocabulary registry under
https://omir.io/spec/R2/vocab/<field> as an R1.x-additive documentation + JSON-LD artifact
(no schema type change), so other domains can NAME their own concepts immediately with zero
bundle becoming R2. This is the real payoff of A-first under the directive — opening the
namespace, not paying a major-version envelope break before any consumer benefits. (2) Widen the
union when a coded value is demanded. Add a CodeableConcept def ({ system, code, text })
to common.schema.json (purely-additive per the §5.2 note) and widen kind, experienceType,
Entity.labels, Episode.source, and Relationship.relationType (the spec’s one already-open
vocabulary — CONTRIBUTING §3.10:137-140, GOVERNANCE §5.1:282 — and the single largest real-world
fragmentation vector; widening it is the least breaking because its bare branch is already
type:string). Widen each closed enum’s string branch to the EXISTING enum, not to type:string:
anyOf: [ <the R1 enum>, CodeableConcept ] (for relationType, anyOf: [ {type:string}, CodeableConcept ]). This keeps the legacy closed set hard-validated at Core (a bad kind
like "banana" still fails E120). Make the seed/object mapping STRUCTURAL, not prose: in the
R2 schema, constrain the CodeableConcept branch of each widened field so its system MUST NOT
be that field’s own seed-system URL (not:{properties:{system:{const:<seedURL>}}}), so a seed
value is expressible only as the bare string and the union is genuinely bijective at the Core
schema layer — no Phase-1b dependency, no honor-system. Move (2) ships as the first R2 change;
move (1) does not. Adding or retiring a seed value is governed per the operation, below.
Seed-registry governance, split by direction (so additive growth stays cheap):
- ADDING a seed value to an existing seed system is §5.1-additive open-vocabulary growth (it matches GOVERNANCE §5.1’s existing “new open-vocabulary values” carve-out): an ordinary PR under Maintainer review, NOT a full RFC. Routing every new concept through a 14-day ballot would make A heavier than today’s open-vocab path — the opposite of cheap adoption.
- RETIRING, narrowing, or re-meaning a seed value is enum-narrowing: §3.1 RFC + ballot, and (see Phase 3) a canonicalization-version bump under §6 so historical signatures are verified under the frozen seed table of their own version, never retroactively flipped.
Normalization, stated precisely and per-field cardinality:
- A bare seed string
sand the seeded coding{system:<seed URL>, code:s}are semantically equivalent for matching/comparison; the bare-string seed form is the canonical at-rest form. R2 introduces a NEW consumer rule (not a restatement of the existing extensions-scoped SHOULD): an R2 consumer MUST NOT canonicalize between bare-string and object form for seed-coded fields on re-export — lossless pass-through. This is a consumer-side tightening listed in the R2 migration note; equality is a consumer obligation, not a structural check (the core validates union shape only, with the seed-system exclusion above making the shape itself bijective). - For fields carrying a schema
default(kind,tier): do NOT mutate the R1defaultin place (MemoryRecord.kind:18default:"memory", tier:31default:"working"live on an OMM-1 type and are read by default-applying codegen/normalizers). The R2 schemas keepdefault; the canonical at-rest form is omission when a field equals its default (not the explicit value). Phase 3’s canonicalizer normalizes a defaulted field to its omitted form before building the signed map M, so absent and explicit-default sign identically. Removing/relocating adefaultwould change the value seen by default-applying consumers — so this is a documented R2 migration item under “behavior of absent fields” for bothkindandtier(“an absentkindMUST be interpreted asmemory; an absenttierasworking”), not a silent edit. - For scalar optional fields with no default (
experienceType,Episode.source): absent stays absent and is distinct from{text:s}. - For the array field
Entity.labels: the legacy"other"+ label-extension idiom is DEPRECATED under §6 in favor of a CodeableConcept item. Updatecontext.jsonldlabels/term mappings accordingly. - A non-seed coded value MUST use the object form; a seed value MUST use the bare string — now enforced structurally (above), so Phase-3 attestation canonicalization collapses the union to one byte sequence.
Companion item (atomic with the widening, NOT deferred to 1b): re-key every behavioral MUST that
references a widened enum value. profiles.md:88-90 makes prospective-memory filtering a
behavioral MUST keyed on the literal experienceType:"intention". The moment 1a admits the object
form, an unmodified profile MUST is silent on experienceType:{system:<seed>, code:"intention"} —
a safety-relevant recall gap. The normative re-key (“any experienceType whose canonical form
is the seeded intention coding MUST be filtered from ordinary recall”) lands in the SAME RFC as
the experienceType widening, with a 1a-DoD fixture proving an object-form intention is still
filtered. Only the machine-readable binding engine stays in 1b.
JSON-LD @context audit (a DoD blocker, not an afterthought): widening a field that shares an
overloaded @context term silently corrupts the RDF lift of existing bundles. In R1, one term
source is bound to a single bare omir:source with no per-field @type (context.jsonld:33), yet
four fields use it: Episode.source (being widened), Bundle.source, Meta.source,
Provenance.source (all plain strings). A coded Episode.source would lift a {system,code,text}
node onto a predicate that elsewhere carries a literal, and system/code/text (unmapped) would
fall to accidental @vocab predicates. Before widening any field, split every overloaded term in
the R2 context.jsonld: give the widened Episode.source its own term (episodeSource → omir:episodeSource) typed for an embedded coding, add CodeableConcept’s system/code/text
with explicit typing, and keep the three string source fields on a separate literal predicate.
DoD: a Turtle/N-Quads golden-file test that the R1 minimal bundle and an R2 coded bundle both
lift to the intended triples.
Honest status: this is a type change to existing fields, two of which live on MemoryRecord (now OMM-1 under P0). A bare-string producer stays valid; an object value is rejected by an R1 consumer; an R2 bundle is opaque to an unmodified R1 consumer at the envelope (per P1). Domain vocabularies (robotics, healthcare) live under their own non-omir.io URLs; the omir.io namespace is reserved for WG-published seed systems only. Target: R2 · OMM-1. A-first is about opening the namespace (move 1, day-one, additive) so other domains can name their own concepts; the union widening (move 2) is held to R2 and is the cheapest additive spec change — but milestone zero (below) is the literally cheapest adoption.
Phase 1b — closed-vocabulary enforcement (separate work item, off the 1a critical path).
Extend profiles.md with a machine-readable terminology-binding artifact (a profile schema,
not prose) binding a CodeableConcept field to a code system/value set with a binding strength
(required/extensible/preferred), mirroring FHIR. Implement profile-constraint checking in
the reference validator — today there is no code path (lib.rs:144 is a bare comment): this is a
net-new subsystem (profile loader, meta.profile dispatch, value-set membership check, new
finding codes: required → error, extensible/preferred → warning), with passing and failing
examples. Re-issue the three R1 reference profiles, splitting by kind:
- Genuinely REQUIRED enums (
omir-coding-agent.kind) →required-strength bindings. The required-strength binding requires the field to be PRESENT — an absentkindrelying on the Phase-1a normative default does not satisfy the profile (a profile narrows the base; presence is the thing it guarantees). Cross-referenced from the 1a “default” bullet. - SHOULD value fields (
omir-coding-agent provenance.sourceType(profiles.md:41),Entity.labels) →preferred/extensiblebindings reported as warnings, never errors. Binding strength MUST match the original RFC-2119 keyword; promoting a SHOULD to an enforcedrequiredbinding is a new-release narrowing, not a re-expression. - Behavioral MUSTs are NOT terminology bindings and stay as prose RFC-2119 requirements. Re-issuing a profile MUST preserve every behavioral MUST verbatim (the intention-filtering re-key already landed in 1a); terminology bindings constrain values, not consumer behavior.
Phase 2 — Identity (B: Agent) and Resolution (C). B (Agent) is a prerequisite ONLY for E2
(agent-attribution hops); the headline trust deliverable E1 depends only on A and the existing
Provenance and ships without Agent — so Agent’s adoption gate never holds the trust pillar
hostage. B: introduce an Agent (Principal) resource and optional authoredBy / owner
references. C: add optional Entity.identifier[] ({ system, value }), aliases[], and a
sameAs link so two stores can declare cross-system identity and merge.
Honest status, partitioned by whether a field touches the Reference pattern:
- §5.1-additive (R1.x-eligible):
Entity.identifier[]andaliases[](a new non-refdef{system,value}and string array — noReferencewidening). - §5.2/§3.1-breaking (R2/RFC-gated): any field that can reference
Agent—authoredBy,owner, andsameAsinsofar as it permits Agent targets — because each forces thecommon.schema.json#/$defs/Reference.patternwidening. These are NOT R1.x-additive. AddingAgentdoes not invalidate any existing R1 bundle, but the conformance/bundle prose enumerating “the four core resources” must change in the same PR. Required, atomic work items: - (a1) author
schemas/Agent.schema.json; (a2) add its$reftoBundle.entry.items.oneOf(aoneOfof resource-schema$refs, not a regex); (a3) widen thecommon.schema.json#/$defs/Reference.patternregex (today^(MemoryRecord|Entity| Relationship|Episode)/..., common.schema.json:18) to includeAgent(the single load-bearing pattern change). - (a4) extend the validator’s
SchemaFilesstruct +from_dir/embeddedloaders +by_typemap (schemas.rs is a hard-coded 6-file / 4-type shape that will not pick up a new type automatically); updateRESOURCE_TYPES(item b) and both “four core resources” message sites: the validator E101 string (lib.rs:124) and validator/README.md:93. - (b) add
AgenttoRESOURCE_TYPES. - (c) replace the hand-coded per-field reference walk with a registry-driven walker. The
mandate is not “resolve every
{ref:"Type/id"}anywhere.” It resolves (i) every Reference-object{ref:...}at a schema-declared Reference-typed field — NEVER insideextension[].valueJsonor any free-form/additionalPropertiesmap (a vendor may legitimately put a{"ref":...}-shaped object invalueJson); and (ii) every closed-world bare-id field enumerated in CR-5 (currentlyMemoryRecord.parentId, special-cased at lib.rs:219-230). Drive the walk from an explicit registry of(field-path → target-type-SET, ref-shape)entries (a set, because some R2 slots are poly-typed — see (c′)). Seed the registry with the FULL existing closed-world set as ground truth from lib.rs:217-239 and CR-5:entityRefs → {Entity}(on MemoryRecord and Episode),Relationship.from → {Entity},Relationship.to → {Entity},Relationship.sourceEpisode → {Episode}(silently dropped in earlier drafts — its E201 expected-type=Episode check must survive the refactor),MemoryRecord.parentId → {MemoryRecord}(bare-id). AddingauthoredBy/owner/sameAs/provenance.chain[]then extends the registry rather than relying on shape-sniffing, and the per-field target type is preserved (E201 must still reject, e.g.,Agent/xinRelationship.from). - (c′) Change E201’s check from
ref_type == expectedtoexpected_set.contains(ref_type)(lib.rs:337), since Phase 3’sprovenance.chain[].fromis poly-typed (any in-Bundle resource) whilechain[].agentis mono-typed to{Agent}. The current single-type tuple cannot model this; the set form can. - (d) amend Conformance CR-5 by appending the new reference-bearing fields to the existing closed-world list (never re-deriving it from the Reference type); state that the closed-world set is exactly the schema’s Reference-typed fields plus the enumerated bare-id fields, and record the per-field target-type table in the migration note.
- (e) register
Agentin the JSON-LD@context; (f) ship a migration note.
For Phase 5’s federation form, do NOT relax the shared Reference pattern: define the
external-reference form as a distinct $def (ExternalReference, its own property, not ref)
so the Core closed-world pattern is untouched. Target: R2 · B (Agent): OMM-0 / C: OMM-0 —
C is graded OMM-0 “mechanism present, cross-system merge unverifiable single-party”, not OMM-1:
identifier[]/aliases/sameAs are inert until either a second store exists or federation (I)
lands. C’s identifier fields and I’s federation mechanism are a matched pair whose combined
value requires the second implementer; neither is claimed as delivered interop value at single-party.
Phase 3 — Provenance & trust: W3C PROV (E), the pillar. Re-shape Provenance into a
PROV-aligned derivation chain with optional, redaction-aware attestation. Detailed below. E splits:
E1 (chain over prior resources + per-hop credibility) depends only on A and the existing
Provenance; E2 (agent-attribution hops referencing Agent) depends on B. The critical
path is A → E1-chain — pure additive optional structure, immediately reviewable by a second
party. The heavyweight attestation subsystem (det-CBOR / M / inclusion allow-list / redaction
commitments / key resolution) is a PARALLEL track explicitly OFF the critical path, because
cross-verification is its only meaningful test and is worthless with one implementer: it cannot
reach OMM-1 or interop until the P2 second-verifier milestone is met. This honors “E emphasized” as
the trust design pillar without front-loading a multi-quarter cryptographic interop project
ahead of any second implementer. The E1/E2 discriminator is normative: a record is E1 only
if its chain contains no agent operand and no agent-naming attestation; the moment any hop
carries agent or the attestation’s signed subset includes an Agent reference, it is E2
and capped at the Agent OMM floor. E2 MUST NOT advance past OMM-0 until Agent is at least
OMM-1 with Phase-2 Reference support. Target: R2 · E1-chain: OMM-1 / attestation & E2: OMM-0
(capped at the Agent floor; non-interoperable until P2).
Phase 4 — Sensitivity & modality (F, D). F: an optional governance block (classification,
legal basis, retention / deleteAfter, redaction markers, consent reference) — plus a typed
redactionCommitments[] array ({ fieldPath, commitment, salt }) so per-field signed
commitments have a declared schema home under additionalProperties:false (without it there is no
legal place to store a commitment for an arbitrary redactable field). The redaction mechanism is
co-designed with E’s attestation envelope (Phase 3). On introducing the core governance block,
deprecate the WG-published, omir.io-namespaced, profile-REQUIRED pii-class extension
(https://omir.io/spec/R1/extension/pii-class, profiles.md:91-93) under §6 (deprecated R2, earliest
removal R3): because it is required by omir-personal-assistant, during the R2 window that
profile MUST accept EITHER the deprecated extension OR the core classification field (so no
existing producer is instantly non-conformant), and the R2 migration note MUST carry an explicit
field-by-field value-shape mapping (pii-class → core classification). D: a multi-part
content model (contentType + inline-or-MediaReference) and an algorithm-neutral Embedding
representation { space, model, dims, dtype, vector?|ref? } where space is an opaque equality
key (vectors comparable iff space strings are byte-equal; no cross-space comparability
claim), dims MUST equal the vector length, and dtype pins the byte form. On-the-wire: a
JSON decimal array in .omir, a typed CBOR array in .omirb. Because .omir decimal and
.omirb dtype bytes are not byte-reconstructable from each other, embedding vectors are EXCLUDED
from signed images by default; if ever committed, the commitment is over a single dtype-independent
canonical-decimal (shortest-round-trip) form, never the dtype byte form (see Phase 3). Honest
status: F and D are new optional fields; absent the Agent/ref dependency they qualify for
R1.x-additive (footnote ¹). Target: R2 · OMM-0 (D grades OMM-0 until two vendors demonstrate a
cross-store similarity round-trip).
Phase 5 — Refinements: fix G, J, and K (and fold in H, I).
- G — uncertainty: keep Beta as default; add a general uncertainty value (credible interval, or named distribution + params) and optionally separate epistemic from aleatoric. New optional fields — R1.x-additive-eligible.
- J — typed attributes: do not replace the string maps. Add an additive optional
sibling channel —
Entity.typedAttributes[]/Episode.typedMetadata[], each{ key, value }wherevalueis a sharedTypedValuedef incommon.schema.json(number, boolean, dateTime, quantity-with-unit, CodeableConcept, reference; the def-add is purely additive) — and markattributes/metadataDEPRECATED under §6 (deprecated R2, earliest removal R3), updatingcontext.jsonld. Adding the typed channel AND marking the maps deprecated are both §5.1-additive and R1.x-eligible (per §6.4 validators only warn on deprecated-but-present items); only the eventual R3 REMOVAL of the maps is the breaking boundary. CodeableConcept (from A) is one member of the union; keep the dependencyA → J, and J’s typed channel caps at A’s grade.TypedValueis NOT reused in theExtensionvalue slots — constraining them to a closed union would remove arbitraryvalueJson(a §5.2 break violating the typed-extension escape-hatch invariant).Extension.valueJsonstays maximally permissive; if typed extension values are later wanted, add an optionalvalueTyped($ref TypedValue) alongside the existingvalue*fields (purely additive) — never retro-typevalueJson. - K — i18n: BCP-47 language tags on textual content and entity names, with optional multilingual variants. New optional fields — R1.x-additive-eligible.
- Fold in H (full bitemporality + imprecise/interval time — new optional fields) and I
(federation). I requires a mechanism before a target: introduce the syntactically distinct
ExternalReference$def(its own property, an absolute/identifier-based locator built on Theme C’s identifiers, never the bareResourceType/idand never a relaxed Core pattern); define a third conformance level “Federated” in Conformance.md (via RFC) that relaxes CR-5 only for explicitly-external references; closed-world remains the Core default and the R1/R2 invariant is preserved for anything not claiming Federated. I is blocked by (1) the widened Reference (Phase 2 B-work), (2) the newExternalReference$def, and (3) the CR-5 carve-out RFC — C’s identifiers are an input to how external references are expressed, not the gate. Target: R2+ · OMM-0.
Phase 3 in depth — aligning Provenance with W3C PROV
W3C PROV models the world with Entity (a thing), Activity (something that acts over
time), and Agent (who/what is responsible), related by a type-constrained verb set whose
relations take incompatible operands: wasAttributedTo is Entity→Agent; wasDerivedFrom is
Entity→Entity; wasGeneratedBy is Entity→Activity (generated entity is the subject); used is
Activity→Entity and wasInformedBy is Activity→Activity — both Activity-subject. OMIR’s flat
{ source, sourceType, credibility, externalId } is the common case; the proposal lets it
expand into a chain when trust matters. The chain’s implicit subject is the resource carrying
the provenance block, lifting to prov:Entity. Each step is a discriminated union keyed on
role that pins its legal operands. To avoid a fatal @context term collision, the chain does
NOT reuse the globally-bound terms from (Relationship operand, context.jsonld:158) or
credibility (context.jsonld:91): it uses distinct field names derivedFrom and hopCredibility
(JSON-LD term definitions are document-global, not subtree-scoped — the same term cannot mean two
things under one merged context).
"provenance": {
"source": "design-review",
"credibility": 0.92,
"aggregateCredibility": { "value": 0.75, "model": "product" },
"chain": [
{ "role": "wasAttributedTo", "agent": { "ref": "Agent/agent-a" }, "at": "2026-05-30T11:42:00Z", "hopCredibility": 0.95 },
{ "role": "wasDerivedFrom", "derivedFrom": { "ref": "Episode/ep-launch-chat" }, "at": "2026-05-30T11:42:03Z", "hopCredibility": 0.90 },
{ "role": "wasGeneratedBy", "activity": { "id": "act-consolidation", "label": "consolidation", "at": "2026-05-31T03:00:00Z" }, "hopCredibility": 0.88 }
],
"attestation": { "alg": "ed25519", "keyId": "did:web:example.org#k1", "key": "<inline JWK>", "agent": "Agent/agent-a", "at": "2026-05-31T03:00:01Z", "signature": "base64url…" }
}
-
Role/operand validation is normative and enforced. With the bearing resource as subject, the Entity-subject verbs are well-formed and enumerated:
wasAttributedTo→ MUST carryagent, MUST NOT carryderivedFrom;wasDerivedFrom→ MUST carryderivedFrom(a prior resource), MUST NOT carryagent;wasGeneratedBy→ MUST carry anactivityoperand, MUST NOT carryagent. A validator rule rejects role/operand mismatches. The Activity-subject verbsused,wasAssociatedWith, andwasInformedByare ALL reserved (none can take the bearingprov:Entityas subject) until a first-class Activity referent exists. (wasInformedByis Activity→Activity, the same subject-type problem as the others — it was previously and contradictorily permitted with an inline activity; that permission is dropped, leavingwasGeneratedByas the only Activity-touching verb in R2.) -
derivedFromis poly-typed;agentis mono-typed.wasDerivedFrom.derivedFrommay resolve to any in-Bundle resource type (Episode, MemoryRecord, Entity, Relationship, Agent), so its registry entry carries the full resource-type set and role/operand validation is what narrows it per hop;wasAttributedTo.agentis mono-typed to{Agent}. The widened E201 set-membership check (Phase 2 c′) covers both. -
Activity is an inline shape, not a literal and not a top-level resource. R2 carries Activity within the hop as
activity: { id, label?, at? }(no newBundle.entry/oneOf type, so the “five core resources” prose is untouched). The PROV-O lift mints a blank-nodeprov:Activityfrom it. Activity operands carry norefand are EXEMPT from closed-world resolution; the walker keys strictly onref/derivedFrom/agentand skips them. Promoting Activity to a first-class resource is deferred. -
Closed-world applies to the resource-typed chain refs.
chain[].agentandchain[].derivedFromare closed-world references in R2 and MUST resolve within the Bundle; the registry-driven walker covers them. Chain-ref integrity is a DOCUMENT property: the validator walks it unconditionally. A producer that cannot include an upstream Agent/Episode MUST inline a minimal resource, satisfy the hop with a placeholder tombstone of the right type, or omit the hop — never emit a dangling ref. A placeholder tombstone Agent MUST carry an explicitplaceholder:truemarker. Cross-Bundle provenance is deferred to Federated, never permitted at Core; until I lands, attested records carry their provenance closure. -
Per-hop credibility is the only normative trust number; any roll-up is optional and non-normative.
chain[].hopCredibilityis a UnitInterval, added to the CR-7 [0,1] enumeration and bound tocommon.schema.json#/$defs/UnitInterval. The legacyprovenance.credibilitykeeps its R1 meaning — it is not silently redefined as a derived product. Any roll-up lives in a new optionalprovenance.aggregateCredibility { value: UnitInterval, model: "product"|"min"|… }.productis a series-reliability / weakest-chain heuristic, NOT an independent-evidence probability — the “independent-evidence” label is dropped as false. A SHOULD-level check warns whenaggregateCredibility.valueis inconsistent withmodelover the present hops (rounding-tolerant). -
Attestation — explicitly an OMM-0, off-critical-path, parallel track until two independent verifiers demonstrate cross-encoding verification AND the A-normalization and J-migration canonicalization versions are pinned. The validator cannot enforce any of this today (Outcome is a closed
{Pass, Fail}enum, report.rs:64-69;Checkis{Structural, ReferenceIntegrity, VersionPresence, Profile}, report.rs:45-52; lib.rs has no crypto, no CBOR, and reads only parsed.omirJSON). The attestation track therefore includes an honest validator work item paralleling Phase 1b’s “net-new subsystem”: (a) add an.omirb/CBOR reader with a declared sub-profile tag byte; (b) add aCheck::Attestationvariant and a per-attestation finding tripletverified/tampered/unverifiablethat is SEPARATE from the document-level Pass/Fail — anunverifiableattestation MUST NOT flipcore_conformantto false (it is orthogonal to schema conformance); (c) scope the ed25519/key code as OMM-0 and OUT of the Core conformance path, so a Core-R2 consumer that does not verify is still conformant. Until (a)-(c) land, the bincode-under-attestation rule is documentation only and attestation is OMM-0/non-interoperable — no validator-enforced MUST is claimed. The mechanism, when built:- Signing input is a canonical typed map M defined by a closed, version-tagged INCLUSION
allow-list, enumerated per signable resource type. For MemoryRecord, M includes
id,resourceType, acontent-commitment, the ordered chain (roles, operandref/inline-activity-id, per-hophopCredibility), the assertingAgentref +keyId+ the inline signing key + signing-timeat, andclassificationonce Phase 4 lands; M excludes mutable/operational fields (decay.*,meta.lastUpdated,version). Entity/ Relationship subsets, if signable, are enumerated separately (Entity excludesmentionCount,salience,lastSeenAt; Relationship excludesstrength,validAt,invalidatedAt). - Scope. An attestation signs the bearing resource’s enumerated subset plus a
hash-commitment of each referenced resource’s
id+resourceType(not their mutable bodies). - One canonical byte form. The signature is over
SHA-256(det-CBOR(M))(RFC 8949 §4.2). A JSON (.omir) signer/verifier MUST construct the identical typed map M and the identical det-CBOR(M) bytes; JCS is at most an aid to building M, never an alternate signed form. No field whose.omirand.omirbrepresentations are not provably byte-reconstructable from each other may enter a signed commitment (dtype-pinned binary byte forms MUST NOT) — this clause is also written into encodings.md so encoding-neutrality (Principle 5) and one-canonical-form hold simultaneously. - Number normalization is a single pre-CBOR step applying to ALL signed numeric fields
(not only UnitInterval/score): every JSON number in M maps to its shortest round-tripping
decimal, so
9,9.0,9.00collapse (the shipped corpus already drifts —{alpha:9,beta:1}in encodings.md:36 vs{alpha:9.0,beta:1.0}in minimal-bundle.omir:82) and CBOR floats are forbidden for signed numeric fields. A KAT fixture proves{alpha:9}and{alpha:9.0}sign identically. Defaulted fields are normalized to their omitted form before M is built, so absent and explicit-default sign identically (Phase 1a). A KAT fixture proves two MemoryRecords differing only in presence/absence of a default-valuedkindproduce the identical digest. - CodeableConcept-union fields are pre-canonicalized into M (seed values → bare string —
enforced bijective by the Phase-1a seed-system exclusion — non-seed →
{system,code}withtextdropped) before CBOR. The canonicalization-profile version is carried INSIDE M (not merely on the envelope), and the verifier selects canonicalization rules by the version recorded in M, never by its current profile. A seed retirement (Phase 1a) is a canonicalization-version bump; attestations are verified under their own version’s frozen, per-release-published seed table (§5.3) — so seed evolution is non-retroactive to historical signatures. - Ordered chain, not a set. M commits to the ordered chain (a det-CBOR array of per-hop commitments, equivalently a Merkle root), so reorder/drop/duplicate is detected.
- Key authority is self-contained — verification is decidable from the Bundle alone. Because
OMIR is an at-rest format with a closed-world invariant, a
verifiedresult MUST be a pure function of the bundle bytes. M therefore inlines the full public key (JWK) used and MAY inline a short Agent-signed key-authorization assertion bindingkeyIdatat; verification = signature valid over M + key-binding valid +atwithin validity. Out-of-band DID/HTTPS key resolution is demoted to an OPTIONAL Federated-level enhancement (it belongs with Theme I, which already gates open-world). AsameAsmerge MUST NOT rewrite a signed Agent id. Lawful key rotation/revocation does not retroactively flip historical attestations. - Redaction is cryptographically honest — and the security claim is stated honestly. For each
redactable field M contains a
commitment = H(field-bytes ∥ per-field-salt)(stored in the Phase-4redactionCommitments[]) and MUST NOT contain the plaintext. Redaction deletes only the plaintext and leaves thecommitment, so the signature still verifies; absent plaintext + intact signature-covered commitment = “lawfully redacted, chain intact,” NOT tampered. Because the salt is retained in the at-rest document, it provides ZERO brute-force resistance for low-entropy fields — the honest property is unlinkability (commitments are non-correlatable across resources/documents), not brute-force resistance. For genuine brute-force resistance a producer MUST use a high-entropy per-field nonce that is itself deleted at redaction time (accepting that such fields can never be re-verified against plaintext) or explicit out-of-band salt escrow; the chosen model is recorded. The impossible “tombstone bearing the same hash” phrasing is dropped: the tombstone bears aredacted:truemarker; the siblingcommitmentcarries the hash. - Erasure dominates the trust pillar — the conflict is acknowledged, not legislated away. When
a data subject’s erasure is legally compelled and the subject is the signing Agent, retaining
id + key-commitment + an attributing signature can itself be unlawful personal data. So: Agent
identity enters M via the erasable salted commitment (above), not a raw resolvable ref;
erasing the Agent’s PII deletes the plaintext while the commitment + signature survive, and
closed-world is preserved by a placeholder tombstone carrying that commitment. The earlier
“MUST NOT sign over a placeholder Agent reference” is relaxed to “MUST NOT sign over a
placeholder that carries no key-commitment” — i.e. sign over the commitment, not the
resolvable id. When erasure nonetheless forces signature invalidation, the verifier reports the
record as
unverifiable-by-erasure(a defined sub-outcome), NOTtampered. The hard privacy invariant is not subordinated to the trust pillar. - Encoding cannot silently break a signature — and the rule lives in the SPEC, not only the
validator. A producer emitting
.omirbfor a bundle containing anyprovenance.attestationMUST use the CBOR sub-profile, never bincode (bincode is not self-describing and cannot reconstruct M). This narrows an existing release-published allowance, so it is an R2 normative edit to encodings.md §“.omirb binary profile” (encodings.md:84-101) under §3.1/§5.2: amend the “bincode permitted as an internal sub-profile” sentence (encodings.md:90) to “…permitted EXCEPT when the bundle carries anyprovenance.attestation, in which case the CBOR sub-profile is REQUIRED.” The validatorCheck::Attestationthen enforces a rule the spec actually states. An attestation whose canonical form cannot be reconstructed is reportedunverifiable— a third outcome distinct fromverified/tampered. A cross-verifier known-answer test vector (canonical det-CBOR bytes + digest over the worked example) ships as a DoD artifact, gated on the P2 second verifier.
- Signing input is a canonical typed map M defined by a closed, version-tagged INCLUSION
allow-list, enumerated per signable resource type. For MemoryRecord, M includes
-
The PROV-O lift is a SEPARATE opt-in context that does NOT silently compose under
@vocab. The basecontext.jsonldsets@vocab: https://omir.io/ns#(line 4) andresourceType → @type, so unmapped chain terms would otherwise mint accidentalomir:role/omir:atpredicates and the chain hops (which carry noresourceType) have nothing for@typeto attach to. Thehttps://omir.io/spec/R2/prov-context.jsonldtherefore: (a) re-declares theprovenanceterm, dropping the inherited@type:@id(context.jsonld:144) and declaring it an embedded node with typed interiorchain/agent/derivedFrom/activityterms; (b) sets@vocab: nullwithin the chain/attestation sub-context (or gives every chain/attestation property an explicit@id) so unmapped terms are DROPPED, not coerced toomir:; (c) attaches PROV to the bearing resource via explicit PROV relations whose subject is theomir:MemoryRecordnode (the resource keeps its singleomir:@typeand gains PROV edges) rather than giving it a conflicting second@typeofprov:Entity. The disjointness assertion is narrowed: “OMIR Entity-as-subject-matter is NOTprov:Entity; any OMIR resource appearing as a provenance derivation operand lifts toprov:Entityfor the derivation graph” (publish that, not a blanketowl:disjointWiththat would make awasDerivedFrom Episodetriple ill-formed or the graph inconsistent). A normative omir-role → prov: predicate table accompanies the chain. DoD: a worked RDF-output test vector proving two implementers produce identical triples under base-only and base+prov, with no spurious coercion artifacts. -
Redaction mechanics (Phase 4) are concrete. A redacted resource retains
resourceType/idand all reference targets (graph stays valid per CR-5), setscontentto a defined sentinel ("[redacted]"+redacted:truein the governance block) while its signedcommitmentsibling is preserved, and MUST NOT be removed while anything references it. Whole-resource deletion uses a distinct tombstone (a present, conformant resource of the right type). A redaction round-trip example ships with the phase. -
Vendor-neutral by construction. The worked example uses placeholder agents (
Agent/agent-a) and a placeholder issuer. Any Veld-specific signing convention lives in a vendor extension underveld.dev, never in the normative core example.
Overlook — the sequence (lens 1)
| Phase | Themes | Release · OMM | Breaking? | Unlocks |
|---|---|---|---|---|
| 1a / 1b | A vocabularies | R2 · OMM-1 | Union = type-change; R2, RFC-gated. Vocab registry (move 1) ships R1.x-additive; 1b binding-engine off the critical path | Every non-coding domain can name concepts |
| 2 | B Agent + C identity | R2 · OMM-0/0 | New resource + Reference widening; R2, RFC-gated, non-invalidating to R1 bundles. C inert single-party → OMM-0 | Agents to attribute to; cross-store merge (needs 2nd store) |
| 3 | E PROV (E1-chain on path; attestation parallel) | R2 · E1-chain OMM-1 / attestation OMM-0 | New optional fields (R2 line) | Verifiable trust across vendors (at P2) |
| 4 | F + D governance, modality | R1.x where additive¹ | New optional fields | Regulated & multimodal adoption |
| 5 | G J K + H, I | R1.x additive / R2+ structural¹ | I’s CR-5 carve-out → R2+; G/K/H/F/D and J’s typed-channel + deprecation markers additive → R1.x¹ | Structured, multilingual, federated memory |
¹ Genuinely §5.1-additive themes ship as R1.x increments as they land (K, G, H, F, D, the
non-Reference C fields only — identifier[]/aliases[], never authoredBy/owner/
Agent-sameAs — and J’s typed channel + deprecation markers, since §6.4 makes a deprecation
marker a warn-only additive change). The R2 boundary is reserved strictly for the actually
non-additive items: the CodeableConcept union (type change), the Agent resource + Reference
widening (§5.2/§3.1), the eventual R3 REMOVAL of J’s deprecated string maps, and I’s CR-5
carve-out. Holding a purely-additive optional field for a major release works against the
cheap-adoption invariant. Caveat from P1: “R1.x-additive” means “additive to the data model and
deprecation-policy-clean,” NOT “forward-readable by an already-shipped same-major closed-schema
validator” — closed schemas reject undeclared new core fields, so intra-major novelty rides
extension[] until a minor schema bump admits it.
Overlook — dependencies (lens 2)
A (vocabularies) ──► everything (domains can finally name their own concepts)
A ──► J (typed values reuse CodeableConcept via the shared TypedValue def; NOT via Extension)
A ──► E1-chain (chain over prior resources; needs no Agent)
B (Agent) ──► E2 (agent-attribution hops attribute memory to Agents) ──► (only) the agent case
B (Reference widening) ──► I (the ExternalReference $def extends, never relaxes, the widened Reference)
C (identifier/sameAs) ──► I (input to how external refs are expressed; not the gate)
I (Federated CR-5 carve-out) ──► attested cross-store E2 (a real Agent or an external ref)
D, F ──► E (close the attestation inclusion allow-list; F's classification enters M)
E (attestation, redaction-aware) ──► F (signed retention/consent; redaction preserves the commitment)
P2 (a second implementer / verifier) ──► OMM-2 anywhere; ──► attestation interop & E maturity at all
The critical path is A → E1-chain, a pure additive structure reviewable by a second party; the
attestation subsystem and E2 are a parallel track gated on P2 (cross-verification is their only
test). C runs in parallel after A per the directive, but its value (cross-system merge) and I’s
mechanism are a matched pair that also need P2. Maturity floor rule: a dependent feature’s OMM
is min(grade(its dependencies), grade(its own mechanism), gate(P2 where interop is the test)) —
while Agent is OMM-0, E2 and any agent-signing attestation are at most OMM-0; J caps at A’s grade;
I and C at single-party OMM-0 until P2. I’s true gate is the CR-5 carve-out RFC + the
ExternalReference $def, not C.
Breakers — adversarial stress-test (3 passes)
Pass 1 — semantic collisions & compatibility.
- PROV
Entityvs OMIREntity, the base context, and term collisions. Mitigation: distinct chain field names (derivedFrom/hopCredibility, never the globally-boundfrom/credibility); a separate opt-in PROV context that re-declaresprovenance, suppresses@vocabinside the subtree, attaches PROV via explicit relations on theomir:node (not a second@type), and publishes the narrowed disjointness (“subject-matter Entity ≠ prov:Entity; derivation operands DO lift to prov:Entity”). The published context + RDF golden-file vector is what makes the lift deterministic. - CodeableConcept softens validation. Mitigation: the widened branch is the existing enum
(not free
type:string), and the seed/object mapping is made structurally bijective by excluding the field’s own seed-system URL from the object branch — so a seed value is expressible only as the bare string at the Core schema layer, with no Phase-1b/honor-system dependency. Phase 1b adds the machine-readable binding construct + net-new validator subsystem for closed-vocabulary enforcement. - Vocabulary fragmentation — including the field that actually fragments. Mitigation:
WG-published seed code systems (release-governed) + a public registry (add = additive PR; retire =
RFC) + a mandatory
textfallback.relationTypeis included in A — the one already-open, highest-fragmentation field — so the namespace discipline lands where collisions actually happen. Domain vocabularies live under non-omir.io URLs.
Pass 2 — trust & governance hard parts.
- Attestation brittle under re-serialization / across encodings. Mitigation: one canonical
form
SHA-256(det-CBOR(M))over a versioned inclusion allow-list, JSON signers build identical M, numbers normalized to shortest-round-trip decimal pre-CBOR (floats forbidden), CodeableConcept folded bijectively, dtype-pinned binary forms barred from signed commitments (and embeddings excluded by default), forbid bincode under attestation (encodings.md edit + validator enforcement), ship a KAT vector — gated on P2. - Retention vs immutable provenance; redaction vs signatures; erasure of a signing Agent.
Mitigation: signed subset holds salted per-field commitments, not plaintext (with an
honest unlinkability-not-brute-force claim and an erasable-nonce option for real resistance);
redaction deletes only plaintext; erasure dominates — Agent identity enters via an erasable
commitment so a compelled deletion yields
unverifiable-by-erasure, nevertampered, and never a dangling ref. - Key authority across rotation/merge/redaction — and the closed-world invariant. Mitigation:
inline the key + key-binding into M so
verifiedis decidable from the bundle bytes alone; demote out-of-band DID/HTTPS resolution to Federated; bind to signing-timeat; never rewrite a signed Agent id on merge. - Classification across a trust boundary. Mitigation: classification is declarative; enforcement is consumer policy; a profile may require the core governance block (and the legacy pii-class extension is deprecated with a dual-accept window + value-shape mapping).
Pass 3 — adoption & process risks.
- The whole plan is gated on a second implementer — and nobody is tasked with producing one. Mitigation: P2 makes the gate a named, dated deliverable, not a deferred “later.” Milestone zero is reframed: a NON-Veld party consumes a published minimal Bundle unchanged and publishes a conformance statement (Conformance §“Declaring conformance”). A is the cheapest additive spec change, sequenced first to open the namespace (move 1, additive, day-one). The independence test (P0a) — technical/economic, recusal-bounded — prevents “Veld twice.” Co-design C and E with the first external adopter, named on the RFC record.
- Backwards compatibility is not a slogan. Adding any new core field is a new-release change
because R1/R2 schemas are closed (
additionalProperties:false). Honest guarantee: “R<n> bundles stay valid forever” + “R<n+1> readers accept R<n>,” not “older readers tolerate newer.” A published R1 reader rejects any R2 bundle at the envelope (E300) by design. Until a field is promoted, data ridesextension[]under a non-omir.io URL. - Scope creep / 80-20 violation — measured by the SPEC SURFACE a second implementer must read, not
the required-field count. Mitigation: the Adopter floor (below) defines a normatively-labeled
Core-R2 mandatory-to-understand tier (the four+Agent core resources + the CodeableConcept
text-fallback + ignore-unknown rules) and an Optional-capability tier (B-semantics, D, E, F, G,
H, I, J, K) a Core-R2 consumer MAY ignore wholesale without reading their specs. WG effort is
tied to the gate: no phase past Phase 2 begins schema work until a candidate second implementer
has cleared the Phase-1a floor (P2). “Adopt all” stays the destination without spending the
cheap-adoption budget before the gate is met.
Extension.valueJsonstays arbitrary JSON. OMM is graded per resource TYPE; new volatile fields on a type do not inherit or drag the type’s grade — but the field-level signal must be machine-readable, not buried in prose: a new field on a mature type either ridesextension[]until it earns promotion or carries anx-omir-maturityannotation the validator surfaces as an INFO finding when a bundle uses a below-type-grade field (prosedescriptionis governed as editorial §5.1 and would let stability claims escape the OMM ballot gate — so it is not the vehicle). - Velocity vs honesty. Mitigation: the OMM rule — no OMM-2 without ≥2 independent implementations — is the brake, real only once P0 reconciles the three conflicting definitions and re-grades the four overclaimed R1 types. A falling trigger keeps the gate visible: any feature holding OMM-1 across a full release cycle (or fixed calendar window) with no recorded independent implementation MUST be flagged “OMM-1 (single-party; no independent implementation as of <date>)” and becomes a candidate for §4.2 TSC review — so a standard stuck single-party forever is not indistinguishable in the grade table from a healthy one.
Adopter conformance floor
So a second implementer can scope the work, the Core-R2 floor is explicit, split into a mandatory-to-understand tier and an optional-capability tier.
Mandatory to understand (the whole of Core-R2 reading): the five core resources (MemoryRecord,
Entity, Relationship, Episode, Agent), the CodeableConcept union + text fallback, and the
ignore-unknown rules. A Core-R2 consumer:
- MUST parse the CodeableConcept union and read
text; MAY ignoresystem/codeit does not know. - MUST ignore the semantics of unknown Agents, chain hops, governance blocks, embeddings, typed attributes, and i18n variants without rejecting (extending “ignore unknown extensions” to new optional core blocks).
- MUST satisfy reference integrity as a document property for Bundles it AUTHORS: a
Bundle a Core-R2 implementation authors MUST have a closed chain (every
chain[].agent/chain[].derivedFromresolves in-Bundle), exactly as CR-5 is unconditional. When RELAYING a Bundle it did not author, it MUST pass the chain through unmodified (lossless pass-through) and MUST NOT introduce a new dangling ref — a trust-agnostic relay is never forced to mint placeholder Agents into a chain it cannot interpret, and the Phase-3 “producer MUST emit a closed chain” wording means author, not relay.
Optional capability (may be ignored wholesale, specs unread): attestation verification, federation references, Agent/B semantics beyond ignore-and-preserve, D modality, F governance enforcement, G/H/J/K. A Core-R2 implementation is NOT required to verify attestations, resolve federation references, or implement any profile it does not claim.
A Core-R2 producer MUST emit the backward-compatible (bare-string) form where it has no coded
value, and MUST place implementation-specific data in extension[] under a non-omir.io URL.
Definition of done (per phase)
A phase reaches its stated target maturity when, for each theme it lands: the schema and the
version-aware reference validator support it — including that every new ResourceType/id
reference field (and every closed-world bare-id field, new and pre-existing) is in the
registry-driven reference walk and exercised by a dangling-reference invalid example in
examples/invalid/. The non-regression clause is split by the phase that introduces the field,
because a field cannot have a fixture before it exists:
- Phase 2 DoD (the walker-refactor guard): a dangling-
parentIdfixture AND a danglingRelationship.sourceEpisodefixture MUST be added and MUST still be rejected, asserted invalidator/tests/conformance.rsbefore the registry-walker refactor merges (parentId is the lib.rs:219-230 special case the refactor risks dropping; sourceEpisode was the silently-omitted pre-existing field). - Phase 3 DoD: dangling
chain[].derivedFromandchain[].agentfixtures, plus a wrong-shapechain[].agent(e.g. resolving toEpisode/x) fixture proving the broadened set-membership E201 still rejects a non-Agent in the Agent slot.
For A: a forward-compat fixture PAIR, each with a single unambiguous expected code —
(i) an R2-declared bundle with object-form CodeableConcept that the R1 validator rejects
with E300 at the envelope (proves “R1 readers reject R2 by design”); and (ii) an
R1-declared bundle that nonetheless contains an object-form CodeableConcept, which the R1
schema set rejects with E120 at per-entry dispatch (proves the legacy enum stays hard-validated).
Plus the @context Turtle golden-file (the source-split audit) and the intention-filtering
object-form fixture. For 1b: profile-constraint checking enforces code-system membership with
passing/failing examples. For E: the cross-verifier known-answer attestation vector, the
bincode-under-attestation failing example, the default-vs-omitted and 9-vs-9.0 digest-equality
KATs, and the RDF golden-file — the interop DoD items gated on P2. At least one profile
exercises each theme; docs + examples exist; an OMM grade is assigned honestly under the
reconciled, re-graded P0 ladder.
The independent-implementation requirement applies to claiming OMM-2+, not to completing a phase — but P2 binds at least one phase to a real second party. OMM-0/1 = the reference implementation alone; OMM-2 = ≥2 independent (per P0a). Phases 1–5 may ship at OMM-0/1 on the reference implementation alone, so the standard ships before a second implementer materializes — directly serving “cheap early adoption matters more than feature count.” But Phase 1a MUST NOT be declared done until a candidate independent implementer (named on the RFC/ballot record) has either (a) consumed an R2 CodeableConcept-union bundle and round-tripped it, or (b) recorded a public statement of intent — so the one existential invariant is a gate, not an owner-less “later.” Process preconditions (P-1, P0, P0a, P1) must land before the phases they gate — no phase may claim OMM-2 before the OMM reconciliation, no R2 traffic is testable before the validator is version-aware, and no ballot is callable before the founding TSC is seated.
Process
None of this is unilateral. Every change here is a candidate, to be proposed as an RFC,
debated by the Working Group, and balloted (see CONTRIBUTING.md
and GOVERNANCE.md).
New resources and fields enter at OMM-0/1 and earn maturity through independent
implementation — the same honesty rule the rest of the spec lives by. The fastest way to
move any of these forward is the project’s stated existential need: a second implementer
whose requirements turn one of these themes from a hypothesis into a ballot.
Efficiency & Information-Bearing Codes
Non-normative. This page is a design discussion, not part of the R1 conformance surface. Nothing here changes what makes a Bundle valid (Conformance). Every change below would enter through the RFC + ballot process and start low on the OMM — OMM-0/1 — and earn its level through independent implementation. It is the efficiency-first companion to Toward a Global Standard (the interchange-first schema roadmap) and the forward-looking counterpart to
memory_theory.md(the backward-looking divergence map).Status — draft-schema landing (this iteration). The four R1.x-additive proposals are now applied to the draft
schemas/as optional fields, each annotatedx-omir-maturity: 0: EP-1 (InformationContent→MemoryRecord.informationContent); EP-4 (Episode.boundaryStrength/boundaryReason,MemoryRecord.replayPriority,Interference→MemoryRecord.interference); EP-5b (FamiliaritySketch→Bundle.familiaritySketch); and EP-6c — the D5 edge-normalization fix (Relationship.reverseStrength/normalizedStrength/normalization+EdgeNormalization, withdependentRequired: normalizedStrength → normalization). These are non-breaking (no existing bundle becomes invalid;additionalProperties:falsestill holds because the fields are now declared), remain RFC-gated for ratification, and are reversible — until a TSC ballots them they are draft and OMM-0. The R2 / breaking proposals (EP-2/EP-3/EP-5a/EP-F on the Theme-DEmbedding; EP-6aChunk; EP-6bschemaType) are not applied — they widen the sharedReferencepattern orBundle.entryand stay candidates here.
Where Toward a Global Standard asks “what blocks interchange across domains?”, this page asks a different question: “what state would let an engine store less, search cheaper, and infer more — at the same fidelity?” The seven proposals below come from the efficient-/predictive-coding, fuzzy-trace, hippocampal-indexing, event-segmentation, temporal-context, and chunking lineages. Each is scored on three axes the interchange roadmap does not track: ⚡ watts (store/search less), 🎯 inference (signal-to-noise), and capability (what becomes expressible).
The one-paragraph thesis
These are not seven scattered fields. Four of the seven (EP-2, EP-3, EP-5a, EP-F) are
variants of a single object — the algorithm-neutral Embedding representation that
global-standard §D already
proposes. The highest-leverage move is therefore a reframing the divergence map already named:
extend Theme D from “neutral embeddings” to efficiency-bearing codes — an Embedding
that can carry a Matryoshka prefix (rank coarse-to-fine), a sparse code (CPU inverted-index
search), a reserved drift space (cheap temporal cues), and a VSA structural code (compositional
ops). That fold is realized: those four fields now live in the single canonical
§D Embedding definition; this page
supplies their watts/inference rationale and does not re-specify the schema. The remaining three
are: two information-theoretic scalars OMIR has no field for
(EP-1 surprisal, EP-4 replay/interference), and one graph fix that makes spreading
activation portable (EP-6c, the D5 edge-normalization remediation). Two proposals add genuinely
new structure: an Episode event-boundary (EP-4) and a Chunk consolidation-product resource
(EP-6a). One adds a producer-level metamemory primitive (EP-5b, the familiarity sketch).
Proposal index
| EP | Prescription (from the principle) | Concrete delta | Additivity | Vehicle · OMM | Theme | Divergence |
|---|---|---|---|---|---|---|
| EP-1 | surprisal/novelty scalar + model-redundancy flag | new InformationContent def + MemoryRecord.informationContent | §5.1-additive | RFC-gated · R1.x · OMM-0 | new | — (new lever) |
| EP-2 | Matryoshka gist + offloadable verbatim | §D Embedding fields matryoshka/nestedDims/role + verbatim MediaReference (defined in §D) | additive to the D def | rides D · R2 · OMM-0 | D | D2 (partial) |
| EP-3 | sparse index layer (indices+values) | Embedding.sparse + Theme-I external-content pointer | additive to the D def | rides D · R2 (+ I · R2+) · OMM-0 | D + I | — |
| EP-4 | boundary metadata + replay priority + interference | Episode.boundaryStrength/boundaryReason; MemoryRecord.replayPriority; new Interference def + field | §5.1-additive | RFC-gated · R1.x · OMM-0 | new | D10 (closes), D8/D4 |
| EP-5 | temporalContext drift vector + familiarity sketch | reserved Embedding space omir:temporal-context (D); new FamiliaritySketch def + Bundle.familiaritySketch | sketch §5.1-additive; tc rides D | sketch R1.x · OMM-0; tc rides D · R2 | D + new | — |
| EP-6 | chunk(composedOf) + schema typing + edge-norm fix | new Chunk resource (Templates = reusable Chunks); MemoryRecord.schemaType (CodeableConcept); Relationship.reverseStrength/normalizedStrength/normalization + Check::GraphNormalization (E250/E251) | Chunk = §3.1/§5.2 breaking; edge-norm §5.1-additive; schemaType rides A | Chunk R2 · OMM-0 (§4 RFC); edge-norm R1.x · OMM-0; schemaType rides A · R2 | A + graph + new | D5 (closes), D8 |
| EP-F | VSA / HRR structural code (frontier) | Embedding space:"vsa" + structure tags; extension-first | extension (no RFC) → promotable | extension now; D · R2+ · OMM-0 later | D + .omirb profile | — |
Routing convention (from GOVERNANCE §3.1/§5.1/§5.2): a new optional
field is §5.1-additive (no existing bundle becomes invalid → ships as an R1.x increment)
but still touches the normative surface, so it is RFC-gated (the RFC authorizes the schema
edit; R1.x is the release lane). A change that widens the shared Reference pattern or the
Bundle.entry oneOf (a new resource type) is §5.2-breaking → R2, full RFC per
CONTRIBUTING §4. Adding a new, unreferenced $defs member to
common.schema.json is itself purely additive; it is the field that references it that
carries the additivity class. New fields ride at OMM-0 under the per-field x-omir-maturity
signal (global-standard, Pass-3),
never inheriting their host type’s grade.
EP-1 — Information content (don’t store what the model can regenerate)
Principle 1 (efficient coding · predictive coding). “A surprisal/novelty scalar at encoding (information content vs. the producer’s model) and a model-redundancy flag (reconstructable from the base model?). This is not importance (value) or confidence (belief) — it’s information content, which OMIR has no field for.”
The orthogonality is the whole point: importance is value, confidence is belief,
this is information (−log p against a generative prior). A record the base model already
knows is near-zero information regardless of how important or believed it is — and is the single
biggest watt lever, because most “memories” are model-knowable and need not be persisted or
searched at all (cf. Titans’ ‖∇loss/∇input‖ write-gate, EM-LLM’s Bayesian surprise).
New common.schema.json#/$defs member (purely additive):
"InformationContent": {
"type": "object",
"description": "Information content of a record against a generative prior — what is NOVEL, distinct from what is valued (importance) or believed (confidence). Operationalizes efficient/predictive coding: store the surprising, regenerate the predictable.",
"properties": {
"novelty": {
"$ref": "#/$defs/UnitInterval",
"description": "Producer-relative normalized novelty in [0,1]: how surprising this record was against the producer's model at encoding. Comparable WITHIN one producer only (Theory & Scope: stored scalars are producer-relative)."
},
"surprisalBits": {
"type": "number",
"minimum": 0,
"description": "Raw Shannon surprisal -log2 p(x) in bits against the model named in 'model'. Unbounded and model-relative; the reproducible quantity when 'model' is shared."
},
"model": {
"type": "string",
"description": "Identifier of the generative model whose p(x) defines 'novelty'/'surprisalBits', e.g. 'minilm-l6-v2' or a base-LLM id."
},
"reconstructable": {
"type": "boolean",
"description": "True if this content is regenerable from the model named in 'reconstructableBy' and is therefore a candidate to DROP and regenerate rather than store/search."
},
"reconstructableBy": {
"type": "string",
"description": "Identifier of the model that can regenerate this content when 'reconstructable' is true."
}
},
"additionalProperties": false
}
MemoryRecord additive field:
"informationContent": { "$ref": "common.schema.json#/$defs/InformationContent" }.
Classification. New optional field on MemoryRecord; no enum/required/Reference change →
§5.1-additive, ships as R1.x, RFC-gated for the normative meaning. OMM-0. novelty
is a UnitInterval (CR-7 conformant) and a producer-relative snapshot → on landing it joins the
Theory & Scope producer-relative list;
surprisalBits is the model-relative reproducible form.
Hits. ⚡⚡⚡ watts (persist/search only the genuinely novel) · 🎯 inference (signal-to-noise) ·
capability (compression). Open questions: does reconstructable:true license a consumer to
omit content entirely (ties EP-2 verbatim-eviction)? bits vs nats — pin one (surprisalBits,
log2, here).
EP-2 — Matryoshka gist + offloadable verbatim (store the gist, fetch the words)
Principle 2 (fuzzy-trace · rate-distortion). “A dual-trace model: a compact, durable, anchorable gist code (cheap to rank, slow decay) + an optional, fast-decaying, offloadable verbatim payload (via a MediaReference). Crucially, mandate the gist be prefix-truncatable (Matryoshka-style) so coarse-to-fine works across producers.”
This is the efficiency-bearing-codes extension of Theme D, stated literally. Matryoshka
representation learning (2205.13147) is the exact engineering
analog: rank millions on a truncated prefix, re-rank survivors on more dims. The dual trace maps
the gist to a durable anchorable Embedding and the verbatim surface to an offloaded
MediaReference (Theme D) with its own faster decay.
Schema: the matryoshka, nestedDims, and role fields of the canonical
§D Embedding — defined there, not
duplicated here. matryoshka + nestedDims give the prefix-truncatable gist; role splits the
durable, anchorable gist from the offloadable verbatim (carried out-of-line via ref →
MediaReference).
Decay split. The gist reuses the existing Decay block (anchored:true, long
halfLifeHours); the verbatim trace carries a short halfLifeHours and is the first thing
dropped under memory pressure — graceful degradation that keeps the rankable gist.
Classification. Additive to the (not-yet-landed) Theme-D Embedding def → rides Theme D ·
R2 · OMM-0. No Reference-pattern impact (MediaReference is {uri,contentType,hash}, not a
typed Reference). Closes part of D2 (storage/retrieval-strength duality): the durable gist
is the storage-strength-bearing trace, the evictable verbatim is retrieval-strength-bearing.
Hits. ⚡⚡ watts (coarse-to-fine + verbatim eviction) · 🎯 inference (gist generalizes) ·
capability (summary recall). Open question: make nestedDims ascending + power-of-two by
convention so two producers’ prefixes align for cross-store shortlisting.
EP-3 — Sparse index layer (index, don’t scan)
Principle 3 (hippocampal indexing · SDM · sparse codes). “Separate a cheap index layer (sparse keys/pointers) from content (offloadable); standardize a sparse-code representation (indices+values). The index may point at content living elsewhere (ties to federation, §I).”
A sparse code turns ANN-on-GPU into an exact inverted-index lookup on CPU (≈ an order of
magnitude cheaper), and modern-Hopfield / SDM gives one-step associative completion. The
“content lives elsewhere” half is exactly global-standard §I’s
federation: the index entry resolves to remote content via the ExternalReference $def (never
the closed-world bare ResourceType/id).
Schema: the sparse field ({indices, values}) of the canonical
§D Embedding, mutually exclusive
with the dense vector — defined there, not duplicated here.
Classification. Additive to the Theme-D Embedding def → rides D · R2 · OMM-0; the
pointer-to-remote-content rides Theme I · R2+ (its true gate is I’s CR-5 carve-out +
ExternalReference $def, not this field). The index/content split composes with EP-2’s
gist/verbatim and EP-1’s reconstructable drop.
Hits. ⚡⚡⚡ watts (CPU sparse search + lazy content load) · 🎯 inference (DG-style
pattern-separation, anti-interference) · capability (recall from fragments). Open question:
indices length cap / a density hint so a consumer can pick inverted-index vs dense path.
EP-4 — Boundary, replay priority, interference (encode by surprise, prune by interference)
Principle 4 (event segmentation · prioritized replay · rational forgetting). “Event-boundary metadata on episodes (location + boundary strength + why); a replay/consolidation priority (surprise × value × recency); an eviction/interference signal (need-probability or local embedding density).”
Three deltas, the third of which closes divergence D10 — the doc’s explicit reframe of D10 from “missing fidelity” (a bounded non-goal) to “missing the rational-eviction efficiency mechanism” (a load-bearing watt lever: a smaller hot index makes every query cheaper).
(a) Episode additive fields — the EM-LLM / Event-Segmentation boundary:
"boundaryStrength": {
"$ref": "common.schema.json#/$defs/UnitInterval",
"description": "Prediction-error / Bayesian-surprise magnitude at this episode's boundary. Where memory is structured; a segmentation cue for downstream consolidation."
},
"boundaryReason": {
"type": "string",
"description": "Open vocabulary: why the cut was made (e.g. 'topic_shift', 'temporal_gap', 'actor_change'). Promotable to a CodeableConcept under Theme A."
}
(b) MemoryRecord.replayPriority — the prioritized-replay (Schaul 2015) weight:
"replayPriority": {
"$ref": "common.schema.json#/$defs/UnitInterval",
"description": "Producer-relative consolidation/replay priority (surprise x value x recency snapshot). Prioritizes amortized OFFLINE re-embedding/consolidation/decay to idle time. Snapshot as of meta.lastUpdated; producer-relative (Theory & Scope)."
}
(c) New Interference def + MemoryRecord.interference field — the D10 remediation:
"Interference": {
"type": "object",
"description": "Rational-eviction / interference signal: why a record competes for retrieval and how prunable it is. Closes divergence D10 — forgetting as interference (retroactive/proactive competition among similar traces), not pure time-decay.",
"properties": {
"needProbability": {
"$ref": "#/$defs/UnitInterval",
"description": "Anderson need-probability: estimated P(this record is needed soon). The rational eviction key — evict lowest need, NOT oldest (LRU). Distinct from decay (time) and importance (value)."
},
"localDensity": {
"type": "number",
"minimum": 0,
"description": "Nearest-neighbour crowding in embedding space. High density = high interference from similar traces (the dominant real forgetting mechanism in similarity-based stores)."
},
"competesWith": {
"type": "array",
"items": { "$ref": "#/$defs/Reference" },
"description": "MemoryRecord references this record competes with — similar traces that degrade each other's retrievability. Reuses the existing Reference pattern (MemoryRecord already in it); no widening."
}
},
"additionalProperties": false
}
Classification. All §5.1-additive (new optional fields; competesWith reuses the existing
Reference pattern → no widening) → R1.x, RFC-gated, OMM-0. On landing, replayPriority
joins the producer-relative + snapshot enumerations in Theory & Scope. Closes
D10; advances D8/D4 (the boundary is the consolidation-as-process / continuous-segmentation
half).
Hits. ⚡⚡⚡ watts (encode less, small hot set, amortized offline replay) · 🎯 inference (clean
event retrieval, fewer distractors) · capability (continual learning without catastrophic
forgetting). Open question: is competesWith producer-authored or derivable from
localDensity at import — carry both and let the consumer choose.
EP-5 — Temporal-context drift + familiarity sketch (cheap cues, skip retrieval)
Principle 5 (Temporal Context Model · feeling-of-knowing). “A per-record/episode temporalContext drift vector (timestamps give recency but not cheap similarity-based contiguity); a producer-level familiarity sketch over entities/cues.”
(a) temporalContext is a vector → it rides the canonical
§D Embedding under the reserved
space "space": "omir:temporal-context" (registered there). Timestamps already give recency; the
drift vector gives cheap contiguity (recall the neighbours-in-time of a hit) as a dot product, no
scan (EM-LLM adds exactly this temporal-contiguity stage on top of similarity). No new field — a
reserved space (documented in §D) + the contiguity-retrieval semantics. Rides Theme D · R2 ·
OMM-0.
(b) familiarity sketch is genuinely new: producer-level aggregate state OMIR has no home
for. It answers “do I plausibly hold this?” before paying for retrieval — and in production the
skipped retrievals are often the dominant cost. A negative is authoritative (skip RAG
entirely); a positive is probabilistic. It lives at the Bundle level (Bundle has no OMM of
its own; it tracks its resources):
"FamiliaritySketch": {
"type": "object",
"description": "Producer-level approximate-membership sketch over entities/cues, for metamemory gating: answer 'do I plausibly hold this?' before paying for retrieval (feeling-of-knowing). A negative is authoritative (skip retrieval); a positive is probabilistic. Complements confidence's in-weights-vs-retrieve gate.",
"properties": {
"kind": { "enum": ["bloom", "count_min"], "description": "Sketch family." },
"domain": { "enum": ["entity", "cue", "content"], "description": "What the sketch is built over." },
"hashes": { "type": "integer", "minimum": 1, "description": "Number of hash functions k." },
"bits": { "type": "integer", "minimum": 1, "description": "Filter width m in bits (bloom) / table width (count-min)." },
"data": { "type": "string", "contentEncoding": "base64", "description": "Serialized filter bytes." }
},
"required": ["kind", "domain", "hashes", "bits", "data"],
"additionalProperties": false
}
Bundle additive field: "familiaritySketch": { "$ref": "common.schema.json#/$defs/FamiliaritySketch" }.
Classification. temporalContext rides D · R2. The sketch is a new optional Bundle field
→ §5.1-additive · R1.x, RFC-gated, OMM-0. (A Bundle-level field, unlike a resource field,
does not interact with any resource’s OMM grade.)
Hits. ⚡⚡⚡ watts (short-circuit/skip retrieval; cheap temporal cue) · 🎯 inference (contiguity
recall) · capability (temporal-neighbourhood recall). Open question: sketch hash-function
identity must be pinned (a named, versioned hash) or a consumer cannot test membership — carry a
hashAlg field if EP-5b is balloted.
EP-6 — Chunks, schema typing, and portable spreading activation
Principle 6 (chunking/expertise · schema · spreading activation). “Chunk/template resources (composedOf) produced by consolidation; schema typing new memories attach to; and — load-bearing — make the graph spreading-activation-ready by fixing the edge-strength normalization (divergence D5) so PPR gives the same answer across importers, with salience as seed weights.”
Three deltas; (c) is the one the principle flags load-bearing.
(a) The Chunk resource — Templates are reusable Chunks. A Chunk is a consolidation
product: a compressed/abstracted unit (composedOf the memories/episodes/entities it
consolidates), the episodic→semantic derivation made first-class (divergence D8). A reusable
Chunk (reusable: true) is a Template — a schema/pattern that new memories instantiate (via
MemoryRecord.schemaType, EP-6b) rather than a one-off abstraction. The two are one resource
distinguished by the flag, not two resource types — Template is not minted separately. This is
the proposal’s only new core resource type: it widens the Reference pattern and
Bundle.entry oneOf → §3.1/§5.2 breaking → R2, full
CONTRIBUTING §4 RFC
(spec/rfcs/RFC-<nnnn>-chunk.md, number TSC-assigned).
The consolidation event is not a resource. HANDOFF §5’s
floated ConsolidationEvent/Reflection re-imports the consolidation process that
semantics.md (“what OMIR deliberately does not specify”) puts out of scope, and
duplicates what Theme E already models. The derivation is carried as a Theme-E provenance hop on
the Chunk — wasGeneratedBy { activity: "consolidation" } + wasDerivedFrom over its
composedOf set — never a second resource. Veld concurs structurally: its event-sourced journal
IntentPayload (src/intent_log/payload.rs) is Remember/Forget/Update/Anchor — pure memory CRUD
with no consolidation-event variant; consolidation lands its output as Remember/Update of
records, so even Veld’s journal models the product, not the event. Draft model:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://omir.io/spec/R2/schemas/Chunk.schema.json",
"title": "OMIR Chunk (R2 candidate)",
"type": "object",
"properties": {
"resourceType": { "const": "Chunk" },
"id": { "$ref": "common.schema.json#/$defs/Id" },
"meta": { "$ref": "common.schema.json#/$defs/Meta" },
"content": { "type": "string", "description": "The compressed / abstracted unit — a named template, schema, or expert chunk (MDL: search fewer, denser units)." },
"composedOf": {
"type": "array",
"items": { "$ref": "common.schema.json#/$defs/Reference" },
"description": "MemoryRecord / Episode / Entity references this chunk consolidates. The episodic->semantic derivation, first-class (divergence D8)."
},
"reusable": {
"type": "boolean",
"default": false,
"description": "True => this Chunk is a TEMPLATE: a reusable schema/pattern that NEW memories instantiate (via MemoryRecord.schemaType), not a one-off abstraction. Templates are reusable Chunks, never a separate resource."
},
"schemaType": {
"$comment": "CodeableConcept once Theme A lands; bare string until then.",
"type": "string",
"description": "The schema/pattern class this Chunk represents (open vocabulary, Theme A). A MemoryRecord.schemaType equal to this value attaches that memory to this Template (meaningful when reusable=true)."
},
"confidence": { "$ref": "common.schema.json#/$defs/Confidence" },
"provenance": { "$ref": "common.schema.json#/$defs/Provenance" },
"createdAt": { "$ref": "common.schema.json#/$defs/Instant" },
"extension": { "type": "array", "items": { "$ref": "common.schema.json#/$defs/Extension" } }
},
"required": ["resourceType", "id", "content", "createdAt"],
"additionalProperties": false
}
Required atomic work items mirror global-standard Phase 2’s
new-resource checklist: author the schema; add to Bundle.entry.items.oneOf; widen
common.schema.json#/$defs/Reference.pattern to include Chunk; extend the validator
SchemaFiles/RESOURCE_TYPES/registry-walker; register in the JSON-LD @context; ship a
migration note (no R1 bundle becomes invalid — older consumers ignore unknown entry items).
(b) MemoryRecord.schemaType — schema-consistent fast integration (Tse et al. 2007): a new
memory attaches to a schema it instantiates. Best as a CodeableConcept (Theme A) since
schema-types are an open, per-domain vocabulary → rides Theme A · R2, caps at A’s grade.
(c) D5 edge-normalization fix — the load-bearing one. Today Relationship.strength is a
single [0,1] scalar; fan normalization is not representable, so two importers running
spreading activation / Personalized PageRank over the same exported strengths get different
answers — the normalization that makes a strength mean something is producer-private. HippoRAG
shows single-pass PPR multi-hop is 10–20× cheaper and 6–13× faster than iterative retrieval only
if the weights are portable. Relationship additive fields:
"reverseStrength": {
"$ref": "common.schema.json#/$defs/UnitInterval",
"description": "Backward association weight P(from|to); asymmetric counterpart to 'strength' = P(to|from). Resolves the symmetric-scalar half of D5 without two independent edges."
},
"normalizedStrength": {
"$ref": "common.schema.json#/$defs/UnitInterval",
"description": "Fan-normalized strength: within a source Entity's outgoing edge set sharing one 'normalization', these values sum to <= 1 + epsilon (ACT-R fan S - ln(fan): source activation conserved and divided). THIS is the portable spreading-activation weight; raw 'strength' is producer-private. MUST co-occur with 'normalization' (dependentRequired)."
},
"normalization": { "$ref": "common.schema.json#/$defs/EdgeNormalization" }
"EdgeNormalization": {
"type": "object",
"description": "Declares the regime that makes 'normalizedStrength' portable. Without it, PPR/spreading-activation diverges across importers (divergence D5).",
"properties": {
"scheme": { "enum": ["fan", "softmax", "none"], "description": "fan = source activation divided among associates; softmax = exp-normalized; none = raw (not portable)." },
"over": { "enum": ["source", "target"], "description": "Normalized over a node's outgoing (source) or incoming (target) edge set." }
},
"additionalProperties": false
}
Seed weights. With normalizedStrength + normalization present, PPR is fully specified
across importers when seeded by Entity.salience (the documented seed-weight convention) —
salience as seed, fan-normalized edges as the transition matrix, one pass.
Validator rule — fan-normalization is enforced, not producer-asserted (new
Check::GraphNormalization, codes E250/E251; verified and sapper-hardened):
- E250 — fan-normalization overflow (error). For each Entity E, partition its edges by
(normalization.scheme, normalization.over). Within each partition wherescheme ∈ {fan, softmax}: the sum ofnormalizedStrengthover edges withfrom = E(over: source) — orto = E(over: target) — MUST be ≤ 1 + ε (ε = 1e-6, rounding-tolerant, per global-standard’saggregateCredibilityprecedent). Edges withnormalizationabsent orscheme: noneare excluded;reverseStrengthis excluded (asymmetry hint, not part of the fan sum). - E251 —
normalizedStrengthwithoutnormalization(error). Also enforced in-schema viadependentRequired: { normalizedStrength: ["normalization"] }. Without the regime the value is unportable — the exact D5 defect the field cures.
Why it survives the sapper pass: upper-bound only (a partial export’s subset of fan weights
sums to ≤ the full sum ≤ 1, so ≤ 1+ε holds on partial graphs; a lower bound ≈ 1 is deliberately
NOT checked); partitioned (mixed regimes never cross-contaminate the sum); no retroactive
invalidation (new R2 fields, the rule ships with them — §5.2-clean, not a tightening of an
existing field); decidable & cheap (pure function of the closed-world bundle, O(edges) group-by
on from/to, no new index — a fifth Check variant beside
{Structural, ReferenceIntegrity, VersionPresence, Profile}, mirroring global-standard’s
Check::Attestation); not a silent hole (dodging via scheme: none self-labels the value
non-portable — honest degradation, not a bypass). Final code numbers are TSC-assigned.
Classification. Chunk (+reusable Templates) = R2 · OMM-0 (§4 RFC, Reference-widening).
schemaType rides A · R2. Edge-norm fields = §5.1-additive (no widening) → R1.x, RFC-gated,
OMM-0, on Relationship (OMM-3), plus the Check::GraphNormalization validator variant
(E250/E251). Closes D5; advances D8.
Hits. ⚡⚡ watts (fewer, denser units; single-pass vs iterative multi-hop) · 🎯 inference (portable multi-hop/transitive) · capability (abstraction, expertise).
EP-F — VSA / HRR structural code (frontier; lowest priority, highest ceiling)
Frontier (Vector Symbolic Architectures · Holographic Reduced Representations · HDC). An
optional VSA structural code — bind (circular convolution) + bundle (superpose) a whole relational
structure into one low-precision hypervector, cleanup via an item memory — gives OMIR
analogical/compositional retrieval in cheap vector ops. Low-precision and edge-friendly: a
natural fit for the .omirb robotics profile. Speculative — propose extension-first (no
RFC), under a WG/vendor URL, promotable to the reserved omir:vsa space of the canonical
§D Embedding once a second
implementer exercises it:
{
"url": "https://omir.io/spec/R2/ext/vsa-code",
"valueJson": {
"space": "vsa", "op": "hrr", "dims": 10000, "dtype": "int8",
"vector": "<base64 or array>", "cleanupRef": "Entity/item-memory"
}
}
Classification. Extension now (prefer-an-extension-over-an-RFC, the 80/20 rule); promotable
to a Theme-D Embedding space at R2+ · OMM-0 when field-exercised. Ties to the .omirb
binary profile (low-precision bytes). Hits. capability (compositional/analogical recall) ·
⚡ watts (low-precision edge ops); 🎯 neutral. No divergence — pure capability frontier.
Overlook — sequence
| Step | EPs | Release · OMM | Breaking? | Unlocks |
|---|---|---|---|---|
| 1 | EP-1 info-content · EP-4 replay/interference · EP-6c edge-norm | R1.x · OMM-0 | No (additive optional fields) | The watt levers that need no new object: store-less, prune-by-interference, portable PPR. Closes D5 & D10. |
| 2 | EP-5b familiarity sketch · EP-4a boundary | R1.x · OMM-0 | No | Skip-retrieval gating; segmentation cue. |
| 3 | EP-2 / EP-3 / EP-5a / EP-F (all on the Theme-D Embedding) | R2 · OMM-0 | Rides Theme D’s R2 line | Efficiency-bearing codes: Matryoshka, sparse, drift, VSA. |
| 4 | EP-6a Chunk · EP-6b schemaType | R2 · OMM-0 | Yes (new resource / CodeableConcept) | Consolidation product + schema attachment. Needs Themes A & E. |
Overlook — dependencies
Theme D (neutral Embedding) ──► EP-2 (matryoshka), EP-3 (sparse), EP-5a (drift space), EP-F (vsa)
Theme A (CodeableConcept) ──► EP-6b (schemaType), EP-4a boundaryReason (promotion)
Theme E (PROV wasGeneratedBy)──► EP-6a Chunk's consolidation *event* (vs a 2nd resource)
Theme I (ExternalReference) ──► EP-3 (index points at remote content)
EP-1 reconstructable ──► EP-2 verbatim-eviction / EP-3 lazy content load (compose)
P2 (a second implementer) ──► OMM-2 anywhere; required for the Embedding-code interop tests
The critical path is Step 1 — three pure-additive R1.x field sets that need no new object, no
Reference widening, and no other theme, yet close the two divergences this track inherits
(D5, D10) and land the biggest watt lever (EP-1). Steps 3–4 are gated on Theme D / A / E landing
first, and the cross-store value of the efficiency-bearing codes (like all interchange value)
needs the global-standard P2 second implementer.
Definition of done (per EP)
A proposal reaches its stated OMM when, for each field it lands: the schema and the
version-aware reference validator support it; every new ResourceType/id reference field is in
the registry-driven reference walk with a dangling-ref negative fixture in examples/invalid/
(EP-4 competesWith, EP-6a Chunk refs); and UnitInterval fields (EP-1 novelty, EP-4
replayPriority/needProbability, EP-6 reverseStrength/normalizedStrength) are in the CR-7
range check. EP-6c additionally ships a worked PPR golden-file proving two importers produce
the same ranking from the same normalizedStrength + salience seeds — the portability claim made
testable, the same discipline global-standard
applies to the RDF and attestation vectors.
Process
None of this is unilateral. Every change here is a candidate — an RFC, debated by the Working Group and balloted (CONTRIBUTING, GOVERNANCE) — entering at OMM-0/1 and earning maturity through independent implementation. The efficiency framing changes the motivation (watts, not just interchange), not the gate.