Canonical JSON

What “canonical JSON” is and why we require it

Canonical JSON is the single, deterministic representation of a dossier that PoV will accept. Two teams can collect the same evidence, serialize it the same way, and produce the same byte string and the same hash. That hash is what PoV signs, what One-Claim keys off, and what appears on receipts and proof pages. If JSON is not canonical, hashes drift; if hashes drift, trust drifts.

Short rule: one dossier → one byte string → one hash → one truth.

Design goals

  • Deterministic: identical facts always produce identical bytes.

  • Readable: humans can inspect it; no binary blobs inline.

  • Minimal: no optional/empty/null noise; only required content.

  • Stable across stacks: independent of language, locale, or whitespace choices.

  • Audit-safe: redaction and lineage are explicit; files referenced by digest, never embedded.

Canonicalization rules

We adopt a JSON canonicalization close to RFC 8785 (JCS) but tailored to PoV dossiers and evidence.

  1. Encoding: UTF-8 only. No BOM. Newlines \n. No trailing whitespace anywhere.

  2. Objects: Sort object member names lexicographically (code-point order). No duplicate keys, no null values, no empty objects. Omit fields that are optional and empty (schema defines optionality).

  3. Arrays: Arrays are order-significant by default (preserve order). If a field is a set (e.g., container_ids), the schema marks it as SET → sort ascending (string compare) before hashing. No trailing commas; no empty arrays unless the schema explicitly requires them.

  4. Strings: Escape using standard JSON escapes. Timestamps are strings in RFC 3339/ISO-8601 UTC: YYYY-MM-DDTHH:MM:SSZ (no milliseconds unless schema requires; if required, use 3 decimals, e.g., .123Z). IDs/serials are strings; never numbers.

  5. Numbers: To avoid cross-language float drift, quantities, amounts, and rates are strings with schema-defined format (e.g., "39.600" MWh, "0.40" tCO₂e/MWh). If a schema permits integers (e.g., counts), encode as JSON numbers without leading zeros, +, or exponent.

  6. Binary / large files: No binary inline. Every file appears as an entry with digests: {"name":"seal.jpg","sha256":"<hex>","size":123456} Multiple digests permitted (e.g., sha256, blake3) if the schema asks.

  7. Redaction: PII or restricted fields are omitted and replaced by a redaction record (see below). Redaction changes the hash by design; we capture why/where.

  8. Versioning: Every dossier includes "schema_id" and "schema_version" (semantic X.Y). Backwards-compatible changes bump minor; breaking changes bump major.

The evidence envelope

Minimum top-level structure for Trade stages and Tokens units:

{
  "schema_id": "TRADE.ON_BOARD.v1",
  "schema_version": "1.0",
  "order_id": "ord_2NF8J7",
  "stage": "ON_BOARD",                       // Trade only
  "listing_id": "lst_A12B" ,                 // Tokens only (omit in Trade)
  "evidence": {
    "bl_number": "OOLU1234567890",
    "seal_number": "SEAL9981",
    "seal_photo": { "sha256": "ab…cd", "size": 284337 },
    "container_ids": ["CMAU000001", "CMAU000002"],  // SET → sorted
    "files": [
      { "name": "bl.pdf",   "sha256": "de…ad", "size": 90211 },
      { "name": "pack.csv", "sha256": "be…ef", "size": 4312  }
    ]
  },
  "attestors": [
    { "role": "TERMINAL", "entity_id": "prt_rotterdam",
      "key_id": "k_01F", "signature": "base64url(...)" },
    { "role": "CARRIER",  "entity_id": "carrier_oocl",
      "key_id": "k_09C", "signature": "base64url(...)" }
  ],
  "meta": {
    "created_at": "2025-03-01T12:00:00Z",
    "locale": "en",
    "notes": "optional, if schema allows"
  }
}

Tokens dossiers mirror this pattern with schema_id like TOKENS.CARBON.v1 and evidence containing method, region, vintage, project_id, unit_serial or device_id/start_ts/end_ts/quantity_Wh.

Redaction

Some evidence contains PII (e.g., warehouse contact). We do not hash secrets into the PoV dossier. Instead:

"redactions": [
  {
    "path": "/evidence/warehouse_contact",
    "reason": "PII",
    "salt_id": "s_01",                      // refers to an off-chain salt bundle
    "sha256": "f3…21"                       // hash(contact || salt)
  }
]

The field is omitted from evidence. A redaction record tells auditors what was omitted and why, and commits to a salted hash so a regulator can later verify the original off-chain, without changing the on-chain PoV hash. Salts are managed per-org and per-lane; never reuse salts across orders.

Schema hints

  • Required vs optional fields: and when optional fields must be omitted

  • Which arrays are SET (sort before hashing) vs SEQ (preserve order): schema marks

  • Type formats for strings that represent numbers: ("39.600"), timestamps, IDs

  • Digest algorithms required in files[]: (e.g., sha256 mandatory; blake3 optional)

  • Redactable paths and allowed reasons

Canonical JSON enforces the how; schemas enforce the what.

API requirements

  • Submit bytes, not objects: The pov_hash is computed over the canonical bytes you POST. The API also accepts the structured JSON to help humans, but PoV verifies your bytes → hash.

  • Idempotency: Include Idempotency-Key on POST; the same canonical bytes produce the same pov_hash and a safe replay.

  • Reject drift: If the server’s canonicalizer produces a different hash from your submitted bytes, the Gate returns E_CANONICAL_DRIFT with the server’s canonical form so you can fix your serializer.

Do’s and Don’ts

Do

  • Use UTC Z timestamps everywhere.

  • Encode amounts as strings with schema precision (e.g., "396000.000").

  • Sort object keys; mark set-arrays as sorted in your serializer.

  • Reference every file by digest and size; store binaries off-chain.

  • Omit optional empty fields; don’t send null.

Don’t

  • Don’t embed PDFs/images inline.

  • Don’t send floats as JSON numbers for money/energy/tons.

  • Don’t rely on property insertion order; the canonicalizer sorts.

  • Don’t include locale-specific formats (commas as decimals, local time).

Test vectors

  • Key-order drift: swap two keys → canonicalizer must re-order and produce the same bytes as the reference.

  • Set vs seq: change the order of container_ids (SET) → hash must not change; change the order of a declared SEQ (e.g., temp_log[]) → hash must change.

  • Whitespace: add spaces/newlines → canonicalizer strips them; hash unchanged.

  • Decimal strings: "39.6" vs "39.600" → if schema requires 3 decimals, the former is invalid.

(We provide ready-to-run vectors in the SDK so implementers can certify their serializer.)

Example — canonical snippet

{"attestors":[{"entity_id":"carrier_oocl","key_id":"k_09C","role":"CARRIER","signature":"base64url(...)"},
{"entity_id":"prt_rotterdam","key_id":"k_01F","role":"TERMINAL","signature":"base64url(... )"}],
"evidence":{"bl_number":"OOLU1234567890","container_ids":["CMAU000001","CMAU000002"],
"files":[{"name":"bl.pdf","sha256":"deadbeef...","size":90211},{"name":"seal.jpg","sha256":"c0ffee...","size":284337}],
"seal_number":"SEAL9981","seal_photo":{"sha256":"ab12cd...","size":284337}},
"meta":{"created_at":"2025-03-01T12:00:00Z","locale":"en"},
"order_id":"ord_2NF8J7","schema_id":"TRADE.ON_BOARD.v1","schema_version":"1.0","stage":"ON_BOARD"}

One line, sorted keys, no whitespace; bytes → hash → PoV Gate input.

Drawing

Plain recap

Canonical JSON is how we make facts hashable. Sort keys, fix formats, treat sets vs sequences correctly, reference files by digest, and omit noise. The pov_hash binds this byte string forever; PoV verifies it; One-Claim keys off it; receipts point to it. That’s why you can trust that the hash you see on a proof page truly commits to the evidence behind a payment: one dossier → one byte string → one hash → one truth.

Last updated