Hashing

What hashing does on EDMA and why it matters

Hashes are the anchors that make evidence verifiable without exposing the files. The PoV hash commits to a dossier’s canonical JSON bytes; file digests commit to every PDF/photo/CSV; claim ids derive from those hashes plus the identity fields so a fact can only be monetized once. If two teams serialize the same dossier identically, they get the same bytes → same hash → same truth.

Short rule: the chain trusts bytes and hashes, not filenames or promises.

Hashes we use and where

File digest (sha256): sha256(file_bytes) in lowercase hex. Every referenced file (e.g., bl.pdf, seal.jpg, temp-log CSV) appears as { name, sha256, size } in the dossier.
PoV hash (sha256): sha256(canonical_json_bytes) of the canonical JSON (Section 12.1). This binds the whole dossier (including its file digests and ordering).

Claim id (keccak256): Unique, route-agnostic identifier derived from facts and the PoV hash:

claim_id = keccak256(
    chain_id,
    lane,             // "TRADE" | "TOKENS"
    schema_id,        // e.g., "TRADE.ON_BOARD.v1"
    claim_key_fields, // minimal identity for the event/unit
    pov_hash          // 32 bytes, sha256(canonical json)
)

Examples of claim_key_fields: Trade/On-Board → { BL_number, seal_number, hash(container_ids_sorted) } Tokens/Carbon → { program, project_id, vintage, unit_serial }

Why two algorithms? SHA-256 is widely used for content commitment across tools; Keccak-256 (Solidity’s keccak256) is standard for deterministic ids on EVM. We store all as bytes32 on-chain.

How to hash

Files: Read in binary; do not normalize newlines. Stream through SHA-256; output lowercase hex. Record { name, sha256, size } in dossier; never inline file bytes.
Dossier → PoV hash: Build canonical JSON (Section 12.1): sorted keys, RFC-3339 Z timestamps, stringified decimals, set/seq handling, no null, no optional empties. Encode as UTF-8 with no BOM, no trailing whitespace, \n newlines. Compute sha256(canonical_json_bytes) → PoV hash (bytes32 / lowercase hex in UIs).
Claim id: Assemble claim_key_fields per schema (minimal identity). Compute keccak256(chain_id || lane || schema_id || claim_key_fields || pov_hash) → claim_id.

Important: PoV Gate requires all counted attestations to reference the same PoV hash; One-Claim reserves the claim_id and finalizes it atomically with mint/settle.

Large evidence sets

Some evidence (e.g., temperature logs) is large or updated in chunks. Schemas can declare a Merkle field:

Chunk the file (e.g., 64 KiB); hash each chunk with SHA-256.
Build a SHA-256 Merkle tree; store the root in the dossier (under evidence.temp_log_root).
At PoV time, provide inclusion proofs for the required intervals; auditors can later verify chunks against the root.
The root is inside the PoV hash, so the commitment is stable; raw chunks are pulled via signed URLs only when needed.

Redaction commitments

When you must hide PII, omit the field and commit to a salted hash:

"redactions": [
  {
    "path": "/evidence/warehouse_contact",
    "reason": "PII",
    "salt_id": "s_01",
    "sha256": "sha256(utf8(contact) || salt_bytes)"
  }
]

Salts are per-org/per-lane; never reuse.
Auditors/regulators can verify offline by revealing the salt; the on-chain PoV hash doesn’t change because redacted fields are omitted by schema.

Equality, drift, and server checks

Equality in PoV: Counted attestations must reference the same PoV hash; if any differ, the gate fails (E_HASH_MISMATCH).
Canonical drift: Server re-canonicalizes the submitted dossier; if sha256(local_bytes) ≠ sha256(server_bytes), you get E_CANONICAL_DRIFT with the server’s canonical form so you can fix your serializer.
Set vs Seq: Schema marks arrays as SET (sort before hashing) or SEQ (preserve order). Changing order in a SET must not change the PoV hash; changing order in a SEQ must.

Encoding and formats

Hex: Lowercase, no 0x in JSON; UIs may show 0x….
Timestamps: YYYY-MM-DDTHH:MM:SSZ (or .mmmZ if schema demands).
Decimals: Amounts as strings with schema precision (e.g., "396000.000").
IDs/serials: Always strings.
Numbers: Integers allowed only for pure counts; no sign, no leading zeros, no exponent.

Test vectors

Key order drift: Swapping object keys produces identical PoV hash.
SET vs SEQ: Re-order container_ids (SET) → same hash; re-order temp_log[] (SEQ) → different hash.
Whitespace: Add spaces/newlines → same hash.
Decimal strings: If schema says 3 decimals, "39.6" is invalid; "39.600" is valid.
Merkle inclusion: Verify chunk proof matches temp_log_root.
Combined claim id: Two dossiers with same BL/containers/seal but different PoV hash produce different claim_ids (the dossier hash is part of the id).

API & signatures

Submit bytes, not objects: PoV endpoints accept the canonical bytes; server recomputes the hash and rejects drift.
Attestor signatures: Bind to (schema_id, order_or_listing_id, stage_or_unit, pov_hash) with the role’s active key in the Attestor Registry.
Idempotency: Include Idempotency-Key on any POST that changes state (proof, release, settle); retries do not create duplicates.

Security properties

Second-preimage resistance: SHA-256 commitments make it infeasible to change a file or field without changing the hash.
Determinism: Canonicalization ensures equal facts → equal hashes across languages and toolchains.
Global uniqueness: By including the PoV hash in the claim_id, different dossiers (even with the same identity fields) cannot collide.
Auditability: Every receipt/proof page shows PoV hash, claim id, file digests, and (when money moved) the EDM burn hash, so third parties can replay the chain from blobs.

Operator checklist

Serialize once with the canonicalizer you test in CI; don’t let UI code re-serialize “helpfully.”
Hash files as they are (binary, streaming); never transcode or compress before digesting.
Mark arrays correctly (SET vs SEQ) in your serializer.
Use the exact decimal string precision the schema demands.
Keep salts offline and unique; never reuse a salt across orders.
Store the canonical bytes alongside the object you keep in your DB; audits are faster when you can reproduce the hash byte-for-byte.

Plain recap

Hashing is how we turn evidence into a commitment the chain can enforce. Files → sha256 digests, dossier → PoV hash (sha256), uniqueness → claim_id (keccak256). Canonical JSON guarantees determinism, Merkle roots let you commit to big logs, and redaction keeps PII out while still being provable. When the hash everyone signs is the same, PoV can pass; One-Claim can finalize; and—when it’s time to pay—the router can burn exactly half because the facts are locked to a single, verifiable digest.

PreviousCanonical JSON NextSchemas

Last updated 1 month ago