DA failures
What a DA failure is—and what it cannot do
“DA failure” means the rollup cannot get its batch data into L1 blobs or readers cannot fetch those blobs in a timely way. That affects anchoring and reconstructability, not the law of settlement. Contracts still enforce: PoV PASS → EMT → Locked EDSD flips to Unlocked → fee → 50% burn. No batch or blob issue can mint an EMT, flip funds early, bypass One-Claim, or skip burns.
1. Risk surface
Posting failure: sequencer cannot post a batch’s EIP-4844 blob; L1 congestion, pricing spikes, misconfiguration
Retrieval failure: blobs posted but temporarily unfetchable; client/provider outage, indexing lag
Extended lag: batches post, but anchoring is delayed far beyond normal; minutes to hours
Blob expiry risk: catastrophic delay pushes blob publication close to the retention window; blobs are pruned after a limited time
Split-brain clients: readers see inconsistent DA endpoints; one mirrors a blob, another lags
Operator error: bad upgrade or credentials prevent blob posting; wrong fee settings starve inclusion
What these do: delay economic finality and audit replay. What they don’t: create money, destroy lineage, or let unproven facts release funds.
2. Guardrails already in place
Optimistic rollup with blobs: batches include PoV hashes, EMT ids, Locked→Unlocked EDSD deltas, fee/burn lines, One-Claim updates
Forced inclusion path: if the sequencer stalls or censors, anyone can push critical txs via L1 inbox; next batch must include them
Permissionless exits: EDM/EDSD can be withdrawn to L1 after the challenge window even if the sequencer is offline
Application brakes: Gate PASS + EMT + One-Claim + must-fund block early releases regardless of DA
Status & SLOs: we publish block delay, batch lag, blob fetch SLO, and forced-inclusion events; receipts are marked “awaiting L1 anchor” until their batch lands
3. Operating modes
Normal (less than 10 min): operate normally; receipts show burn hash; Explorer links L1 blob once posted
Degraded (10–60 min): operate normally, badge receipts as “awaiting L1 anchor”; alerts fire; forced-inclusion offered in UI/API for critical releases
Anchor-guarded (greater than 60 min, governed threshold): new EMT→release calls auto-route through forced inclusion, or pause opening new gates until at least one batch lands, or allow only capped releases per order until DA resumes (governed cap)
We announce the mode on status pages and webhooks. Why: keep business moving within minutes, but guard against extended DA outage risk with clear rails.
4. Detection → response → recovery
Detection: infra.batch_lag alert above 10 min, anchor-guarded threshold default 60 min; infra.block_delay L2 p95 above 5s; infra.blob_fetch_slo fetch failures above SLO; infra.inbox_forced_inclusion count spikes
Response: Apps/ops retry with idempotency; if lag above threshold or a critical release is needed, click forced inclusion; receipts are flagged “awaiting L1 anchor”. Sequencer adjusts blob fee parameters, failovers to hot standby, or routes forced-inclusion queue; publishes incident ETA. Governance or Sec Council enters anchor-guarded mode if needed; changes are public and timelocked to exit
Recovery: When blobs land, receipts auto-link L1 blob indices; badges clear. A post-mortem shows cause and fix; no application-level replays were allowed to bypass proof
5. Business impact matrix
Failure mode
What you feel
What does not change
What to do
Posting failure (short)
Receipts show “awaiting L1 anchor”
PoV/EMT/One-Claim, Locked→Unlocked, 50% burn
Continue; consider forced inclusion for critical releases
Posting failure (extended)
Anchor-guarded mode
Same brakes; funds remain safe
Use forced inclusion; expect gate-open pauses or capped releases until blob posts
Retrieval outage
Explorer link delayed
Receipts/burns issued on L2
Pull blob later; auditors can reconcile after provider recovers
Blob expiry risk
Ops banner with countdown
No early release
Governance raises blob fee or prioritizes batches; forced inclusion for critical queue
Operator misconfig
Temporary lag
Settlement law intact
Ops failover; config hot-fix; public ETA
6. KPIs & SLOs
Batch lag median/p95: less than or equal to 5 min / 15 min
EMT→release latency p50/p95: less than or equal to 5s / 15s (unchanged)
Blob fetch SLO: greater than or equal to 99.5% within 5 min
Forced-inclusion resolution: included in less than or equal to 30 min
Anchor-guarded time/month: 0 (target), alerts at any non-zero
Blob expiry headroom: more than 72 h between oldest unanchored event and pruning horizon
KPIs are on the status page; misses ship with root cause and remediation.
7. Governance knobs
Anchor-guarded threshold: when to switch to forced-inclusion/pauses; default 60 min; band 30–120 min
Blob fee policy: max fee multipliers for DA posting under congestion
Forced-inclusion auto-routing: enable after N minutes for critical endpoints
Release caps in guarded mode: per-order/per-hour caps to spread risk
Status transparency: require real-time batch/lag charts and inbox depth
They cannot allow money to release without EMT/PASS, reuse evidence, or discount the 50% burn.
8. Hardening techniques we use
Multi-provider DA reads: fetch blobs from multiple archival/providers; verify root against L1
Pre-funded blob gas buffer: governed reserve for DA surges; alert when drawn
Batch sizing/timing: adaptive target sizes to avoid blob overflows; shorter bursts under load
Auto-fallback to inbox: queue critical calls for forced inclusion when lag exceeds threshold
Replica explorers: mirrors of proof pages so audits don’t depend on one front-end
9. Operator checklist
If you see “awaiting L1 anchor”: proceed; receipts remain valid—just note the badge for audit
For time-critical releases during lag: use forced inclusion in UI/API (idempotent)
Don’t resubmit without an Idempotency-Key: duplicates are no-ops and clog queues
Plan cross-chain withdrawals around the challenge window: DA lag doesn’t affect on-rail cash safety
Reconcile later: when the blob lands, your receipt auto-links the L1 index
Plain recap
Last updated