DA failures

What a DA failure is—and what it cannot do

“DA failure” means the rollup cannot get its batch data into L1 blobs or readers cannot fetch those blobs in a timely way. That affects anchoring and reconstructability, not the law of settlement. Contracts still enforce: PoV PASS → EMT → Locked EDSD flips to Unlocked → fee → 50% burn. No batch or blob issue can mint an EMT, flip funds early, bypass One-Claim, or skip burns.

1. Risk surface

Posting failure: sequencer cannot post a batch’s EIP-4844 blob; L1 congestion, pricing spikes, misconfiguration
Retrieval failure: blobs posted but temporarily unfetchable; client/provider outage, indexing lag
Extended lag: batches post, but anchoring is delayed far beyond normal; minutes to hours
Blob expiry risk: catastrophic delay pushes blob publication close to the retention window; blobs are pruned after a limited time
Split-brain clients: readers see inconsistent DA endpoints; one mirrors a blob, another lags
Operator error: bad upgrade or credentials prevent blob posting; wrong fee settings starve inclusion

What these do: delay economic finality and audit replay. What they don’t: create money, destroy lineage, or let unproven facts release funds.

2. Guardrails already in place

Optimistic rollup with blobs: batches include PoV hashes, EMT ids, Locked→Unlocked EDSD deltas, fee/burn lines, One-Claim updates
Forced inclusion path: if the sequencer stalls or censors, anyone can push critical txs via L1 inbox; next batch must include them
Permissionless exits: EDM/EDSD can be withdrawn to L1 after the challenge window even if the sequencer is offline
Application brakes: Gate PASS + EMT + One-Claim + must-fund block early releases regardless of DA
Status & SLOs: we publish block delay, batch lag, blob fetch SLO, and forced-inclusion events; receipts are marked “awaiting L1 anchor” until their batch lands

3. Operating modes

Normal (less than 10 min): operate normally; receipts show burn hash; Explorer links L1 blob once posted
Degraded (10–60 min): operate normally, badge receipts as “awaiting L1 anchor”; alerts fire; forced-inclusion offered in UI/API for critical releases
Anchor-guarded (greater than 60 min, governed threshold): new EMT→release calls auto-route through forced inclusion, or pause opening new gates until at least one batch lands, or allow only capped releases per order until DA resumes (governed cap)

We announce the mode on status pages and webhooks. Why: keep business moving within minutes, but guard against extended DA outage risk with clear rails.

4. Detection → response → recovery

Detection: infra.batch_lag alert above 10 min, anchor-guarded threshold default 60 min; infra.block_delay L2 p95 above 5s; infra.blob_fetch_slo fetch failures above SLO; infra.inbox_forced_inclusion count spikes
Response: Apps/ops retry with idempotency; if lag above threshold or a critical release is needed, click forced inclusion; receipts are flagged “awaiting L1 anchor”. Sequencer adjusts blob fee parameters, failovers to hot standby, or routes forced-inclusion queue; publishes incident ETA. Governance or Sec Council enters anchor-guarded mode if needed; changes are public and timelocked to exit
Recovery: When blobs land, receipts auto-link L1 blob indices; badges clear. A post-mortem shows cause and fix; no application-level replays were allowed to bypass proof

5. Business impact matrix

Failure mode

What you feel

What does not change

What to do

Posting failure (short)

Receipts show “awaiting L1 anchor”

PoV/EMT/One-Claim, Locked→Unlocked, 50% burn

Continue; consider forced inclusion for critical releases

Posting failure (extended)

Anchor-guarded mode

Same brakes; funds remain safe

Use forced inclusion; expect gate-open pauses or capped releases until blob posts

Retrieval outage

Explorer link delayed

Receipts/burns issued on L2

Pull blob later; auditors can reconcile after provider recovers

Blob expiry risk

Ops banner with countdown

No early release

Governance raises blob fee or prioritizes batches; forced inclusion for critical queue

Operator misconfig

Temporary lag

Settlement law intact

Ops failover; config hot-fix; public ETA

6. KPIs & SLOs

Batch lag median/p95: less than or equal to 5 min / 15 min
EMT→release latency p50/p95: less than or equal to 5s / 15s (unchanged)
Blob fetch SLO: greater than or equal to 99.5% within 5 min
Forced-inclusion resolution: included in less than or equal to 30 min
Anchor-guarded time/month: 0 (target), alerts at any non-zero
Blob expiry headroom: more than 72 h between oldest unanchored event and pruning horizon

KPIs are on the status page; misses ship with root cause and remediation.

7. Governance knobs

Anchor-guarded threshold: when to switch to forced-inclusion/pauses; default 60 min; band 30–120 min
Blob fee policy: max fee multipliers for DA posting under congestion
Forced-inclusion auto-routing: enable after N minutes for critical endpoints
Release caps in guarded mode: per-order/per-hour caps to spread risk
Status transparency: require real-time batch/lag charts and inbox depth

They cannot allow money to release without EMT/PASS, reuse evidence, or discount the 50% burn.

8. Hardening techniques we use

Multi-provider DA reads: fetch blobs from multiple archival/providers; verify root against L1
Pre-funded blob gas buffer: governed reserve for DA surges; alert when drawn
Batch sizing/timing: adaptive target sizes to avoid blob overflows; shorter bursts under load
Auto-fallback to inbox: queue critical calls for forced inclusion when lag exceeds threshold
Replica explorers: mirrors of proof pages so audits don’t depend on one front-end

9. Operator checklist

If you see “awaiting L1 anchor”: proceed; receipts remain valid—just note the badge for audit
For time-critical releases during lag: use forced inclusion in UI/API (idempotent)
Don’t resubmit without an Idempotency-Key: duplicates are no-ops and clog queues
Plan cross-chain withdrawals around the challenge window: DA lag doesn’t affect on-rail cash safety
Reconcile later: when the blob lands, your receipt auto-links the L1 index

Plain recap

DA failures delay anchoring, not admissibility. When blobs lag, settlement law still holds—PoV PASS → EMT → Locked→Unlocked EDSD → fee → 50% burn—and forced inclusion keeps critical flows alive. If a lag becomes extended, we switch to an anchor-guarded mode (forced inclusion/pauses/caps) until blobs land. Worst case, you wait with receipts—you never pay early or lose the trail. No EMT, no funds.

PreviousGarbage-in NextRevocation SLA

Last updated 4 months ago

hashtagWhat a DA failure is—and what it cannot do

hashtag1. Risk surface

hashtag2. Guardrails already in place

hashtag3. Operating modes

hashtag4. Detection → response → recovery

hashtag5. Business impact matrix

hashtag6. KPIs & SLOs

hashtag7. Governance knobs

hashtag8. Hardening techniques we use

hashtag9. Operator checklist

hashtagPlain recap