Phase 07 · v1.0 · Shipped 2026-05-20

Failure taxonomy & surfacing.

A failed result row carries a structured failureKind distinguishing tracking-broken vs site-broken vs robot-broken vs timeout. The classification surfaces in Grafana and Slack so an analyst can route the issue to the right team. Phase 10 will extend this enum with param-mismatch, duplicate, under-counted, unclassified.

Milestonev1.0 ScopeBIF-03, BIF-04 Depends onPhase 2 (outcome field) ModeStrictly additive
Phase boundary

Who owns the fix — orthogonal to "what happened".

Strictly additive; builds on Phase 2's outcome field; does NOT change matching, capture, or any Phase 1–6 contract. outcome answers what happened (validated/aborted/skipped); failureKind answers which team owns the fix. Both are needed.

Decisions
Q1 · Field name & values

Additive optional failureKind? on both row types.

export type FailureKind
  = "tracking-broken"
  | "site-broken"
  | "robot-broken"
  | "timeout";

Absent on validated/passing rows. Set whenever passed: false, regardless of outcome.

Q2 · Classification rules

Origin → failureKind.

OriginoutcomefailureKind
validateEvents finds source missingvalidatedtracking-broken
Per-section catch in bi-measurement.tssection-abortedclassifyError(err)
reconcileMissingFlows synthesizes placeholderflow-skippedrobot-broken
Passing rowvalidated(absent)
Q3 · classifyError(err: unknown): FailureKind

Pure helper · first match wins.

  1. /timeout/i OR error name TimeoutError (Playwright) → "timeout".
  2. Known modal/consent selectors OR "intercepts pointer events" / "element is not visible" / "waiting for selector" (without timeout) → "robot-broken".
  3. net::ERR_, ERR_HTTP, ECONN, ENOTFOUND, getaddrinfo, ERR_CONNECTION_, Cannot navigate, page.goto: with network → "site-broken".
  4. Default → "robot-broken" (the runner observed the error; attribute to robot/tooling absent stronger signal).

Pure, deterministic, no side effects. Lives at src/utils/failureClassification.ts.

Success criteria
  1. A modal-blocked failure produces a row whose failureKind is "robot-broken", distinct from a tracking miss.
  2. A page.goto timeout produces a row whose failureKind is "timeout", layered on Phase 2's outcome="section-aborted".
  3. A genuine tracking miss produces a row whose failureKind is "tracking-broken" while outcome stays "validated".
  4. A flow-skipped row carries failureKind="robot-broken" because the robot's guard return is the proximate cause.
  5. A passing row leaves failureKind undefined — never falsy-but-present.
  6. Grafana exposes failureKind as a template variable and filterable facet; dashboard version 63 → 64.
  7. Slack annotates each row in the existing event-miss and Aborted/Skipped blocks with an inline failureKind: <kind> suffix; no new blocks, no reorder.
  8. Phase 1–6 behavior byte-unchanged; every pre-existing assertion still passes; zero new npm deps.
Files
modsrc/lib/Types.tsFailureKind union + optional failureKind?
newsrc/utils/failureClassification.tsclassifyError (pure)
modsrc/utils/eventUtils.tsstamp failureKind at validate / emit / reconcile sites
modsrc/bi-measurement.tsclassifyError(err) into emitOutcomeRow
modsrc/lib/slack.tsinline failureKind annotation
modgrafana-dashboard-with-viewport.jsonfailureKind template var; v63→64
newtests/failure-classification.test.js
modtests/event-utils.test.jsstamping assertions at each call site
← PreviousPhase 6 · Cutover Next →Phase 8 · v1.1 begins