Every night, a robot walks through the 12 CNC sites, performs the same actions a reader would (open homepage, browse, click into a gallery, play a video), and checks whether the analytics events we expect actually fire. This page explains the moving parts — and lets you play with a simulator to see how the robot reasons.
The robot opens a page, performs an action, listens for what each tracker says — then compares against what the BI team said should happen on that page.
If all three agree that the event happened, we trust it. If only some agree, we have a bug somewhere — and the routing tells us where to look.
A growing list on window.dataLayer. Each entry has an event name and parameters. This is what the site itself says happened.
A network beacon to google-analytics.com/collect with an en parameter — the event name. Tells us whether the GA4 install actually reported it.
A network beacon to a Gemius host with an et parameter — the event type. Coarser-grained but the source of truth for industry traffic reporting.
For most flows the three event names differ: a single "gallery opened" action lands in DataLayer as gallery_open, in GA4 as gallery_open, and in Gemius as view. We track the canonical mapping in data/event-rules.json — one row per flow, three names per row.
The BI team owns this table. Edit it to change what the robot looks for; no code change required.
| Logical flow | Applies to page type | DataLayer | GA4 | Gemius | Count |
|---|---|---|---|---|---|
| Page ready | homepage | page_ready | page_view | view | ≥1 |
| Pagination — next | category_paginated | page_next | page_next | view | ≥1 |
| Pagination — previous | category_paginated | page_previous (also accepts page_prev) | page_previous | view | ≥1 |
| Gallery open | article_with_gallery | gallery_open | gallery_open | view | ≥1 |
| Gallery — next | article_with_gallery | gallery_next | gallery_next | view | ≥1 |
| Gallery — previous | article_with_gallery | gallery_previous | gallery_previous | view | ≥1 |
| Player start | article_with_vplayer | player_start | player_start | stream | ≥1 |
| Article part — next | article_multipart | articlePart_next | articlePart_next | view | ≥1 |
| Article part — previous | article_multipart | articlePart_previous | articlePart_previous | view | ≥1 |
≥1 ("at least once") is the default count rule — any captured matching name satisfies the contract. The BI team can tighten any rule to exactlyOnce (duplicates fail), maxOnce (extras OK but not required), or zero (event must NOT appear) at any time.
Toggle the page state on the left, pick what action the robot should take, then click Run robot. The right side walks through every decision the classifier made and lands on a verdict.
What's actually visible on the page when the robot arrives.
The classifier never silently passes a row. Every failed row carries a failureKind that routes it to the right channel and tells the right team to look.
This is the bug the v1.1 milestone was built to fix. Before v1.1, the robot would call galleryOpen on every article — even ones that didn't have a gallery — and report a missing gallery event as a tracking failure. The BI team got noise instead of signal.
The robot's job is to test the gallery flow. It calls galleryOpen regardless of what's on the page — that's what it does.
Before validating events, the robot probes the DOM: are gallery selectors visible? In this case, no — hasGallery: false lands in the evidence snapshot.
Because the action is galleryOpen but the page has no gallery, the page type is resolved as article_standard, not article_with_gallery.
The gallery-open rule says "I apply to article_with_gallery". This page is article_standard. The rule simply doesn't apply here.
ui-mismatch-suppressed, routed to #bi-triageNot a tracking failure. The BI team isn't paged. Instead the row is suppressed (silent in #bi-alerts) and surfaced quietly to triage so an analyst can confirm the suppression heuristic was right.
Same incident, different lenses. Detailed for the first hit. Compact for mobile triage. Minimal for high-volume channels. Daily / weekly / monthly summaries roll up the trend. Click a tab to see what each looks like.
For each failing row, src/lib/slackDispatch.ts walks five gates in order. The first one to suppress short-circuits the rest; a row that survives all five becomes a single Slack POST.
Suppress non-severe kinds during configured window. Severe kinds (e.g. site-broken) always pass.
Look up (channel, template, dedupWindow) for the verdict in slack-routing.json.
Resolve owner pings from owners.json: default + by-property + by-failureKind, deduplicated.
Render the Block Kit payload — new-functional, technical-failure, or a compact / minimal variant.
Record the incident first (atomic _create); only POST when status is created. Repeats are swallowed.
One line, no body. Useful when #bi-alerts is paired with a louder downstream channel (e.g. PagerDuty) — the Slack message becomes a low-friction confirmation rather than the primary signal.
blesk/article_with_gallery · Gallery open (tracking-broken) — 12× since 2026-05-22auto/article_multipart · Article part — next (under-counted) — 8× since 2026-05-24reflex/article_with_vplayer · Player start (tracking-broken) — 6× since 2026-05-23e15/homepage · Page ready (timeout) — 5× since 2026-05-25dama/article_with_gallery · Gallery — next (duplicate) — 4× since 2026-05-25
tracking-broken 6 :arrow_up: +1duplicate 3 :left_right_arrow: 0robot-broken 3 :arrow_up: +2under-counted 2 :arrow_down: -1
tracking-broken 18 :arrow_up: +3duplicate 12 :arrow_down: -2robot-broken 9 :arrow_up: +5unclassified 8 :left_right_arrow: 0
blesk/article_with_gallery · Gallery open (tracking-broken) — 12× since 2026-04-03
Which user flow runs against which property. A check means the robot has a configured URL + selectors for that flow on that site; a dash means the flow isn't applicable (no multipart articles, no video, …). Derived from data/config-per-domain/*.json.
| Site | Homepagepage_ready | Paginationpage_next · page_prev | GallerygalleryOpen · gallery_next · gallery_previous | Videoplayer | Article partsarticlePart_next · articlePart_previous |
|---|---|---|---|---|---|
| abc | ✓ | ✓ | ✓ | ✓ | ✓ |
| ahaonline | ✓ | ✓ | ✓ | ✓ | – |
| auto | ✓ | ✓ | ✓ | ✓ | ✓ |
| autorevue | ✓ | ✓ | ✓ | ✓ | ✓ |
| blesk | ✓ | ✓ | ✓ | ✓ | ✓ |
| dama | ✓ | ✓ | ✓ | ✓ | ✓ |
| e15 | ✓ | ✓ | ✓ | ✓ | ✓ |
| maminka | ✓ | ✓ | ✓ | ✓ | ✓ |
| mojezdravi | ✓ | ✓ | ✓ | ✓ | – |
| reflex | ✓ | ✓ | ✓ | ✓ | ✓ |
| zeny | ✓ | ✓ | ✓ | ✓ | ✓ |
| zive | ✓ | ✓ | ✓ | ✓ | ✓ |
Honest list — each item is a real BI ask, with a one-line take on whether to build it now, soon, or never.
Wired into scripts/incident-close.ts. Closing an incident posts a small "resolved" message via buildAllClear() to the same channel as the original alert (webhook-friendly; thread reply still needs a bot token).
Resolved at dispatch time via src/lib/owners.ts against data/owners.json — default + by-property + by-failureKind, de-duplicated. Mentions appear inside the alert body for webhook clients.
If the same incident has occurred N times in M hours, escalate — page on-call, ping a Slack group. Threshold lives in slack-routing.json.
Repeat occurrences reply to the original alert's thread instead of a new top-level message. Needs Slack bot token + thread_ts (today we're webhook-only). Cleanly collapses noise.
:eyes: = investigating, :white_check_mark: = resolved (auto-closes the incident). Needs the same Slack bot token as threading.
Hourly cron in scripts/slack-threshold-check.ts (scheduled at pipelines/pipeline-threshold-check.yml) inspects last-hour failure rates and posts buildThresholdBreach() when a kind exceeds its threshold.
"5 different sites all started failing at 14:32" → meta-alert with the time pattern highlighted. Probably catches a third-party tracker outage minutes earlier than today.
Suppress alerts 22:00-08:00 except for high-severity. Real value depends on whether the BI team is genuinely on-call overnight — open question.