For the BI team · no code, just concepts

What we measure, and how the robot decides.

Every night, a robot walks through the 12 CNC sites, performs the same actions a reader would (open homepage, browse, click into a gallery, play a video), and checks whether the analytics events we expect actually fire. This page explains the moving parts — and lets you play with a simulator to see how the robot reasons.

v1.1Dynamic validation Sites12 TrackersDataLayer · GA4 · Gemius Verdicts8 kinds, routed to 2 Slack channels
Big picture

One robot, one page, three trackers, one verdict.

The robot opens a page, performs an action, listens for what each tracker says — then compares against what the BI team said should happen on that page.

Robot simulates a reader The page blesk.cz / auto.cz / … DataLayer JS array on the page GA4 Google Analytics Gemius Audience measurement Verdict pass / fail kind / suppressed routes to Slack channel
The three trackers

Each one tells a different version of the same story.

If all three agree that the event happened, we trust it. If only some agree, we have a bug somewhere — and the routing tells us where to look.

DataLayer

The JavaScript truth

A growing list on window.dataLayer. Each entry has an event name and parameters. This is what the site itself says happened.

{ event: "gallery_open",
  gallery_id: 12345,
  article_id: 798801 }
GA4

What Google saw

A network beacon to google-analytics.com/collect with an en parameter — the event name. Tells us whether the GA4 install actually reported it.

collect?en=gallery_open
  &ep.article_id=798801
  &ep.premium=true
Gemius

What audience measurement saw

A network beacon to a Gemius host with an et parameter — the event type. Coarser-grained but the source of truth for industry traffic reporting.

.gemius.com/?et=view
  &extra=articleId%3D798801

For most flows the three event names differ: a single "gallery opened" action lands in DataLayer as gallery_open, in GA4 as gallery_open, and in Gemius as view. We track the canonical mapping in data/event-rules.json — one row per flow, three names per row.

The contract

What we expect for the 9 baseline flows.

The BI team owns this table. Edit it to change what the robot looks for; no code change required.

Logical flow Applies to page type DataLayer GA4 Gemius Count
Page readyhomepagepage_readypage_viewview≥1
Pagination — nextcategory_paginatedpage_nextpage_nextview≥1
Pagination — previouscategory_paginatedpage_previous
(also accepts page_prev)
page_previousview≥1
Gallery openarticle_with_gallerygallery_opengallery_openview≥1
Gallery — nextarticle_with_gallerygallery_nextgallery_nextview≥1
Gallery — previousarticle_with_gallerygallery_previousgallery_previousview≥1
Player startarticle_with_vplayerplayer_startplayer_startstream≥1
Article part — nextarticle_multipartarticlePart_nextarticlePart_nextview≥1
Article part — previousarticle_multipartarticlePart_previousarticlePart_previousview≥1

≥1 ("at least once") is the default count rule — any captured matching name satisfies the contract. The BI team can tighten any rule to exactlyOnce (duplicates fail), maxOnce (extras OK but not required), or zero (event must NOT appear) at any time.

Try it

Set up a page, run the robot, see the verdict.

Toggle the page state on the left, pick what action the robot should take, then click Run robot. The right side walks through every decision the classifier made and lands on a verdict.

1. The page state

What's actually visible on the page when the robot arrives.

Has a gallery embedded image gallery widget
Has a video player (VPlayer) embedded CNC video
Has a paywall premium content gate visible
User is premium paid subscriber session
Consent accepted cookie banner was dismissed
Modal still blocking an overlay intercepts clicks

2. What the robot does

The robot's reasoning

⚙️
Click Run robot to see what verdict the classifier reaches for the current page state and action.
Failure kinds

Eight verdicts — click one to see what it means.

The classifier never silently passes a row. Every failed row carries a failureKind that routes it to the right channel and tells the right team to look.

tracking-brokenexpected event missing
duplicateevent fired too many times
under-countedfewer events than required
param-mismatchrequired param missing/wrong
robot-brokenmodal blocked, selector gone
site-brokenpage didn't load / network
timeoutaction took too long
unclassifiedno rule matched — needs triage
Click a kind above to see its routing, the team that owns it, and an example.
The headline case

Why "article has no gallery" doesn't trigger a false alarm anymore.

This is the bug the v1.1 milestone was built to fix. Before v1.1, the robot would call galleryOpen on every article — even ones that didn't have a gallery — and report a missing gallery event as a tracking failure. The BI team got noise instead of signal.

1

Robot arrives at an article with no gallery widget

The robot's job is to test the gallery flow. It calls galleryOpen regardless of what's on the page — that's what it does.

2

Evidence collector inspects the page

Before validating events, the robot probes the DOM: are gallery selectors visible? In this case, no — hasGallery: false lands in the evidence snapshot.

3

Classifier derives the actual page type

Because the action is galleryOpen but the page has no gallery, the page type is resolved as article_standard, not article_with_gallery.

4

The gallery rule looks for its match

The gallery-open rule says "I apply to article_with_gallery". This page is article_standard. The rule simply doesn't apply here.

Verdict: ui-mismatch-suppressed, routed to #bi-triage

Not a tracking failure. The BI team isn't paged. Instead the row is suppressed (silent in #bi-alerts) and surfaced quietly to triage so an analyst can confirm the suppression heuristic was right.

Slack output

Five formats, one underlying signal.

Same incident, different lenses. Detailed for the first hit. Compact for mobile triage. Minimal for high-volume channels. Daily / weekly / monthly summaries roll up the trend. Click a tab to see what each looks like.

Dispatch flow

For each failing row, src/lib/slackDispatch.ts walks five gates in order. The first one to suppress short-circuits the rest; a row that survives all five becomes a single Slack POST.

1

quietHours

Suppress non-severe kinds during configured window. Severe kinds (e.g. site-broken) always pass.

2

router

Look up (channel, template, dedupWindow) for the verdict in slack-routing.json.

3

mentions

Resolve owner pings from owners.json: default + by-property + by-failureKind, deduplicated.

4

template

Render the Block Kit payload — new-functional, technical-failure, or a compact / minimal variant.

5

POST

Record the incident first (atomic _create); only POST when status is created. Repeats are swallowed.

When this fires: First detection of a functional BI issue (tracking-broken, param-mismatch, duplicate, under-counted).
:rotating_light: BI measurement issue detected #bi-alerts
Siteblesk.cz
Page typearticle_with_gallery
DeviceDesktop
URLhttps://www.blesk.cz/clanek/…

Issue typeMissing event
Logical eventGallery open
DataLayerPass
GA4Pass
GemiusFail

ExpectedGemius beacon with et=view
ActualNo matching Gemius request captured

UI evidencehasGallery, hasArticleBody, consent:didomi
Likely meaningGallery rendered and DataLayer/GA4 fired, but the Gemius install missed it — most likely a Gemius script-load or rate-limit issue.
ClassificationFunctional BI issue
View in Grafana
Coverage matrix

Twelve sites, five flows, one glance.

Which user flow runs against which property. A check means the robot has a configured URL + selectors for that flow on that site; a dash means the flow isn't applicable (no multipart articles, no video, …). Derived from data/config-per-domain/*.json.

Site Homepagepage_ready Paginationpage_next · page_prev GallerygalleryOpen · gallery_next · gallery_previous Videoplayer Article partsarticlePart_next · articlePart_previous
abc
ahaonline
auto
autorevue
blesk
dama
e15
maminka
mojezdravi
reflex
zeny
zive
Configured flow — robot runs it nightly Not applicable for this property
What's still missing

Eight candidate features we haven't built yet.

Honest list — each item is a real BI ask, with a one-line take on whether to build it now, soon, or never.

shipped

All-clear notifications

Wired into scripts/incident-close.ts. Closing an incident posts a small "resolved" message via buildAllClear() to the same channel as the original alert (webhook-friendly; thread reply still needs a bot token).

shipped

Owner pings per property

Resolved at dispatch time via src/lib/owners.ts against data/owners.json — default + by-property + by-failureKind, de-duplicated. Mentions appear inside the alert body for webhook clients.

later

Recurrence escalation

If the same incident has occurred N times in M hours, escalate — page on-call, ping a Slack group. Threshold lives in slack-routing.json.

later

Threading repeats

Repeat occurrences reply to the original alert's thread instead of a new top-level message. Needs Slack bot token + thread_ts (today we're webhook-only). Cleanly collapses noise.

later

Slack reactions for triage

:eyes: = investigating, :white_check_mark: = resolved (auto-closes the incident). Needs the same Slack bot token as threading.

shipped

Threshold-breach meta-alerts

Hourly cron in scripts/slack-threshold-check.ts (scheduled at pipelines/pipeline-threshold-check.yml) inspects last-hour failure rates and posts buildThresholdBreach() when a kind exceeds its threshold.

later

Cross-incident correlation

"5 different sites all started failing at 14:32" → meta-alert with the time pattern highlighted. Probably catches a third-party tracker outage minutes earlier than today.

maybe

Quiet hours

Suppress alerts 22:00-08:00 except for high-severity. Real value depends on whether the BI team is genuinely on-call overnight — open question.