Crawl-assisted catalog of every CNC property the BI tool should monitor — the 12 existing per-domain configs plus subdomains currently outside coverage (tv.blesk.cz, isport.blesk.cz, prozeny.blesk.cz, video properties). For each (property × pageType) cell: example URL, viewport coverage, and an advisory tracking-presence probe.
Two artifacts: a machine-readable data/site-inventory.json and a BI-facing docs/inventory/2026-05-26.html. The crawler is read-only against production — no clicks, no consent, no login. Tracking-presence probes are advisory only; Phase 10's runtime evidence is the authoritative signal.
Out of scope: writing event rules (Phase 9), adding per-domain configs for newly-discovered subdomains (separate PRs after Phase 9), any login or premium probe (deferred to v1.2), any runtime change to the orchestrator or robots.
Seed list = existing 12 hosts from data/config-per-domain/*.json + a hand-curated subdomain seed (tv.blesk.cz, isport.blesk.cz, prozeny.blesk.cz, video.auto.cz, …).
Anchor harvest — Playwright opens each seed root, dumps <a[href]> hosts, filters to same-org TLDs, emits candidate subdomains.
Manual classification — the BI lead marks each candidate in-scope / out-of-scope / unsure before it lands in the inventory. Discovered subdomains never auto-add.
tv.blesk.cz and blesk.cz share zero selectors and have different tracking implementations (TV likely has additional video lifecycle events). The variant-mode approach would create N×M conditionals inside every robot.
The cost is one extra config per subdomain — the same cost the existing 12 already pay.
Phase 8 produces inventory property keys only. Per-domain config PRs are separate work, post-Phase 9.
homepage · category · category_paginated · article_standard · article_multipart · article_premium_locked · article_premium_unlocked · article_with_gallery · article_with_vplayer · gallery_standalone · video_standalone · paywall · login
category_paginated and article_multipart are explicit because Phase 9 must derive rules for the existing page_next/page_prev and articlePart_* flows (see data/event-mapping.json:3–4,9–10 and per-domain nextpage/part URLs).
Each cell is one of: null (doesn't exist), {urls,viewports,probe} (populated), {status:"unknown",reason} (couldn't probe), {status:"deferred",reason} (intentionally not probed).
window.dataLayer exists and has ≥1 entry → hasDataLayer: true*google-analytics.com* with collect in path → hasGA4: truehasGemius: trueThese values are advisory, not validation — Phase 10's UI evidence is authoritative at runtime.
docs/inventory/<date>.html.{
"generatedAt": "2026-05-26T12:00:00Z",
"properties": [
{
"key": "blesk",
"name": "Blesk",
"host": "www.blesk.cz",
"scope": "in-scope",
"parentKey": null,
"pageTypes": { /* … */ }
}
]
}
Premium-unlocked articles need a logged-in premium session to render. Mixing login into discovery couples two failure modes — chosen instead to mark article_premium_unlocked as {status:"unknown",reason:"requires-premium-session"} for v1.1.
JSON lands in repo. HTML at cnc-bi-events.pages.dev/inventory/<date>. README's staged-deploy snippet now creates the inventory/ subdirectory and copies the HTML in.
Codex flagged that the original "≥18 properties / ≥4 page types" criteria were quotas without grounding. Replaced with classification completeness.
out-of-scope/deferred with a reason. No silent drops.null (doesn't exist) or {status:"deferred",reason} (not probed).docs/inventory/<date>.html deploys to cnc-bi-events.pages.dev/inventory/<date> and matches the existing visual system.inventory/ subdir before copying (mkdir -p "$STAGE/inventory").tsconfig.json includes scripts/ so npm run check typechecks the new crawler. Today tsconfig.json:14 excludes it.npm run check + npm run test:logic pass with no regression.hasGA4:false from the probe is documented as "unauthenticated, no consent" — Phase 9/10 rules don't trust probe values.robots.txt? (Recommend yes; override flag for exceptions.)Original "≥18 properties" and "≥4 page types per property" were quotas without grounding. Replaced with classification completeness; narrow properties allowed; cell shape extended with unknown/deferred to distinguish "doesn't exist" from "not probed".
The 11-column taxonomy missed category_paginated and article_multipart — both are current first-class flows. Without them Phase 9 can't derive rules for page_next/page_prev/articlePart_*. Taxonomy widened to 13 columns.
tsconfig.json:14 excludes scripts/ — npm run check would silently skip the new crawler. README staged-deploy is flat-file only — inventory/<date>.html needs mkdir -p first. Both surfaced as explicit success criteria.