9 guards in production (check-blog-registration / check-blog-related-links --strict-blog-only / check-orphan-routes / check-seo-page-entries / check-tools-title-list / check-page-tsx-files / check-prerender-skip-completeness / check-seasonal-anchor-staleness --advisory / check-blog-date-invariant) × 0-defect baseline at introduction × synthetic-fixture failure-path verification across all 9 — past the 5-instance reusable-template threshold and into automatic-template-clone territory whenever any new corpus-scale cross-file invariant surfaces
A corpus-canonical engineering discipline for catching editorial-registration drift before deploy. The template: a Node script `scripts/check-<thing>.cjs` with strict and advisory modes, regex extraction from one or more authoritative files, diff against a truth-source, baseline 0-defect at first run (including any companion fixes the guard surfaces in the same commit), synthetic-fixture failure-path verification before commit, then wired into `.github/workflows/deploy.yml` as a strict-mode pre-deploy step. The 9 production guards span four structural archetypes: (1) cross-file consistency code↔code where two registries must list the same set (check-blog-registration: every blog file must appear in `blogPostMeta.ts` AND `SitemapController.java`; check-blog-related-links: every internal `/blog/*` ref must resolve to a blog file; check-orphan-routes: every static `<Route path>` in `App.tsx` must have a `addUrl` in `SitemapController.java`; check-seo-page-entries: every audit-set route must have a `path:` literal in `generate-seo-pages.cjs`; check-tools-title-list: every `/tools/*` audit-set entry must have a `['<path>', 'Label']` tuple in the prerender homepage list); (2) cross-domain consistency code↔filesystem where a registry's references must resolve to actual on-disk files (check-page-tsx-files: every `lazy(() => import('./pages/X'))` in `App.tsx` must resolve to a `frontend/src/pages/X.tsx` file); (3) cross-file consistency with code-shape-derived prefix where the truth-source is computed from one file's code-shape and asserted against another file's array literal (check-prerender-skip-completeness: every parameterized `<Route path="/foo/:slug">` in `App.tsx` whose component imports from `services/api` must have its prefix in `apiDependentPrefixes` in `prerender-pages.cjs`); (4) cross-file invariant on data-shape (check-seasonal-anchor-staleness advisory: each seasonal-anchor page's `dateModified` must be within configured staleness threshold of today; check-blog-date-invariant: every blog `dateModified ≥ datePublished` per Google's schema-validator constraint). Sub-rule (form-agnostic): the template generalises across page-object form (`{ path: '/foo' }`), tuple-array form (`['/foo', 'Label']`), lazy-import form (`lazy(() => import('./pages/X'))`), and route-prefix-derivation form (parameterized routes + service-import detection). Sub-rule (domain-agnostic): the template generalises across code↔code (1st–7th guards), code↔filesystem (8th guard), and code↔code-with-derived-prefix (9th guard).
- The 9-instance application set is the empirical evidence chain. check-blog-registration was first (motivated by the April 2026 honey-tasting-guide incident where a blog file shipped without `blogPostMeta.ts` registration and was invisible to Google for weeks). check-blog-related-links --strict-blog-only added at 2026-04-29 (102 broken refs across 65 files at introduction, auto-fixed via 42 noun↔adjective + topic↔topic-benefits mappings). check-orphan-routes shipped 2026-04-29 with baseline 0 orphans post the morning fix of `/tools/quiz-embed` (silent gap 2026-02-08 → 2026-04-29). check-seo-page-entries shipped 2026-04-29 with baseline 0 missing on 222-route audit-set vs 790 SEO path literals. check-tools-title-list shipped 2026-04-30 with baseline 0 missing on 62/62 tools audit-set. check-page-tsx-files shipped 2026-04-30 with baseline 0/239 missing (closes the speculative-registration runtime-blank failure mode where every cross-file consistency check passes but the underlying lazy chunk has no file to fetch — precedent: `/learn/honey-conflict-zones`). check-prerender-skip-completeness shipped 2026-04-30 with baseline 0/4 missing post a companion fix that added `/local/` to `apiDependentPrefixes` (the 9th guard surfaced a latent gap that the prior day's curl-driven manual audit missed because `/local/<slug>` generates no static HTML today and so does not appear in any prerender output to inspect — code-shape detection caught what output-shape audit could not). check-seasonal-anchor-staleness was added in advisory mode; check-blog-date-invariant catches `dateModified < datePublished` per the schema-validator constraint.
- 0-defect baseline at introduction is the editorial payoff. Every guard ships with the baseline at zero — including any companion fixes the guard surfaces in the same commit. The check-prerender-skip-completeness ship is canonical: the guard surfaced `/local/:slug` as a latent gap on first audit, and the same commit added `/local/` to `apiDependentPrefixes`, preserving the 0-defect baseline rule. Sub-rule: when a NEW guard surfaces gaps on first audit, ship the gap-fix in the same commit rather than as a follow-up — the 0-defect baseline is invariant across the introduction commit.
- Synthetic-fixture failure-path verification is canonical and non-negotiable. Clean-baseline runs alone don't verify sigil/exit-code logic: a guard that exits 0 on a clean corpus has not demonstrated it would exit 1 on a violation. Each guard's introduction commit includes a synthetic fixture (typically a temporary file injecting a fake violation), runs the guard against it expecting strict exit 1 + advisory exit 0 with same warning, then restores → exits 0. Without this verification, a regex bug or path-comparison bug could silently produce false 0-defect baselines indefinitely.
- Regex extraction must handle BOTH canonical multi-line forms AND inline forms — production data may follow one canonical form but synthetic-fixture stresses regex against minimum-line-count fixtures. Precedent: check-seo-page-entries initially used line-anchored `^\s*path:` which production matched but synthetic inline `{ path: '/foo' }` exposed; durable fix `(?:^\s*|[{,]\s*)path:` covers both. Sub-rule: regex extraction code must be tested against the minimum-line-count synthetic fixture, not the production-typical multi-line form, because the synthetic fixture is the failure-path verification surface.
- When an existing guard is scoped to a single route class, expanding it across all sibling route classes is usually higher-leverage than authoring a new dedicated guard. The audit-logic skeleton transfers; only the truth-source parser changes per class. Precedent: check-blog-related-links was originally scoped to /blog/<slug> internal refs; expanded 2026-04-30 from 1 to 5 route classes (/blog + /learn + /tools + /compare + /best-honey-for) by parameterising the truth-source parser. Sub-rule: graduate each new class from advisory to strict via the `--strict-<class>-only` flag pattern as each baseline reaches zero, rather than a single all-or-nothing flip — the strict-blog-only mode preserves the original CI gate while keeping new classes advisory until baseline is paid down.
- Code-shape-driven CI guards have a structural advantage over output-shape-driven manual audits because they catch latent bugs that have not yet manifested in any output. check-prerender-skip-completeness operates on code-shape (parameterized route + services/api import) not on rendered-page-shape, which is why it caught `/local/:slug` despite the route generating no static HTML to inspect. Sub-rule: when the bug class is "X is registered in App.tsx but missing from Y," the guard's detection logic should match X's code-shape (e.g. parameterized-route-pattern) rather than X's rendered output (which may not exist at audit time). Bug class fixed by the 9th guard: when a NEW `<Route path="/foo/:slug">` is added in App.tsx whose component fetches from `/api/<thing>/<slug>`, generate-seo-pages.cjs builds static HTML, but the prefix is NOT in `apiDependentPrefixes` → Puppeteer's mocked-404 API triggers component "not found" branch (no `<SEO>`) → snapshot inherits homepage `<head>` → writeFileSync clobbers static page-specific HTML. Net effect: silent prod regression — Nginx 200, page hydrates correctly, only meta-description in `<head>` is wrong. Precedent: 50 city pages + 1650 event pages all served the homepage 140-char placeholder description in SERPs for months until 2026-04-30 (commit 640ee93a).
- Future corpus-scale drift surfaces are the candidate-set for template clones. Strongest candidates: (a) check-honeys-json-image-urls.cjs — every `imageUrl` + `thumbnailUrl` in `backend/src/main/resources/seed-data/honeys.json` returns HTTP 200 against R2 CDN (catches R2-bucket-broken cases like the events imagery 404 wave 2026-04-29); (b) check-blog-source-urls.cjs — every external `<a href="https://...">` cited as a source in blog data files returns 200 (catches cite-rot in long-form posts); (c) check-country-guide-hero-prompt-recipe-completeness.cjs — every `scripts/_gen-<slug>-hero.cjs` head comment block contains all 5 §42 prompt-recipe elements (would prevent re-roll regression as later instances ship); (d) honeys.json variety-array vs generate-seo-pages variety-list (page-object form); (e) learn-page hub link-grid vs SitemapController (tuple-array form); (f) footer link list vs registered routes. Each cross-domain class is a fresh template-extension surface; the candidate-set is bounded but not exhausted.
Method. Editorial spine: corpus-scale cross-file consistency invariants need code-shape-driven CI guards that catch editorial-registration drift before deploy rather than relying on per-ship author discipline. The template is the page-shape discipline that prevents the silent-failure regression class — a multi-file registration where any one file has the wrong entry passes per-file lint but ships a broken corpus surface. The 9-instance count is the editorial payoff: every guard shipped under this template has reached 0-defect baseline on first run (including any companion fixes), has been verified against synthetic fail-path fixtures, and has been wired into the strict-mode deploy pipeline. The compounding leverage is that future drift surfaces inherit the named template + recipe rather than re-deriving the script-shape and verification-discipline from scratch. Primary sources: this is a corpus-internal engineering-discipline derived from the 9-instance application set 2026-04-29 through 2026-04-30, plus the original 2026-Q1 check-blog-registration that motivated the template after the honey-tasting-guide incident. External validation candidates: software-engineering literature on shift-left testing (moving compliance checks from late-stage audits to early-stage authoring); type-system theory on cross-module consistency invariants and how dependent types encode registration discipline at compile time; build-system literature on "lockfile" patterns where a derived file is consistency-checked against its source files (e.g. package-lock.json, Cargo.lock, Gemfile.lock); database-integrity literature on referential integrity constraints as the canonical cross-table consistency mechanism (the corpus-scale CI guard is the file-system analogue of a foreign-key constraint). The template is not contingent on any specific language or framework — it applies equally to Node + regex extraction (corpus-current), Python + AST parsing, Go + go/parser, or future LSP-driven semantic-analysis-based extraction. What this entry does NOT claim: (a) the 9-instance count is the only valid threshold for "corpus-canonical template" status — the §44 3-instance lifecycle threshold applies to editorial-discipline rules at the per-instance application level, whereas the corpus-scale CI guard template operates at the meta-template level where each instance is itself a guard that protects N (often thousands of) per-corpus invariant cases; the 5-instance reusable-template threshold cited in the n: field reflects when the template-shape became reproducible across distinct guard authoring sessions, and the 9-instance count is its current durability state; (b) every cross-file invariant is worth a guard — guards have a maintenance cost (regex updates as authoritative-file shapes evolve, false-positive triage when a legitimate edit fires the guard, deploy-pipeline latency contribution) and the candidate is worth a guard only when the bug class has shipped at least once before OR the structural failure mode is silent in prod (Nginx-200-with-wrong-head class); (c) all 9 guards run in strict mode — check-seasonal-anchor-staleness runs in advisory mode because the staleness threshold is a soft editorial signal rather than a hard deploy-blocking invariant, and the advisory/strict mode is per-guard configurable; (d) regex-based extraction is the only valid implementation — AST-based extraction would be more robust against authoritative-file shape changes (e.g. JSX-attribute-vs-prop-spread, multi-line-string-template), but the regex implementation is currently sufficient because the authoritative files follow a consistent canonical shape and the synthetic-fixture verification catches regex failure modes; future guards may adopt AST-based extraction when the authoritative file's shape variance exceeds what regex can robustly handle; (e) the 0-defect baseline is invariant across the introduction commit AND across all subsequent commits — the baseline is invariant ONLY at introduction; after introduction, the baseline can fluctuate as new corpus entries are added, and the guard's job is to catch any introduction of a new violation, not to re-establish the 0-defect baseline; (f) the deploy.yml strict-mode wiring is the only valid integration — pre-commit hooks (Husky / lint-staged) and pre-push hooks would shift detection earlier in the cycle and reduce broken-deploy risk further, but they add per-developer setup friction and the strict-mode deploy step is the canonical integration point because it runs in CI on every PR; future runs may add pre-commit / pre-push integration if the corpus-scale guard suite's wall-clock cost grows beyond what is acceptable in the per-commit hot path; (g) the candidate-set for future template clones is exhaustive — the 6 candidates listed (R2-image-URL existence, blog-source-URL existence, hero-prompt-recipe completeness, honeys.json variety-array consistency, learn-page link-grid consistency, footer link consistency) are the most-visible drift surfaces from current corpus state, but new candidates may surface as the corpus grows or as new authoritative-file shapes are introduced; the template is form-agnostic and domain-agnostic so any future drift surface inherits the script-shape, verification-discipline, and deploy-wiring recipe regardless of its specific shape.
Open the story →