the loop › Sense › Neuron

Neuron

Neuron reads a veterinary practice's website down to the byte, extracts every fact it can prove, and produces a living record of that practice's truth — the Site Genome — that every other DE system draws from.

partial

What it is

Neuron is the engine that converts a veterinary practice's existing website into a structured, editable, machine-readable record of that practice's truth. Think of it like a highly precise scanner: it visits every page of a practice's site, photographs it, pulls out every piece of meaningful content, and packages it into a single file called the Site Genome. That Genome is then the source from which the practice's new website is rendered, updated, and kept consistent — not a copy of the old site, but the living truth that the old site was imperfectly expressing. Once Neuron reads a practice in, every surface DE manages for that practice is a projection of that genome rather than a hand-maintained copy. The genome is versioned in git, meaning every change is traceable back to its source with a full audit trail.

Why it exists

Before Neuron, there was no single place that knew what a practice's phone number, hours, team, and services actually were. The website said one thing, Google Maps said another, the Facebook page said a third. Every time something changed, someone had to remember to update every surface by hand — and they didn't, so the information drifted. Neuron kills that problem at the root: one read, one canonical record, one source that everything else is derived from. The bigger problem it solves is that DE can't scale to 500 practices if every onboarding is a manual effort. Neuron makes the first read — the hardest, most information-dense step — something a machine can do reliably on its own.

How it works

Neuron's capture process runs in four stages that chain together automatically. The first stage is the Crawler. Using a real browser (the same kind you'd use yourself, just headless and automated), it visits every page of the practice's current WordPress site. It follows all internal links, reads the sitemap, captures full-page screenshots at desktop, tablet, and mobile sizes, downloads every image and font, and saves the exact HTML each page returns — including dynamic content that only appears after JavaScript runs. For Fairfax Vet, for example, that means 161 pages captured, each saved as a raw HTML file with a sha256 fingerprint so we can prove later that the bytes haven't changed. The second stage is the Extractor. It reads those HTML files and looks for recognizable patterns — the Avada/WordPress page-builder markup that most vet sites are built on. It identifies what kind of thing each section of a page is: a hero banner, a team grid, a service card, a contact block, a testimonial slider. It pulls out the text, the images, the SEO metadata, the embedded third-party scripts (appointment booking systems, pharmacy links, Google Analytics), and the forms. The third stage is the Fact-Finder, and this is where Neuron goes deeper than most tools. Rather than just copying what the site says, it reads every page byte-by-byte and finds every occurrence of each meaningful fact — every format a phone number appears in, every place an email address shows up, every carrier pattern. It coalesces these across all pages into canonical facts (one phone number, one email, stored once) and records exactly where in the raw bytes each fact lives. This is what makes editing possible later: when you change the phone number, the system knows the precise byte location of every occurrence on every page and can update them all at once, provably, without touching anything else. The fourth stage is the Composer, which assembles all of this into the Site Genome — a single JSON file that is the complete, structured, schema-validated truth about that practice. It records the practice's design tokens (their specific brand colors, fonts, and spacing), their navigation structure, every page and its sections, every team member, service, testimonial, and form. The genome also receives an RFC-3161 genesis timestamp from an external authority, meaning there is a cryptographically verifiable record that these exact bytes existed at this exact time before any edits — a provenance anchor that holds up as legal evidence if it ever needs to.

The parts

The Crawler (packages/crawler/src/crawl.ts) — Drives a headless Chromium browser to visit every page of the practice's current site. Captures raw HTML, full-page screenshots at three viewport sizes, a HAR file (a complete log of every network request the page made), and downloads images and fonts. Saves a sha256 fingerprint of each page's bytes so nothing can change without detection.

The Extractor (packages/extractor/src/extract.ts + patterns.ts) — Parses the crawled HTML and maps each section of each page to a named component type — Hero, TeamGrid, ServiceCard, CTABanner, FAQAccordion, ContactSection, and so on. Pulls out all text content, image references, SEO metadata, embedded integrations, and form structures. Produces confidence scores; anything below 0.8 is flagged for human review.

The Fact-Finder (scripts/find-facts.ts) — Goes a level deeper than the extractor: reads every page byte-by-byte and finds every occurrence of every machine-provable fact — phone numbers in every format they appear, emails, and (in the next increment) other key facts. Coalesces duplicates across pages into one canonical value per fact. The key property: it only includes a fact type if the completeness of its search can be proven by a script alone, with no human needed to verify it caught everything.

The Genome Assembler (scripts/genome-extract.ts / assembleGenome) — Combines the extractor output and the fact-finder output into the final Site Genome: a schema-validated JSON file containing the practice's design tokens, navigation, all pages and their sections, and a facts directory with one YAML file per fact, each referencing the exact byte anchors where it lives across all pages.

The Site Genome (sites/{siteId}/genome.json + truth-book/) — The output artifact: the single source of truth for the practice. Contains meta (practice name, domain, genome version), design tokens (brand colors, fonts, spacing), navigation structure, all pages, all facts, and the asset manifest. The truth-book subdirectory holds facts as separate YAML files with byte-precise anchors — the structure that makes safe, provable editing possible.

The Genesis Timestamp (scripts/genome/timestamp-genesis.ts) — Issues an RFC-3161 timestamp from an external authority (freeTSA) over the git commit SHA of the initial genome. Turns the capture record from 'we say it looked like this' into 'an independent authority confirms these exact bytes existed at this time, before any edits.' Stored in _genesis/ and verifiable offline.

The Five-Layer Validator (packages/validator/) — After the genome is composed and the new site rendered from it, the validator runs five independent checks: visual pixel-diff (does it look the same?), content (is every word present?), SEO (are all meta tags, canonicals, and structured data preserved?), functional (do all links and forms work?), and performance (Lighthouse scores). A site can only advance past this gate with no FAILs.

Where it fits

→

Fides (the client-relationship agent) — Fides reads from the genome to answer questions about a practice and writes back to it when edits are requested. The genome is the shared truth layer between them — Fides is the agent that tends the genome over time after Neuron's initial capture.designed

→

The Change-Safety Guard (scripts/guard/change-safety-guard.ts) — Before any edit is applied to the genome, the guard re-reads the raw bytes, verifies that only the intended bytes changed, and that the change is byte-faithful and reversible. The genome's fact anchors are the input the guard checks against. This connection is live on the DE site.live

→

The Projector / Renderer (scripts/project-genome.ts + packages/renderer/) — Takes the genome and renders it into a real website (an Astro-based static site deployed to Cloudflare Pages). The genome is the sole input; the same genome always produces the same site. Live for digitalempathyvet.com.live

→

Cloudflare Pages (deployment layer) — The rendered static site is deployed to Cloudflare Pages. The edit-to-live pipeline is proven end-to-end at ~44 seconds: genome fact edit → guard verify → reproject → deploy → live.live

→

GA4 / Google Search Console (measurement layer) — Every rendered site is designed to include standardized GA4 event tracking and Search Console verification, enabling automated performance pulls and correlation of genome changes with traffic outcomes.designed

→

Agent API / answer-layer (future Fides surface) — The genome is designed to serve as the authoritative source for an agent-facing answer endpoint — so when a patient's AI assistant asks 'what are this clinic's hours?', the answer comes from the provenanced genome, not from a web scrape.designed

Its job in the loop: Neuron is DE's primary SENSE organ. Before any agent can answer a question about a practice, edit its website, or push an update to a directory, Neuron has to have read that practice into the system. The Genome it produces is the company's ground truth about what that practice actually is, says, and looks like — captured from the source itself, timestamped, and provable.