Three growth systems we run in production. Architecture, not demo footage.

May 7, 2026
Three growth systems we run in production. Architecture, not demo footage.

Most write-ups about AI in marketing are about the demo. This is about the architecture. Three growth systems we run in production at Arthea, what each one is for, how it is built, and where the operator stays in the loop. We publish a version of this log every week. The schema is fixed. The systems change. This piece is for senior operators evaluating which AI growth systems are worth running in production and which are demo footage in expensive clothing.


Architecture, not demo footage, is the load-bearing distinction. A demo shows the system on its best day. An architecture defines what happens on the median day, the failure day, and the day the operator is on holiday. The three systems below are architectures, not demos. Each one has a named purpose, an explicit human-in-the-loop point, and a failure mode the team has already paid for.



The three growth systems we run in production at Arthea


Each system answers a different question. The Klaviyo retention architecture answers "where does compounded ecommerce revenue come from after the first sale." The Webflow CRO experimentation layer answers "is this page earning its place in the funnel." The AI Lab content drafter answers "how does the team produce platform-shaped drafts at quality without a content desk." Different questions, different stacks, same operating discipline.



The system selection criteria


Three filters before we put a growth system in production. First, it has to be repeatable across brands without bespoke rebuilds. Second, it has to have a load-bearing metric that is observable in the world, not just in the dashboard. Third, it has to have an explicit operator-in-the-loop point. Anything that fails one filter ships as a demo and stays in /ai-lab. Production is a higher bar.



 

System 1. The Klaviyo retention architecture


The 90-day Retention Architecture is our standard build for ecommerce brands past the 30 to 50K EUR per month band on /websites-cro. Six lifecycle flows, every one of them with a named purpose: Welcome, Abandoned Cart and Checkout, Post-Purchase, Replenishment and Win-Back, VIP and Loyalty, and a Seasonal/BFCM lane that flips on for peak windows. Each flow is built once with senior Klaviyo strategists, no junior handoffs, and the segmentation lives behind a "Lifecycle System Architecture" layer rather than ad-hoc Klaviyo conditions.


What is interesting is not any one flow. It is that the entire build runs on a deliverability contract before it runs on a creative contract. Inbox placement is the only metric that compounds; everything else is a vanity decoration on top. Across the brands we run this for, the typical outcome we publish is 25 to 40 percent of total revenue driven by Klaviyo within three months and a first measurable uplift inside ten to fourteen days. We do not publish per-client weekly numbers, and we never will. Those belong on the brand's side of the wall, not ours.


The operator-in-the-loop point is the deliverability review, not the creative review. Senior strategists scan inbox-placement signals weekly and intervene when placement drifts. Creative is downstream of placement; great copy in the spam folder is worth nothing.



 

System 2. The Webflow CRO experimentation layer


A Webflow site that scores 100/100 Lighthouse is the table-stakes deliverable. The interesting work starts after that. The Architecture Build phase ships the page; the Experimentation Layer is what decides whether it earns its place in the funnel. We design the test, instrument it, run heatmaps and analytics, and only roll the variant forward when the evidence is unambiguous. The 5-phase structure is published on /websites-cro: Audit and Frame, Design System Setup, Build and Tracking, Experimentation Layer, Manage and Improve.


The reason the work compounds is that every test goes through the same shape. Same hypothesis form. Same instrumentation. Same readout cadence. Architectures, not templates. A "won" test that you cannot reproduce at the next page is not a win, it is a lottery ticket.


The published outcome band on the architecture build is a 20 to 40 percent CRO uplift, plus an aggregate +5M EUR across the brands we run it for. The numbers hold because the architecture is reproducible; the failure mode for most CRO programs is that wins are local to the team or the page or the season, and they do not survive the next change. The 5-phase structure is what makes them survive.



 

System 3. The AI Lab content drafter


A specialist agent built on n8n + Claude that reads a fixed 6-field brief and produces a draft for one platform at a time. Nothing exotic. The leverage is the brief schema, which is the same shape every week, and the voice contract that runs as a static scanner against every output before any human reviews it. Briefs without sharp inputs produce shallow drafts regardless of model size; that is an input problem dressed as a model problem.


When the agent hits an ambiguous edit, it pauses and asks the operator a one-line question with a link to the draft. The pause is a feature. Confident guessing on edge cases is what erodes brand voice over months, not bad sentences in any single post.


The published band on AI Lab work is a 20 percent retention lift and 60 percent operator time saved. The retention lift comes from the consistency of voice (drafts that read coherent across weeks build audience). The time saved comes from the brief schema removing the daily creative-direction tax. Both numbers are downstream of the brief, not the model.



 

Runbook: how an operator should evaluate a production AI growth system


1. Identify the load-bearing metric. The metric in the world that says this system is doing its job. For Klaviyo, inbox-placed sends. For CRO, conversion lift on the variant page. For the content drafter, drafts approved without rewrite per week. If the metric is not observable, the system is a demo. 2. Find the explicit operator-in-the-loop point. Where does a senior human intervene. If there is no named intervention, the system is auto-piloting brand risk. If there are too many, the system is a glorified macro. 3. Test the failure mode the team has already paid for. Every production system carries the scar of a previous incident. If the team cannot tell you the last failure and the fix, the system is too young to call production. 4. Probe the brief or input contract. Sharp inputs produce sharp outputs. If the brief is "here is some context," the output ceiling is shallow regardless of model size. The brief is the architecture. 5. Confirm the architecture compounds across brands or pages or weeks. A win that does not survive the next deploy or the next brand is a lottery ticket. The whole point of architecture is reproducibility. 6. Audit the heartbeat. Does the system have a panel that shows green-amber-red against the load-bearing metric. If not, silent failure is the dominant risk. 7. Read one weekly log. The discipline of writing it surfaces what the system actually did versus what it claimed. The log is the receipt. 8. Make the call. If the system passes all seven, it is production. If it fails any, it is /ai-lab work and belongs on a different track until the gap closes.



 

When this is wrong: trade-offs and limits of the architecture frame


The architecture frame is overkill for one-off campaigns. A single-month brand activation does not need a 90-day Retention Architecture. It needs a campaign brief and a tight feedback loop. Forcing architecture-grade rigor on a campaign-grade engagement burns time the team did not have. The frame applies where the work is meant to compound, not where it is meant to ship and fade.


It is also wrong to apply the architecture frame to demos. /ai-lab work is supposed to be exploratory. If every prototype has to clear the production bar before it gets built, no new systems ever ship. The hedge is to keep /ai-lab and production on different tracks with different bars. Promote when the system passes the seven-step audit. Not before.


The deeper trade-off is that architecture takes longer to ship than templates. A team racing a quarter on velocity will out-ship an architecture team for one or two cycles. By cycle four the velocity team is rebuilding from scratch and the architecture team is compounding. Pick the time horizon that matches the engagement.



 

What success looks like


A description of what the systems did this week, written at the level of detail you would publish in front of a client without trimming first. The shape repeats. The systems compound. The number that matters at the end of any week is not how much shipped. It is whether the architecture is more legible than it was on Monday.


Across the three systems, the published outcome bands hold because the architectures are reproducible. The Klaviyo retention architecture lands the 25 to 40 percent of revenue band within three months for ecommerce brands past the 30 to 50K EUR per month threshold. The CRO architecture lands the 20 to 40 percent uplift on the variant pages, with the aggregate +5M EUR across the brands we run it for. The AI Lab content drafter lands the 20 percent retention lift and 60 percent operator time saved on weekly content. The numbers are bands, not promises. The architecture is what makes them durable.



 

FAQ


Why call them architectures and not playbooks? A playbook is a sequence of steps. An architecture is a set of named components with explicit contracts between them. The difference shows up the third time you run the system. Playbooks decay. Architectures compound.


How do you know a system is ready for production? It clears the seven-step audit in the runbook. Load-bearing metric is observable. Operator-in-the-loop point is named. Failure mode has been paid for. Brief is sharp. Architecture compounds. Heartbeat panel exists. Weekly log shows real work. Anything less is /ai-lab.


Does the operator-in-the-loop point slow the system down? Yes, deliberately. The pause is a feature. Confident guessing on edge cases is what erodes brand voice and breaks deliverability over months. The operator point is the cheapest insurance in the architecture.


Why is inbox placement the load-bearing metric for the Klaviyo build? Because it is the only metric that compounds. Sends-attempted, opens, clicks, and even revenue can all look fine while placement quietly shifts to the spam folder; the rest then degrades over weeks. Treating placement as the heartbeat keeps the retention build durable.


Can you share per-client numbers? No. We publish bands across the engagements (25 to 40 percent of revenue from Klaviyo, 20 to 40 percent CRO uplift, +5M EUR aggregate) and we never publish per-client weekly numbers. Those belong on the brand's side of the wall, not ours.



 

Read more


- https://www.arthea.ai/article/agents-going-quiet-isnt-resting - https://www.arthea.ai/article/this-week-in-arthea - https://www.arthea.ai/email-and-sms


If you want a 30-minute review of which of your growth systems are production-grade and which are still demos, the calendar is here: arthea.ai/book.

 

Related

An attribution model for content that compounds over months
SMS and email orchestration without cannibalizing either channel
The lead-routing workflow that cut our response time to 90 seconds