The 5-Agent Editorial Pipeline: How Autonomous AI Produces Fact-Checked, Publication-Quality Technical Articles

```html

April 5, 2026 · 7 min read

The 5-Agent Editorial Pipeline: How Autonomous AI Produces Fact-Checked, Publication-Quality Technical Articles

Publishing technical content at scale without sacrificing accuracy is a hard problem. I built a 5-agent newsroom inside ARKONA that now produces, edits, fact-checks, and publishes articles autonomously — running on a battle rhythm with no human in the loop unless a confidence threshold trips. Here's how it works, why it's architected the way it is, and what I learned building it.

The Problem with Single-Agent Content Generation

The naive approach is one LLM call: give the model a topic, get an article, publish it. This works until it doesn't — hallucinated citations, stale data, inconsistent tone, and no audit trail. For a portfolio blog tied to a research identity and an Anthropic job application, "probably correct" isn't good enough.

The real problem is that content generation, fact verification, editorial judgment, and publication are genuinely different cognitive tasks. Collapsing them into a single prompt produces lowest-common-denominator output. Separating them into specialized agents — each with its own system prompt, tool access, and confidence model — produces something qualitatively better.

Pipeline Architecture: Five Agents, One Workflow

The editorial pipeline runs as a coordinated workflow brokered through ARKONA's inter-agent communication layer on port 8500 (the MCP pub/sub broker). Each agent is a discrete Claude API call with a specialized role, and the pipeline is triggered nightly as part of the DevOps domain's battle rhythm scheduler.

The five agents are:

1. SCOUT — Research and source aggregation. SCOUT receives a topic brief and performs structured web research, pulling from RSS feeds, arXiv preprints, CVE feeds, and NIST NVD depending on the article domain. It outputs a structured JSON research bundle with source URLs, key claims, and confidence scores per claim.

2. WRITER — First draft generation. WRITER consumes SCOUT's research bundle and produces a first-draft article. Its system prompt enforces house style: first-person technical voice, no fluffy introductions, specific numbers over vague claims. It embeds claim IDs from SCOUT's bundle as HTML comments so downstream agents can trace every assertion back to its source.

3. VERITAS — Fact-checking and source validation. VERITAS is the most expensive agent in the pipeline and deliberately runs on Claude rather than a local Ollama model. It receives the draft plus SCOUT's source bundle and cross-checks every embedded claim ID against its cited source. Claims that fail verification are flagged with a severity tag — DISPUTED, UNVERIFIABLE, or STALE.

4. EDITOR — Revision and coherence. EDITOR receives the VERITAS-annotated draft and does two things: removes or rewrites flagged claims, then runs a coherence pass to ensure the article still flows after surgical removals. It also enforces length targets and checks that code snippets compile or at minimum parse correctly.

5. HERALD — Publication and provenance signing. HERALD finalizes the article, generates the HTML output, computes a SHA-256 hash of the canonical content, and writes a signed provenance record to .provenance.json before committing to the blog repository. This gives every published article a verifiable content fingerprint.

The MuXD Routing Decision

One of the more interesting engineering decisions was which agents run locally versus cloud. ARKONA's MuXD router (port 8080) handles this transparently, but the routing policy matters for both cost and quality.

{
  "pipeline": "editorial",
  "routing_policy": {
    "SCOUT":   { "model": "ollama/mistral-nemo", "reason": "structured extraction, local data" },
    "WRITER":  { "model": "claude-sonnet-4-6",   "reason": "voice consistency, long-form generation" },
    "VERITAS": { "model": "claude-sonnet-4-6",   "reason": "reasoning depth, citation cross-check" },
    "EDITOR":  { "model": "ollama/llama3.1:70b", "reason": "deterministic revision, cost control" },
    "HERALD":  { "model": "ollama/mistral-nemo", "reason": "templating, no generation needed" }
  },
  "fallback": "claude-haiku-4-5-20251001",
  "token_budget_per_run": 18000
}

SCOUT and HERALD are structurally simple tasks — extraction and templating — that run cleanly on local Mistral Nemo. WRITER and VERITAS need Claude's reasoning depth and stylistic consistency. EDITOR sits in the middle: I originally ran it on Claude but found that the revision task (remove flagged claims, restore flow) was mechanical enough that llama3.1:70b on my P40s handled it well, saving roughly 4,000 tokens per pipeline run. MuXD tracks token consumption per agent and surfaces this in the dashboard — a concrete feedback loop for tuning the routing policy.

Confidence Thresholds and Human Escalation

The pipeline is autonomous but not blind. Each agent emits a confidence score alongside its output, and the broker evaluates a gate condition before passing work downstream. If VERITAS flags more than 30% of claims as DISPUTED or UNVERIFIABLE, the pipeline halts and publishes a pending notification to the ARKONA dashboard rather than a broken article.

This design pattern — autonomous by default, human-escalated by exception — is directly informed by COMET, ARKONA's AI governance framework. COMET's delegation model maps tasks to a seven-step scale from full human control to full agent autonomy. Content publication sits at step 5: agents execute, humans are notified, humans can intervene within a window before the commit is pushed. This satisfies IEEE 7000-2021's transparency requirements for automated systems acting on behalf of a person's public identity.

Provenance as a First-Class Requirement

Every article published by HERALD includes a provenance record that I can reproduce or audit later. The record is written to .provenance.json at the repo root and contains the pipeline run ID, agent versions, model identifiers, source URLs with retrieval timestamps, and the SHA-256 hash of the final HTML before any rendering transforms.

This isn't paranoia — it's accountability infrastructure. If a source article is retracted six months after publication, I can trace exactly which claims in my articles derived from that source. If MuXD's routing policy changes and a different model handles WRITER in a future run, the provenance record makes that change auditable. For a research-adjacent technical blog, this is the minimum bar I'd hold any published system to.

What Actually Breaks in Practice

The failure mode I didn't anticipate was temporal drift in SCOUT's sources. RSS feeds for CVE and NIST NVD are reliable, but general technical feeds sometimes serve stale cached content with fresh timestamps. VERITAS catches this when a source URL returns a 404 or a page hash mismatch from a cached version — but it took three pipeline runs producing articles with outdated statistics before I added the retrieval-timestamp validation step. Now SCOUT flags any source older than 14 days for human review before WRITER sees it.

The other real failure is style inconsistency when EDITOR makes large structural removals. If VERITAS flags a significant chunk of WRITER's output and EDITOR removes it wholesale, the resulting article can feel disjointed in ways that are hard to detect programmatically. I addressed this by giving EDITOR explicit instructions to request a WRITER revision pass when removal exceeds 25% of the original content — effectively a pipeline loop that runs WRITER again with a revised research bundle before EDITOR finalizes.

Key Takeaway

The lesson I'd carry into any agent system design: specialization beats generalization at the cost of orchestration complexity, and that tradeoff is almost always worth it for quality-sensitive tasks. A five-agent pipeline with well-defined interfaces and clear confidence gates produces consistently better output than any single-agent approach I tried — and it produces an audit trail that lets me understand exactly why any given article looks the way it does. The hard part isn't the LLM calls. It's the state management, the failure handling, and the discipline to not collapse roles back together when the complexity gets uncomfortable. Resist that collapse. The interfaces are doing real work.

```

Blog