```html

Inter-Agent Task Delegation: How Our Broker Enables Agents to Assign Work to Specialists

When you're running 26 autonomous agents across six operational domains, the question of who does what — and how agents discover each other — becomes a real engineering problem, not a theoretical one. Early in ARKONA's development, each agent was essentially an island: it received tasks from humans, executed them, and reported back. That worked fine at three agents. It breaks completely at twenty-six.

The fix wasn't adding more humans to route work. It was building an inter-agent communication broker that lets agents delegate to specialists the same way a senior engineer routes tickets to the right team. This post covers how that broker works, what the message contract looks like, and the failure modes that shaped its current design.

The Problem: Specialists Without a Switchboard

ARKONA's 26 agents are organized around specialization. The Research Agent runs at 0200 daily, scanning arXiv and NIST publications for new developments across the FS-RE meta-model's eight layers. The Editorial Pipeline is a five-agent newsroom — intake, fact-check, line edit, layout, publish — each owning exactly one step. CIPHER handles hardware RE with Ghidra integration. COMET evaluates AI delegation decisions against its seven-step framework with IEEE 7010 and NIST SP 800-30 grounding.

Without a broker, two bad patterns emerge. First, you get hardcoded agent-to-agent calls: the Research Agent directly invokes the Editorial Agent's API endpoint, coupling two services that should be independent. Second, you get task loss at boundaries — the Research Agent finishes a report, but no mechanism exists to hand it to the editorial pipeline, so a human has to intervene. Neither is acceptable in an autonomous system running on a battle rhythm.

Architecture: Pub/Sub Over a Typed Message Bus

The broker runs on port 5199 behind Tailscale HTTPS, exposed as both a REST API and an MCP (Model Context Protocol) server. The MCP surface is critical: it means Claude Code instances, local Ollama models, and any agent with an MCP client can interact with the broker using the same tool-call interface, regardless of whether they're running on the dual Tesla P40 hardware or in the cloud.

The core model is publish/subscribe with typed task envelopes. Every agent registers a capability manifest at startup — a JSON document declaring what task types it can consume, its current load, and its expected SLA in seconds. The broker maintains this registry in-memory with Redis-backed persistence for crash recovery.

{
  "agent_id": "editorial-factcheck-v2",
  "domain": "DevOps",
  "capabilities": ["fact_check", "source_verification", "claim_extraction"],
  "max_concurrent": 3,
  "sla_seconds": 120,
  "health_endpoint": "https://arkona-core:5199/agents/editorial-factcheck-v2/health",
  "mcp_tools": ["verify_claim", "fetch_source", "flag_unsupported"]
}

When the Research Agent completes a scan and wants its output reviewed, it doesn't call the Editorial Agent directly. It publishes a TASK_AVAILABLE message to the broker with a task type of fact_check and a payload reference. The broker resolves the best available agent from the registry, considering load and SLA, and routes the task. The Research Agent gets back a task_id and a delegate_to field telling it which agent accepted — useful for logging and audit trails.

POST /broker/delegate
{
  "origin_agent": "research-agent-v3",
  "task_type": "fact_check",
  "priority": "normal",
  "payload": {
    "document_ref": "vault://L6-data/vault/reports/2026-04-05-fsre-scan.md",
    "claim_count": 14,
    "source_urls": ["https://nvd.nist.gov/...", "https://arxiv.org/..."]
  },
  "deadline_seconds": 300,
  "require_provenance": true
}

// Response
{
  "task_id": "task_8f3a2c1d",
  "delegate_to": "editorial-factcheck-v2",
  "estimated_completion": "2026-04-05T02:17:43Z",
  "provenance_hash": "sha256:a3f9..."
}

Provenance Signing at Every Hop

Every task envelope that passes through the broker gets a SHA-256 provenance signature. This isn't bureaucratic overhead — it's what lets COMET evaluate delegation decisions with confidence. When the Fact-Check Agent completes its work and the result propagates back through the pipeline, COMET can reconstruct the full delegation chain: who generated the original claim, who verified it, which model version was used at each step, and what the intermediate outputs were.

This satisfies a core requirement from NIST SP 800-30: risk assessments require documented provenance for information inputs. When ARKONA's COMET engine renders a risk evaluation, every supporting data point carries a chain of custody back to a primary source. The broker is the mechanism that makes that chain possible across agent boundaries.

The provenance hash also acts as a deduplication key. If the Research Agent's network hiccup causes it to re-publish the same task, the broker detects the matching hash and returns the existing task_id rather than spawning a duplicate workflow. This matters at 0200 when the nightly rhythm is running unattended.

Failure Modes and What They Taught Us

Three failures shaped the current design more than any whiteboard session.

The silent drop. Early version of the broker used a simple HTTP callback — the delegating agent fired and forgot. When the target agent was restarting (common during the iterative development phase with 240 commits in seven days), tasks vanished. The fix was mandatory acknowledgment: the broker holds a task in PENDING state until the target agent returns an ACK. If no ACK arrives within the agent's registered SLA, the broker re-routes to the next eligible agent and fires a DELEGATION_FAILED event that gets surfaced in COMET's dashboard.

The cascade amplification. The Editorial Pipeline's five agents form a linear chain. When the Layout Agent was slow, upstream agents — unaware — kept delegating new tasks. By the time Layout recovered, it faced a queue of forty pending items and degraded further. The broker now enforces back-pressure: if an agent's queue depth exceeds its max_concurrent threshold by a configurable multiplier (default 2x), it's temporarily removed from routing until queue depth drops. Delegating agents receive a 503 CAPACITY_CONSTRAINED and can either wait or escalate to a human via COMET's alert channel on port 5180.

The authority ambiguity. COMET's seven-step delegation framework, grounded in IEEE 7010, distinguishes between decisions an agent can make autonomously and those requiring human authorization. Early broker design had no notion of decision authority — any agent could delegate any task to any other agent. We added an authority_level field to the capability manifest (values: autonomous, supervised, human_required) and a matching field in the task envelope. The broker rejects any delegation where the task's required authority level exceeds the target agent's registered authority. This surfaces as a hard error, not a silent failure, forcing the originating agent to escalate properly.

The MCP Surface: Why It Matters for Claude

Exposing the broker as an MCP server was a later addition, but it's become the most useful interface. When I'm working in Claude Code and need to understand what a specific agent is doing, I can call broker.get_agent_status("research-agent-v3") directly as an MCP tool rather than switching context to a dashboard. More importantly, Claude Code itself can delegate tasks through the broker during agentic sessions — spawning a background summarization job to a local Ollama model via MuXD while continuing to work on the primary task. MuXD (port 5195) handles the routing decision between local Ollama inference and Claude cloud calls, optimizing for token cost while respecting latency requirements. The broker doesn't need to know about that split — it routes to an abstract agent ID, and MuXD handles the model selection internally.

Current State: 21 of 22 Services Online

As of today the broker is routing across 21 active services. One service — the KiCad Agent for PCB reverse engineering from VAULT photos — is offline pending a vision model upgrade. The broker's health dashboard surfaces this cleanly: the service shows CAPABILITY_DEGRADED rather than just disappearing, and any task routed to its capabilities gets held rather than silently dropped, waiting for the service to come back online.

Across the week, the broker has handled delegation chains spanning four agents on average, with the longest chain being the full Editorial Pipeline at seven hops when COMET evaluation is included. Latency overhead from broker routing adds roughly 40ms per hop — negligible given that the agents doing the actual work take seconds to minutes.

Key Takeaway

The insight that changed my mental model: inter-agent delegation is a governance problem wearing an engineering costume. The technical implementation — pub/sub, typed envelopes, SLA tracking — is straightforward. What's hard is defining authority boundaries clearly enough that agents can make routing decisions without human intervention in the normal case, while failing loudly and visibly in the abnormal case. NIST and IEEE give you the vocabulary for thinking about those boundaries. The broker just enforces them at runtime.

If you're building multi-agent systems and your agents are still calling each other directly, you're accumulating coordination debt. The broker pattern pays that debt back with compounding returns every time you add a new specialist to the system.

```