```html

Agent Battle Rhythm: Scheduling 26 Agents Across Research, Monitoring, Publishing, and Maintenance Windows

Running 26 autonomous agents on a single ecosystem without them stepping on each other is a scheduling problem — and like most scheduling problems, the naive solution (just run everything at midnight) fails almost immediately. After iterating through three generations of ARKONA's agent orchestration layer, I've landed on what I call a battle rhythm: a deliberate, time-boxed cadence borrowed from military operations planning that assigns each agent class a designated window, priority tier, and resource budget.

This isn't theoretical. As of today, 21 of 22 services are online across 23 Tailscale HTTPS ports, and 239 commits have landed in the last seven days — most of them from agent-driven pipelines running while I slept.

Why "Battle Rhythm"?

The term comes from U.S. military command-and-control doctrine: a structured cycle of meetings, reports, and decision points that keeps a headquarters synchronized without consuming all of its bandwidth. The analogy holds for agent systems. Without structure, agents compete for GPU time, generate conflicting writes to shared state, and produce outputs that nobody — human or machine — is positioned to act on.

ARKONA's 26 agents fall into four functional classes, each with distinct resource profiles and latency tolerances:

The Four Windows

I divide the 24-hour clock into four non-overlapping windows. Each window owns a resource allocation from the dual Tesla P40 GPUs (combined 48GB VRAM), and agents outside their window are not scheduled — they queue.

# /etc/arkona/battle-rhythm.yaml
windows:
  research:
    start: "01:00"
    end:   "04:00"
    gpu_budget_pct: 80
    agents:
      - fs_re_research_agent    # scans state-of-the-art across 8 FS-RE layers
      - idea_generator          # mines ecosystem improvement candidates → FORGE backlog
      - cipher_intel_harvester  # OSINT feeds for hardware RE intelligence
      - vault_indexer           # re-indexes PCB/firmware evidence in L6-data/vault

  monitoring:
    start: "00:00"
    end:   "23:59"   # always-on, preemptible
    gpu_budget_pct: 5
    agents:
      - service_health_monitor  # polls all 23 ports, pages on degradation
      - thermal_guard           # Ollama GPU temp: warn 70C, throttle 75C, kill 80C
      - muxd_token_tracker      # tracks Claude token savings vs. local inference
      - comet_drift_detector    # detects model/policy drift against NIST 800-30 baseline

  editorial:
    start: "04:30"
    end:   "07:00"
    gpu_budget_pct: 60
    agents:
      - research_summarizer     # digests overnight research into structured briefs
      - fact_checker            # cross-references claims against MITRE/NIST/IEEE corpus
      - editorial_writer        # drafts blog posts and executive summaries
      - editorial_editor        # line-edits for tone, accuracy, readability
      - editorial_publisher     # pushes approved content to Wiki.js (port 3000)

  maintenance:
    start: "07:00"
    end:   "09:00"
    gpu_budget_pct: 20
    agents:
      - git_sync_agent          # commits, rebases, and pushes ecosystem-wide
      - provenance_signer       # SHA-256 signs all new artifacts in vault
      - backup_agent            # rsync snapshots to cold storage
      - dependency_auditor      # npm/pip audit across all 47 services

The monitoring window is the exception — it runs continuously but at a hard 5% GPU cap. Those agents use quantized local models (gemma3:4b via Ollama) for their inference and escalate to Claude only when an anomaly requires nuanced interpretation. MuXD, my hybrid LLM router on port 4040, handles that escalation logic automatically.

Resource Contention and the GPU Thermal Guard

The dual P40 setup is powerful but not infinite. When the research window opens at 01:00, the fs_re_research_agent spins up a full context-window sweep across 8 layers of the FS-RE meta-model — that's a 128K-token pass through the qwen2.5:72b model, pulling roughly 35GB VRAM. If the thermal_guard simultaneously detects a temperature spike and tries to run its own inference to classify the anomaly, you have a memory pressure problem.

The solution is a lightweight semaphore service I call the resource broker, running on port 4041. Every agent that requires more than 2GB VRAM acquires a lease before loading its model. The broker enforces window budgets and provides backpressure:

# Resource lease acquisition (Python, simplified)
import httpx, time

def acquire_gpu_lease(agent_id: str, vram_gb: float, timeout: int = 300) -> str:
    resp = httpx.post("http://localhost:4041/lease", json={
        "agent":    agent_id,
        "vram_gb":  vram_gb,
        "window":   current_window(),   # derived from system clock
        "priority": AGENT_PRIORITIES[agent_id],
    }, timeout=timeout)
    resp.raise_for_status()
    return resp.json()["lease_id"]

def release_gpu_lease(lease_id: str):
    httpx.delete(f"http://localhost:4041/lease/{lease_id}")

The broker tracks cumulative VRAM usage per window and rejects low-priority requests when the budget ceiling is hit. Monitoring agents carry priority 1 (never rejected); research agents carry priority 3 (deferrable). This prevents a runaway research job from starving the thermal guard during a crisis.

Sequential Pipeline Dependencies: The Editorial Chain

The 5-agent editorial pipeline is the hardest scheduling problem because it's a directed acyclic graph masquerading as a cron job. The research_summarizer must finish before the fact_checker can start; the fact_checker must clear before the editorial_writer drafts; and the publisher won't touch content that hasn't been editor-reviewed.

I model this with a simple event bus on the inter-agent communication broker (port 4042, pub/sub). Each agent publishes a completed event with a content hash when it finishes, and the downstream agent subscribes and triggers on that event rather than on a fixed clock time. This means the entire editorial chain adapts to how long overnight research actually took — if the research_summarizer finishes 40 minutes late because the FS-RE sweep ran long, the publisher just shifts right accordingly. The window boundary (07:00) acts as a hard deadline: any agent still running at window close is checkpointed and resumed during the next editorial window.

IEEE 7001-2021 (Transparency of Autonomous Systems) informs how I log these handoffs. Every agent-to-agent delegation in COMET — my AI governance dashboard on port 3005 — is recorded with a timestamp, the delegating agent's confidence score, and a human-readable rationale. This audit trail matters when a published article later turns out to contain an error: I can trace exactly which agent made which decision and with what confidence.

NIST 800-30 and Scheduling Risk

NIST 800-30 treats scheduling as an operational risk surface. An agent that runs at a predictable time with predictable resource consumption is a predictable attack vector — and since ARKONA processes sensitive OT/ICS firmware in the CIPHER pipeline, timing predictability matters. I introduce ±15-minute jitter on research and maintenance agent start times, derived from a hardware entropy source on the host. Monitoring agents don't get jitter (latency sensitivity trumps unpredictability), but their check intervals randomize between 45 and 90 seconds.

What Breaks and How I've Fixed It

The most common failure mode isn't resource exhaustion — it's an agent that hangs silently. A research agent waiting on a rate-limited external API will sit indefinitely unless you build timeouts into the lease protocol. Every lease now carries a max_wall_seconds field; the broker forcibly revokes expired leases and logs the kill to COMET. The agent is marked degraded in the service health dashboard and re-queued for the next window.

The second failure mode is cascading delays: one long research run pushes the editorial window past the 0635 executive summary deadline. I handle this with a priority override — if the editorial chain hasn't started by 05:15, the research window is hard-stopped and GPUs are handed to the editorial agents regardless of in-progress work. Research is checkpointed; the summary ships on time.

Key Takeaway

Autonomous agent systems don't fail because the individual agents are poorly built. They fail because the agents were never designed to coexist. Battle rhythm — explicit windows, resource budgets, priority tiers, and event-driven handoffs — is what transforms a collection of autonomous scripts into a coordinated system. The military analogy isn't just flavor; it's a reminder that in any complex operation, synchronization is a first-class engineering requirement, not an afterthought. Build the rhythm before you build the agents, and you'll spend far less time firefighting at 03:00.

```