The GPU Thermal Guardian: Monitoring Dual Tesla P40s with Circuit Breakers and Auto-Throttle

Maintaining stable operation of the ARKONA ecosystem – particularly the compute-intensive workloads on our dual Tesla P40 GPUs – has been a significant engi

April 06, 2026

The case for local inference: when Ollama on a Tesla P40 beats cloud API calls

For the past year, building ARKONA – an autonomous multi-agent AI ecosystem – has forced some incredibly practical decisions about where and how we

April 06, 2026

Building COMET: A 7-Step Framework for Human-AI Task Ownership

At ARKONA, we’ve spent the last year building an autonomous multi-agent AI ecosystem. It’s currently managing 47 services across 23 Tailscale HTTPS ports, f

April 06, 2026

Why Every AI System Needs a Risk Engine: Implementing NIST 800-30 Semi-Quantitative Scoring for Autonomous Agents

Building ARKONA – my autonomous multi-agent ecosystem – has fundamentally shifted my thinking on AI system design. We’ve moved past simply *making* things

April 06, 2026

Multi-agent communication patterns: pub/sub, task delegation, and the broker architecture behind 26 coordinated agents

At ARKONA, we've built an autonomous multi-agent AI ecosystem currently operating with 26 agents, covering domains from cyber-physical reverse engineering (Core

April 06, 2026

Optimizing LLM Costs with MuXD: A Hybrid Router for ARKONA

At ARKONA, we're building an autonomous multi-agent AI ecosystem, currently comprising 47 services running across 23 ports on Tailscale HTTPS. This generates a

April 06, 2026

Why I Built My Own MCP Servers Instead of Using Off-the-Shelf Tools

The ARKONA ecosystem, a multi-agent AI system for cyber-physical reverse engineering and beyond, currently comprises 47 services spanning 23 Tailscale HTTPS por

April 05, 2026

Real-time Ecosystem Metrics: How 47 Services Report Health Through a Unified Dashboard

Maintaining situational awareness across ARKONA – my autonomous multi-agent AI ecosystem – demands more than just knowing its components are *running*. It r

April 05, 2026

Building an AI Governance Taxonomy: Mapping 816 Task Definitions to Human-AI Delegation Levels

The core challenge in scaling an autonomous multi-agent system like ARKONA isn’t just building the agents themselves, but defining the boundaries of their aut

April 05, 2026

Agent Memory and Context Management: Preventing Degradation in Long-Running Autonomous Systems

Maintaining consistent performance in autonomous agent systems over extended periods is a significant challenge. It's not simply about building agents that *can

April 05, 2026

From NIST Frameworks to Running Code: How Standards Compliance Becomes Executable Policy

For the past quarter century, I’ve been building and operating cyber-physical systems for defense. Over that time, I’ve seen a *lot* of compliance documenta

April 05, 2026

Multi-domain Dashboard Architecture: One React Template, Six Domains, Consistent UX

Building ARKONA, my autonomous multi-agent AI ecosystem, presented a unique challenge: managing six distinct operational domains – CoreOps, BizOps, COMET, Dev

April 05, 2026

Autonomous Code Review: How Agents Scan for TODO Drift, Test Failures, and Security Issues Every Hour

Maintaining code quality at scale, especially within a rapidly evolving ecosystem like ARKONA, demands automation beyond traditional CI/CD. We’ve moved past s

April 05, 2026

Tailscale as the Networking Layer: Zero-Trust HTTPS Across 23 Ports Without a Reverse Proxy

ARKONA, my autonomous multi-agent AI ecosystem, currently operates 21 of 23 services across 23 distinct ports. Initially, I anticipated a significant networkin

April 05, 2026

Why Agentic AI Needs Systems Engineering Discipline, Not Just Prompt Engineering

I’ve spent the last 25 years building and securing complex cyber-physical systems for the government. Now, with ARKONA, my focus has shifted to building an au

April 05, 2026

Building a Security Center: Exposure Control, Data Silos, and Invite-Code Auth for a Personal Ecosystem

When your personal project crosses 47 services on 23 ports, "security" stops being a feature you bolt on and becomes a structural problem you solve at the archi

April 05, 2026

Inter-Agent Task Delegation: How Our Broker Enables Agents to Assign Work to Specialists

When you're running 26 autonomous agents across six operational domains, the question of who does what — and how agents discover each other — becomes a real

April 05, 2026

Performance Engineering at Scale: Replacing 47 Sequential lsof Calls with a Single ss Snapshot

My ecosystem health check script was taking 14 seconds to run. That's 14 seconds of wall-clock time every time I wanted to know whether all 47 services were up

April 05, 2026

Agent Battle Rhythm: Scheduling 26 Agents Across Research, Monitoring, Publishing, and Maintenance Windows

Running 26 autonomous agents on a single ecosystem without them stepping on each other is a scheduling problem — and like most scheduling problems, the naive

April 05, 2026

WebAuthn in Production: Adding Face ID Biometric Auth to a Multi-Service Ecosystem Without a Single Password

I run 47 services across 23 ports. Every one of them sits behind Tailscale, every one of them speaks HTTPS, and until recently, every one of them that required

April 05, 2026

The 5-Agent Editorial Pipeline: How Autonomous AI Produces Fact-Checked, Publication-Quality Technical Articles

Publishing technical content at scale without sacrificing accuracy is a hard problem. I built a 5-agent newsroom inside ARKONA that now produces, edits, fact-ch

April 05, 2026

From Monolith to 6 Domains: Decomposing a Cyber-Physical RE Platform into Independently Deployable Services

Eighteen months ago, ARKONA was a single Python script that scraped firmware headers and dumped them into a SQLite file. Today it is 47 services across 23 ports

April 05, 2026

Why Agentic AI Needs Systems Engineering Discipline

April 2026

The gap between an AI demo and a production agent system is enormous. When you chain LLM calls across long-horizon tasks, failure modes compound in ways that prompt engineering alone can't solve. What's missing is the discipline that systems engineers have applied to safety-critical infrastructure for decades.

Read more →

Designing Agent Harnesses for Long-Horizon Tasks

April 2026

Most agent demos show short-horizon tasks that complete in seconds. Real production agents need to maintain coherence across hours, days, or longer. This is a fundamentally different design problem that demands careful attention to memory architecture, context compression, and communication protocols.

Read more →

The Case for Hybrid LLM Routing in Production

April 2026

Not every task needs the most capable model. In production multi-agent systems, you're making hundreds of LLM calls per workflow. Cost and latency add up fast. A hybrid routing approach that dynamically matches model capability to task requirements is essential for sustainable production AI.

Read more →