The GPU Thermal Guardian: Monitoring Dual Tesla P40s with Circuit Breakers and Auto-Throttle
Maintaining stable operation of the ARKONA ecosystem – particularly the compute-intensive workloads on our dual Tesla P40 GPUs – has been a significant engi
April 06, 2026
The case for local inference: when Ollama on a Tesla P40 beats cloud API calls
For the past year, building ARKONA – an autonomous multi-agent AI ecosystem – has forced some incredibly practical decisions about where and how we
April 06, 2026
Building COMET: A 7-Step Framework for Human-AI Task Ownership
At ARKONA, we’ve spent the last year building an autonomous multi-agent AI ecosystem. It’s currently managing 47 services across 23 Tailscale HTTPS ports, f
April 06, 2026
Why Every AI System Needs a Risk Engine: Implementing NIST 800-30 Semi-Quantitative Scoring for Autonomous Agents
Building ARKONA – my autonomous multi-agent ecosystem – has fundamentally shifted my thinking on AI system design. We’ve moved past simply *making* things
April 06, 2026
Multi-agent communication patterns: pub/sub, task delegation, and the broker architecture behind 26 coordinated agents
At ARKONA, we've built an autonomous multi-agent AI ecosystem currently operating with 26 agents, covering domains from cyber-physical reverse engineering (Core
April 06, 2026
Optimizing LLM Costs with MuXD: A Hybrid Router for ARKONA
At ARKONA, we're building an autonomous multi-agent AI ecosystem, currently comprising 47 services running across 23 ports on Tailscale HTTPS. This generates a
April 06, 2026
Why I Built My Own MCP Servers Instead of Using Off-the-Shelf Tools
The ARKONA ecosystem, a multi-agent AI system for cyber-physical reverse engineering and beyond, currently comprises 47 services spanning 23 Tailscale HTTPS por
April 05, 2026
Real-time Ecosystem Metrics: How 47 Services Report Health Through a Unified Dashboard
Maintaining situational awareness across ARKONA – my autonomous multi-agent AI ecosystem – demands more than just knowing its components are *running*. It r
April 05, 2026
Building an AI Governance Taxonomy: Mapping 816 Task Definitions to Human-AI Delegation Levels
The core challenge in scaling an autonomous multi-agent system like ARKONA isn’t just building the agents themselves, but defining the boundaries of their aut
April 05, 2026
Agent Memory and Context Management: Preventing Degradation in Long-Running Autonomous Systems
Maintaining consistent performance in autonomous agent systems over extended periods is a significant challenge. It's not simply about building agents that *can
April 05, 2026
From NIST Frameworks to Running Code: How Standards Compliance Becomes Executable Policy
For the past quarter century, I’ve been building and operating cyber-physical systems for defense. Over that time, I’ve seen a *lot* of compliance documenta
April 05, 2026
Multi-domain Dashboard Architecture: One React Template, Six Domains, Consistent UX
Building ARKONA, my autonomous multi-agent AI ecosystem, presented a unique challenge: managing six distinct operational domains – CoreOps, BizOps, COMET, Dev
April 05, 2026
Autonomous Code Review: How Agents Scan for TODO Drift, Test Failures, and Security Issues Every Hour
Maintaining code quality at scale, especially within a rapidly evolving ecosystem like ARKONA, demands automation beyond traditional CI/CD. We’ve moved past s
April 05, 2026
Tailscale as the Networking Layer: Zero-Trust HTTPS Across 23 Ports Without a Reverse Proxy
ARKONA, my autonomous multi-agent AI ecosystem, currently operates 21 of 23 services across 23 distinct ports. Initially, I anticipated a significant networkin
April 05, 2026
Why Agentic AI Needs Systems Engineering Discipline, Not Just Prompt Engineering
I’ve spent the last 25 years building and securing complex cyber-physical systems for the government. Now, with ARKONA, my focus has shifted to building an au
April 05, 2026
Building a Security Center: Exposure Control, Data Silos, and Invite-Code Auth for a Personal Ecosystem
When your personal project crosses 47 services on 23 ports, "security" stops being a feature you bolt on and becomes a structural problem you solve at the archi
April 05, 2026
Inter-Agent Task Delegation: How Our Broker Enables Agents to Assign Work to Specialists
When you're running 26 autonomous agents across six operational domains, the question of who does what — and how agents discover each other — becomes a real
April 05, 2026
Performance Engineering at Scale: Replacing 47 Sequential lsof Calls with a Single ss Snapshot
My ecosystem health check script was taking 14 seconds to run. That's 14 seconds of wall-clock time every time I wanted to know whether all 47 services were up
April 05, 2026
Agent Battle Rhythm: Scheduling 26 Agents Across Research, Monitoring, Publishing, and Maintenance Windows
Running 26 autonomous agents on a single ecosystem without them stepping on each other is a scheduling problem — and like most scheduling problems, the naive
April 05, 2026
WebAuthn in Production: Adding Face ID Biometric Auth to a Multi-Service Ecosystem Without a Single Password
I run 47 services across 23 ports. Every one of them sits behind Tailscale, every one of them speaks HTTPS, and until recently, every one of them that required
April 05, 2026
The 5-Agent Editorial Pipeline: How Autonomous AI Produces Fact-Checked, Publication-Quality Technical Articles
Publishing technical content at scale without sacrificing accuracy is a hard problem. I built a 5-agent newsroom inside ARKONA that now produces, edits, fact-ch
April 05, 2026
From Monolith to 6 Domains: Decomposing a Cyber-Physical RE Platform into Independently Deployable Services
Eighteen months ago, ARKONA was a single Python script that scraped firmware headers and dumped them into a SQLite file. Today it is 47 services across 23 ports
April 05, 2026
April 2026
The gap between an AI demo and a production agent system is enormous. When you chain LLM calls across long-horizon tasks, failure modes compound in ways that prompt engineering alone can't solve. What's missing is the discipline that systems engineers have applied to safety-critical infrastructure for decades.
Read more →
April 2026
Most agent demos show short-horizon tasks that complete in seconds. Real production agents need to maintain coherence across hours, days, or longer. This is a fundamentally different design problem that demands careful attention to memory architecture, context compression, and communication protocols.
Read more →
April 2026
Not every task needs the most capable model. In production multi-agent systems, you're making hundreds of LLM calls per workflow. Cost and latency add up fast. A hybrid routing approach that dynamically matches model capability to task requirements is essential for sustainable production AI.
Read more →