The Loop: How an AI Swarm Surfaced a Governance Limitation, Then Tested the Fix

Author(s): Selfradiance Originally published on Towards AI. AgentGate is a runtime accountability layer for AI agents: before an agent can execute a high-impact action, it must lock a bond as collateral. Good outcomes release the bond. Bad outcomes slash it. The mechanism makes bad behavior economically irrational. In March 2026, a coordinated swarm of nine AI agents ran 97 attacks against AgentGate. One team — Beta — spent 48 clean bond cycles building reputation and earned nothing for it. Bond capacity was mathematically enforced but not reputation-gated: a brand-new identity could lock the same bond-locking capacity as one with a spotless track record. The original swarm campaign classified this as a governance limitation, not a vulnerability. AgentGate’s core defenses held. Gamma maintained a 100% catch rate across all 38 of its attacks. The campaign produced zero swarm-emergent findings. We responded by implementing the Progressive Trust Model — reputation-gated bond capacity enforced at bond-lock time. Then we re-ran the swarm campaigns against the hardened system. The swarm verified tier promotion, distinct-resolver gating, and malicious demotion. The static Tier 2 bond cap (500¢ accepted, 501¢ rejected with TIER_BOND_CAP_EXCEEDED) was confirmed separately through a targeted direct test, because the Claude-powered strategist never selected the high-value bond probe across any of the three runs. This article shows what that loop looked like end to end — and where the verification is incomplete. Act 1 — The Limitation The canonical Agent 004 v0.5.0 swarm campaign ran 9 agents across 3 teams for 5 rounds: Alpha (reconnaissance, 37 attacks), Beta (trust exploitation, 22 attacks), and Gamma (coordinated economic pressure, 38 attacks). Of those 97 attacks, 77 were caught and 20 were uncaught. But the uncaught attacks didn’t map to the trust-model issue. They fell into categories AgentGate was already aware of — for example, unauthenticated GET endpoints that return system information by design, and recon-category boundary probes targeting surfaces outside the bond-enforcement layer. None of the 20 represented newly discovered vulnerabilities. Gamma’s economic pressure attacks — the category most relevant to bond integrity — were caught at a 100% rate (38 of 38). Zero of the 20 uncaught attacks were classified as swarm-emergent. The real finding was structural: Beta’s trust-building phase completed 48 clean bond cycles, and the system treated that identity identically to one created five seconds ago. No elevated bond-locking capacity was earned. The system could not distinguish newly created identities from proven ones when deciding how much bond capacity to grant — it enforced accountability but couldn’t differentiate at the point of economic delegation. The canonical run documented this as an enhancement opportunity, not a vulnerability. Act 2 — The Fix AgentGate now computes reputation tiers at bond-lock time via computeTrustTier(), with no stored tier state: Tier 1 “New” (default): 100¢ bond cap.Tier 2 “Established”: 500¢ bond cap. Requires 5 or more qualifying successes from 2 or more distinct resolvers, with zero malicious resolutions.Tier 3 “Trusted”: No tier cap (normal capacity rules apply). Requires 20 or more qualifying successes from 20 or more distinct resolvers, with zero malicious resolutions. A single malicious resolution forces immediate demotion to Tier 1. Then the Codex cold-eyes audit caught a cheap-farming loophole before it shipped. The initial implementation let an agent rack up qualifying successes with trivial-exposure actions. Three hardening measures were added: Minimum exposure threshold. Only actions with 100¢ or more of effective exposure (after the 1.2× risk multiplier) count as qualifying successes. Under the Tier 1 cap of 100¢, the tightest valid path is declaring 83¢ exposure — ceil(83 × 1.2) = 100¢ effective, exactly meeting the threshold. This means cheap farming is closed, but Tier 1 agents still have a narrow valid path to earn qualifying successes and advance — a deliberate design tradeoff. Distinct resolver requirement. Each qualifying success must come from a different resolver identity. This raises the bar against trivial self-dealing and low-identity collusion. It does not prevent collusion in any strong sense — coordinated identities controlling enough resolvers could still game the system — but it makes the cheapest farming paths unviable. Self-resolution forbidden. An identity cannot resolve its own actions (403 SELF_RESOLUTION_FORBIDDEN). A methodological caveat about the distinct-resolver threshold. The Tier 2 threshold was originally set at 5 distinct resolvers. During re-testing, this was lowered to 3, then to 2. The reason: a 3-agent team can only produce 2 distinct resolvers per executor (the other two teammates). With the original threshold of 5, Tier 2 was unreachable regardless of how many qualifying successes an agent accumulated. The threshold was reduced to match the maximum resolver diversity the test environment’s team topology can produce. The mechanism itself — requiring multiple independent resolvers — is the defense against collusion. The threshold number scales with deployment size. A production system with more participants would use a higher threshold. AgentGate’s tier system passed 121 tests across 12 suites. Act 3 — The Re-Test I reran three full swarm campaigns against the hardened system. The goal was to verify three specific enforcement mechanisms. Swarm-Verified: Tier Promotion, Gating, and Demotion Beta-2 reached Tier 2 in runs 2 and 3, accumulating 8 qualifying successes from 2 distinct resolvers with zero malicious resolutions. Promotion through the distinct-resolver gate confirmed working. Beta-1 accumulated 8 qualifying successes but was blocked at Tier 1 — only 1 distinct resolver. The gating rule held. Same number of successes, different outcome, because resolver diversity was insufficient. Beta-3 was immediately demoted to Tier 1 after a malicious resolution. Demotion confirmed working. Gamma maintained a 100% catch rate across all three re-test runs. Its economic-pressure attacks were caught in every run regardless of tier state. Directly Verified: Tier 2 Bond Cap The Claude-powered strategist never selected the highValueBondAttempt task across any of the three swarm runs. This is worth flagging because it’s evidence that LLM-driven campaign selection can miss relevant tests even in repeated runs — the strategist had no strategic reason to attempt a high-value bond before confirming Tier 2 promotion, and only one agent reached Tier 2. The Tier 2 bond […]

Liked Liked