v2.4.0 — Framework Self-Audit & Integrity Fixes

Agent Triforge

Three AI models forging production-grade code together. Claude Code orchestrates Gemini CLI, Codex CLI, and 19 specialized subagents through file-based protocols, portable skills, and parallel review swarms.

0 Agents
0 Skills
0 Commands
0 Hooks
-- Stars
claude — agent-triforge
agent-triforge main claude v2.4.0
$
>
Phase 0 Gemini analyzing codebase ···
gemini-3.1-pro 1M context window 47 files mapped
Phase 1 Planning with shadow paths ···
shadow-path-tracing 23 paths enumerated 12 error states mapped
Phase 1.1 Ambiguity check — 0 corrections ···
plan-checker agent 0 blockers approved on first pass
Phase 2 Building via wave orchestration ···
wave-orchestrator 4 waves 7 builders in parallel
Phase 3 7 reviewers dispatched in parallel ···
gemini codex security-sentinel performance-oracle simplicity-reviewer convention-enforcer architecture-strategist
Phase 5 Codex TDD — 47 tests, all green ···
codex gpt-5.4 RED → GREEN → REFACTOR 47 / 47 passing
All gates passed. Shipping.
>
Ready 0.0s Press R to replay

A Complete Multi-Agent Development System

Every model does what it does best. Claude builds. Gemini analyzes. Codex tests. Knowledge compounds across sessions.

3

AI Models

Claude + Gemini + Codex

19

Agents

Opus max effort

12

Skills

Portable workflows

7

Reviewers

Parallel review swarm

Three-Model Orchestration

Claude Code as lead agent, Gemini CLI for 1M-token codebase analysis, Codex CLI for sandboxed test execution. Each assigned by a heuristic matrix.

Specialized Agents

Security sentinel, performance oracle, architecture strategist, test-gap analyzer, findings synthesizer — each with restricted tools and focused expertise.

Portable Skills

Model-agnostic workflow modules injected into any agent — TDD, systematic debugging, wave orchestration, shadow-path tracing, knowledge compounding.

Parallel Review Swarm

Up to 7 reviewers analyze code simultaneously — Gemini + Codex + 5 Claude specialist agents — then merged with confidence tiering.

Knowledge Compounding

Every non-trivial solution is documented in ops/solutions/. A learnings-researcher automatically searches these before planning new work.

File-Based Coordination

All agents coordinate through shared markdown in ops/ — AGENTS.md, GOALS.md, TASKS.md, MEMORY.md, CHANGELOG.md, STATE.md. Auditable, git-friendly, no databases.

From Goal to Ship in 7 Phases

Run /ship for fully autonomous execution, or invoke each phase individually with dedicated commands.

Phase
What Happens
Agent(s)
Command
0
Codebase Analysis
Gemini CLI
/plan
1
Planning
Claude + writing-plans skill
/plan
1.1
Ambiguity Resolution
Claude (user confirmation)
/plan, /ship
1.5
Plan Validation
plan-checker agent
/plan
2
Build (Wave Orchestration)
Claude subagents / team-lead
/build
3-4
Parallel Review + Synthesis
Gemini + Codex + 5 agents
/review
5
TDD Testing
Codex CLI + test-gap-analyzer
/test
6
Ship (Knowledge Compound)
Claude + knowledge-compounding
/wrap

19 Specialized Agents

Each runs in its own context window with restricted tools and focused expertise.

Core Workflow

plan-checker

Validates task plans for completeness, assignments, dependencies

findings-synthesizer

Merges review outputs with deduplication and confidence tiering

integration-verifier

Runs build, tests, lint between waves

learnings-researcher

Searches solutions and decisions for relevant patterns

team-lead

Orchestrates agent team workers with file ownership

research-synthesizer

Merges parallel research outputs into unified analysis

continuous-reviewer

Per-task quality gate during team builds

Review Specialists

security-sentinel

SQL injection, XSS, auth bypass, OWASP Top 10

performance-oracle

O(n²) loops, N+1 queries, memory leaks

code-simplicity-reviewer

Over-engineering, YAGNI violations, unnecessary abstraction

convention-enforcer

Naming, file organization, code style consistency

architecture-strategist

SOLID principles, coupling/cohesion, module boundaries

test-gap-analyzer

Untested code paths, missing edge cases, weak assertions

Research & Verification

best-practices-researcher

Industry patterns, anti-patterns, tradeoff analysis

framework-docs-researcher

Current docs for specific frameworks and libraries

git-history-analyzer

Code evolution and architectural decisions via git

bug-reproduction-validator

Validates bugs are reproducible before fixes begin

deployment-verifier

Post-deployment health checks and smoke tests

pr-comment-resolver

Reads GitHub PR comments and implements changes

12 Model-Agnostic Workflow Modules

Skills are markdown files that any agent can consume. Inject into Gemini or Codex via $(cat skills/SKILL.md).

SkillConsumerWhat It Teaches
codebase-mappingGemini (Phase 0)Full-repo analysis: structure, data flow, patterns, debt
writing-plansClaude (Phase 1)Task decomposition with shadow paths, error maps, interface context
shadow-path-tracingClaude (Phase 1)Enumerate every failure path alongside the happy path
wave-orchestrationClaude (Phase 2)Dependency-grouped parallel execution with integration checks
test-driven-developmentCodex (Phase 5)RED-GREEN-REFACTOR: no production code without failing test
systematic-debuggingCodex, ClaudeError taxonomy, assumption tracking, bisection, circuit breaker
iterative-refinementClaude (Phase 4)Review-fix-review loops with convergence modes (max 3 cycles)
review-synthesisClaude (Phase 4)Merge multi-reviewer findings with confidence tiering
verification-before-completionAll agentsEvidence-based completion checklist — no "done" without proof
knowledge-compoundingClaude (Phase 6)Document solutions to ops/solutions/ for future sprints
session-continuityClaudeSave and resume via STATE.md across sessions
scope-cuttingClaudeSystematically cut scope by unblocking value and risk

7 Reviewers in Parallel

Two external CLIs and five Claude specialist agents analyze the same code simultaneously through different lenses, then a findings-synthesizer merges everything.

Confidence Tiering

HIGH — verified in codebase. MEDIUM — pattern match. LOW — heuristic only, can never be P1. Prevents wasting time on phantom issues.

Suppressions

Each reviewer has a "Do Not Flag" list — readability-aiding redundancy, documented thresholds, sufficient assertions, consistency-only style changes.

Iterative Convergence

Max 3 review cycles. P1 fixed immediately, P2 this cycle, P3 logged for later. Escalate to user if not converged after 3 rounds.

Six Non-Negotiable Checkpoints

Enforced at every stage of the pipeline. No shortcuts.

1

Plan validated before build

No build without a validated plan plan-checker — max 3 iterations

2

Failing test before implementation

No production code without a failing test test-driven-development

3

Root cause before fixes

No fix without diagnosis systematic-debugging

4

Evidence before completion

No "done" without proof verification-before-completion

5

Code review before shipping

No merge without review review-synthesis — max 3 cycles

6

Circuit breaker on debugging

3-attempt ceiling per issue, then escalation report systematic-debugging

Seven Defense Layers

Long sprints don't die to context limits or silent failures.

Inner Loop

ship-loop.sh Stop hook blocks exit with JSON re-feed, session-isolated, transcript-based promise detection (max 5x)

Outer Loop

coordinate.sh spawns fresh sessions with clean context, notifies on completion via webhook or OS notification

PreCompact Hook

Auto-checkpoints STATE.md before context compaction — prevents state loss during mid-sprint compaction

Analysis Paralysis Detection

context-monitor.sh warns at 8+ consecutive reads without writes — breaks the reading loop

Tool Failure Monitor

Tracks and warns on accumulated tool failures — 5 consecutive or 10 total triggers alert

Subprocess Timeouts

Watchdog pattern on all Gemini/Codex calls — SIGTERM after timeout, SIGKILL after 5s grace

Risk Scoring

Per-subagent risk accumulation — halt at >20% risk or 50+ file changes

Install in One Command

Hooks, agents, skills, and commands register automatically. No manual configuration needed.

Step 1 — Install the plugin
bash
claude plugin add https://github.com/Ninety2UA/agent-triforge
Step 2 — Start a sprint
claude
/ship add user authentication with JWT
Prerequisites — Three CLIs + Python 3
bash
# Claude Code (you're probably already here) claude --version # Gemini CLI gemini -p "Respond with only: READY" # Codex CLI codex exec "Respond with only: READY" # Python 3 (used by hook handlers for JSON parsing) python3 --version
Autonomous mode — with context recovery
bash
./scripts/coordinate.sh "add user auth" --max 5 --team

16 Slash Commands

CommandWhat It Does
/ship <goal> fullFully autonomous end-to-end sprint with inner loop guard
/coordinate <goal> fullFull lifecycle with context-exhaustion recovery
/plan <goal> phaseAnalyze codebase, plan with shadow paths, validate
/build phaseWave orchestration build. --team for agent team mode
/review phaseParallel review + synthesis. --full for all 7 reviewers
/test phaseGap analysis + Codex TDD. --gaps-only to just identify gaps
/wrap phaseCompound knowledge, archive reviews, write STATE.md, git trailers
/quick <change> utilChanges touching < 3 files. Skips heavy machinery
/debug <bug> utilStructured debugging: reproduce, diagnose, fix
/deep-research <topic> util5 parallel research agents + research-synthesizer
/status utilSprint overview: phase, tasks, blockers
/pause utilQuick checkpoint to STATE.md
/resume utilContinue from STATE.md checkpoint
/compound utilDocument a solved problem or decision
/analyze <url> utilDeep compatibility analysis of an external repo
/resolve-pr <#> utilRead GitHub PR comments and implement changes

Frequently Asked Questions

Yes. The framework installs as a Claude Code plugin — it's additive and doesn't modify your existing code. On first session, it bootstraps an ops/ directory with AGENTS.md, GOALS.md, MEMORY.md, and CHANGELOG.md templates.
No. The framework degrades gracefully. Without Gemini, Phase 0 is skipped. Without Codex, testing is handled by Claude. You lose the multi-model benefits but everything still works.
Skills are instructions that guide behavior — methodology documents. Agents are separate subprocesses dispatched via the Agent tool, each with their own context window. Skills can be injected into any agent (including external ones like Gemini and Codex).
Agent Teams spawn multiple Claude Code instances that collaborate through a shared task list and messaging. Unlike review swarms (read-only analysis), teams are peers that divide file ownership and coordinate builds. Best for 5+ interdependent tasks.
No. Use /quick for changes touching fewer than 3 files. It skips Phase 0, plan validation, and the full review swarm.
Two layers. Inside a session, ship-loop.sh blocks premature exit and re-injects the original goal (max 5 iterations). Outside a session, coordinate.sh spawns fresh Claude processes with clean context windows. A PreCompact hook auto-checkpoints STATE.md before context compaction.
After solving a non-trivial problem, /compound saves it to ops/solutions/. Future /plan commands automatically search this directory before starting new work — so every sprint gets smarter. Each sprint should make the next sprint easier.