v2.4.0 — Framework Self-Audit & Integrity Fixes

Agent Triforge

Three AI models forging production-grade code together. Claude Code orchestrates Gemini CLI, Codex CLI, and 19 specialized subagents through file-based protocols, portable skills, and parallel review swarms.

0 Agents

0 Skills

0 Commands

0 Hooks

-- Stars

Get Started GitHub

claude — agent-triforge

agent-triforge main claude v2.4.0

Phase 0 Gemini analyzing codebase ···

gemini-3.1-pro 1M context window 47 files mapped

Phase 1 Planning with shadow paths ···

shadow-path-tracing 23 paths enumerated 12 error states mapped

Phase 1.1 Ambiguity check — 0 corrections ···

plan-checker agent 0 blockers approved on first pass

Phase 2 Building via wave orchestration ···

wave-orchestrator 4 waves 7 builders in parallel

Phase 3 7 reviewers dispatched in parallel ···

gemini codex security-sentinel performance-oracle simplicity-reviewer convention-enforcer architecture-strategist

Phase 5 Codex TDD — 47 tests, all green ···

codex gpt-5.4 RED → GREEN → REFACTOR 47 / 47 passing

All gates passed. Shipping.

Ready 0.0s Press R to replay

AI Models

Claude + Gemini + Codex

Agents

Opus max effort

Skills

Portable workflows

Reviewers

Parallel review swarm

Three-Model Orchestration

Claude Code as lead agent, Gemini CLI for 1M-token codebase analysis, Codex CLI for sandboxed test execution. Each assigned by a heuristic matrix.

Specialized Agents

Security sentinel, performance oracle, architecture strategist, test-gap analyzer, findings synthesizer — each with restricted tools and focused expertise.

Portable Skills

Model-agnostic workflow modules injected into any agent — TDD, systematic debugging, wave orchestration, shadow-path tracing, knowledge compounding.

Parallel Review Swarm

Up to 7 reviewers analyze code simultaneously — Gemini + Codex + 5 Claude specialist agents — then merged with confidence tiering.

Knowledge Compounding

Every non-trivial solution is documented in ops/solutions/. A learnings-researcher automatically searches these before planning new work.

File-Based Coordination

All agents coordinate through shared markdown in ops/ — AGENTS.md, GOALS.md, TASKS.md, MEMORY.md, CHANGELOG.md, STATE.md. Auditable, git-friendly, no databases.

Phase

What Happens

Agent(s)

Command

Codebase Analysis

Gemini CLI

/plan

Planning

Claude + writing-plans skill

/plan

1.1

Ambiguity Resolution

Claude (user confirmation)

/plan, /ship

1.5

Plan Validation

plan-checker agent

/plan

Build (Wave Orchestration)

Claude subagents / team-lead

/build

3-4

Parallel Review + Synthesis

Gemini + Codex + 5 agents

/review

TDD Testing

Codex CLI + test-gap-analyzer

/test

Ship (Knowledge Compound)

Claude + knowledge-compounding

/wrap

Core Workflow

plan-checker

Validates task plans for completeness, assignments, dependencies

findings-synthesizer

Merges review outputs with deduplication and confidence tiering

integration-verifier

Runs build, tests, lint between waves

learnings-researcher

Searches solutions and decisions for relevant patterns

team-lead

Orchestrates agent team workers with file ownership

research-synthesizer

Merges parallel research outputs into unified analysis

continuous-reviewer

Per-task quality gate during team builds

Review Specialists

security-sentinel

SQL injection, XSS, auth bypass, OWASP Top 10

performance-oracle

O(n²) loops, N+1 queries, memory leaks

code-simplicity-reviewer

Over-engineering, YAGNI violations, unnecessary abstraction

convention-enforcer

Naming, file organization, code style consistency

architecture-strategist

SOLID principles, coupling/cohesion, module boundaries

test-gap-analyzer

Untested code paths, missing edge cases, weak assertions

Research & Verification

best-practices-researcher

Industry patterns, anti-patterns, tradeoff analysis

framework-docs-researcher

Current docs for specific frameworks and libraries

git-history-analyzer

Code evolution and architectural decisions via git

bug-reproduction-validator

Validates bugs are reproducible before fixes begin

deployment-verifier

Post-deployment health checks and smoke tests

pr-comment-resolver

Reads GitHub PR comments and implements changes

Skill	Consumer	What It Teaches
`codebase-mapping`	Gemini (Phase 0)	Full-repo analysis: structure, data flow, patterns, debt
`writing-plans`	Claude (Phase 1)	Task decomposition with shadow paths, error maps, interface context
`shadow-path-tracing`	Claude (Phase 1)	Enumerate every failure path alongside the happy path
`wave-orchestration`	Claude (Phase 2)	Dependency-grouped parallel execution with integration checks
`test-driven-development`	Codex (Phase 5)	RED-GREEN-REFACTOR: no production code without failing test
`systematic-debugging`	Codex, Claude	Error taxonomy, assumption tracking, bisection, circuit breaker
`iterative-refinement`	Claude (Phase 4)	Review-fix-review loops with convergence modes (max 3 cycles)
`review-synthesis`	Claude (Phase 4)	Merge multi-reviewer findings with confidence tiering
`verification-before-completion`	All agents	Evidence-based completion checklist — no "done" without proof
`knowledge-compounding`	Claude (Phase 6)	Document solutions to ops/solutions/ for future sprints
`session-continuity`	Claude	Save and resume via STATE.md across sessions
`scope-cutting`	Claude	Systematically cut scope by unblocking value and risk

Confidence Tiering

HIGH — verified in codebase. MEDIUM — pattern match. LOW — heuristic only, can never be P1. Prevents wasting time on phantom issues.

Suppressions

Each reviewer has a "Do Not Flag" list — readability-aiding redundancy, documented thresholds, sufficient assertions, consistency-only style changes.

Iterative Convergence

Max 3 review cycles. P1 fixed immediately, P2 this cycle, P3 logged for later. Escalate to user if not converged after 3 rounds.

Plan validated before build

No build without a validated plan plan-checker — max 3 iterations

Failing test before implementation

No production code without a failing test test-driven-development

Root cause before fixes

No fix without diagnosis systematic-debugging

Evidence before completion

No "done" without proof verification-before-completion

Code review before shipping

No merge without review review-synthesis — max 3 cycles

Circuit breaker on debugging

3-attempt ceiling per issue, then escalation report systematic-debugging

Inner Loop

ship-loop.sh Stop hook blocks exit with JSON re-feed, session-isolated, transcript-based promise detection (max 5x)

Outer Loop

coordinate.sh spawns fresh sessions with clean context, notifies on completion via webhook or OS notification

PreCompact Hook

Auto-checkpoints STATE.md before context compaction — prevents state loss during mid-sprint compaction

Analysis Paralysis Detection

context-monitor.sh warns at 8+ consecutive reads without writes — breaks the reading loop

Tool Failure Monitor

Tracks and warns on accumulated tool failures — 5 consecutive or 10 total triggers alert

Subprocess Timeouts

Watchdog pattern on all Gemini/Codex calls — SIGTERM after timeout, SIGKILL after 5s grace

Risk Scoring

Per-subagent risk accumulation — halt at >20% risk or 50+ file changes

Step 1 — Install the plugin

claude plugin add https://github.com/Ninety2UA/agent-triforge

Step 2 — Start a sprint

/ship add user authentication with JWT

Prerequisites — Three CLIs + Python 3

bash
# Claude Code (you're probably already here)
claude --version

# Gemini CLI
gemini -p "Respond with only: READY"

# Codex CLI
codex exec "Respond with only: READY"

# Python 3 (used by hook handlers for JSON parsing)
python3 --version

Autonomous mode — with context recovery

./scripts/coordinate.sh "add user auth" --max 5 --team

Command	What It Does
`/ship <goal>` full	Fully autonomous end-to-end sprint with inner loop guard
`/coordinate <goal>` full	Full lifecycle with context-exhaustion recovery
`/plan <goal>` phase	Analyze codebase, plan with shadow paths, validate
`/build` phase	Wave orchestration build. `--team` for agent team mode
`/review` phase	Parallel review + synthesis. `--full` for all 7 reviewers
`/test` phase	Gap analysis + Codex TDD. `--gaps-only` to just identify gaps
`/wrap` phase	Compound knowledge, archive reviews, write STATE.md, git trailers
`/quick <change>` util	Changes touching < 3 files. Skips heavy machinery
`/debug <bug>` util	Structured debugging: reproduce, diagnose, fix
`/deep-research <topic>` util	5 parallel research agents + research-synthesizer
`/status` util	Sprint overview: phase, tasks, blockers
`/pause` util	Quick checkpoint to STATE.md
`/resume` util	Continue from STATE.md checkpoint
`/compound` util	Document a solved problem or decision
`/analyze <url>` util	Deep compatibility analysis of an external repo
`/resolve-pr <#>` util	Read GitHub PR comments and implement changes

Yes. The framework installs as a Claude Code plugin — it's additive and doesn't modify your existing code. On first session, it bootstraps an ops/ directory with AGENTS.md, GOALS.md, MEMORY.md, and CHANGELOG.md templates.

No. The framework degrades gracefully. Without Gemini, Phase 0 is skipped. Without Codex, testing is handled by Claude. You lose the multi-model benefits but everything still works.

Skills are instructions that guide behavior — methodology documents. Agents are separate subprocesses dispatched via the Agent tool, each with their own context window. Skills can be injected into any agent (including external ones like Gemini and Codex).

Agent Teams spawn multiple Claude Code instances that collaborate through a shared task list and messaging. Unlike review swarms (read-only analysis), teams are peers that divide file ownership and coordinate builds. Best for 5+ interdependent tasks.

No. Use /quick for changes touching fewer than 3 files. It skips Phase 0, plan validation, and the full review swarm.

Two layers. Inside a session, ship-loop.sh blocks premature exit and re-injects the original goal (max 5 iterations). Outside a session, coordinate.sh spawns fresh Claude processes with clean context windows. A PreCompact hook auto-checkpoints STATE.md before context compaction.

After solving a non-trivial problem, /compound saves it to ops/solutions/. Future /plan commands automatically search this directory before starting new work — so every sprint gets smarter. Each sprint should make the next sprint easier.

Agent Triforge

A Complete Multi-Agent Development System

AI Models

Agents

Skills

Reviewers

Three-Model Orchestration

Specialized Agents

Portable Skills

Parallel Review Swarm

Knowledge Compounding

File-Based Coordination

From Goal to Ship in 7 Phases

19 Specialized Agents

plan-checker

findings-synthesizer

integration-verifier

learnings-researcher

team-lead

research-synthesizer

continuous-reviewer

security-sentinel

performance-oracle

code-simplicity-reviewer

convention-enforcer

architecture-strategist

test-gap-analyzer

best-practices-researcher

framework-docs-researcher

git-history-analyzer

bug-reproduction-validator

deployment-verifier

pr-comment-resolver

12 Model-Agnostic Workflow Modules

7 Reviewers in Parallel

Confidence Tiering

Suppressions

Iterative Convergence

Six Non-Negotiable Checkpoints

Plan validated before build

Failing test before implementation

Root cause before fixes

Evidence before completion

Code review before shipping

Circuit breaker on debugging

Seven Defense Layers

Inner Loop

Outer Loop

PreCompact Hook

Analysis Paralysis Detection

Tool Failure Monitor

Subprocess Timeouts

Risk Scoring

Install in One Command

16 Slash Commands

Frequently Asked Questions