Three AI models forging production-grade code together. Claude Code orchestrates Gemini CLI, Codex CLI, and 19 specialized subagents through file-based protocols, portable skills, and parallel review swarms.
Every model does what it does best. Claude builds. Gemini analyzes. Codex tests. Knowledge compounds across sessions.
Claude + Gemini + Codex
Opus max effort
Portable workflows
Parallel review swarm
Claude Code as lead agent, Gemini CLI for 1M-token codebase analysis, Codex CLI for sandboxed test execution. Each assigned by a heuristic matrix.
Security sentinel, performance oracle, architecture strategist, test-gap analyzer, findings synthesizer — each with restricted tools and focused expertise.
Model-agnostic workflow modules injected into any agent — TDD, systematic debugging, wave orchestration, shadow-path tracing, knowledge compounding.
Up to 7 reviewers analyze code simultaneously — Gemini + Codex + 5 Claude specialist agents — then merged with confidence tiering.
Every non-trivial solution is documented in ops/solutions/. A learnings-researcher automatically searches these before planning new work.
All agents coordinate through shared markdown in ops/ — AGENTS.md, GOALS.md, TASKS.md, MEMORY.md, CHANGELOG.md, STATE.md. Auditable, git-friendly, no databases.
Run /ship for fully autonomous execution, or invoke each phase individually with dedicated commands.
Each runs in its own context window with restricted tools and focused expertise.
Validates task plans for completeness, assignments, dependencies
Merges review outputs with deduplication and confidence tiering
Runs build, tests, lint between waves
Searches solutions and decisions for relevant patterns
Orchestrates agent team workers with file ownership
Merges parallel research outputs into unified analysis
Per-task quality gate during team builds
SQL injection, XSS, auth bypass, OWASP Top 10
O(n²) loops, N+1 queries, memory leaks
Over-engineering, YAGNI violations, unnecessary abstraction
Naming, file organization, code style consistency
SOLID principles, coupling/cohesion, module boundaries
Untested code paths, missing edge cases, weak assertions
Industry patterns, anti-patterns, tradeoff analysis
Current docs for specific frameworks and libraries
Code evolution and architectural decisions via git
Validates bugs are reproducible before fixes begin
Post-deployment health checks and smoke tests
Reads GitHub PR comments and implements changes
Skills are markdown files that any agent can consume. Inject into Gemini or Codex via $(cat skills/SKILL.md).
| Skill | Consumer | What It Teaches |
|---|---|---|
codebase-mapping | Gemini (Phase 0) | Full-repo analysis: structure, data flow, patterns, debt |
writing-plans | Claude (Phase 1) | Task decomposition with shadow paths, error maps, interface context |
shadow-path-tracing | Claude (Phase 1) | Enumerate every failure path alongside the happy path |
wave-orchestration | Claude (Phase 2) | Dependency-grouped parallel execution with integration checks |
test-driven-development | Codex (Phase 5) | RED-GREEN-REFACTOR: no production code without failing test |
systematic-debugging | Codex, Claude | Error taxonomy, assumption tracking, bisection, circuit breaker |
iterative-refinement | Claude (Phase 4) | Review-fix-review loops with convergence modes (max 3 cycles) |
review-synthesis | Claude (Phase 4) | Merge multi-reviewer findings with confidence tiering |
verification-before-completion | All agents | Evidence-based completion checklist — no "done" without proof |
knowledge-compounding | Claude (Phase 6) | Document solutions to ops/solutions/ for future sprints |
session-continuity | Claude | Save and resume via STATE.md across sessions |
scope-cutting | Claude | Systematically cut scope by unblocking value and risk |
Two external CLIs and five Claude specialist agents analyze the same code simultaneously through different lenses, then a findings-synthesizer merges everything.
HIGH — verified in codebase. MEDIUM — pattern match. LOW — heuristic only, can never be P1. Prevents wasting time on phantom issues.
Each reviewer has a "Do Not Flag" list — readability-aiding redundancy, documented thresholds, sufficient assertions, consistency-only style changes.
Max 3 review cycles. P1 fixed immediately, P2 this cycle, P3 logged for later. Escalate to user if not converged after 3 rounds.
Enforced at every stage of the pipeline. No shortcuts.
No build without a validated plan plan-checker — max 3 iterations
No production code without a failing test test-driven-development
No fix without diagnosis systematic-debugging
No "done" without proof verification-before-completion
No merge without review review-synthesis — max 3 cycles
3-attempt ceiling per issue, then escalation report systematic-debugging
Long sprints don't die to context limits or silent failures.
ship-loop.sh Stop hook blocks exit with JSON re-feed, session-isolated, transcript-based promise detection (max 5x)
coordinate.sh spawns fresh sessions with clean context, notifies on completion via webhook or OS notification
Auto-checkpoints STATE.md before context compaction — prevents state loss during mid-sprint compaction
context-monitor.sh warns at 8+ consecutive reads without writes — breaks the reading loop
Tracks and warns on accumulated tool failures — 5 consecutive or 10 total triggers alert
Watchdog pattern on all Gemini/Codex calls — SIGTERM after timeout, SIGKILL after 5s grace
Per-subagent risk accumulation — halt at >20% risk or 50+ file changes
Hooks, agents, skills, and commands register automatically. No manual configuration needed.
| Command | What It Does |
|---|---|
/ship <goal> full | Fully autonomous end-to-end sprint with inner loop guard |
/coordinate <goal> full | Full lifecycle with context-exhaustion recovery |
/plan <goal> phase | Analyze codebase, plan with shadow paths, validate |
/build phase | Wave orchestration build. --team for agent team mode |
/review phase | Parallel review + synthesis. --full for all 7 reviewers |
/test phase | Gap analysis + Codex TDD. --gaps-only to just identify gaps |
/wrap phase | Compound knowledge, archive reviews, write STATE.md, git trailers |
/quick <change> util | Changes touching < 3 files. Skips heavy machinery |
/debug <bug> util | Structured debugging: reproduce, diagnose, fix |
/deep-research <topic> util | 5 parallel research agents + research-synthesizer |
/status util | Sprint overview: phase, tasks, blockers |
/pause util | Quick checkpoint to STATE.md |
/resume util | Continue from STATE.md checkpoint |
/compound util | Document a solved problem or decision |
/analyze <url> util | Deep compatibility analysis of an external repo |
/resolve-pr <#> util | Read GitHub PR comments and implement changes |
ops/ directory with AGENTS.md, GOALS.md, MEMORY.md, and CHANGELOG.md templates./quick for changes touching fewer than 3 files. It skips Phase 0, plan validation, and the full review swarm.ship-loop.sh blocks premature exit and re-injects the original goal (max 5 iterations). Outside a session, coordinate.sh spawns fresh Claude processes with clean context windows. A PreCompact hook auto-checkpoints STATE.md before context compaction./compound saves it to ops/solutions/. Future /plan commands automatically search this directory before starting new work — so every sprint gets smarter. Each sprint should make the next sprint easier.