27 KiB
APOPHIS CLI Execution Guide
1. Purpose
This file defines the CLI redesign contract. It is written for parallel implementers. Each stream owns an end-to-end command. The orchestrator owns specs, fixtures, and golden outputs. Merge gates are strict and minimal.
2. Philosophy
- Vertical slices, not horizontal layers. Each stream goes straight to a complete command endpoint.
- Acceptance tests first. Every stream starts with failing top-level tests, then implements until green.
- No premature extraction. Shared helpers are extracted only after two or more streams prove the same seam.
- Fast local feedback. Every stream should be runnable and testable in isolation.
- Authoritative merge gates only. Spec compliance, golden snapshots, fixture end-to-end runs, and latency budgets.
3. Frozen Contracts (Orchestrator-Owned)
These must not change without orchestrator approval. All streams code against them.
3.1 Command Vocabulary
| Command | Purpose |
|---|---|
apophis init |
Scaffold config, scripts, and example usage |
apophis verify |
Run deterministic contract verification |
apophis observe |
Validate runtime observe configuration and reporting setup |
apophis qualify |
Run scenario, stateful, protocol, or chaos-driven qualification |
apophis replay |
Replay a failure using seed and stored trace |
apophis doctor |
Validate config, environment safety, docs/example correctness |
apophis migrate |
Check and rewrite deprecated config or API usage |
3.2 Global Flags
Every command must accept:
--config <path>--profile <name>--cwd <path>--format human|json|ndjson--color auto|always|never--quiet--verbose--artifact-dir <path>
3.3 Exit Codes
| Code | Meaning |
|---|---|
0 |
Success |
1 |
Behavioral / qualification failure |
2 |
Usage, config, or environment safety violation |
3 |
Internal APOPHIS error |
130 |
Interrupted (SIGINT) |
3.4 Config Schema (TypeBox + Ajv)
Config must be validated with strict unknown-key rejection. Use TypeBox to define the schema so JSON Schema output is available for docs and IDE support.
Key schema requirements:
mode?: 'verify' | 'observe' | 'qualify'profile?: stringpreset?: stringroutes?: string[]seed?: numberartifactDir?: stringenvironments?: Record<string, EnvironmentPolicy>profiles?: Record<string, ProfileDefinition>presets?: Record<string, PresetDefinition>
Unknown keys at any depth must produce a hard failure with exact key path.
3.5 Artifact Schema
Every verify, observe, and qualify run must produce an artifact document:
{
"version": "apophis-artifact/1",
"command": "verify",
"mode": "verify",
"cwd": "/path/to/project",
"configPath": "apophis.config.js",
"profile": "quick",
"preset": "safe-ci",
"env": "local",
"seed": 42,
"startedAt": "2026-04-28T12:30:00Z",
"durationMs": 1234,
"summary": {
"total": 10,
"passed": 9,
"failed": 1
},
"failures": [
{
"route": "POST /users",
"contract": "response_code(GET /users/{response_body(this).id}) == 200",
"expected": "200",
"observed": "404",
"seed": 42,
"replayCommand": "apophis replay --artifact reports/apophis/failure-2026-04-28T12-30-22Z.json"
}
],
"artifacts": [
"reports/apophis/failure-2026-04-28T12-30-22Z.json"
],
"warnings": [],
"exitReason": "behavioral_failure"
}
3.6 Human Output Grammar
For --format human, every failure must follow this exact shape:
Contract violation
POST /users
Profile: quick
Seed: 42
Expected
response_code(GET /users/{response_body(this).id}) == 200
Observed
GET /users/usr-123 returned 404
Why this matters
The resource created by POST /users is not retrievable.
Replay
apophis replay --artifact reports/apophis/failure-2026-04-28T12-30-22Z.json
Next
Check the create/read consistency for POST /users and GET /users/{id}.
This is the canonical human failure format. Do not deviate without orchestrator approval.
3.7 Machine Output Schema
--format json must emit a single stable document matching the artifact schema.
--format ndjson must emit step events:
{"type":"run.started","command":"verify","seed":42,"timestamp":"2026-04-28T12:30:00Z"}
{"type":"route.started","route":"POST /users","timestamp":"2026-04-28T12:30:01Z"}
{"type":"route.passed","route":"POST /users","durationMs":123,"timestamp":"2026-04-28T12:30:01Z"}
{"type":"route.failed","route":"POST /users","failure":{...},"timestamp":"2026-04-28T12:30:02Z"}
{"type":"run.completed","summary":{...},"timestamp":"2026-04-28T12:30:03Z"}
4. Recommended Tooling Stack
| Concern | Tool | Why |
|---|---|---|
| Command parser | cac |
Fast, small, zero ceremony |
| Config/artifact validation | TypeBox + Ajv |
Fast, strict, JSON Schema output |
| Interactive setup | @clack/prompts (lazy-loaded) |
Polished init, zero startup tax elsewhere |
| Color/styling | picocolors |
Tiny, sufficient |
| Output layout | Custom renderer | Better than heavy task/spinner frameworks |
| CLI bundling | tsup |
Fast cold start, single bin |
| Tests | node:test + golden fixtures |
Already aligned with repo |
| Filesystem/glob | Node built-ins + minimal helper | Lean startup |
Avoid: yargs, commander, heavy spinner UIs, ad hoc config validation.
5. Directory Ownership
Each stream owns its directory. No stream touches another stream's directory without orchestrator-mediated extraction.
src/
cli/
core/
index.ts # S1: entrypoint, command registration
context.ts # S1: cwd, env, TTY detection
config-loader.ts # S2: config resolution, profile/preset resolution
policy-engine.ts # S2: env gating, safety checks
exit-codes.ts # S0: exit code constants
types.ts # S0: shared CLI types
commands/
init/
index.ts # S3
scaffolds/ # S3: preset templates
verify/
index.ts # S4
runner.ts # S4: deterministic run logic
observe/
index.ts # S5
validator.ts # S5: observe config validation
qualify/
index.ts # S6
runner.ts # S6: scenario/stateful/chaos runner
replay/
index.ts # S7
loader.ts # S7: artifact loading, version checks
doctor/
index.ts # S8
checks/ # S8: individual diagnostic checks
migrate/
index.ts # S9
rewriters/ # S9: config rewriters
renderers/
human.ts # S10
json.ts # S10
ndjson.ts # S10
shared.ts # S10
__fixtures__/ # S12: fixture apps
__goldens__/ # S12: golden output snapshots
test/
cli/ # S12: CLI acceptance tests
6. Workstreams
S0: Spec Authority (Orchestrator)
Owner: Orchestrator thread only.
Responsibilities:
- Own all files in
src/cli/core/types.ts,src/cli/core/exit-codes.ts - Own
src/cli/__goldens__/* - Own fixture app definitions in
src/cli/__fixtures__/* - Approve or reject contract changes requested by implementation streams
- Merge arbitration: resolve conflicts, enforce golden compliance
Done when:
- All other streams can import from
src/cli/core/types.tsandsrc/cli/core/exit-codes.ts - Golden snapshots exist for every command's
--helpand canonical failure output - Fixture apps cover: tiny Fastify, broken-behavior, monorepo, protocol-flow, observe-config, legacy-config
S1: CLI Kernel
Owner: One LLM thread.
Directory: src/cli/core/ (except types.ts and exit-codes.ts)
Responsibilities:
- Entrypoint:
src/cli/core/index.ts - Command registration with
cac - Global flag parsing and normalization
- Context loading: cwd, env vars, TTY/CI detection
- Error boundary: catch unexpected errors, print internal error banner, write debug artifact
- Help text generation
Acceptance tests (start here, all failing):
apophis --helpmatches golden snapshotapophis verify --helpmatches golden snapshotapophis --versionprints versionapophis unknown-cmdexits 2 with clear messageapophis verify --unknown-flagexits 2 with exact flag name- Non-TTY shell disables prompts and spinners
- CI env disables spinners and fancy rendering
Done when: All acceptance tests pass and other commands can register cleanly.
S2: Config + Policy
Owner: One LLM thread.
Directory: src/cli/core/config-loader.ts, src/cli/core/policy-engine.ts
Responsibilities:
- Config file discovery (
.js,.ts,.json,package.jsonfield) - Config loading with
tsxfor.tsfiles - Profile resolution from config
- Preset resolution and application
- Environment policy enforcement
- Unknown-key hard failure with exact path
- Monorepo boundary detection
Acceptance tests (start here, all failing):
- Loads
apophis.config.jsfrom cwd - Loads config from
--configoverride - Rejects unknown key with exact path
- Resolves profile from config
- Applies preset correctly
- Blocks
qualifyinproductionenv by default - Detects monorepo package boundary
- Suggests
apophis initwhen no config found
Done when: Every command resolves config identically and policy gates are authoritative.
S3: Init
Owner: One LLM thread.
Directory: src/cli/commands/init/
Responsibilities:
apophis init --preset <name>- Detect Fastify app structure
- Write scaffold files (config, example route guidance, package script)
- Support
--forcefor overwrite - Noninteractive mode with explicit flags
- Idempotent rerun behavior
- Print exact next command after init
Acceptance tests (start here, all failing):
apophis init --preset safe-ciwrites correct files in empty repo- Detects existing Fastify entrypoint
- Refuses overwrite without
--force - Merges package scripts without clobbering
- Noninteractive mode works with all required flags
- Missing
@fastify/swaggerproduces clear guidance - Idempotent rerun updates only changed scaffold parts
- Prints exact next command:
apophis verify --profile quick --routes "POST /users"
Done when: Fresh repo gets to first verify in one pass.
S4: Verify
Owner: One LLM thread.
Directory: src/cli/commands/verify/
Responsibilities:
apophis verify --profile <name> --routes <filter>- Route selection and filtering
- Deterministic contract verification
- Seed generation and emission
- Failure reporting with canonical human output
- Artifact emission
- Replay command generation
--changedsupport for git-based route filtering
Acceptance tests (start here, all failing):
apophis verify --profile quickruns all routes with behavioral contracts--routes "POST /users"filters correctly- Finds the canonical behavioral failure: POST /users creates an unretrievable resource
- Failure output matches golden snapshot exactly
- Emits artifact with correct schema
- Prints replay command
- Seed is generated and printed when omitted
--changedfilters to modified routes- No routes matched produces clear failure with available matches
- No behavioral contracts found explains schema-only is not enough
Done when: The first behavioral failure is reliable and replay works.
S5: Observe
Owner: One LLM thread.
Directory: src/cli/commands/observe/
Responsibilities:
apophis observe --profile <name> --check-config- Validate observe configuration
- Check reporting sink setup
- Validate non-blocking semantics
- Environment safety checks
- Explain what would be checked and why it is safe
Acceptance tests (start here, all failing):
apophis observe --profile staging-observevalidates config- Blocking behavior in prod is blocked by default
- Invalid sampling rate fails with exact bounds
- Missing sink config tells user what is required
- Observe profile referencing qualify-only feature is blocked
--check-configonly validates, does not activate- Output explains safety boundaries clearly
Done when: Staging/prod safety checks are crisp and trustworthy.
S6: Qualify
Owner: One LLM thread.
Directory: src/cli/commands/qualify/
Responsibilities:
apophis qualify --profile <name> --seed <n>- Scenario execution
- Stateful execution
- Chaos execution
- Profile gating
- Rich artifact emission
- Non-prod boundary enforcement
Acceptance tests (start here, all failing):
apophis qualify --profile oauth-nightly --seed 42runs OAuth scenario- Prod run is blocked by default
- Chaos on protected routes is blocked without allowlist
- Scenario with outbound mocks not allowed in env is blocked
- Cleanup failure is reported separately without hiding primary failure
- Emits rich artifact with step traces
- Seed is generated and printed when omitted
Done when: Deeper realism works without contaminating normal CI.
S7: Replay
Owner: One LLM thread.
Directory: src/cli/commands/replay/
Responsibilities:
apophis replay --artifact <path>- Artifact loading and validation
- Version compatibility checks
- Seed replay
- Degraded replay guidance when source changed
- Fast startup (p95 under 500 ms on the CLI fixture environment)
Acceptance tests (start here, all failing):
apophis replay --artifact <path>reproduces exact failure- Missing artifact fails with exact path
- Corrupted artifact explains parse/validation failure
- Source code changed since artifact warns but attempts replay
- Referenced route no longer exists explains drift
- CLI version mismatch shows compatibility message
- Startup p95 is under 500 ms on the CLI fixture environment
Done when: Every verify/qualify failure is reproducible with one command.
S8: Doctor
Owner: One LLM thread.
Directory: src/cli/commands/doctor/
Responsibilities:
apophis doctor- Dependency checks (Fastify, swagger, Node version)
- Config validation
- Route discovery checks
- Docs/example smoke checks
- Legacy config detection
- Mixed config style detection
Acceptance tests (start here, all failing):
apophis doctorpasses on healthy project- Unknown config key is caught
- Missing
@fastify/swaggeris reported with install command - Mixed legacy and new config shows both and recommends
migrate - Qualify enabled in unsafe env is caught
- Docs examples drift from reality fails in CI mode
- Monorepo with different config styles reports per package
Done when: Malformed setups fail fast and clearly.
S9: Migrate
Owner: One LLM thread.
Directory: src/cli/commands/migrate/
Responsibilities:
apophis migrate --checkapophis migrate --dry-runapophis migrate --write- Legacy config detection
- Exact replacement guidance
- Comment/formatting preservation where feasible
- Partial migration reporting
Acceptance tests (start here, all failing):
apophis migrate --checkdetects legacy config--dry-runshows exact rewrites without writing--writeperforms rewrites correctly- Ambiguous rewrite stops and requires manual choice
- Legacy field with no direct equivalent emits human guidance
- Partial migration reports completed and remaining items
- Preserves comments/formatting where feasible
Done when: Old outward contract upgrades cleanly.
S10: Renderers
Owner: One LLM thread.
Directory: src/cli/renderers/
Responsibilities:
- Human renderer: canonical failure output, progress, summaries
- JSON renderer: stable artifact schema
- NDJSON renderer: step events
- Truncation rules for large payloads
- Color/styling with
picocolors - No spinners in CI
- No ANSI in
--format json
Acceptance tests (start here, all failing):
- Human failure output matches golden snapshot exactly
- JSON output validates against artifact schema
- NDJSON emits correct event sequence
- Large payloads are truncated in terminal, full in artifact
- No ANSI in
--format json - No spinners when
CI=true - Color respects
--colorflag
Done when: Every command looks consistent and machine-readable.
S11: Docs + Site
Owner: One LLM thread.
Directory: docs/
Responsibilities:
docs/cli.md: command referencedocs/verify.md,docs/observe.md,docs/qualify.md: mode guidesdocs/getting-started.md: first-signal quickstartdocs/llm-safe-adoption.md: scaffold and CI policy- Homepage behavior examples and first-signal funnel copy
- All examples must be smoke-tested against real CLI
Acceptance tests (start here, all failing):
- Every code block in
docs/getting-started.mdruns successfully - Homepage behavior example produces exact golden output
- All
apophiscommands in docs exist and have correct flags - All examples use current config schema
- No stale legacy syntax in docs
Done when: Docs match shipped CLI exactly.
S12: Acceptance Matrix
Owner: One LLM thread.
Directory: src/test/cli/, src/cli/__fixtures__/, src/cli/__goldens__/
Responsibilities:
- Top-level fixture apps
- End-to-end command smoke suite
- Latency budget checks
- Regression harness
- Golden snapshot management
Fixture apps required:
tiny-fastify: minimal app with one route, one behavioral contractbroken-behavior: app with known behavioral bugmonorepo: multiple packages with different configsprotocol-lab: OAuth-like multi-step flowobserve-config: observe-ready app with sink configlegacy-config: old-style config for migration tests
Acceptance tests (start here, all failing):
- All commands run against all fixture apps
- Golden snapshots match
- Latency budgets met:
apophis --help: < 100msapophis doctorconfig-only: < 3sapophis initafter prompts: < 500msapophis verifyfirst progress: < 2sapophis replaystartup: < 500ms
- Regression: no command breaks another command's fixtures
- Exit codes are correct for every scenario
Done when: Merge gate is authoritative.
7. Red-Green-Refactor Per Stream
For every stream, follow this exact loop:
- Red: Write all acceptance tests. They must fail.
- Green: Implement the vertical slice until all tests pass.
- Refactor: Only after green, extract shared code if another stream needs it. Request orchestrator mediation for cross-stream extraction.
Example for S4 (Verify):
// Step 1: Red - write failing test
import { test } from 'node:test';
import assert from 'node:assert';
import { runCli } from '../helpers/run-cli.js';
test('verify finds the canonical behavioral failure', async () => {
const result = await runCli({
cwd: 'src/cli/__fixtures__/broken-behavior',
args: ['verify', '--profile', 'quick', '--routes', 'POST /users']
});
assert.strictEqual(result.exitCode, 1);
assert.match(result.stdout, /Contract violation/);
assert.match(result.stdout, /POST \/users/);
assert.match(result.stdout, /Replay/);
assert.match(result.stdout, /apophis replay --artifact/);
});
// Step 2: Green - implement until it passes
// src/cli/commands/verify/index.ts
import { cac } from 'cac';
// ... implementation
// Step 3: Refactor - only if S6 also needs route filtering
// Request orchestrator to extract route-filter to src/cli/core/
8. Merge Policy
8.1 What streams can merge independently
- Any stream can merge if:
- All its acceptance tests pass
- It does not modify orchestrator-owned files
- It does not modify another stream's directory
- It passes
npm run buildandnpm run test:src
8.2 What requires orchestrator approval
- Changes to
src/cli/core/types.ts - Changes to
src/cli/core/exit-codes.ts - Changes to
src/cli/__goldens__/ - Changes to
src/cli/__fixtures__/ - New shared extraction requests
- Golden snapshot updates
8.3 Merge gate commands
Every PR must pass:
npm run build
npm run test:src
npm run test:cli # S12 acceptance matrix
npm run test:cli:goldens # golden snapshot comparison
npm run test:cli:latency # latency budget checks
npm run test:docs # docs smoke tests
9. Edge Cases Reference
Global
| Edge case | Expected behavior |
|---|---|
| No config found | Suggest apophis init, do not crash |
| Multiple config candidates | Print choices and exact override flag |
| Monorepo root vs package root | Detect package boundary and say which one was chosen |
| Unknown config keys | Hard fail with exact key path |
| Invalid profile name | List available profiles |
| Preset/profile mismatch | Explain mismatch, do not silently coerce |
| Unsupported Node/runtime | Fail immediately with exact version requirement |
| Missing peer dependencies | Report package names and install command |
| Non-TTY shell | Disable prompts and fancy rendering automatically |
| CI environment | No spinners, stable deterministic output |
--format json with warnings |
Warnings go into structured fields, never stderr noise |
| Unwritable artifact dir | Fail before run if artifacts are required |
| SIGINT | Write partial artifact if safe, print interruption summary |
| Internal exception | Show internal error banner plus artifact/debug path |
| Very large failure payload | Concise terminal summary, full detail in artifact |
| Route path contains spaces or weird chars | Always quote safely in printed commands |
| Dirty git tree | Never block, unless command explicitly needs git diff semantics |
--changed outside git repo |
Degrade cleanly and tell user how |
| Stale artifact version | Explain incompatibility and fallback options |
Init
| Edge case | Expected behavior |
|---|---|
| Existing config file | Refuse overwrite unless --force, show diff or dry-run |
| Existing package scripts | Merge carefully, do not clobber |
| Multiple Fastify entrypoints detected | Ask or require explicit selection |
| Noninteractive shell with ambiguity | Fail with explicit flags needed |
Missing @fastify/swagger |
Tell user why it matters and how to add it |
| Package manager unknown | Avoid assumptions, print generic install commands |
Rerun init |
Idempotent or clearly update-only |
Verify
| Edge case | Expected behavior |
|---|---|
| No routes matched | Fail with route filter echo and available matches summary |
| No behavioral contracts found | Explain that schema-only routes do not provide behavioral contracts for verify |
| Contract parse failure | Show route, clause index, expression, migration guidance |
| Seed omitted | Generate one and print it always |
| Multiple failures | Stable order, compact summary, artifact for full detail |
| Changed-files selection empty | Say no relevant routes changed |
| Flaky endpoint behavior | Call out nondeterminism if replay diverges |
| Timeout | Route-specific timeout in summary |
| Artifact write fails after run | Still print failure summary and note artifact problem |
Observe
| Edge case | Expected behavior |
|---|---|
| Blocking behavior requested in prod | Hard fail unless explicit break-glass policy allows it |
| Invalid sampling rate | Fail with exact bounds |
| Missing sink config | Tell user what sink is required |
| Config would generate outage risk | Fail before activation |
| Observe profile references qualify-only feature | Hard fail |
Qualify
| Edge case | Expected behavior |
|---|---|
| Run in prod by default | Hard block |
| Scenario uses outbound mocks not allowed in env | Hard block |
| Scenario form flow requires missing app support | Clear diagnostic |
| Chaos requested on protected routes | Hard block unless allowlisted |
| Cleanup fails after stateful run | Report separately without hiding primary failure |
| Seed omitted | Generate and print it |
| Too many artifacts | Summarize and index them cleanly |
Replay
| Edge case | Expected behavior |
|---|---|
| Artifact missing | Fail with exact path |
| Artifact corrupted | Explain parse/validation failure |
| Source code changed since artifact | Warn but still attempt replay |
| Referenced route no longer exists | Explain drift clearly |
| CLI version newer/older than artifact schema | Compatibility message, not stack trace |
Doctor
| Edge case | Expected behavior |
|---|---|
| Mixed legacy and new config | Show both and recommend migrate |
| Docs examples drift from reality | Fail in CI mode |
| Missing swagger registration | Tell user whether APOPHIS can still proceed and what is degraded |
| Qualify enabled in unsafe env | Hard fail |
| Multiple packages in monorepo using different config styles | Report per package |
Migrate
| Edge case | Expected behavior |
|---|---|
| Ambiguous rewrite | Stop and require manual choice |
| Comments/formatting preservation | Preserve where feasible, otherwise warn |
| Dry-run mode | Default for safety |
| Legacy field removed with no direct equivalent | Emit exact human guidance |
| Partial migration | Report completed and remaining items separately |
10. Latency Budgets
| Command | Target |
|---|---|
apophis --help |
< 100ms |
apophis doctor config-only |
< 3s |
apophis init after prompts |
< 500ms |
apophis verify first progress |
< 2s |
apophis replay startup |
< 500ms |
These are enforced by S12. A command that exceeds its budget fails CI.
11. First Signal Checklist
For the CLI to deliver the first useful signal, every stream must satisfy:
- Install to first signal: under 10 minutes for normal Fastify service
--helpclarity: user can infer product model from help text alone- First
init: writes correct scaffold without blocking on unnecessary prompts - First
verify: checks cross-operation behavior, not only shape - First failure: route, formula, observed reality, seed, replay command, artifact path
- First replay: one copy-paste command reproduces same result
- Trust signal: CLI explicitly shows environment gating and deterministic seed
- Expansion path: output tells user whether to add more
verify, turn onobserve, or createqualifyprofile
12. Final Notes for Implementers
- Do not over-engineer shared code. Each stream should be self-contained until proven otherwise.
- Do not add features not in the spec. The spec is intentionally minimal.
- Do not optimize for polish over correctness. The useful signal is in the failure message, not the spinner.
- Do not skip acceptance tests. They are the contract.
- Do not modify orchestrator files. Request changes through the orchestrator.
- Do not assume another stream's timeline. Code against the spec, not against another stream's partial implementation.
- Do ask for clarification. The orchestrator exists to resolve ambiguity.
This document is versioned. The orchestrator will update it if contracts change. Implementation streams should pin to a version and request updates explicitly.