4.8 KiB
Parallelization and Incremental Testing Analysis
1. Parallelization with Worker Threads
Feasibility: PARTIAL
APOPHIS has three phases, each with different parallelization potential:
Phase 1: Route Discovery
- Fastify stores routes in a single array
- Reading routes is already O(n) and fast (~0.5µs/route)
- Parallelizing would require sharing the Fastify instance across threads
- Fastify instances are NOT thread-safe
- Verdict: NOT worth parallelizing. Bottleneck is negligible.
Phase 2: Test Generation (Schema → Arbitrary)
- CPU-bound: fast-check arbitrary construction
- Independent per route
- Could shard routes across worker threads
- Each worker needs only the schema subset
- Verdict: HIGH POTENTIAL. Could get near-linear speedup with core count.
Phase 3: Test Execution (fastify.inject)
- Fastify is single-threaded
- Cannot share instance across workers
- Creating multiple Fastify instances wastes memory and breaks integration tests
- Verdict: NOT feasible for integration testing.
Implementation Strategy (if needed):
// Phase 2 parallelization
const { Worker } = require('worker_threads')
async function generateTestsParallel(routes, numWorkers = os.cpus().length) {
const chunks = chunk(routes, Math.ceil(routes.length / numWorkers))
const workers = chunks.map(chunk =>
new Worker('./test-generator-worker.js', {
workerData: { routes: chunk }
})
)
const results = await Promise.all(
workers.map(w => new Promise((res, rej) => {
w.on('message', res)
w.on('error', rej)
}))
)
return results.flat()
}
Expected Speedup: 2-4x on 8-core machine for generation phase only. Complexity: Medium. Need to serialize/deserialize schemas and arbitraries. When to use: Only if generation phase exceeds 5 seconds.
2. Incremental Testing with Schema Hashing
Feasibility: HIGH
Instead of regenerating all tests every run, hash each route's schema and only regenerate changed ones.
Algorithm:
- Compute deterministic hash of each route's schema
- Compare with cached hashes from previous run
- For unchanged routes: reuse previous test commands
- For changed routes: regenerate from scratch
- Save new hashes to cache file
Simple Implementation:
import { createHash } from 'node:crypto'
function hashSchema(schema) {
return createHash('sha256')
.update(JSON.stringify(schema))
.digest('hex')
.slice(0, 16) // 64 bits is enough
}
// Cache structure
const cache = {
version: 1,
schemas: {
'hash123': { commandTemplates: [...], lastRun: timestamp },
'hash456': { commandTemplates: [...], lastRun: timestamp }
}
}
Expected Impact:
- First run: 100% generation (baseline)
- Typical commit (50 routes changed of 11,389): 0.4% regeneration
- Schema-only changes (types, constraints): near-instant
Cache Invalidation Strategy:
- Cache key:
sha256(JSON.stringify(schema)) - Cache file:
.apophis-cache.json(gitignored) - TTL: Infinite (schemas are immutable once defined)
- Manual invalidation:
rm .apophis-cache.json
JSONHash Integration:
The JSONHash library from ~/Business/workspace/lsh_libs provides structural similarity detection, which could enable:
- Fuzzy cache hits: If schema changed slightly but structure is similar, reuse and mutate test data
- Schema migration detection: Identify which routes changed structurally vs cosmetically
- Test suite deduplication: Detect routes with similar schemas that can share test patterns
However, for the primary use case (skip unchanged routes), a simple SHA-256 hash is sufficient and faster.
Recommendation:
- Immediate: Implement simple SHA-256 schema cache (1-2 hours work, huge CI/CD win)
- Future: Integrate JSONHash for fuzzy similarity and smart test data reuse
- Parallelization: Defer until generation phase proves to be the bottleneck in practice
3. Current Bottleneck Analysis
From profiling:
convertSchema: 823ms (37% of total) — CPU bound, parallelizablediscoverRoutes: 1,649ms (74% of total) — Memory/allocation boundevaluate: 156ms (7% of total) — Fast enoughparse: 85ms (4% of total) — Cached, fast enough
The real bottleneck is discoverRoutes which is memory-bound (creating objects). Parallelization won't help here because:
- Object allocation is single-threaded in V8
- Fastify routes array must be read sequentially
- WeakMap cache is already optimizing the repeated case
Incremental testing would eliminate the discoverRoutes cost entirely for unchanged routes.
4. Implementation Priority
- Schema hash cache (HIGH): Eliminates 74% of work for unchanged routes
- Parallel generation (MEDIUM): Could speed up remaining 26% by 2-4x
- JSONHash similarity (LOW): Nice-to-have for advanced use cases