142 lines
4.8 KiB
Markdown
142 lines
4.8 KiB
Markdown
|
|
# Parallelization and Incremental Testing Analysis
|
||
|
|
|
||
|
|
## 1. Parallelization with Worker Threads
|
||
|
|
|
||
|
|
### Feasibility: PARTIAL
|
||
|
|
|
||
|
|
APOPHIS has three phases, each with different parallelization potential:
|
||
|
|
|
||
|
|
**Phase 1: Route Discovery**
|
||
|
|
- Fastify stores routes in a single array
|
||
|
|
- Reading routes is already O(n) and fast (~0.5µs/route)
|
||
|
|
- Parallelizing would require sharing the Fastify instance across threads
|
||
|
|
- Fastify instances are NOT thread-safe
|
||
|
|
- **Verdict**: NOT worth parallelizing. Bottleneck is negligible.
|
||
|
|
|
||
|
|
**Phase 2: Test Generation (Schema → Arbitrary)**
|
||
|
|
- CPU-bound: fast-check arbitrary construction
|
||
|
|
- Independent per route
|
||
|
|
- Could shard routes across worker threads
|
||
|
|
- Each worker needs only the schema subset
|
||
|
|
- **Verdict**: HIGH POTENTIAL. Could get near-linear speedup with core count.
|
||
|
|
|
||
|
|
**Phase 3: Test Execution (fastify.inject)**
|
||
|
|
- Fastify is single-threaded
|
||
|
|
- Cannot share instance across workers
|
||
|
|
- Creating multiple Fastify instances wastes memory and breaks integration tests
|
||
|
|
- **Verdict**: NOT feasible for integration testing.
|
||
|
|
|
||
|
|
### Implementation Strategy (if needed):
|
||
|
|
```javascript
|
||
|
|
// Phase 2 parallelization
|
||
|
|
const { Worker } = require('worker_threads')
|
||
|
|
|
||
|
|
async function generateTestsParallel(routes, numWorkers = os.cpus().length) {
|
||
|
|
const chunks = chunk(routes, Math.ceil(routes.length / numWorkers))
|
||
|
|
|
||
|
|
const workers = chunks.map(chunk =>
|
||
|
|
new Worker('./test-generator-worker.js', {
|
||
|
|
workerData: { routes: chunk }
|
||
|
|
})
|
||
|
|
)
|
||
|
|
|
||
|
|
const results = await Promise.all(
|
||
|
|
workers.map(w => new Promise((res, rej) => {
|
||
|
|
w.on('message', res)
|
||
|
|
w.on('error', rej)
|
||
|
|
}))
|
||
|
|
)
|
||
|
|
|
||
|
|
return results.flat()
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Expected Speedup**: 2-4x on 8-core machine for generation phase only.
|
||
|
|
**Complexity**: Medium. Need to serialize/deserialize schemas and arbitraries.
|
||
|
|
**When to use**: Only if generation phase exceeds 5 seconds.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 2. Incremental Testing with Schema Hashing
|
||
|
|
|
||
|
|
### Feasibility: HIGH
|
||
|
|
|
||
|
|
Instead of regenerating all tests every run, hash each route's schema and only regenerate changed ones.
|
||
|
|
|
||
|
|
### Algorithm:
|
||
|
|
1. Compute deterministic hash of each route's schema
|
||
|
|
2. Compare with cached hashes from previous run
|
||
|
|
3. For unchanged routes: reuse previous test commands
|
||
|
|
4. For changed routes: regenerate from scratch
|
||
|
|
5. Save new hashes to cache file
|
||
|
|
|
||
|
|
### Simple Implementation:
|
||
|
|
```javascript
|
||
|
|
import { createHash } from 'node:crypto'
|
||
|
|
|
||
|
|
function hashSchema(schema) {
|
||
|
|
return createHash('sha256')
|
||
|
|
.update(JSON.stringify(schema))
|
||
|
|
.digest('hex')
|
||
|
|
.slice(0, 16) // 64 bits is enough
|
||
|
|
}
|
||
|
|
|
||
|
|
// Cache structure
|
||
|
|
const cache = {
|
||
|
|
version: 1,
|
||
|
|
schemas: {
|
||
|
|
'hash123': { commandTemplates: [...], lastRun: timestamp },
|
||
|
|
'hash456': { commandTemplates: [...], lastRun: timestamp }
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Expected Impact:
|
||
|
|
- First run: 100% generation (baseline)
|
||
|
|
- Typical commit (50 routes changed of 11,389): **0.4% regeneration**
|
||
|
|
- Schema-only changes (types, constraints): **near-instant**
|
||
|
|
|
||
|
|
### Cache Invalidation Strategy:
|
||
|
|
- Cache key: `sha256(JSON.stringify(schema))`
|
||
|
|
- Cache file: `.apophis-cache.json` (gitignored)
|
||
|
|
- TTL: Infinite (schemas are immutable once defined)
|
||
|
|
- Manual invalidation: `rm .apophis-cache.json`
|
||
|
|
|
||
|
|
### JSONHash Integration:
|
||
|
|
The JSONHash library from `~/Business/workspace/lsh_libs` provides **structural similarity** detection, which could enable:
|
||
|
|
- **Fuzzy cache hits**: If schema changed slightly but structure is similar, reuse and mutate test data
|
||
|
|
- **Schema migration detection**: Identify which routes changed structurally vs cosmetically
|
||
|
|
- **Test suite deduplication**: Detect routes with similar schemas that can share test patterns
|
||
|
|
|
||
|
|
However, for the primary use case (skip unchanged routes), a simple SHA-256 hash is sufficient and faster.
|
||
|
|
|
||
|
|
### Recommendation:
|
||
|
|
1. **Immediate**: Implement simple SHA-256 schema cache (1-2 hours work, huge CI/CD win)
|
||
|
|
2. **Future**: Integrate JSONHash for fuzzy similarity and smart test data reuse
|
||
|
|
3. **Parallelization**: Defer until generation phase proves to be the bottleneck in practice
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 3. Current Bottleneck Analysis
|
||
|
|
|
||
|
|
From profiling:
|
||
|
|
- `convertSchema`: 823ms (37% of total) — CPU bound, parallelizable
|
||
|
|
- `discoverRoutes`: 1,649ms (74% of total) — Memory/allocation bound
|
||
|
|
- `evaluate`: 156ms (7% of total) — Fast enough
|
||
|
|
- `parse`: 85ms (4% of total) — Cached, fast enough
|
||
|
|
|
||
|
|
The real bottleneck is `discoverRoutes` which is memory-bound (creating objects). Parallelization won't help here because:
|
||
|
|
1. Object allocation is single-threaded in V8
|
||
|
|
2. Fastify routes array must be read sequentially
|
||
|
|
3. WeakMap cache is already optimizing the repeated case
|
||
|
|
|
||
|
|
**Incremental testing would eliminate the discoverRoutes cost entirely for unchanged routes.**
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 4. Implementation Priority
|
||
|
|
|
||
|
|
1. **Schema hash cache** (HIGH): Eliminates 74% of work for unchanged routes
|
||
|
|
2. **Parallel generation** (MEDIUM): Could speed up remaining 26% by 2-4x
|
||
|
|
3. **JSONHash similarity** (LOW): Nice-to-have for advanced use cases
|