Agent Eval Harness

The Agent Eval Harness adds deterministic contract checks on top of structural validation.

It verifies behavior-critical metadata and instruction markers for:

Planner read-only posture
Orchestrator plan-first and loop protocol markers
Review verification-gate markers
Command routing/frontmatter/argument consistency
Permission skill/task invariants

Commands

npm run eval:agents
npm run eval:agents:json
npm run eval:agents:trend

Output

Human-readable summary in terminal
Optional JSON report at evals/reports/latest.json
Trend snapshot markdown at evals/reports/trend-summary.md

In CI (validate-agent-evals), the JSON report and trend snapshot are uploaded as workflow artifacts.

Design Notes

Static and deterministic (no model calls)
No external dependencies
Designed as a lightweight contract gate, not a benchmark framework

Fixtures

Regression fixtures for harness and validator tests live under scripts/fixtures/. This keeps golden inputs explicit and reusable across tests.