Mini-FRED
Transparent evaluation of a finance question-answering workflow.
A public offline benchmark showing how versioned evaluations, deterministic checks, and failure analysis improve a business-data workflow over time.
Baseline
65.9% task success
Measured across 560 evaluation cases.
After iterations
78.2% task success
With transparent failure breakdowns per version.
Reference benchmark only. Not a live-client result or ROI claim.