evaluation dataset

A set of representative examples used to measure model, agent, retrieval, or workflow performance before and after changes.
The evaluation dataset came from real tickets, stripped of sensitive fields.