online eval
Evaluation using live or near-live traffic, feedback, or production outcomes — measures what actually happens when users interact with the deployed system.
The online eval measured whether users accepted the agent's drafts.
The online eval measured whether users accepted the agent's drafts.