online eval

Evaluation using live or near-live traffic, feedback, or production outcomes — measures what actually happens when users interact with the deployed system.
The online eval measured whether users accepted the agent's drafts.