Author

seed1

203 approved definitions. Showing 81–100 of 203.

online eval

Evaluation using live or near-live traffic, feedback, or production outcomes — measures what actually happens when users interact with the deployed system.
The online eval measured whether users accepted the agent's drafts.

offline eval

Evaluation run outside live user traffic — on a saved dataset — before changes reach production. Used to catch regressions before users see them.
The offline eval caught a regression in citation quality.

citation requirement

A product or workflow requirement that model answers cite the underlying sources used — usually driven by legal, compliance, or user trust needs.
The citation requirement came from legal, not engineering.

grounded answer

An answer tied to source data, documents, or tool results that users can inspect or verify — not a confident-sounding guess.
The FDE required a grounded answer for every compliance recommendation.

prompt pack

A versioned set of prompts, instructions, examples, and tool guidance used by an AI application or agent. Versioned alongside the release it belongs to.
The prompt pack changed with the workflow, so the FDE versioned it with the release.

eval rubric

The scoring criteria used to judge whether model or agent output is good enough for the use case — defined by what the customer actually cares about, not by fluency.
The eval rubric penalized answers without cited policy sections.

evaluation dataset

A set of representative examples used to measure model, agent, retrieval, or workflow performance before and after changes.
The evaluation dataset came from real tickets, stripped of sensitive fields.

golden dataset

A curated set of examples with trusted expected outcomes — used to test model or workflow behavior and catch regressions. Built from real cases, not invented ones.
The golden dataset included the weird edge cases operators cared about.

eval harness

The test infrastructure used to run model or agent evaluations repeatedly against examples, expected behavior, tools, and scoring logic. FDEs add real failure cases from production.
The FDE added the customer's top failure cases to the eval harness.

bespoke agentic solution

A customer-specific agentic solution built for a particular workflow — useful as a first deployment, but needs to be evaluated for what belongs in product before the second account asks for the same thing.
The FDE shipped a bespoke agentic solution, then identified which parts belonged in product.

agent operating model

The technical and organizational model for building, approving, monitoring, supporting, escalating, and improving agents after launch.
The agent operating model assigned support ownership before rollout.

agent guardrail

A control that limits what an agent can see, say, decide, or do, which is enforced through policy, permissions, validation, or runtime checks.
The agent guardrail blocked write-back unless the user approved the change.

agent handoff

The designed transfer of work between an agent and a human, another agent, or a system, which is triggered when confidence, ownership, or permissions change.
The agent handoff sent low-confidence cases to a supervisor with the evidence attached.

agent orchestration

The logic that coordinates agents, tools, prompts, retrieval, state, and human checkpoints across a workflow. FDEs reduce this until it's as simple as the use case allows.
The FDE reduced agent orchestration to one router and three tools.

sub-agent

A smaller specialized agent or model role used inside a larger agent workflow to handle a specific task.
The sub-agent handled policy lookup while the main agent drafted the answer.

agent skill

A bounded capability an agent can perform, which is defined by instructions, tools, inputs, outputs, permissions, and evals.
The FDE added an agent skill for drafting renewal summaries.

agent rollout

The staged introduction of an agent to users, workflows, permissions, or action types. Recommendations before write-back is a common first stage.
The agent rollout started with recommendations before write-back.

agent workflow

The ordered steps an agent follows: context gathering, reasoning, tool calls, decisions, human checkpoints, and outputs. FDEs simplify these after watching them in practice.
The FDE simplified the agent workflow after observing three unnecessary tool calls.

agentic enterprise

An enterprise operating model where agents participate in many workflows under shared governance, monitoring, and integration patterns. Needs standard tool permissions before teams build dozens of agents independently.
The agentic enterprise needed standard tool permissions before teams built dozens of agents.

agentic application

An application where AI agents perform meaningful workflow steps through tools, context, and controls (not just generating text for a user to copy somewhere else).
The agentic application opened the right claim view and prepared the adjustment recommendation.