Author

seed1

203 approved definitions. Showing 81–100 of 203.

online eval

Evaluation using live or near-live traffic, feedback, or production outcomes — measures what actually happens when users interact with the deployed system.

The online eval measured whether users accepted the agent's drafts.

Permalink · online eval by seed1
May 23, 2026

offline eval

Evaluation run outside live user traffic — on a saved dataset — before changes reach production. Used to catch regressions before users see them.

The offline eval caught a regression in citation quality.

Permalink · offline eval by seed1
May 23, 2026

citation requirement

A product or workflow requirement that model answers cite the underlying sources used — usually driven by legal, compliance, or user trust needs.

The citation requirement came from legal, not engineering.

Permalink · citation requirement by seed1
May 23, 2026

grounded answer

An answer tied to source data, documents, or tool results that users can inspect or verify — not a confident-sounding guess.

The FDE required a grounded answer for every compliance recommendation.

Permalink · grounded answer by seed1
May 23, 2026

prompt pack

A versioned set of prompts, instructions, examples, and tool guidance used by an AI application or agent. Versioned alongside the release it belongs to.

The prompt pack changed with the workflow, so the FDE versioned it with the release.

Permalink · prompt pack by seed1
May 23, 2026

eval rubric

The scoring criteria used to judge whether model or agent output is good enough for the use case — defined by what the customer actually cares about, not by fluency.

The eval rubric penalized answers without cited policy sections.

Permalink · eval rubric by seed1
May 23, 2026

evaluation dataset

A set of representative examples used to measure model, agent, retrieval, or workflow performance before and after changes.

The evaluation dataset came from real tickets, stripped of sensitive fields.

Permalink · evaluation dataset by seed1
May 23, 2026

golden dataset

A curated set of examples with trusted expected outcomes — used to test model or workflow behavior and catch regressions. Built from real cases, not invented ones.

The golden dataset included the weird edge cases operators cared about.

Permalink · golden dataset by seed1
May 23, 2026

eval harness

The test infrastructure used to run model or agent evaluations repeatedly against examples, expected behavior, tools, and scoring logic. FDEs add real failure cases from production.

The FDE added the customer's top failure cases to the eval harness.

Permalink · eval harness by seed1
May 23, 2026

A customer-specific agentic solution built for a particular workflow — useful as a first deployment, but needs to be evaluated for what belongs in product before the second account asks for the same thing.

The FDE shipped a bespoke agentic solution, then identified which parts belonged in product.

Permalink · bespoke agentic solution by seed1
May 23, 2026

agent operating model

The technical and organizational model for building, approving, monitoring, supporting, escalating, and improving agents after launch.

The agent operating model assigned support ownership before rollout.

Permalink · agent operating model by seed1
May 23, 2026

agent guardrail

A control that limits what an agent can see, say, decide, or do, which is enforced through policy, permissions, validation, or runtime checks.

The agent guardrail blocked write-back unless the user approved the change.

Permalink · agent guardrail by seed1
May 23, 2026

agent handoff

The designed transfer of work between an agent and a human, another agent, or a system, which is triggered when confidence, ownership, or permissions change.

The agent handoff sent low-confidence cases to a supervisor with the evidence attached.

Permalink · agent handoff by seed1
May 23, 2026

agent orchestration

The logic that coordinates agents, tools, prompts, retrieval, state, and human checkpoints across a workflow. FDEs reduce this until it's as simple as the use case allows.

The FDE reduced agent orchestration to one router and three tools.

Permalink · agent orchestration by seed1
May 23, 2026

sub-agent

A smaller specialized agent or model role used inside a larger agent workflow to handle a specific task.

The sub-agent handled policy lookup while the main agent drafted the answer.

Permalink · sub-agent by seed1
May 23, 2026

agent skill

A bounded capability an agent can perform, which is defined by instructions, tools, inputs, outputs, permissions, and evals.

The FDE added an agent skill for drafting renewal summaries.

Permalink · agent skill by seed1
May 23, 2026

agent rollout

The staged introduction of an agent to users, workflows, permissions, or action types. Recommendations before write-back is a common first stage.

The agent rollout started with recommendations before write-back.

Permalink · agent rollout by seed1
May 23, 2026

agent workflow

The ordered steps an agent follows: context gathering, reasoning, tool calls, decisions, human checkpoints, and outputs. FDEs simplify these after watching them in practice.

The FDE simplified the agent workflow after observing three unnecessary tool calls.

Permalink · agent workflow by seed1
May 23, 2026

agentic enterprise

An enterprise operating model where agents participate in many workflows under shared governance, monitoring, and integration patterns. Needs standard tool permissions before teams build dozens of agents independently.

The agentic enterprise needed standard tool permissions before teams built dozens of agents.

Permalink · agentic enterprise by seed1
May 23, 2026

agentic application

An application where AI agents perform meaningful workflow steps through tools, context, and controls (not just generating text for a user to copy somewhere else).

The agentic application opened the right claim view and prepared the adjustment recommendation.

Permalink · agentic application by seed1
May 23, 2026

seed1

Share definition

Report definition