Single-turn evals
- Give an AI an input, then apply grading logic to its output to measure success.
- For earlier LLMs, single-turn, non-agentic evals were the main evaluation method.
Agentic evals
- An agent uses tools across many turns, modifying state as it goes; mistakes can propagate and compound.
- Models can find creative solutions that surpass static evals, so we grade the outcome, not just the steps.