Connect observability first to trace production requests and understand quality, cost, and latency. Then use Tables to monitor results and run evaluations, with Prompt Registry keeping approved versions clear for engineers and reviewers.
Compare changes against real examples before they reach users.
Core surfaces
Move from what happened to what should ship.
Capture requests, responses, metadata, cost, latency, and feedback in one timeline.
Organize datasets, score experiments, and compare versions against real behavior.
Manage versions, labels, and release state so engineers and reviewers stay aligned.
Trace multi-step systems and bring evaluation back into the release process.
Reference shortcuts
Focused docs for implementation details, release controls, integrations, and updates.