Use model comparison when you want to test the same prompt across GPT, Claude, Gemini, or another provider before choosing a production model.Documentation Index
Fetch the complete documentation index at: https://docs.promptlayer.com/llms.txt
Use this file to discover all available pages before exploring further.
Before you start
You need:- A saved prompt template
- A dataset with the input variables your prompt expects
- Provider API keys configured for the models you want to compare
Create a comparison evaluation
Create a new evaluation and select your dataset. Add multiple Prompt Template columns. Configure each column with the same prompt template, then set a different provider or model override for each column.
Score the outputs
Add an LLM-as-judge, human grading, equality comparison, or code evaluator column to score the model outputs against your criteria. For example, you can score whether each output:- Follows the requested format
- Answers the user correctly
- Avoids hallucinated details
- Meets latency or cost expectations for the use case

