- How much does PromptA vs PromptB cost?
- How often is PromptA used?
- Is PromptA working better than PromptB?
- Which prompts are receiving the most negative user feedback?
- How do I synthetically evaluate my prompts using LLMs?
A/B Testing

Scoring
Every PromptLayer request can have multiple “Scores”. A score is an integer from 0-100.
- User feedback: Present a 👍 and 👎 to your users after the completion. A user press of one of those buttons fills in a score of [100, 0] respectively.
- RLHF: Use our visual dashboard to fill in scores by hand. You can then use this data to decide between prompt templates or to fine-tune.
-
Synthetic Evaluation: Use LLMs to score LLMs. After getting a completion, run an evaluation prompt on it and translate that to a score [0, 100].
For example, your prompt could be:
Analytics
After populating Scores as described above, navigate to the Prompt Template page to see how each template stacks up.
Pricing
We live in the real world, so money matters. Building a prod LLM system means managing price. Some LLMs are cheaper than other LLMs. Some prompts are cheaper than other prompts. Each request history page will tell you its individual cost:
