Score Card
The score card feature in PromptLayer allows you to assign a score to each evaluation you run. This score provides a quick and easy way to assess the performance of your prompts and compare different versions.
Configuring the Score Card
Default Configuration
By default, the score is calculated based on the last column in your evaluation results:
- If the last column contains Booleans, the score will be the percentage of
true
values. - If the last column contains numbers, the score will be the average of those numbers.
Custom Column Selection
You can customize which columns are included in the score card calculation. When setting up your evaluation pipeline, click the “Score card” button to configure the score card.
Here, you can add specific columns to be included in the score calculation:
- If you add multiple numeric columns, the total score will be the average of the averages for each selected column.
- If you add multiple Boolean columns, the total score will be the average of the
true
percentages for each selected column. - Columns that do not contain numbers or Booleans will not be included in the score calculation.
These selected columns will also be formatted for more easy viewing in the evaluation report. You will see larger numbers, and check/x icons for booleans.
Custom Scoring Logic
For more advanced scoring needs, you can provide your own custom scoring logic using Python or JavaScript code. The code execution environment is the same as the one used for the code execution evaluation column type (learn more).
This custom scoring logic can be used to generate a single score number or a drill-down matrix.
You can optionally return multiple drill-down matrices. This is useful for generating confusion matrices.
Your custom scoring code must return an object with the following keys:
score
(required): A number representing the overall score. This is mandatory.score_matrix
(optional): A list of lists of lists, representing one or more matrices of drilled-down scores. Each cell in these matrices can be a raw value or an object with metadata.
Score Matrix Cell Format
Each cell in the score_matrix
can be either:
- A raw value (string or number), or
- An object with the following properties:
value
: The actual value of the cell, which can be a string or number.positive_metric
: (Optional) A boolean indicating whether an increase in this value is considered positive (true
). If absent, we default to true.
Examples
- Simple value:
42
- Object with metadata:
{"value": 42, "positive_metric": true}
The optional positive_metric
property can be used to indicate how changes in the value should be interpreted when comparing evaluations. This is particularly useful for automated reporting and analysis tools.
Code example
The data
variable will be available in your scoring code, which is a list containing a dictionary for each row in the evaluation results. The keys in each dictionary correspond to the column names, and the values are the corresponding cell values.
For example:
Comparing Evaluation Reports
You can compare two evaluation reports to see how scores and other metrics have changed between runs. Simply click the “Compare” button and select the evaluation reports you want to compare.
The score card and any score matrices will be displayed side-by-side for easy comparison of your prompt’s performance over time.
Was this page helpful?