Programmatic Evals
PromptLayer offers a few options for configuring and creating evaluation pipelines programmatically in your workflows. This is ideal for users who require the flexibility to run evaluations from code, enabling seamless integration with existing CI/CD pipelines or custom automation scripts.
Building a Dataset
To run evaluations, you’ll need a dataset against which to test your prompts. Luckily, you can create datasets from your request history programmatically via the API.
- Endpoint:
/dataset-from-filter-params
- Description: Create a dataset in PromptLayer programmatically. Datasets are built from request history.
- Payload Filters: When specifying search query filters, include the required
name
parameter and theworkspace_id
. Optionally, you can define astart_time
and anend_time
to filter requests within a specific timeframe, both given as datetime objects. Themetadata
parameter allows for a list of objects, each with akey
and avalue
. For more granular control, use theprompt_template
to filter for requests using a specific template, a query stringq
for additional filtering, andscores
. Tags can be added through thetags
parameter as a list of strings, and the number of requests returned can be limited with thelimit
parameter.
Example Payload
Creating a Pipeline
You can create and configure a pipeline programmatically.
To create an evaluation pipeline, also known as a report, make a POST request to /reports
with a name and dataset ID (test_dataset_id
).
Configuring Steps
The evaluation pipeline consists of steps, each referred to as a “report column”. To configure these steps, you will need to make POST requests to add each desired step to your pipeline.
Example Payload 1
For example, to add a step that runs the newest version of your prompt template, make a POST request to /report-columns
with the following configuration:
Example Payload 2
Another example where we add an API endpoint column afterwards: