Creates a new evaluation pipeline (report) with optional evaluation columns and custom scoring. An evaluation pipeline processes datasets through a series of transformations and evaluations to measure prompt performance.
Workflow:
/reports/{report_id}/run| Parameter | Type | Required | Description |
|---|---|---|---|
dataset_group_id | integer | Yes | ID of the dataset group to use |
name | string | No | Name for the pipeline (auto-generated if not provided) |
folder_id | integer | No | Folder ID for organization |
dataset_version_number | integer | No | Specific dataset version (uses latest if not provided) |
columns | array | No | List of evaluation columns to add |
score_configuration | object | No | Custom scoring logic configuration |
columns array accepts:
| Field | Type | Required | Description |
|---|---|---|---|
column_type | string | Yes | Type of column (see supported types below) |
name | string | Yes | Display name for the column |
configuration | object | Yes | Column-type-specific configuration |
position | integer | No | Position in the pipeline (auto-assigned if not provided) |
is_part_of_score | boolean | No | Whether this column contributes to the score (defaults to false) |
| Type | Description |
|---|---|
LLM_ASSERTION | AI-powered assertion that evaluates content against a prompt |
PROMPT_TEMPLATE | Runs a prompt template from the registry |
CODE_EXECUTION | Executes custom Python or JavaScript code |
VARIABLE | Static value or reference to another column |
COMPARE | Compares two values for equality or similarity |
CONTAINS | Checks if a value contains a substring |
REGEX | Matches content against a regular expression |
REGEX_EXTRACTION | Extracts content using a regular expression |
JSON_PATH | Extracts data using JSONPath expressions |
XML_PATH | Extracts data using XPath expressions |
COSINE_SIMILARITY | Calculates semantic similarity between texts |
AI_DATA_EXTRACTION | AI-powered data extraction from content |
ASSERT_VALID | Validates data format (JSON, XML, etc.) |
COALESCE | Returns first non-null value from multiple sources |
COMBINE_COLUMNS | Combines multiple column values |
COUNT | Counts occurrences in content |
ENDPOINT | Calls an external HTTP endpoint |
HUMAN | Manual human evaluation |
MATH_OPERATOR | Mathematical operations on numeric values |
MIN_MAX | Finds minimum or maximum values |
PARSE_VALUE | Parses and transforms values |
ABSOLUTE_NUMERIC_DISTANCE | Calculates absolute difference between numbers |
WORKFLOW | Runs a PromptLayer workflow |
MCP | Executes an MCP action |
is_part_of_score)is_part_of_score: true on columns to have PromptLayer automatically average their values into a score.
score_configuration)| Field | Type | Required | Description |
|---|---|---|---|
code | string | Yes | Python or JavaScript code for score calculation |
code_language | string | No | "PYTHON" (default) or "JAVASCRIPT" |
data variable (list of row dictionaries with all column values) and must return a dictionary with at least a score key (0-100).
is_part_of_score: true to have PromptLayer automatically average the column values:
score_configuration to write custom scoring logic:
API key to authorize the operation. Can also use JWT authentication.
The ID of the dataset group containing the dataset versions to evaluate. The dataset group must be within a workspace accessible to the authenticated user.
x >= 1Optional name for the evaluation pipeline. If not provided, a unique name will be auto-generated. Must be between 1 and 255 characters if specified.
1 - 255Optional folder ID to organize the pipeline within your workspace. If not specified, the pipeline will be created at the root level.
x >= 1Optional specific dataset version number to use. If not specified, the latest non-draft version will be used. Cannot be -1 (draft version).
Optional list of evaluation columns to create with the pipeline.
Optional custom scoring logic configuration.
Evaluation pipeline created successfully