Skip to main content
POST
/
reports
Create Evaluation Pipeline
curl --request POST \
  --url https://api.promptlayer.com/reports \
  --header 'Content-Type: application/json' \
  --header 'X-API-KEY: <x-api-key>' \
  --data '
{
  "dataset_group_id": 123,
  "name": "QA Evaluation Pipeline",
  "columns": [
    {
      "column_type": "LLM_ASSERTION",
      "name": "Accuracy Check",
      "configuration": {
        "source": "response",
        "prompt": "Is this response accurate?"
      },
      "is_part_of_score": true
    }
  ]
}
'
{
  "success": true,
  "report_id": 456,
  "report_columns": [
    {
      "id": 789,
      "report_id": 456,
      "column_type": "LLM_ASSERTION",
      "name": "Accuracy Check",
      "position": 2,
      "is_part_of_score": true,
      "configuration": {
        "source": "response",
        "prompt": "Is this response accurate?"
      }
    }
  ]
}
This endpoint creates an Evaluation Pipeline associated with a dataset.

Request Parameters

ParameterTypeRequiredDescription
dataset_group_idintegerYesID of the dataset group to use
namestringNoName for the pipeline (auto-generated if not provided)
folder_idintegerNoFolder ID for organization
dataset_version_numberintegerNoSpecific dataset version (uses latest if not provided)
columnsarrayNoList of evaluation columns to add
score_configurationobjectNoCustom scoring logic configuration

Column Definition

Each column in the columns array accepts:
FieldTypeRequiredDescription
column_typestringYesType of column (see supported types below)
namestringYesDisplay name for the column
configurationobjectYesColumn-type-specific configuration
positionintegerNoPosition in the pipeline (auto-assigned if not provided)
is_part_of_scorebooleanNoWhether this column contributes to the score (defaults to false)

Supported Column Types

TypeDescription
LLM_ASSERTIONAI-powered assertion that evaluates content against a prompt
PROMPT_TEMPLATERuns a prompt template from the registry
CODE_EXECUTIONExecutes custom Python or JavaScript code
VARIABLEStatic value or reference to another column
COMPARECompares two values for equality or similarity
CONTAINSChecks if a value contains a substring
REGEXMatches content against a regular expression
REGEX_EXTRACTIONExtracts content using a regular expression
JSON_PATHExtracts data using JSONPath expressions
XML_PATHExtracts data using XPath expressions
COSINE_SIMILARITYCalculates semantic similarity between texts
AI_DATA_EXTRACTIONAI-powered data extraction from content
ASSERT_VALIDValidates data format (JSON, XML, etc.)
COALESCEReturns first non-null value from multiple sources
COMBINE_COLUMNSCombines multiple column values
COUNTCounts occurrences in content
ENDPOINTCalls an external HTTP endpoint
HUMANManual human evaluation
MATH_OPERATORMathematical operations on numeric values
MIN_MAXFinds minimum or maximum values
PARSE_VALUEParses and transforms values
ABSOLUTE_NUMERIC_DISTANCECalculates absolute difference between numbers
WORKFLOWRuns a PromptLayer workflow
MCPExecutes an MCP action

Scoring

There are two independent ways to calculate scores:

Built-in Scoring (is_part_of_score)

Set is_part_of_score: true on columns to have PromptLayer automatically average their values into a score.

Custom Scoring (score_configuration)

Write custom code that receives all report data and calculates the score however you want.
FieldTypeRequiredDescription
codestringYesPython or JavaScript code for score calculation
code_languagestringNo"PYTHON" (default) or "JAVASCRIPT"
Your custom code receives a data variable (list of row dictionaries with all column values) and must return a dictionary with at least a score key (0-100).

Examples

Built-in Scoring

Use is_part_of_score: true to have PromptLayer automatically average the column values:
import requests

response = requests.post(
    "https://api.promptlayer.com/reports",
    headers={"X-API-Key": "your_api_key"},
    json={
        "dataset_group_id": 123,
        "name": "Pipeline with Built-in Scoring",
        "columns": [
            {
                "column_type": "LLM_ASSERTION",
                "name": "Accuracy Check",
                "configuration": {
                    "source": "response",
                    "prompt": "Is this response accurate?"
                },
                "is_part_of_score": True
            },
            {
                "column_type": "LLM_ASSERTION",
                "name": "Safety Check",
                "configuration": {
                    "source": "response",
                    "prompt": "Is this response safe?"
                },
                "is_part_of_score": True
            }
        ]
    }
)

Custom Scoring

Use score_configuration to write custom scoring logic:
import requests

response = requests.post(
    "https://api.promptlayer.com/reports",
    headers={"X-API-Key": "your_api_key"},
    json={
        "dataset_group_id": 123,
        "name": "Pipeline with Custom Scoring",
        "columns": [
            {
                "column_type": "LLM_ASSERTION",
                "name": "Accuracy Check",
                "configuration": {
                    "source": "response",
                    "prompt": "Is this response accurate?"
                }
            },
            {
                "column_type": "LLM_ASSERTION",
                "name": "Safety Check",
                "configuration": {
                    "source": "response",
                    "prompt": "Is this response safe?"
                }
            }
        ],
        "score_configuration": {
            "code": """
# Weighted scoring: accuracy matters more
weights = {"Accuracy Check": 0.7, "Safety Check": 0.3}
total_weight = weighted_sum = 0

for row in data:
    for col, weight in weights.items():
        if col in row:
            total_weight += weight
            if row[col] == True:
                weighted_sum += weight

score = (weighted_sum / total_weight * 100) if total_weight > 0 else 0
return {"score": round(score, 2)}
""",
            "code_language": "PYTHON"
        }
    }
)

Headers

X-API-KEY
string
required

API key to authorize the operation. Can also use JWT authentication.

Body

application/json
dataset_group_id
integer
required

The ID of the dataset group containing the dataset versions to evaluate. The dataset group must be within a workspace accessible to the authenticated user.

Required range: x >= 1
name
string

Optional name for the evaluation pipeline. If not provided, a unique name will be auto-generated. Must be between 1 and 255 characters if specified.

Required string length: 1 - 255
folder_id
integer | null

Optional folder ID to organize the pipeline within your workspace. If not specified, the pipeline will be created at the root level.

Required range: x >= 1
dataset_version_number
integer | null

Optional specific dataset version number to use. If not specified, the latest non-draft version will be used. Cannot be -1 (draft version).

columns
object[]

Optional list of evaluation columns to create with the pipeline.

score_configuration
object

Optional custom scoring logic configuration.

Response

Evaluation pipeline created successfully

success
boolean
required

Indicates if the operation was successful

Example:

true

report_id
integer
required

The unique ID of the created evaluation pipeline. Use this ID to run evaluations.

Example:

456

report_columns
object[]

List of created columns (only present if columns were provided in the request)