# AGENTS
Source: https://docs.promptlayer.com/AGENTS


# Review Guidelines

Use these guidelines when reviewing PRs into `promptlayer-docs`. Prioritize concise, public-facing docs with predictable structure and stable URLs.

## Structure

Place pages by user intent:

* **Get Started**: onboarding, quickstarts, setup, migration.
* **Core Concepts**: product concepts and UI reference.
* **Providers**: provider/model setup and compatibility.
* **Guides**: task-oriented workflows.
* **AI Tools**: assistant/tooling docs.
* **Reference**: SDKs, REST API, Webhooks, schemas, events, exact interfaces.

Keep REST API, SDKs, and Webhooks under **Reference**.

## REST API Pages

REST endpoint pages should be OpenAPI-first and lightweight:

```mdx theme={null}
---
title: "List Datasets"
openapi: "GET /api/public/v2/datasets"
---

Briefly explain what the endpoint does and any key behavior.
```

The MDX page should usually contain only:

* `title` and `openapi` frontmatter.
* A short 1-3 sentence overview.
* Optional behavior notes for non-obvious semantics.
* Optional related links.

Put API mechanics in `openapi.json`, not Markdown: auth, headers, parameters, request bodies, response schemas, errors, pagination, filtering, and examples.

Avoid manual `Authentication`, `Example`, `Response`, parameter, or schema sections unless they explain behavior OpenAPI cannot express.

## Style

Write for users trying to complete a task.

Prefer:

* Direct, concrete language.
* Active voice and present tense.
* Specific titles and sidebar labels.
* Consistent PromptLayer terms.
* Action-oriented endpoint titles like `List X`, `Get X`, `Create X`, `Update X`, `Delete X`.

Avoid:

* Marketing copy.
* Internal implementation details.
* Long tutorials in reference pages.
* Duplicating generated OpenAPI content.
* Generic labels like `Usage`, `Features`, or `Integrations`.

## Review Comments

Keep comments specific, actionable, and tied to the docs guideline being enforced.

## Review Checklist

Before approving, check that:

* The page is in the right nav section.
* URLs are preserved, or redirects are included.
* Titles, labels, slugs, and links are clear and consistent.
* REST pages use the minimal OpenAPI-backed format.
* `openapi` frontmatter exactly matches `openapi.json`.
* OpenAPI contains the real API contract and examples.
* Copy is concise, useful, and public-facing.


# MCP & Skills
Source: https://docs.promptlayer.com/agents/overview

Use PromptLayer with AI tools.

PromptLayer provides a few ways to bring PromptLayer context and workspace access into AI coding tools and MCP-compatible clients. Use the skill when you want your coding agent to understand PromptLayer concepts and best practices. Use the MCP servers when you want an agent to search documentation or work with PromptLayer resources directly.

## What is available

| Tool                     | Use it for                                                                                                                                                               |
| ------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **PromptLayer skill**    | Gives AI coding tools PromptLayer product context, SDK patterns, prompt management guidance, evaluation workflows, and observability best practices.                     |
| **PromptLayer Docs MCP** | Lets agents search and read PromptLayer documentation.                                                                                                                   |
| **PromptLayer MCP**      | Lets MCP-compatible clients interact with PromptLayer workspace resources such as prompts, request logs, datasets, evaluations, workflows, tools, and skill collections. |

## Install and connect

<Tip>
  **Using a coding agent?**

  Copy the following prompt to add the PromptLayer **skill** and **MCP servers** for better results when working with PromptLayer.
</Tip>

<Prompt description="Install the PromptLayer skill and MCP servers." icon="sparkles">
  Install the PromptLayer skill for context on project structure, SDKs, prompts, evaluations, observability, and PromptLayer best practices:

  npx skills add [https://docs.promptlayer.com](https://docs.promptlayer.com)

  Add the PromptLayer Docs MCP server for documentation search:

  [https://docs.promptlayer.com/mcp](https://docs.promptlayer.com/mcp)

  Add the PromptLayer MCP server for PromptLayer workspace access and content management:

  [https://mcp.promptlayer.com/mcp](https://mcp.promptlayer.com/mcp)
</Prompt>

Alternatively, you can install them manually using the information below.

Install the PromptLayer skill:

```bash theme={null}
npx skills add https://docs.promptlayer.com
```

Add the PromptLayer Docs MCP server:

```text theme={null}
https://docs.promptlayer.com/mcp
```

Add the PromptLayer MCP server:

```text theme={null}
https://mcp.promptlayer.com/mcp
```

## PromptLayer MCP

Use the PromptLayer MCP server when you want an MCP-compatible client to work with PromptLayer workspace resources such as prompts, request logs, datasets, evaluations, workflows, tools, and skill collections.

### Authorization header

When you connect to the hosted MCP server, pass your PromptLayer API key in the `Authorization` header:

```text theme={null}
Authorization: Bearer pl_your_key_here
```

### Available tools

The PromptLayer MCP server exposes 61 tools covering all major PromptLayer features:

| Category                               | Tools                                                                                                                                                                                                                                                                 |
| -------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Prompt Templates**                   | `get-prompt-template`, `get-prompt-template-raw`, `list-prompt-templates`, `publish-prompt-template`, `patch-prompt-template-version`, `list-prompt-template-labels`, `create-prompt-label`, `move-prompt-label`, `delete-prompt-label`, `get-snippet-usage`          |
| **Request Logs**                       | `get-request`, `search-request-logs`, `get-trace`, `get-request-search-suggestions`, `get-request-analytics`, `get-request-analytics-custom-analytics`                                                                                                                |
| **Tracking**                           | `log-request`, `create-spans-bulk`                                                                                                                                                                                                                                    |
| **Datasets** `Deprecated`              | `list-datasets`, `get-dataset-rows`, `create-dataset-group`, `create-dataset-version-from-file`, `create-dataset-version-from-filter-params`, `create-draft-dataset-version`, `add-request-log-to-dataset`, `save-draft-dataset-version`                              |
| **Evaluations / Reports** `Deprecated` | `list-evaluations`, `get-evaluation-rows`, `create-report`, `run-report`, `get-report`, `get-report-score`, `add-report-column`, `edit-report-column`, `delete-report-column`, `update-report-score-card`, `rename-report`, `delete-report`, `delete-reports-by-name` |
| **Agents**                             | `list-workflows`, `get-workflow`, `get-workflow-labels`, `create-workflow`, `patch-workflow`, `run-workflow`, `get-workflow-version-execution-results`                                                                                                                |
| **Tool Registry**                      | `list-tool-registries`, `get-tool-registry`, `create-tool-registry`, `create-tool-version`                                                                                                                                                                            |
| **Skill Collections**                  | `list-skill-collections`, `create-skill-collection`, `get-skill-collection`, `update-skill-collection`, `save-skill-collection-version`                                                                                                                               |
| **Folders**                            | `create-folder`, `edit-folder`, `get-folder-entities`, `move-folder-entities`, `delete-folder-entities`, `resolve-folder-id`                                                                                                                                          |

### Local server

Install the server from npm: [@promptlayer/mcp-server](https://www.npmjs.com/package/@promptlayer/mcp-server)

For clients that support stdio transport, such as Claude Desktop and Cursor, you can run the server locally via npx:

```json theme={null}
{
  "mcpServers": {
    "promptlayer": {
      "command": "npx",
      "args": ["-y", "@promptlayer/mcp-server"],
      "env": {
        "PROMPTLAYER_API_KEY": "pl_your_key_here"
      }
    }
  }
}
```

## Next steps

<CardGroup>
  <Card title="PromptLayer MCP source" icon="github" href="https://github.com/MagnivOrg/promptlayer-mcp">
    Review the open-source MCP server implementation.
  </Card>
</CardGroup>


# Changelog
Source: https://docs.promptlayer.com/changelog

Daily updates on new features and improvements to PromptLayer.

## June 18, 2026

### Deployment 1

#### Improvements

* Fixed prompt version comparison to correctly show added/removed lines in diff view
* Improved chart rendering stability when resizing or switching between floating and fullscreen layouts
* Enhanced analytics chart memory management to prevent resource leaks during layout transitions

***

## June 17, 2026

### Deployment 1

#### New Features

**Vibe Chat Tool History Search**
Vibe Chat can now search and retrieve data from previously executed tools within a conversation, enabling more contextual responses based on past actions.

* Search across current session or recent sessions for specific tool executions
* Filter by tool name, query text, or presence of chart data
* Automatically includes relevant past tool results when generating responses

#### Improvements

* Analytics graphs in Vibe Chat now include full chart data for easier reference and reuse
* Tool execution results are better organized and preserved across conversation turns
* Improved reliability of analytics chart data retrieval in Vibe Chat

***

## June 16, 2026

### Deployment 1

#### Improvements

* Fixed false-positive "missing or empty input variables" errors in the `Playground` when using template variables
* Resolved chart rendering issues in analytics dashboards

***

### Deployment 2

General performance and stability improvements

***

### Deployment 3

#### Improvements

* Fixed prompt version diff view to correctly show added and removed lines
* Improved error messages in `Smart Tables` to be more clear and actionable
* Enhanced `Analytics` chart support to include histogram, heatmap, and hierarchy visualizations for live queries
* Better error handling for `Smart Tables` cell execution failures

***

### Deployment 4

#### New Features

**Tools and A/B Tests in @-mentions**
You can now @-mention `Tools` and `A/B Tests` directly in the PromptLayer dashboard when creating content or annotations.

* Quickly reference tools and experiments without switching contexts
* Streamlines workflow when documenting test configurations

**OpenClaw Integration**
Added native support for OpenClaw framework traces and spans in the observability platform.

* Tool executions from OpenClaw appear as `CODE_EXECUTION` nodes
* Agent sessions display as `LLM Session` spans
* Automatic extraction of tool names and LLM call metadata

#### Improvements

* Request logs structured search filters now correctly apply to the grid view
* Tool definitions are now properly persisted when saving configurations
* Request history header checkbox selects only the current page instead of all results
* Recent tool call results (up to 4) are now included in chat history for better context
* Custom analytics charts with percentile metrics now sort correctly by the selected percentile value
* Analytics charts with distinct count metrics display proper ordering in grouped views

***

## June 15, 2026

### Deployment 1

General performance and stability improvements

***

### Deployment 2

#### New Features

**Analytics Custom Charts**
Custom analytics charts now support multiple metrics per chart, allowing you to visualize several data series simultaneously for richer insights.

* Create charts with multiple metrics (sum, avg, min, max, percentile) in a single view
* Combine time-series data with multi-metric analysis
* Export multi-series charts with full metric labels

**Enhanced Wrangler Chat Experience**
The Wrangler chat interface now supports fullscreen mode and multi-chat sessions for improved productivity.

* Switch between multiple chat sessions without losing context
* Expand chat to fullscreen for focused analysis
* Create custom analytics charts directly from chat conversations

#### Improvements

* Smart Table row deletion now handles sparse rows correctly
* Analytics chart export includes proper metric-specific units for each series
* Trace-to-dataset imports now preserve consistent column ordering
* Smart Table composition column staleness detection improved for virtual rows
* Chat session history properly syncs when switching between conversations

***

### Deployment 3

#### Improvements

* Faster Smart Table imports from request history with improved bulk data loading
* Enhanced Smart Table request import reliability with automatic retry logic for temporary data availability issues

***

### Deployment 4

#### Improvements

* Analytics charts now support distinct count metrics for tracking unique metadata values (e.g., unique sessions or user IDs)
* Custom analytics charts can now use weekly or monthly time buckets for long-range trend analysis
* Chart view selection is now preserved when switching between different analytics artifacts
* Analytics date range handling improved for weekly and monthly bucket intervals

***

## June 13, 2026

### Deployment 1

#### New Features

**Trace Search API**
New structured search endpoint for querying traces with flexible AND/OR filter combinations across trace-level and span-level attributes.

* Supports both page-based and cursor-based pagination for large result sets
* Available via dashboard (`/api/dashboard/v2/workspaces/<workspace_id>/traces/search`) and public API (`/api/public/v2/traces/search`)
* Query traces by metadata, tags, duration, timestamps, and nested span properties
* Filter groups support complex nested logic with `AND`, `OR`, `SPAN_AND`, and `SPAN_OR` operators

#### Improvements

* Concurrent object retrieval significantly speeds up request log data fetches when importing to Smart Tables
* Bulk span creation endpoint now triggers trace ingestion pipeline when `close_after` flag is set
* Trace closure records now track full ingestion lifecycle with new status transitions
* JSON string attributes containing nested objects are automatically parsed and indexed for structured search
* Deep parsing of serialized dictionaries in trace attributes enables filtering on nested fields

***

## June 12, 2026

### Deployment 1

#### New Features

**Brand Visibility Preset**
A new evaluation preset that tests whether LLM responses include mentions of your brand or domain across multiple queries and models.

* Create brand visibility tables by specifying a target domain and test queries
* Compare how different models mention your brand in their responses
* Automatically generates AI-powered test queries based on your topic and audience
* Results are scored based on whether each model's response contains your target domain

#### Improvements

* Skip button added to onboarding flow for faster workspace setup
* Smart table creation dialog now matches the simplified new item workflow
* Improved display of aggregate score winners in tables
* Enhanced aggregate value visualization when viewing score details

***

## June 11, 2026

### Deployment 1

#### Improvements

* Smart table score calculations now update in real-time via WebSocket without requiring page refresh
* Request analytics search results can now be added directly to tables in addition to datasets
* API requests are now gracefully drained during server shutdown to prevent interruptions

***

### Deployment 2

#### Improvements

* Smart Tables default score calculation now intelligently selects quality-focused columns (like evaluations and assertions) while excluding telemetry metrics (latency, cost, tokens)
* Score sidebar displays helpful guidance when no score is configured instead of showing zero
* `Static` block now available only in workflows to streamline evaluation builder interface

***

## June 10, 2026

### Deployment 1

#### Improvements

* Improved `Smart Table` execution reliability and recovery for long-running operations
* Enhanced `Smart Table` empty state guidance when activating tables
* Improved `Smart Table` resource header responsiveness on smaller screens

***

### Deployment 2

General performance and stability improvements

***

### Deployment 3

#### Improvements

* Fixed an issue where custom scoring functions would receive metadata objects instead of the actual prompt template text when evaluating `PROMPT_TEMPLATE` column types

***

### Deployment 4

#### New Features

**Custom Analytics Charts**
Create and visualize custom analytics charts for your request logs with support for flexible time-series and breakdown aggregations.

* Build charts using custom filters, grouping, and time ranges
* Available in both dashboard and public API endpoints
* Export and share custom analytics views across your team

#### Improvements

* Smart table cell recalculation now targets individual cells more precisely without widening to full column scope
* Improved smart table version history checkpoint system for better performance on large datasets
* Enhanced analytics query routing for faster chart generation
* Organizations on trial plans can now be deleted without upgrading to paid tier
* Smart table custom score inputs now align with code execution behavior for consistent evaluation results

***

## June 09, 2026

### Deployment 1

General performance and stability improvements

***

### Deployment 2

#### New Features

**Claude Fable 5 and Mythos 5 Models**
Added support for Anthropic's latest frontier models with always-on adaptive reasoning across all major platforms (Anthropic API, Amazon Bedrock, and Vertex AI).

* Claude Fable 5 provides state-of-the-art performance on coding, vision, and knowledge tasks with adaptive thinking enabled by default
* Claude Mythos 5 (Project Glasswing) offers the same capabilities with modified safety guardrails for approved cybersecurity defenders
* Both models support 1M token context windows and 128K token outputs with enhanced reasoning parameters

**Smart Table Auto-Execution for Evaluations and Workflows**
Smart table columns using prompt templates now automatically execute tool calls when referenced in evaluations or workflow nodes.

* Tool registry references are resolved at execution time, ensuring consistent behavior between manual runs and automated pipelines
* Final answers are displayed in cells when auto-execution is enabled, improving readability in datasets

#### Improvements

* Added exponential backoff retries to code execution sandboxes for improved reliability during transient network issues
* Smart table cells now display prompt template content consistently using the same rendering engine across all views
* Tool calls with zero arguments now execute correctly when using Wrangler AI provider
* Improved source selector interface to prevent circular dependencies in composite smart table columns

***

### Deployment 3

#### Improvements

* Improved `Smart Table` column auto-sizing to resize all grouped sub-columns together when using the header action
* Refined computing state indicators in `Smart Table` cells to provide more consistent visual feedback across grouped columns

***

## June 08, 2026

### Deployment 1

#### New Features

**Environment Variables for Tool Execution**
Securely store and manage environment variables for tool execution at both workspace and individual tool levels.

* Set workspace-wide variables accessible to all tools
* Override with tool-specific variables for granular control
* Manage via dashboard UI or programmatic API

**Trace Closure API**
Close traces to prevent additional spans from being added after execution completes.

* Call `/traces/{trace_id}/close` via dashboard or public API
* Automatically close traces in bulk operations with `close_after` parameter
* Late spans to closed traces are rejected with clear error messaging

#### Improvements

* Added token-gated metrics endpoint for operational monitoring
* Tool deletion now removes associated environment variables
* Workspace deletion cleans up environment variables
* OTLP trace ingestion supports automatic trace closure
* Tool execution test endpoints accept workspace and tool context parameters

***

### Deployment 2

#### New Features

**Model Comparison Preset**
Quickly compare multiple LLM models side-by-side using the same prompt template.

* Create comparison tables directly from any prompt template with 2+ model configurations
* Automatically runs all model variants on the same input set for immediate comparison
* View execution metrics (latency, tokens, cost) for each model in dedicated columns
* Identify the fastest or most cost-effective model with built-in "lowest metric" analysis columns

**Evaluation Presets**
Pre-configured evaluation workflows for systematic prompt testing.

* Ground Truth Comparison: automatically validate model outputs against expected responses using customizable assertions
* Structural Validation: verify outputs meet format, ordering, or field requirements without reference data
* Batch evaluate outputs with multiple simultaneous assertions per row

**Custom Preset Tables**
Initialize new tables pre-configured with a specific prompt template and input variables for faster testing workflows.

***

### Deployment 3

#### New Features

**Smart Table Aggregate Scoring**
Smart Tables now support aggregate scoring to automatically identify the best-performing option across evaluation rows.

* Choose from most frequent value, minimum value, or maximum value aggregation types
* Optionally specify a label column to display human-readable names for winning values
* Aggregate scores appear in the scoring sidebar with detailed breakdowns of top values and percentages

**Timezone Selector on Requests Page**
View request timestamps in your preferred timezone across the requests analytics page.

* Select from a searchable list of common timezones
* Timezone preference persists across sessions in browser local storage

#### Improvements

* Smart Table boolean scoring now supports assertion aggregation modes (all assertions must pass vs. any assertion passes)
* Evaluation preset tables automatically configure boolean scoring with assertion-based validation
* Model comparison preset tables now include aggregate scoring to highlight the lowest-cost option
* Environment variables can now be created through AI-assisted scaffolding tools
* Empty environment variable detection added to improve configuration validation

***

## June 04, 2026

### Deployment 1

#### New Features

**Request Metrics in Smart Tables**
Smart Tables now track and display cost and latency metrics for LLM requests executed in prompt columns.

* View per-cell execution metrics including total cost and response time
* Export metrics to CSV for analysis across your dataset
* Access aggregated metrics across all rows in a sheet
* Metrics automatically sync when cells are recalculated

#### Improvements

* Smart Tables batch processing performance optimizations for large datasets
* Enhanced column dependency tracking for more accurate staleness detection
* Improved error messages when configuring prompt template columns
* Better handling of column type conversions with automatic metadata preservation

***

### Deployment 2

#### New Features

**Smart Tables**
A new spreadsheet-like interface for building and testing LLM workflows at scale, combining prompt engineering, data transformations, and evaluation in a unified grid view.

* Create columns that reference other columns, prompt templates, workflows, or external data sources
* Run computations across entire columns or individual cells with real-time status tracking
* Track version history with score metrics to compare iteration performance over time
* Import data from CSV, request logs, or manual entry

#### Improvements

* Enhanced AI chat assistant with smart table creation and management capabilities
* Improved column dependency resolution for complex data transformations
* Better handling of execution metadata display in grid cells
* Optimized grid rendering performance for large datasets
* Streamlined navigation between registry items and smart table resources

***

### Deployment 3

#### Improvements

* Added support for `gemini-3.5-flash` model on Vertex AI with reasoning capabilities and up to 1M token context window
* Simplified metadata and resource filtering in trace queries for improved reliability

***

## June 03, 2026

### Deployment 1

#### New Features

**Smart Table Cell Execution Cancellation**
Individual cell executions can now be cancelled directly from the Smart Table interface.

* Cancel button stops active work for a specific cell and restores dependent cells to their previous state
* Running cells are reset to stale status when cancelled, preserving data integrity
* Dependent cells in the same row are automatically restored to not-started status

**Smart Table Status Filtering**
Filter Smart Table rows by cell execution status (completed, running, stale, error, etc.).

* New status filter control in the grid interface allows quick access to cells by execution state
* Status counts API endpoint provides real-time visibility into cell execution distribution
* Combined filter support enables filtering by both status and column values simultaneously

#### Improvements

* Smart Table version history now includes delta counts showing the number of changes made in each version
* Score history API improved to surface version names alongside version numbers for easier navigation
* Smart Table title generation now recognizes "Untitled Smart Table" as a placeholder name and auto-generates unique titles
* Image API requests fall back to signed stream URLs when presigned URL generation fails

***

### Deployment 2

#### New Features

**OpenAI API Compatibility Enhancement**
The PromptLayer API now automatically routes unknown OpenAI parameters into `extra_body`, ensuring better compatibility with newer OpenAI features and reducing integration friction when using custom or experimental parameters.

#### Improvements

* Enhanced `Smart Tables` sidebar with improved request log data handling for faster navigation
* Expanded tool support in assistant features for more flexible workflow automation
* Streamlined entity management by removing deprecated fallback references
* Improved shared session runtime configuration flexibility

***

## June 02, 2026

### Deployment 1

#### Improvements

* Enhanced `Vibe` assistant with request log search suggestions for faster query building
* Improved assistant context handling for large prompt templates with embedded media

***

## June 01, 2026

### Deployment 1

#### Improvements

* Fixed an issue where filtering requests by child metadata was not working correctly
* Resolved a bug where the refresh button in the drawer was not functioning properly
* Fixed tooltip display issues in the request log view
* Addressed a visual glitch causing duplicate loading animations on the home page

***

## May 28, 2026

### Deployment 1

#### New Features

**Claude Opus 4.8 Model Support**
Added support for Anthropic's latest Claude Opus 4.8 model with 1M token context length and 128K max output tokens.

* Most capable model optimized for complex reasoning and agentic coding tasks
* Includes adaptive thinking with configurable display options
* Knowledge cutoff updated to January 2026

**User and Agent Intent Tracking**
New filtering capabilities for tracking user and agent intents in request logs.

* Filter requests by intent type to analyze user behavior patterns
* View intent breakdown in analytics dashboards
* Available in structured search with autocomplete support

**Metadata Cost Breakdown Analytics**
Added detailed cost analysis by metadata key-value pairs in analytics dashboard.

* Break down costs by specific metadata values (e.g., customer ID, environment)
* View top cost drivers across metadata dimensions
* Set limit to 50 results for comprehensive analysis

#### Improvements

* Improved search suggestion performance with optimized query routing between data stores
* Enhanced autocomplete dropdown with better handling for large result sets
* Streamlined analytics chart controls with metadata key selection for cost insights
* Request log cards now display highlighted previews for input text and responses
* Improved request logs grid with better data rendering and column configurations

***

## May 27, 2026

### Deployment 1

#### New Features

**User and Agent Intent Search**
Advanced search now highlights user questions and agent responses in your request logs, making it easier to find specific conversational patterns and intents.

* Search for specific user intents like questions, requests for information, or task completions
* Identify agent response patterns including confirmations, explanations, and error handling
* Filter requests by conversational structure to analyze dialogue quality

**Legacy Table Migration System**
Automatically migrate your existing Datasets and Reports to the new Smart Tables format with detailed migration previews.

* Preview migration changes before committing to see estimated impact on sheets, columns, and cells
* Resume interrupted migrations and continue on error for large-scale data transformations
* Track migration history to see which legacy tables have been successfully converted

#### Improvements

* Smart table cell errors now display clearer error messages with execution details
* Tool execution loop now properly handles registry tool name collisions
* OpenAI Responses API correctly processes `tool_choice` parameter configurations
* Dataset filter queries support more flexible variable format detection
* Improved code execution error reporting for custom scoring functions

***

### Deployment 2

#### Improvements

* Enhanced column setup workflow efficiency with batched version history tracking
* Improved intent detection accuracy for user sentiment analysis in conversation logs

***

## May 26, 2026

### Deployment 1

#### Improvements

* Enhanced `Smart Tables` access control enforcement across all data operations
* Improved `Smart Tables` evaluation billing accuracy by consolidating usage tracking
* Fixed search filter UI behavior for smoother tag and metadata filtering
* Resolved folder drag-and-drop edge case in navigation sidebar

***

### Deployment 2

#### Improvements

* Added search field for filtering requests by the last user message content
* Enhanced search results to include the most recent user input in each conversation
* Improved workflow execution visibility with per-iteration trace spans for multi-step LLM calls
* Added detailed trace spans for individual tool executions within automated workflows

***

## May 25, 2026

### Deployment 1

#### Improvements

* Fixed an issue where prompt templates and workflows with forward slashes in their names could not be retrieved via the API
* Improved cost calculation accuracy for models with multimodal token pricing (audio and image inputs/outputs)
* Resolved prompt registry lookup behavior when using template identifiers in API requests
* Fixed image display in logged request details

***

## May 22, 2026

### Deployment 1

#### Improvements

* Fixed `Smart Tables` CSV export to properly include all column data in composition-based exports
* Resolved issue where `Smart Tables` full-payload column execution would not trigger staleness propagation to dependent cells
* Improved `Smart Tables` dependency resolution for code execution and endpoint columns to correctly merge all sibling column sources with explicitly defined dependencies

***

## May 21, 2026

### Deployment 1

#### Improvements

* Improved navigation menu for request logs with better organization and clearer visual hierarchy
* Enhanced tool registry version creation to persist execution configuration when tools are created through the assistant interface

***

## May 19, 2026

### Deployment 1

#### New Features

**Add Trace to Dataset**
Export full traces or individual spans directly to datasets for evaluation and testing.

* Click "Add to Dataset" from any trace view to create dataset rows from production logs
* Choose between trace-level export (all root spans) or span-level export (selected span + children)
* Automatically creates draft dataset version with proper column mapping

**Complex JSON Schema Support**
The Schema Editor now handles advanced JSON Schema patterns for structured outputs.

* Use `oneOf`, `anyOf`, and `allOf` composition keywords for complex response formats
* Editor automatically detects non-standard schemas and enables JSON editing mode
* Provider-specific validation warns when using unsupported keywords (e.g., `oneOf` with Anthropic)

#### Improvements

* Model parameter selections now persist when switching between models from the same provider
* Dataset archive confirmation dialog shows clearer messaging
* Structured search autocomplete displays multi-select indicators for filter values
* Request log cards show visual cues for recently viewed items
* Prompt template version selector preserves current selection during navigation
* Report score recalculation triggers automatically after updating score card columns

***

## May 15, 2026

### Deployment 1

#### New Features

**Sidebar Drag and Drop**
Users can now reorganize prompts, datasets, and folders by dragging and dropping items directly in the sidebar navigation.

* Drag items between folders or move them to the Home folder
* Multi-select items using keyboard shortcuts (Cmd/Ctrl+A to select all, Cmd/Ctrl+Click for individual selection)
* Visual feedback shows valid drop targets during drag operations

**Custom Row Limits for Datasets**
When creating a dataset from filter parameters, users can now specify a custom row limit to control the number of rows added to the dataset.

#### Improvements

* Added `report_columns` field to the Public API's get report endpoint for programmatic access to report column configurations
* Enhanced keyboard navigation in the sidebar with Escape to clear selection
* Improved visual feedback for selected items in the sidebar with check icons

***

### Deployment 2

General performance and stability improvements

***

### Deployment 3

#### Improvements

* Extended `Wrangler` tool execution timeout limits for longer-running analysis tasks
* Improved trace ingestion support for extended thinking and reasoning content from LLM providers
* Enhanced error classification for provider SDK exceptions to improve debugging accuracy

***

### Deployment 4

General performance and stability improvements

***

## May 14, 2026

### Deployment 1

#### Improvements

* Multi-block LLM responses now display all assistant text content instead of only the first segment, ensuring complete visibility of reasoning and answers
* Built-in tools (web search, code interpreter, etc.) are now correctly preserved when loading Anthropic, Google, and Vertex AI prompts in the Playground
* Registry list view spacing and scrolling behavior improved for smoother navigation
* Analytics tracking added for tool registry creation events

***

### Deployment 2

#### Improvements

* Enhanced reliability of search suggestions with automatic retry on failure
* Improved evaluation custom scoring to properly parse score configurations

***

## May 13, 2026

### Deployment 1

#### Improvements

* Added filtering and sorting options to the API for listing folders, prompts, workflows, datasets, evaluations, AB tests, input variable sets, skill collections, and tools
* API list endpoints now support filtering by creator email, creation date ranges, and update date ranges
* Added support for filtering entities by external ID references across all public list endpoints
* JSON Schema fields marked as nullable are now correctly sent as union types to all LLM providers

***

### Deployment 2

#### Improvements

* Fixed an issue where prompt template columns configured to return only templates (without LLM execution) were not properly displaying their values in evaluation reports

***

## May 12, 2026

### Deployment 1

#### Improvements

* Extended date range selection beyond the previous 14-day limit in analytics views
* Fixed drag-and-drop functionality for organizing items within nested folders in the registry

***

### Deployment 2

#### Improvements

* Mention a prompt or workflow with `@` to link directly to the latest version
* Session list improvements for faster navigation between recent chats
* Added event tracking for registry views and request log opens

***

## May 11, 2026

### Deployment 1

#### New Features

**Vibe Chat Tool Enhancements**
Enhanced tool execution capabilities with improved tracking and display of multi-step operations.

* Tool calls now show detailed progress with expandable history
* Added navigation between completed tool executions
* Improved visualization of nested tool workflows

**Metadata and Input Support for Tool Calls**
Tool calls can now include custom metadata and input parameters for better tracking and context.

* Pass additional context with each tool execution
* Track tool-specific metadata across workflow runs

#### Improvements

* Enhanced Vibe Chat streaming with better real-time updates and error handling
* Improved conversation history management with automatic persistence
* Better visualization of tool execution states in the dashboard
* Refined dataset column operations with improved validation
* Enhanced label management across workflows and skill collections

***

## May 08, 2026

### Deployment 1

#### Improvements

* Loading indicators now appear in the traces sidebar while request logs and spans are being fetched
* Renaming a resource (prompt, dataset, workflow, etc.) from its detail page now immediately updates the sidebar navigation
* Image outputs from completion-style prompts are now properly included when creating datasets from request logs

***

### Deployment 2

#### Improvements

* Enhanced search functionality to find prompts and variables by partial matches and fuzzy text matching, making it easier to locate items even with typos or incomplete names
* Improved search performance for prompt name lookups in large workspaces

***

### Deployment 3

#### New Features

**Pydantic AI Support for Traces**
PromptLayer now fully supports tracing for Pydantic AI applications, including agent runs, tool calls, and LLM requests.

* Agent sessions, tool executions, and model calls are automatically classified and displayed with clear labels
* Tool calls show function names and arguments in the trace view
* Embedding calls are tracked separately from chat completions

#### Improvements

* Improved success rate precision display in analytics charts with configurable decimal places
* Enhanced heatmap legend formatting for better readability of activity patterns

***

## May 07, 2026

### Deployment 1

#### New Features

**LangChain Trace Support**
PromptLayer now automatically captures and displays traces from LangChain applications, providing end-to-end visibility into multi-step LLM workflows.

* View nested spans for chains, agents, and tool calls in the trace detail view
* Automatically extracts input/output for each LangChain component
* Compatible with LangChain's OpenTelemetry instrumentation

**Analytics Page Enhancements**
The Analytics dashboard now includes expanded insights to help you understand model usage patterns and performance.

* Provider and prompt template cost breakdowns show where spend is concentrated
* Tag-based analytics let you track requests by custom labels
* Tool latency metrics identify slow function calls
* Metadata and output key frequency analysis
* Enhanced time-series charts with cached token and thinking token visibility

#### Improvements

* Prompt template editor now opens by default when creating a new tool
* Bulk delete tools in the registry using Mod+Backspace keyboard shortcut
* Error messages now clearly indicate whether failures originated from PromptLayer or the upstream AI provider
* Improved metadata rendering in the commit dialog for prompt template version comparisons
* Request analytics API now includes prompt template names instead of IDs only
* Heatmap on Analytics page displays hourly request activity patterns

***

### Deployment 2

#### New Features

**Analytics Charts by Prompt Template**
View request volume trends for specific prompt templates over time in the Analytics dashboard.

* Track usage patterns for individual prompts alongside model usage data
* Identify which prompt templates are driving the most API traffic
* Compare prompt template activity across custom date ranges

#### Improvements

* Auto-switch to the Requests tab when adding a metadata filter from the request detail view
* Enhanced Analytics charts with finer time granularity for short date ranges (down to 1-second intervals)
* Improved time axis labels in Analytics charts now use 12-hour format with AM/PM for better readability

***

### Deployment 3

#### New Features

**External IDs for Entity Management**
All major entities (prompt templates, workflows, tools, datasets, reports, A/B tests, folders) now support external IDs for seamless integration with external systems.

* Attach custom identifiers from your own systems to PromptLayer entities via API
* Query entities by external ID for simplified synchronization workflows
* Inline attachment during entity creation for atomic operations

**Analytics Latency Heatmap**
New heatmap visualization on the analytics page shows latency distribution across custom dimensions.

* Identify performance patterns by provider, model, or custom metadata
* Interactive drill-down to isolate high-latency request clusters

#### Improvements

* API key display now shows last 4 characters for easier identification without exposing full credentials
* Dataset group creation supports inline external ID attachment
* Prompt template list endpoint returns external IDs when present
* Report creation validates column configuration before committing entity
* Folder external ID management available via public API

***

### Deployment 4

#### Improvements

* Added detection for partial or incomplete responses from language model providers
* Improved error messaging for incomplete model outputs to help identify truncated or stopped responses

***

## May 06, 2026

### Deployment 1

#### New Features

**Request Analytics API**
Analytics queries are now available through the public API, enabling programmatic access to request metrics and trends.

* Query request volumes, latency, costs, and error rates via API
* Filter by date ranges, models, prompts, and custom metadata
* Supports the same powerful filtering available in the dashboard

#### Improvements

* Improved error messaging when password reset links expire
* Enhanced reliability of post-onboarding email communications

***

## May 05, 2026

### Deployment 1

#### Improvements

* Fixed dropdown menus and popovers closing immediately when opened inside modal dialogs
* Workflow traces now properly close when a node fails during execution
* Resolved date picker calendar navigation issues when selecting dates near the minimum or maximum allowed range
* Fixed null usage metadata handling in API responses

***

### Deployment 2

#### New Features

**Enhanced Analytics Dashboard**
The Analytics page has been redesigned with interactive time-series visualizations and improved filtering capabilities.

* View request volume, token usage, cost, and latency trends over customizable time ranges
* Analyze model usage patterns with per-model request breakdowns across time buckets
* Explore latency distributions with p50, p90, and p95 percentile tracking
* Adaptive bucketing automatically adjusts granularity from 5-minute intervals for short ranges to daily aggregations for longer periods

#### Improvements

* Analytics charts now display timezone-aware date labels matching your local time
* Request activity heatmaps show hourly usage patterns throughout the day
* Structured search filters now support more precise date range selection in analytics
* Model usage statistics include unknown/missing model names for better visibility

***

## May 01, 2026

### Deployment 1

#### New Features

**Advanced Search Filtering**
Search requests by conversation turns and tool call counts.

* Filter by number of assistant turns in multi-turn conversations
* Filter by total tool calls made during request execution
* Use numeric operators (greater than, less than, equals) in structured search

**Analytics Dashboard**
New analytics endpoint provides comprehensive request metrics and insights.

* View aggregated statistics including total cost, tokens, and latency
* Track daily breakdowns of requests, tokens, and costs
* Analyze model usage distribution across your workspace
* Monitor latency percentiles (p25, p75, p90) over time

#### Improvements

* Onboarding flow now displays animated previews for each setup step
* Request log tables now show absolute timestamps for better clarity
* Clicking the overview area in prompt and workflow editors now opens the full editor view
* Workflow execution processing reliability improvements

***

## April 30, 2026

### Deployment 1

General performance and stability improvements

***

### Deployment 2

#### New Features

**Custom Provider Authentication Schemes**
Custom providers now support flexible authentication methods beyond the default bearer token, enabling seamless integration with enterprise API gateways and non-OpenAI-compatible endpoints.

* Choose between `Bearer`, `X-API-Key`, or fully custom header authentication when configuring a custom provider
* Configure custom authentication headers for providers requiring proprietary authentication schemes
* Custom authentication settings are fully supported in both the dashboard and API

**Request Logs in Prompt Templates and Datasets**
View and analyze request logs directly within prompt template and dataset views for faster debugging and iteration.

* Access request history filtered by prompt template or dataset from the analytics tab
* Add requests to datasets directly from prompt template analytics views

#### Improvements

* Dataset creation failure webhooks now include the dataset ID for easier error tracking
* Added copy button to JSON cards in trace span details for faster data extraction
* Request logs search now supports filtering by prompt template and dataset
* Improved date range picker UX in request logs with better visual feedback
* Tool registry editor maintains proper scrolling on short viewports

***

### Deployment 3

General performance and stability improvements

***

## April 29, 2026

### Deployment 1

#### New Features

**Multi-Select Dataset Creation from Request History**
Enhanced dataset creation workflow now supports selecting multiple requests at once when adding data from request history.

* Create datasets or add rows to existing datasets directly from selected request logs
* Choose between different filter modes for more flexible data selection
* Streamlined bulk data import process

**Binary File Support in Skills**
Skills now support binary files including images, PDFs, and other non-text formats.

* Upload and reference binary files directly in skill collections
* Preview binary file contents in the skill viewer
* Improved file handling for diverse skill use cases

**Tool Registry Public API**
New public API endpoint allows programmatic versioning of registry tools.

* Update tool versions via API for CI/CD integration
* Automate tool deployment workflows
* Manage tool lifecycle programmatically

#### Improvements

* Improved JSON formatting in playground outputs for better readability
* Fixed variable detection for registry tools when using Jinja2 templates in playground
* Enhanced dataset creation API to preserve filter parameters when creating versions from request history
* Improved skill collection version diff viewer with better file preview and comparison
* Better handling of dataset column creation to prevent duplicate names
* Fixed AI assistant buttons across dashboard
* Resolved naming issues when creating new registry tools
* Enhanced folder import for skill collections to handle binary files correctly

***

### Deployment 2

#### Improvements

* Added tool management permissions to default RBAC role templates for Contributors and Publishers
* Improved build process stability and consistency across development and deployment workflows

***

## April 26, 2026

### Deployment 1

#### New Features

**OpenAI GPT-5.5 Model Support**
Added support for OpenAI's latest `gpt-5.5` and `gpt-5.5-pro` models with extended reasoning capabilities and 1M token context windows.

* `gpt-5.5` offers next-generation reasoning with vision support and configurable reasoning effort levels
* `gpt-5.5-pro` provides top-tier performance for the most demanding professional tasks with enhanced reasoning capabilities
* Both models support text and image inputs with knowledge cutoff of December 2025

#### Improvements

* Fixed conversation turn counting to properly track user-assistant exchanges as single turns in chat transcripts
* Corrected tool choice display in function overview dialog to show the appropriate value based on function type

***

## April 24, 2026

### Deployment 1

#### New Features

**Tool Registry**
Manage reusable tool definitions across your workspace with version control and release labels.

* Create, edit, and version tool definitions independently of prompts
* Apply release labels to tool versions for environment-based deployment
* View all prompts and workflows referencing a specific tool
* Duplicate tools across workspaces

**Span Resource Attribute Filtering**
Filter traces by resource attributes attached to individual spans.

* Add filter button directly in span details view
* Quickly narrow down traces based on span-level metadata

#### Improvements

* Tool definitions in prompts now display resolved names and descriptions from the registry
* Release label configuration supports approval workflows for tool registry labels
* Improved changelog entries for tool-related events (create, version, label changes)
* Tool registry cache automatically resolves tool references when loading prompt versions
* Public API endpoints added for managing tool registry programmatically

***

### Deployment 2

#### New Features

**Organization Members Table Pagination**
The `Organization Members` page now supports server-side pagination, improving performance for organizations with large member lists.

* Navigate through member pages with configurable page size
* Search filters apply across all pages with accurate result counts
* Pending workspace invites appear inline with active members

#### Improvements

* Pending workspace invites now display alongside active organization members in a unified view
* Email search in the organization members table is now case-insensitive and supports partial matching
* Tool Registry editor automatically loads the selected version's definition when adding a function to the playground
* Workspace invite queries can now be scoped by workspace for improved organization member management

***

## April 23, 2026

### Deployment 1

General performance and stability improvements

***

### Deployment 2

#### Improvements

* Enhanced trace visualization with improved span status indicators and error display
* Added support for Vercel AI SDK tool execution spans in trace waterfall view
* Improved token usage details display for LLM calls with cache hit information
* Better handling of embedding operations in trace details and request logs
* Refined trace span naming for better readability in complex workflows

***

### Deployment 3

#### Improvements

* Improved reliability of workflow execution in the editor when using "Play from here" feature
* Enhanced trace visualization with collapsible message sections for better readability of long request/response content
* Improved display of exception details in trace span views with better formatting
* Renamed "Agent" blocks to "Workflow" blocks throughout the platform for consistency

***

### Deployment 4

#### New Features

**Resource Attribute Filtering for Traces**
Filter traces by OpenTelemetry resource attributes such as service name, deployment environment, or host information.

* New resource filter tab in trace search alongside metadata filters
* Supports both AND/OR logic for complex resource-based queries
* Autocomplete suggestions for common resource attribute keys
* Improved query performance through database indexing

***

## April 22, 2026

### Deployment 1

#### Improvements

* Enhanced access controls for `Playground` chat mode to ensure proper usage tracking
* Improved snippet detection and classification for better prompt template organization

***

### Deployment 2

#### Improvements

* Enhanced access controls for new user accounts during the initial setup period
* Fixed an issue where filter checkboxes in the registry view required multiple clicks to toggle
* Removed unused dependencies to improve frontend security and bundle size

***

## April 21, 2026

### Deployment 1

#### Improvements

* Enhanced access controls for API authentication

***

### Deployment 2

#### Improvements

* General performance and stability improvements

***

### Deployment 3

#### Improvements

* Enhanced security controls for account access

***

### Deployment 4

#### New Features

**API Key Restrictions for Playground and Workflows**
The Playground and Workflows now enforce workspace-level provider API key requirements. Users on trial Team plans can continue using PromptLayer-provided API keys, while others must configure their own provider credentials in Settings to run prompts and workflows.

* Prevents unauthorized use of shared infrastructure resources
* Clear error messages guide users to add missing provider API keys
* Enhanced access controls across all dashboard execution contexts

**Snippet Management Improvements**
Snippets are now formally distinguished from regular prompts with a dedicated `is_snippet` flag, improving organization and filtering.

* More accurate snippet identification in the registry
* Better separation between reusable snippets and standalone prompts

#### Improvements

* Enhanced model parameter handling in the Playground UI
* Improved template format consistency checks in Playground
* Better error messaging when provider API keys are missing
* Streamlined snippet display components
* More reliable workflow node execution validation

***

### Deployment 5

#### Improvements

* Enhanced subscription trial access controls for dashboard provider API keys

***

### Deployment 6

#### Improvements

* General performance and stability improvements

***

### Deployment 7

#### Improvements

* General performance and stability improvements

***

### Deployment 8

#### Improvements

* Enhanced account security controls

***

## April 16, 2026

### Deployment 1

#### New Features

**Claude Opus 4.7 Support**
Added support for Anthropic's latest Claude Opus 4.7 model, available through both direct Anthropic and Google Vertex AI integrations.

* New "Extra High" effort level option for enhanced reasoning tasks
* Improved agentic coding capabilities over the previous Opus 4.6 model
* Available on Vertex AI at multi-region (`us`, `eu`) and `global` endpoints

#### Improvements

* Enhanced API key validation for workspace bring-your-own-key configurations across all providers
* Updated thinking display defaults for Claude 4.7 family models to match API specifications
* Improved parameter controls for Claude 4.7 models to ensure optimal configuration

***

### Deployment 2

#### Improvements

* Enhanced search performance and reliability across the platform

***

## April 15, 2026

### Deployment 1

#### Improvements

* Enhanced workflow execution tracing for better debugging and observability
* Fixed issue where workflow nodes could get stuck in loading state indefinitely
* Improved date range picker behavior and URL synchronization for request log filters
* Added support for Qwen models on Amazon Bedrock, including Qwen3 235B, Qwen3 32B, Qwen3 Coder variants, and Qwen3 VL for vision tasks
* Fixed display value extraction for Anthropic and Bedrock responses with tool use
* Improved workflow node dependency handling to properly skip nodes when dependencies cannot complete

***

### Deployment 2

#### Improvements

* Enhanced usage tracking for traces and request logs to improve billing accuracy
* Fixed image handling for Amazon Bedrock Converse API to properly support URL-based images by automatically downloading and converting them to the required format
* Improved onboarding experience by refining when the Playground tour appears for new users

***

### Deployment 4

#### Improvements

* Enhanced `Amazon Bedrock` document handling for better compatibility with PDF attachments in chat templates
* Improved rendering of images and documents in request logs when using `Amazon Bedrock` models

***

## April 14, 2026

### Deployment 1

#### Improvements

* Enhanced error messaging for folder API operations to provide clearer guidance when workspace access issues occur

***

## April 13, 2026

### Deployment 1

#### Improvements

* Fixed an issue where dataset example cells could appear empty during selection

***

## April 10, 2026

### Deployment 1

#### Improvements

* API keys can now edit and delete evaluation columns and rename reports programmatically
* Enhanced scroll behavior in skill collection editor for better form navigation
* Improved interactive tour experience with more reliable dialog and popover interactions during guided walkthroughs

***

### Deployment 2

#### Improvements

* General performance and stability improvements

***

### Deployment 3

#### New Features

**Universal Skill Collections**
Skill Collections now support a vendor-agnostic mode that allows you to create portable skills without committing to a specific AI provider structure.

* Toggle between provider-specific layouts (Claude Code, OpenAI Agent, Copilot) and a universal format directly in the editor
* Universal mode stores skills in a flat structure that can be adapted to any provider later
* Switch providers at any time while preserving your skill content and organization

#### Improvements

* Skill Collection versions now track the provider setting at the time of each save
* Empty folders in Skill Collections are now properly displayed in version history and diffs
* Provider changes are now shown as a separate item in version review and diff views
* Creating new files in universal mode defaults to a plain file instead of requiring skill metadata
* Restoring a previous version now correctly restores the provider setting from that version

***

## April 09, 2026

### Deployment 1

#### Improvements

* Enhanced version selector performance with optimized infinite scroll loading
* Improved file hierarchy navigation with better loading states
* Refined skill collection version browser with smoother pagination
* Enhanced entity creation with better visual feedback during save operations
* General performance and stability improvements

***

### Deployment 2

#### New Features

**Link Editing in Skill Collections**
Enhanced markdown editor with inline link management for Skill Collections.

* Add and edit links directly in the markdown editor with a floating menu
* Auto-complete suggestions for linking to other files within your Skill Collection
* Quick access to edit or remove existing links without switching context

***

## April 08, 2026

### Deployment 1

#### New Features

**Skill Collections**
A new entity type for managing and versioning collections of AI assistant skills and tools.

* Create and organize skill files with support for multiple formats including Markdown and YAML
* Version control with commit messages and release labels
* Import entire skill folders via drag-and-drop or zip file upload
* View version history and compare changes between versions

**Playground Interactive Walkthrough**
New users can now access a guided tour of the Playground to learn key features.

* Step-by-step introduction to prompt testing and template creation
* Interactive highlights for model selection, message composition, and output review
* Optional walkthrough can be triggered from the help menu

#### Improvements

* Enhanced tag mentions to support Skill Collections in content areas
* Improved version selector with collapsed rail view for better workspace navigation
* Dataset column selection now properly handles user input changes
* Tag mention rendering updated to support additional entity types
* Version review dialog now supports phased save workflows with change summaries
* Multipart file upload support added for bulk skill collection operations
* Better handling of concurrent workflow node outputs to prevent race conditions

***

## April 07, 2026

### Deployment 1

#### New Features

**Skills Billing Limits**
Introduced plan-based limits for Skills features to provide clear capacity guidelines across all subscription tiers.

* Free plan: 1 skill collection with up to 30 files per collection
* Pro plan: 5 skill collections with up to 50 files per collection
* Team plan: Unlimited skill collections with up to 100 files per collection
* File size limit of 5 MiB per individual skill file applies across all plans

#### Improvements

* Enhanced skill collection creation and editing with real-time limit validation
* Improved error messages when approaching or exceeding skill collection capacity limits
* Added file count projections when saving skill collection versions to prevent unexpected limit errors
* Optimized skill file validation to catch oversized files before processing

***

### Deployment 2

#### Improvements

* Enhanced access controls with more granular permissions for creating, editing, and deleting `Prompts`, `Workflows`, `Datasets`, and `Reports`
* Improved permission management for workspace administrators and custom roles

***

### Deployment 3

#### New Features

**Named API Keys**
You can now assign custom names to your API keys to help organize and identify them across different environments or use cases.

* Add optional names when generating new API keys (e.g., "Production", "Staging", "CI/CD")
* Edit API key names after creation to keep your workspace organized
* View key names and the last 4 characters of each key in the API Keys table for easier identification

***

## April 06, 2026

### Deployment 1

#### New Features

**Model Catalog with Rich Metadata**
Enhanced model selection with detailed metadata display including pricing, context windows, supported capabilities, and knowledge cutoffs.

* Hover over any model in dropdowns to view comprehensive details
* Compare input/output pricing across providers at a glance
* See supported parameters and modalities before configuring
* View deprecation dates and latency characteristics

**Wrangler AI Follow Mode**
New follow mode in Wrangler AI that automatically scrolls to show latest agent activity.

* Toggle follow mode to stay synchronized with agent progress
* Manual scroll disables follow mode temporarily
* Floating widget shows active progress items during agent runs
* Visual glow overlay indicates when agents are working

**Advanced Request Log Search**
Intelligent query routing automatically selects optimal search backend based on query complexity and date range.

* Recent data queries use high-performance search
* Complex historical queries automatically fall back to full database
* Date range picker shows data availability cutoffs
* Search suggestions indicate field availability for selected time ranges

#### Improvements

* Added 22 new models across providers: Amazon Bedrock (9), OpenAI (3), VertexAI (7), Mistral (3)
* Deprecated 39 outdated models with clear retirement dates in model catalog
* Fixed incorrect parameter configurations on 28+ models (temperature ranges, max tokens, reasoning settings)
* Enhanced Anthropic models with adaptive thinking controls and effort settings
* Improved model dropdown with provider icons and categorized grouping
* Added prompt caching support for Claude models on Bedrock and VertexAI
* Fixed snippet override behavior when viewing shared templates
* Enhanced Cohere models with reasoning token display and vision support

***

### Deployment 2

#### New Features

**Skill Collections**
A new feature for organizing and versioning collections of AI skill files with label-based deployment tracking.

* Create and manage collections of skill files with automatic versioning
* Apply labels to specific versions for deployment tracking
* Version history with archive/restore capabilities
* Public API endpoints available for programmatic access (docs)

**Dataset Query Filtering**
Added `filter_query` parameter to datasets for more flexible data filtering.

* Filter dataset rows using custom query expressions
* Available in both dashboard and public API

#### Improvements

* Enhanced permission validation for label protection settings
* Improved support for OpenRouter model telemetry normalization
* Better handling of nested folder structures in the sidebar navigation
* General performance and stability improvements

***

### Deployment 3

#### Improvements

* Enhanced dataset selector visibility in evaluation blueprints
* Fixed dataset version display for nested datasets in the registry
* Improved model metadata caching for better performance

***

## March 31, 2026

### Deployment 1

#### New Features

**User ID Filtering**
Added `user_id` parameter support for request log filtering across the dashboard and API.

* Filter request logs by `user_id` in the Analytics page structured search
* Use `user_id` parameter in the `GET /requests` API endpoint
* Improved query performance with indexed `user_id` field

**Skip Input Variable Rendering in Prompt Templates**
Added `skip_input_variable_rendering` flag to the `GET /v1/prompt-templates` endpoint.

* Preserve `{variable}` placeholders in `llm_kwargs` instead of rendering them as empty strings
* Useful for fetching raw prompt structure without substitution

#### Improvements

* Enhanced request timeout handling to prevent query timeouts on slow searches
* Improved trial date display logic in subscription status indicators
* Added Amazon Bedrock prompt caching support for Claude 3.5 models
* Updated model configurations for Anthropic and Google models to match latest API capabilities
* Enhanced Wrangler AI assistant notifications with better resource detection
* Improved keyboard navigation in sidebar file hierarchy

***

### Deployment 2

#### Improvements

* Enhanced request search API reliability and accuracy
* Improved prompt snippet replacement handling in the registry
* Fixed variable set name display in the playground
* General performance and stability improvements

***

## March 27, 2026

### Deployment 1

#### Improvements

* General performance and stability improvements

***

## March 26, 2026

### Deployment 1

#### New Features

**Anthropic Claude Prompt Caching**
Added support for Anthropic's prompt caching feature to reduce costs and latency for repeated content.

* Automatically caches system messages and tool definitions in multi-turn conversations
* Displays cached token usage in request logs and the playground
* Works with all Claude models that support prompt caching

**Gemini Flash Image Preview Support**
Added `gemini-3.1-flash-image-preview` model with image generation capabilities.

#### Improvements

* Enhanced input variable handling in the prompt template editor to properly reset state when navigating between templates
* Fixed display of conversation history in chat-based prompts when only a single user message is present
* Updated default model selections across providers to reflect latest available models
* Improved subscription plan explanations with clearer billing information
* Refined authentication flow UI for better user experience during sign-in
* Enhanced tool call formatting in request logs for better readability

***

## March 24, 2026

### Deployment 1

#### New Features

**Public API Request Search Suggestions**
Developers can now fetch autocomplete suggestions for request log searches via the public API, enabling programmatic access to the same search experience available in the dashboard.

* New `/api/public/v2/requests/suggestions` endpoint for retrieving field value suggestions
* Supports filtering suggestions by field type, prefix, metadata key, and structured filter groups
* Rate limited to 10 requests per minute for API stability

#### Improvements

* Updated Python SDK to version 1.2.4 with enhanced features and bug fixes
* Fixed Pro plan upgrade button incorrectly appearing disabled when at user limit
* Simplified internal request search filtering for improved query performance
* Enhanced structured search payload handling for more consistent filter behavior

***

### Deployment 2

#### Improvements

* Enhanced AI assistant tracing capabilities for better observability and debugging
* Improved onboarding experience with more reliable use case suggestions
* Optimized database performance by removing unused indexes

***

## March 23, 2026

### Deployment 1

#### New Features

**Dataset Version Draft Workflow API**
New public API endpoints enable programmatic management of dataset drafts, allowing you to create, modify, and save dataset versions via API.

* Create a draft version from an existing dataset or start fresh with `create-draft`
* Add individual request logs to drafts with `add-request-log`
* Save and publish drafts with `save-draft`
* Enables incremental dataset building and version control through API workflows

#### Improvements

* Enhanced user onboarding flow with personalized use case recommendations
* Improved workspace introduction experience for new users
* Streamlined authentication and workspace setup process
* Updated UI animations and loading states for better visual feedback
* Refined dataset management interface with better draft handling

***

### Deployment 2

#### Improvements

* Enhanced request display for tool calls to show both content and function information
* Improved support for additional LLM providers with automatic fallback formatting
* Refined onboarding experience with updated messaging and visual assets

***

## March 22, 2026

### Deployment 1

#### Improvements

* Enhanced provider detection for requests logged via OpenTelemetry instrumentation

***

### Deployment 2

#### New Features

**Evaluation Runs API Enhancement**
The public API now supports retrieving batch runs nested within evaluations for easier programmatic access to evaluation results.

* Added `include_runs` parameter to the list evaluations endpoint
* When enabled, returns each evaluation with its associated batch runs and their current status
* Includes detailed statistics and status counts for each run

#### Improvements

* Enhanced display of requests from additional LLM providers in the dashboard
* Improved handling of chat completion requests across different provider formats
* Better status tracking for evaluation batch runs

***

### Deployment 3

#### Improvements

* Enhanced reliability of third-party integration webhooks
* Improved handling of tool call arguments in OpenTelemetry trace processing for better compatibility with prompt blueprints
* Optimized trace filtering to reduce duplicate entries and improve performance

***

## March 20, 2026

### Deployment 1

#### New Features

**Public Trace API Endpoint**
New `/api/public/v2/traces/<trace_id>` endpoint allows you to retrieve all spans and request logs associated with a trace ID, making it easier to programmatically access complete trace data for debugging and analysis.

* Returns all spans in the trace with their associated request log IDs
* Scoped to your workspace for security
* Complements the existing request log retrieval endpoint

**Enhanced Request Log API Response**
The `/api/public/v2/request/<request_id>` endpoint now includes the `trace_id` field in the response, enabling you to navigate from individual requests to their complete traces.

**Workflow Details in API Response**
The workflow retrieval endpoint now returns the complete `workflow` object in the response, providing full metadata and configuration details alongside the existing `workflow_id`, `workflow_name`, and `version` fields.

#### Improvements

* Enhanced template rendering error detection for Anthropic and Google system messages in Jinja2 templates
* Improved reliability when retrieving large request logs with automatic retry logic for transient failures
* Better error messages when request log data is temporarily unavailable

***

### Deployment 2

#### Improvements

* Enhanced OpenTelemetry trace ingestion to support event-based message formats from modern observability frameworks
* Improved compatibility with real-time AI agent platforms that use span events for conversation tracking
* Better extraction of tool call information from distributed traces

***

### Deployment 3

#### Improvements

* Enhanced OpenTelemetry trace processing to better handle tool calls in conversational AI workflows
* Improved compatibility with industry-standard telemetry formats for multi-turn agent interactions
* More accurate capture of function calling sequences in traced LLM requests

***

## March 19, 2026

### Deployment 1

#### New Features

**Chat View for Request Traces**
View LLM request traces in a conversation-style chat interface for easier readability.

* Toggle between chat view and traditional template view
* See message flow with role-based avatars (user/assistant/system)
* Available in request logs and playground pages

**Enhanced Workflow API**
The `GET /workflows` endpoint now returns full workflow structure including nodes and edges.

* Query specific versions using `?version=N` parameter
* Query by release label using `?label=my-label` parameter
* New `GET /workflows/{id}/labels` endpoint lists all release labels for a workflow

#### Improvements

* Expanded structured output support for additional Bedrock models (Nova, Llama, Mistral families)
* Better error handling and logging for workflow code execution nodes
* Improved loading states in shared request pages

***

## March 18, 2026

### Deployment 1

#### New Features

**GPT-5.4 Mini and Nano Model Support**
PromptLayer now supports OpenAI's GPT-5.4 Mini and GPT-5.4 Nano models in the `Playground` and API.

* Configure reasoning effort, verbosity, and response format options for both models
* Leverage lower-cost alternatives to GPT-5.4 for appropriate use cases
* Access prompt caching capabilities for improved performance

#### Improvements

* Images now display correctly in `Playground` chat mode
* Dataset columns reordered to show `promptlayer_url` before `prompt` for easier request navigation
* Enhanced model configuration options for GPT-5.4 series models

***

### Deployment 2

#### New Features

**Chat History Injection for Prompt Templates**
Prompt template blocks in evaluations can now inject chat history messages from a dataset column directly into chat prompts.

* Enable chat history injection in the Advanced Settings section of the prompt template block configuration
* Select a source column containing message objects with role and content fields
* Messages are automatically appended to the end of your prompt template
* Supports both JSON and JSON5 formatted message lists for flexible data sources

***

## March 17, 2026

### Deployment 1

#### New Features

**Public API for Dataset Rows**
New REST API endpoint to programmatically retrieve paginated rows from `Datasets`, enabling integration with external tools and workflows.

* Access dataset rows via `/api/public/v2/datasets/{id}/rows` with support for search and pagination
* Returns structured row data matching dataset column definitions
* Supports up to 100 rows per request with flexible filtering

**Public API for Evaluation Results**
New REST API endpoint to fetch evaluation results programmatically, combining dataset inputs with evaluation scores.

* Access evaluation rows via `/api/public/v2/evaluations/{id}/rows` endpoint
* Returns both dataset input variables and evaluation cell results in a unified format
* Enables automated analysis and reporting on evaluation performance

**Enhanced OpenTelemetry Tracing Support**
Expanded tracing instrumentation to support additional SDKs and frameworks for automatic request logging.

* Improved compatibility with diverse instrumentation libraries
* More reliable extraction of provider and model information from traces

#### Improvements

* Enhanced input variable detection in `Playground` with truncation and tooltip support for long variable names
* Improved link handling in Wrangler AI for better navigation across all resource types
* Refined organization members table pagination for more consistent data loading
* Updated API key input styling across provider configuration pages for better visual consistency
* Improved deduplication logic for file annotations to prevent duplicate entries

***

### Deployment 2

#### New Features

**Deployment Usage Analytics**
Track token consumption and session activity across all your prompt deployments with new organization-level analytics.

* View daily token usage broken down by individual deployments
* Monitor session counts per deployment over time
* Compare usage across public and private deployments
* Access historical usage data for capacity planning

**Prompt Remixing**
Enable users to create their own versions of your shared prompts directly from the deployment interface.

* Toggle remix capability on or off for any deployment
* Users can fork and customize prompts while maintaining attribution
* Remixed versions are saved to the user's own workspace
* Great for templates and starter prompts you want others to build upon

#### Improvements

* Enhanced deployment management UI with improved session visibility and controls
* Added batch execution support for shared prompt deployments
* Improved file upload handling and multipart processing for large media files
* Better dataset creation flow with prompt template selection from request history
* Enhanced permission checks and access controls across deployment endpoints

***

## March 16, 2026

### Deployment 1

#### New Features

**Runtime Tool Variables in Prompt Templates**
Dynamic tool injection now supports variable substitution, enabling templates to generate tool definitions on-the-fly based on runtime context.

* Tool schemas can include variables (e.g., `{{user_id}}`, `{{domain}}`) that resolve during template execution
* Supports nested variable resolution within tool parameters and descriptions
* Enables dynamic function calling patterns where tool availability adapts to request context

**Chat Message Annotations**
Annotations can now be added directly to individual messages in chat-mode conversations for improved debugging and analysis.

* Attach metadata, tags, or notes to specific assistant or user messages
* Track message-level performance metrics and quality assessments

#### Improvements

* Enhanced function/tool overview dialog displays complete schema details with improved formatting
* Improved citation modal rendering with better support for complex reference structures
* Streamlined Docker image build process reduces deployment time
* Better visual distinction between different tool types in the functions list view
* Optimized frontend bundle size through refined dependency management

***

## March 15, 2026

### Deployment 1

#### New Features

**Public API Request Search Endpoint**
A new `/api/public/v2/requests/search` endpoint enables programmatic searching of request logs with structured filters.

* Search logs using the same filtering capabilities available in the dashboard
* Support for complex filter groups and structured queries
* Rate-limited to 10 requests per minute with up to 25 results per page

**BYOK Playground Limit Exemption**
Users who configure their own API keys (Bring Your Own Key) are now exempt from daily playground run limits.

* Unlimited playground testing when using your own API credentials
* Cost control stays with your organization while removing artificial usage caps

#### Improvements

* Improved dataset column JSON parsing with better error handling for sparse or malformed data
* Enhanced workspace member management interface with clearer permission displays
* Fixed API key modal display to better communicate rate limits and usage policies
* Standardized public API endpoint structure (moved `GET /api/public/v2/request/<id>` to `/requests/<id>`)
* Added permission checks to dataset creation and editing endpoints to enforce role-based access control

***

## March 12, 2026

### Deployment 1

#### New Features

**Hybrid Search for Registry**
Enhanced search across prompts, workflows, and datasets combining keyword matching with semantic understanding for more relevant results.

* Search results now surface contextually similar items even when exact keywords don't match
* Improved search ranking considers both text relevance and semantic meaning
* Background indexing keeps search up-to-date as you modify registry items

**Scroll Lock in Playground Chat**
Chat panel now maintains your scroll position when new messages arrive, preventing automatic jumping to the bottom.

* Toggle scroll lock on/off to control whether new messages auto-scroll
* Manually scrolling up automatically enables scroll lock
* Scroll to bottom re-enables auto-scroll behavior

#### Improvements

* Fixed search suggestions displaying incorrect text values in autocomplete dropdowns
* Resolved f-string variable indexing issues when searching prompt templates
* Added `language` field support for Google Code Execution tool responses
* Improved citation display by preserving original model response annotations without deduplication
* Enhanced registry list and grid views with optimized virtualization for faster rendering of large item collections

***

### Deployment 2

#### New Features

**OTLP Prompt Resolution by ID and Label**
Enhanced OpenTelemetry trace ingestion now supports flexible prompt identification and version resolution.

* Spans can reference prompts by `promptlayer.prompt.id` in addition to `promptlayer.prompt.name`
* Version resolution via `promptlayer.prompt.label` automatically links traces to labeled prompt versions
* Improved error handling when prompt identifiers are incomplete or not found in the workspace

**Duplicate Span Handling**
The `/spans-bulk` endpoint now intelligently handles duplicate span IDs to prevent data conflicts.

* Duplicate spans within the same batch are deduplicated before insertion (first occurrence wins)
* Duplicate spans across separate batches use upsert logic (last write wins)
* Ensures trace data remains consistent when the same span is reported multiple times

#### Improvements

* Added `flask embed_recently_used_prompts` command to backfill embeddings for prompts with recent traffic
* Added `flask normalize_recently_used_prompts` command to backfill normalized content for recently-used prompt versions
* Both commands support configurable look-back windows (default 365 days) and batch sizes for gradual processing
* Enhanced test coverage for duplicate span handling scenarios in bulk span creation

***

### Deployment 3

#### New Features

**Structured Search**
Advanced filtering interface for request logs with improved query building and autocomplete suggestions.

* Build complex filters using fields, operators, and values with keyboard-driven autocomplete
* Support for nested metadata filtering with `key_equals`, `key_not_equals`, and `key_contains` operators
* Multi-value selection for tags, labels, and metadata fields with `in` and `not_in` operators
* Apply date range presets like "Last 5 minutes" or shorthand like "30d" for quick filtering

#### Improvements

* Enhanced date picker with single date selection mode and custom preset support
* Added `NOT_IN` operator support for identifier, string, array, and nested key-value fields
* Improved table components with better row click handling and empty state messages
* Added `IS_EMPTY` and `IS_NOT_EMPTY` operators for nested metadata filtering
* Expanded operator support for input/output text fields to include `STARTS_WITH` and `ENDS_WITH`

***

### Deployment 4

#### Improvements

* Fixed search filters not correctly matching boolean and numeric metadata values (e.g., `false`, `true`, `42`)
* Resolved issue where changing search filters could trigger duplicate requests and cause stale results to display
* Improved nested field filtering to properly match metadata values regardless of type (string, boolean, or number)

***

## March 11, 2026

### Deployment 1

#### New Features

**Nested Search Support for Outputs and Input Variables**
Advanced search now supports filtering by output fields and input variables, matching the existing metadata search capabilities.

* Search for specific output values using `output:key=value` syntax
* Filter requests by input variable content with `input_variables:key=value`
* Use autocomplete suggestions for both output keys and input variable keys in the search bar

#### Improvements

* Improved snippet handling when creating prompt versions with overrides to ensure base references are used consistently
* Enhanced search suggestion performance for nested field queries (metadata, outputs, input variables)
* Input variables now preserve insertion order when rendering prompt templates
* Added structured logging context showing workspace and user IDs for better debugging and support

***

## March 10, 2026

### Deployment 1

#### New Features

**Model Override Support in Evaluations**
Enhanced evaluation workflows now preserve model configuration when routing between prompt templates and agents.

* Model override settings are now correctly passed through evaluation interfaces
* API type and model configuration IDs are properly maintained across workflow executions

#### Improvements

* Fixed tool call detection in search indexing to correctly identify assistant messages with tool calls
* Improved "Open Prompt" button functionality in image API evaluations to use correct routing
* Enhanced build efficiency by adding path guards to skip unnecessary backend image builds when only documentation or configuration files change

### Deployment 2

#### New Features

**Prompt Starring**
Users can now star important prompts for quick access and organization.

* Star/unstar prompts directly from the prompt template page
* View list of users who have starred a prompt
* Filter and prioritize frequently-used prompts

**Structured Search for Request Logs**
Advanced filtering capabilities for request logs with precise search criteria.

* Build complex queries using field-specific filters (metadata, tags, models, etc.)
* Get autocomplete suggestions for search fields based on your workspace data
* Sort results by any field with flexible ascending/descending order

**Enhanced Tool Rendering**
Native display support for Anthropic code execution and shell command tools.

* View bash command execution results with syntax highlighting
* See code patches applied by AI agents in a readable format
* Improved visualization of tool use blocks in chat interfaces

#### Improvements

* Filter prompt templates by tags via the API using the `tags` parameter
* Fixed "Open Prompt" button behavior in image-based evaluations to correctly navigate to prompt templates
* Resolved race condition in workflow output nodes that could cause incorrect status codes
* Added model override routing support for evaluation workflows
* Improved prompt template list performance with optimized tag indexing

***

### Deployment 3

#### New Features

**Multi-Prompt Search Filtering**
Advanced search now supports filtering across multiple prompts simultaneously and combining filters with logical operators.

* Apply filters to multiple prompt templates at once for cross-prompt analysis
* Combine search conditions using AND/OR logic for more precise queries
* Filter suggestions now respect existing search criteria for faster query building

**Inline Item Creation in Sidebar**
Create new items directly from empty folders in the sidebar navigation without navigating away from your current view.

* Click "New item" buttons that appear in expanded empty folders
* Context-aware creation automatically places items in the correct folder
* Streamlined workflow for organizing prompts, datasets, and other resources

#### Improvements

* Search autocomplete suggestions now dynamically update based on active filters
* Added support for null/not-null operators in numeric field searches
* Enhanced folder navigation with visual indicators for active item context
* Improved metadata value suggestions with better handling of nested fields
* Optimized search performance for large workspaces with complex filter combinations

***

### Deployment 4

#### New Features

**Anthropic Text Editor Tool Support**
Added support for Anthropic's text editor built-in tool, enabling AI assistants to view, create, and edit text files programmatically.

* Available for both Anthropic and Vertex AI (Anthropic models) providers
* Supports commands like view, create, insert, and string-based replace operations
* Automatically handles text editor tool results in request logs and prompt templates

#### Improvements

* Enhanced subscription tracking with monthly contract value and contract duration fields for better enterprise billing management
* Fixed real-time event listener limits to prevent connection issues when multiple components subscribe to the same channel
* Improved tool choice handling to correctly map Text Editor tool names in API requests

***

### Deployment 5

#### New Features

**Plain Text Search in Structured Search**
You can now use plain text search alongside structured filters to find request logs more quickly.

* Performs full-text search across request inputs and outputs while applying your structured filters
* Enables flexible searching when you need both keyword matching and precise filtering

**Tool Names Search and Filtering**
Search and filter request logs by the tools called during execution.

* Search for specific tool names using the search bar with autocomplete suggestions
* Filter requests by tool names in structured search queries
* Helps track which tools are being used across your prompts and workflows

#### Improvements

* Search results now prioritize exact matches in request inputs and outputs when using plain text search
* Tool name suggestions appear in the search bar autocomplete for faster filtering
* Structured search queries support filtering by tool execution status and metadata

***

## March 09, 2026

### Deployment 1

#### New Features

**OpenTelemetry Trace Ingestion Enhancements**
Support for modern OpenTelemetry semantic conventions and improved compatibility with observability libraries.

* Added support for gzip-compressed OTLP trace payloads to reduce network overhead
* Added support for newer `gen_ai.input.messages` and `gen_ai.output.messages` JSON format used by Ruby and other emerging instrumentations
* Improved parsing of `gen_ai.system_instructions` to properly handle system prompts from different providers
* Added automatic upsert logic for duplicate span IDs to ensure trace completeness when spans are sent multiple times

**Template Rendering for Tool/Function Messages**
Improved handling of LLM-generated tool and function call messages in prompt templates.

* Template validation now gracefully skips tool/function messages that contain JSON responses rather than user-authored templates
* Prevents false template rendering errors when JSON braces in tool responses are mistaken for template syntax
* Preserves support for legitimate template variables in few-shot tool examples

#### Improvements

* Added `playground_session_id` to request log bulk endpoint responses for better session tracking
* Improved JSON variable parsing to optimistically parse all string values, matching frontend batch-mode behavior
* Enhanced OTLP function name inference to support more provider-specific operation types (embeddings, text completion, content generation)
* Fixed provider family detection for Anthropic and Google AI models in OpenTelemetry traces
* Improved error handling for malformed Content-Type headers in trace ingestion

***

## March 07, 2026

### Deployment 1

#### New Features

**Anthropic Code Execution Tool Support**
Added support for Anthropic's native code execution tool capability, enabling AI models to write and execute Python code during conversations.

* Models can now generate and run code snippets directly within chat sessions
* Code execution results are displayed inline with conversation history
* Supports dynamic data analysis and computation workflows

**Enhanced Trace Filtering with Metadata Search**
Introduced advanced filtering for traces using custom metadata keys, making it easier to find specific traces in production systems.

* Search and filter traces by any custom metadata key stored in span attributes
* Autocomplete suggestions help discover available metadata keys across your workspace
* Filter results update in real-time as you type

**OpenAI Shell Tool Integration**
Added built-in shell tool support for OpenAI models, allowing AI assistants to execute shell commands when explicitly enabled.

* Enables automation workflows where models can interact with system commands
* Integrates with OpenAI's native tool calling infrastructure

#### Improvements

* Improved playground session initialization to correctly handle tool and function definitions when opening from request logs
* Enhanced request log input variable extraction to include tool/function data for better context when replaying requests
* Streamlined "Open in Playground" workflow to preserve all tool configurations from original requests
* Fixed trace metadata button display issues in the span details view
* Normalized message content format to consistently use content blocks across chat interfaces
* Improved Vite build configuration for better development server performance

***

## March 05, 2026

### Deployment 1

#### New Features

**OpenTelemetry Trace Ingestion**
Native support for industry-standard OpenTelemetry Protocol (OTLP) trace ingestion, enabling seamless integration with existing observability tooling.

* Ingest traces via standard OTLP/HTTP endpoint at `/v1/traces`
* Automatic extraction of GenAI semantic conventions for OpenAI and Anthropic providers
* Convert OTLP spans into PromptLayer request logs with proper error mapping and metadata preservation

**Multi-Message Tool Response Handling**
Enhanced playground chat interface now supports submitting multiple tool response messages simultaneously.

* Import and replay conversations with parallel tool calls from request logs
* Maintain correct message ordering when tools are invoked across conversation turns
* Proper hydration of chat history with multiple tool responses per assistant turn

**Chat History Import from Request Logs**
Import conversation history directly from request logs into playground chat sessions.

* Reset and re-seed chat from any logged request with one click
* Automatically diff request messages against current template to extract conversation context
* Per-variable-set chat history support for testing multiple scenarios simultaneously

#### Improvements

* Fixed playground chat crashes when trace metadata contains non-string values during URL sharing
* Resolved 500 errors when reading prompts that use legacy LangChain message format
* Fixed "No response" display issue for template render errors in request logs
* Improved image evaluation algorithm accuracy for visual content comparison
* Enhanced workspace member invitation dialog with better field validation
* Fixed chat message ordering when importing request logs with tool calls

***

## March 04, 2026

### Deployment 1

#### New Features

**Google File Search Tool Support**
Native integration with Google's File Search tool for Gemini models, enabling document-based context retrieval.

* Create and manage file search stores directly in the PromptLayer UI
* Upload documents to stores and associate them with prompts in the playground
* Documents are automatically indexed for semantic search during conversations
* Grounding metadata shows which documents were referenced in responses

**OpenAI MCP (Model Context Protocol) Tool**
Support for OpenAI's Model Context Protocol tools in prompt templates and playground.

* Configure MCP servers and tools through the built-in tools dialog
* Available for OpenAI models that support function calling
* Tool responses appear inline in conversation history

**User Attribution Tracking**
Track which team member created or modified resources across the platform.

* Author information displayed for prompts, datasets, evaluations, and notifications
* Filter resources by creator in the unified registry
* "Open Original Session" button on run requests links back to the source playground session

#### Improvements

* Added support for Claude Sonnet 4.5 on Amazon Bedrock
* Added support for Gemini 3.1 Flash Lite model
* Debounced playground input variable parsing to reduce API calls during typing
* Fixed issue where deleted file stores could still be selected in the UI
* Improved search indexing with deduplication to prevent duplicate results
* Redesigned settings navigation with clearer organization and visual hierarchy
* Enhanced vector store management with delete store capability
* Improved file preview URLs for local storage backends with HMAC-signed streaming

***

## March 03, 2026

### Deployment 1

#### New Features

**Anthropic Structured Output Support**
Added JSON Schema support for Anthropic models to enforce structured responses.

* Configure `response_format` with JSON Schema in prompt templates for Claude models
* Automatically converts to Anthropic's `output_config` format
* Also supported for Claude models running on AWS Bedrock

**Organization Members Management**
Enhanced organization members page with improved filtering and detailed member views.

* View all workspaces and roles for each organization member in a detailed side panel
* Filter members by workspace, role, or search by name/email
* Members can now remove themselves from organizations without owner permissions

#### Improvements

* Fixed score slider to properly handle integer-only scores
* Added workspace search by name in workspace listing
* Improved autocomplete components with better keyboard navigation and multi-select support
* Enhanced request display to show `error_type` and `error_message` fields when present
* Added validation for `error_type` field in `/track-request` endpoint to match `/log-request` behavior
* Fixed memory leak in scheduled job processing

***

## March 01, 2026

### Deployment 1

#### Improvements

* Conversation simulator now surfaces errors from follow-up turns instead of silently ending conversations, making it easier to diagnose multi-turn evaluation failures
* Request logs with warning status now display partial responses when available, providing visibility into requests that partially succeeded
* Fixed display logic to correctly identify the final assistant response in multi-turn conversations, ensuring request context and actual output are properly distinguished
* Reduced backend test parallelization to improve test stability and reliability

***

## February 28, 2026

### Deployment 1

#### New Features

**Public API Request Payload Endpoint**
New `/api/public/v2/request-payload` endpoint allows you to retrieve complete request details including prompt blueprints, token usage, and latency metrics.

* Returns full prompt blueprint structure for easy reproduction
* Includes comprehensive metadata: provider, model, tokens, pricing, and timing
* Supports API key authentication

#### Improvements

* Improved Playground reliability on slow network connections by buffering early messages to prevent UI stalls
* Enhanced error handling for WebSocket token refresh failures with better logging for troubleshooting
* Fixed race condition in report cell generation that could cause false failures under high concurrency
* Improved WebSocket connection stability by returning cached tokens when refresh attempts fail
* Enhanced error reporting for messaging service failures with clearer error messages and categorization

***

## February 27, 2026

### Deployment 1

#### New Features

**OpenAI Images API Support**
Full support for OpenAI's image generation models including `gpt-image-1`, `gpt-image-1-mini`, `gpt-image-1.5`, `dall-e-3`, and `dall-e-2`.

* Configure quality, size, background, output format, and moderation settings directly in the Playground
* Generate multiple images in a single request with `n` parameter control
* View generated images with revised prompts in dedicated accordion sections

**Google Gemini Image Generation**
Added `gemini-3.1-flash-image-preview` model for AI-generated images via Google/Vertex AI.

* Customize image size (0.5K to 4K) and aspect ratio (1:1, 16:9, 21:9, and more)
* Includes standard Gemini safety settings and generation parameters

**URL Context Tool for Google/VertexAI**
Web search and URL content retrieval now available for Google and Vertex AI models in the Playground.

* Extract and analyze content from web pages during conversations
* Matches existing functionality available for OpenAI models

**Enhanced Custom Scoring System**
Refactored evaluation scoring with improved reliability and performance.

* Automatically recalculates report scores when evaluation criteria are updated
* Prevents score updates on incomplete evaluations

#### Improvements

* Fixed WebSocket connection timing to establish only after authentication token is available
* Increased message history buffer to 400 messages for improved chat continuity
* Resolved dynamic resolution stack errors in evaluation workflows
* Enhanced Playground sidebar layout with better widget spacing and control bar positioning
* Improved clipboard handling for content copy operations in the editor
* Fixed cost calculations for `nano-banana-2` model
* Streamlined prompt template retrieval logic for better reliability

***

## February 26, 2026

### Deployment 1

#### New Features

**OpenAI Images API Support**
PromptLayer now supports OpenAI's image generation models including `gpt-image-1`, `gpt-image-1-mini`, `gpt-image-1.5`, `dall-e-3`, and `dall-e-2`.

* Track and log all image generation requests with full parameter support (quality, size, format, moderation)
* View generated images directly in the request logs with revised prompt accordion
* Monitor token-based pricing for new GPT image models

**Google Gemini Tool Support Enhancements**
Extended tool support for Google and VertexAI models with additional capabilities.

* Added URL context tool support for fetching and processing web content
* Added code execution tool support for running code within model interactions
* Preserved thinking blocks for extended reasoning visibility in responses

**Improved Markdown Rendering**
Enhanced markdown display across the platform for better content readability.

* Richer formatting support in chat messages and outputs
* Improved code block rendering with syntax highlighting
* Better handling of complex markdown structures in evaluations and logs

#### Improvements

* Added human-readable status descriptions in the UI for better request monitoring
* Fixed refresh button behavior in sidebar navigation for consistent state management
* Improved error handling for team member invitations with clearer error messages
* Enhanced clipboard support for copying content from rich text editors
* Fixed prompt analytics page to correctly display evaluations without scores
* Improved evaluation table columns to show more detailed metrics
* Enhanced streaming performance for playground outputs with better state management

***


# Deploy PromptLayer on AWS
Source: https://docs.promptlayer.com/enterprise-deployments/aws

Deploy PromptLayer in your AWS account with OpenTofu, Amazon EKS, and Helm.

# Deploy PromptLayer on AWS

Use this guide to deploy PromptLayer in your own AWS account. PromptLayer provides a deployment package with OpenTofu configuration, Helm values files, a release manifest, and registry credentials.

The deployment has four phases:

1. Prepare AWS access and customer-specific settings.
2. Provision infrastructure with OpenTofu.
3. Install cluster add-ons and OpenSearch.
4. Install PromptLayer Helm charts.

## What PromptLayer provides

PromptLayer sends a deployment package for your environment. It includes:

| Item                   | Purpose                                                                     |
| ---------------------- | --------------------------------------------------------------------------- |
| OpenTofu configuration | Creates the AWS infrastructure and Kubernetes add-ons.                      |
| Example tfvars files   | Templates for `infra.tfvars`, `kubernetes.tfvars`, and `opensearch.tfvars`. |
| Helm values files      | Configuration for the PromptLayer application charts.                       |
| Release manifest       | The chart versions, release names, namespaces, and values files to use.     |
| Registry credentials   | Access to PromptLayer's private chart and image registry.                   |

## Before you begin

Make sure you have:

| Requirement        | Notes                                                                                                                 |
| ------------------ | --------------------------------------------------------------------------------------------------------------------- |
| Enterprise license | See [Self-Hosted PromptLayer](/self-hosted) for licensing and support.                                                |
| OpenTofu           | Version `1.10.0` or newer.                                                                                            |
| AWS CLI            | v2 is recommended. `aws sts get-caller-identity` must succeed.                                                        |
| AWS IAM access     | Permission to create and update VPC, EKS, RDS, ElastiCache, IAM, S3, Route53, Secrets Manager, and related resources. |
| Helm               | A Helm CLI version that supports OCI registries.                                                                      |
| kubectl            | Used for verification after EKS is created.                                                                           |
| Domain             | A Route53 hosted zone for the PromptLayer hostname and wildcard certificate.                                          |
| Deployment package | The environment-specific files from PromptLayer.                                                                      |

<Info>
  OpenTofu downloads provider binaries during `tofu init`. You do not install the AWS, Kubernetes, Helm, or HTTP providers separately.
</Info>

## Gather customer inputs

Decide these values before you run OpenTofu:

| Area        | Values to confirm                                                                                                                                                   |
| ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| AWS account | Account ID, AWS region, AWS partition if not commercial AWS, and the IAM role or user that will run OpenTofu.                                                       |
| Naming      | Project name, environment name, resource tags, cost center, and owner tags.                                                                                         |
| Networking  | VPC CIDR, availability zones, public subnet CIDRs, private subnet CIDRs, NAT gateway strategy, and EKS API access CIDRs.                                            |
| DNS and TLS | Domain name, Route53 hosted zone ID, certificate email, wildcard DNS names, and whether external and internal ingress should use the wildcard certificate.          |
| Databases   | RDS instance size, storage, Multi-AZ setting, backup retention, backup window, maintenance window, deletion protection, and optional customer-managed KMS key.      |
| Cache       | ElastiCache Valkey size, failover setting, Multi-AZ setting, encryption settings, and maintenance window.                                                           |
| EKS         | Cluster name, Kubernetes version, node group sizes, instance types, disk sizes, logs, endpoint access, and optional KMS key for Kubernetes secrets.                 |
| Storage     | S3 bucket names or naming prefix, encryption settings, lifecycle rules, CORS needs, and whether bucket names should include the AWS account ID.                     |
| IAM         | Route53 zones for cert-manager and external-dns, Secrets Manager and SSM ARNs for External Secrets, KEDA scaler permissions, and application service account names. |
| OpenSearch  | Admin password delivery method, replica counts, disk sizes, resources, and optional warm tier.                                                                      |

## Prepare AWS access

Authenticate to the target AWS account and verify the identity:

```bash theme={null}
aws sts get-caller-identity
```

Use the same account and region for all OpenTofu stages unless PromptLayer gives you a different architecture.

## Prepare secrets

Create or select a Secrets Manager secret for RDS. The secret must contain the RDS master password and any database user passwords that the deployment package references.

Example shape:

```json theme={null}
{
  "rds-master-password": "<strong-password>",
  "promptlayer-api-password": "<strong-password>",
  "promptlayer-worker-password": "<strong-password>",
  "promptlayer-readonly-password": "<strong-password>",
  "promptlayer-usage-password": "<strong-password>"
}
```

The exact secret name and JSON keys must match `infra.tfvars` and `kubernetes.tfvars`.

Set the OpenSearch admin password as an environment variable before running the OpenSearch stage:

```bash theme={null}
read -rsp "OpenSearch admin password: " TF_VAR_opensearch_initial_admin_password
echo
export TF_VAR_opensearch_initial_admin_password
```

Unset it when the OpenSearch apply is complete:

```bash theme={null}
unset TF_VAR_opensearch_initial_admin_password
```

## Prepare the deployment package

From the package root, create local tfvars files from the examples:

```bash theme={null}
cp environments/aws/infra/infra.tfvars.example environments/aws/infra/infra.tfvars
cp environments/aws/kubernetes/kubernetes.tfvars.example environments/aws/kubernetes/kubernetes.tfvars
cp environments/aws/opensearch/opensearch.tfvars.example environments/aws/opensearch/opensearch.tfvars
```

Replace every placeholder with customer-specific values. At minimum:

| File                | Update                                                                                                                                                                                                                             |
| ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `infra.tfvars`      | `project_name`, `environment`, `region`, tags, remote state values, VPC settings, EKS settings, RDS settings, Valkey settings, S3 bucket settings, and IRSA settings.                                                              |
| `kubernetes.tfvars` | Remote state values, infra remote state key, storage class, cert-manager settings, Route53 settings, ingress settings, monitoring and logging settings, External Secrets settings, KEDA settings, and RDS user bootstrap settings. |
| `opensearch.tfvars` | Remote state values, AWS region, EKS cluster name, environment, tags, OpenSearch chart versions, replicas, disk sizes, resources, and namespace.                                                                                   |

<Warning>
  Do not leave placeholder values, example domains, local-only email addresses, or development environment names in tfvars before applying.
</Warning>

## Bootstrap OpenTofu state

Create a dedicated S3 bucket for OpenTofu state. The bootstrap script creates the bucket, enables versioning, blocks public access, enables SSE-S3 encryption, and writes the S3 backend config for all three AWS stages.

```bash theme={null}
chmod +x scripts/bootstrap-tf-state-bucket-aws.sh
./scripts/bootstrap-tf-state-bucket-aws.sh <aws-region> <state-bucket-prefix>
```

The bucket name is:

```text theme={null}
<state-bucket-prefix>-<aws-region>-<aws-account-id>
```

After the script runs, set the matching remote state values in each tfvars file:

| Stage          | `remote_state_s3_key`                 |
| -------------- | ------------------------------------- |
| Infrastructure | `aws/<aws-region>/infra.tfstate`      |
| Kubernetes     | `aws/<aws-region>/kubernetes.tfstate` |
| OpenSearch     | `aws/<aws-region>/opensearch.tfstate` |

OpenTofu uses native S3 locking. You do not need a DynamoDB lock table.

## Deploy infrastructure

The infrastructure stage creates the VPC, subnets, EKS cluster, node groups, RDS, ElastiCache Valkey, S3 buckets, security groups, and IAM roles for Kubernetes service accounts.

```bash theme={null}
cd environments/aws/infra
tofu init -upgrade -reconfigure
tofu plan -var-file=infra.tfvars -out=infra.tfplan
tofu apply infra.tfplan
```

After apply, capture the outputs. You will need the EKS cluster name, RDS endpoint, Valkey endpoint, S3 bucket names, and IAM role ARNs for verification and support.

```bash theme={null}
tofu output
```

Configure kubectl for the new cluster:

```bash theme={null}
aws eks update-kubeconfig \
  --region <aws-region> \
  --name <eks-cluster-name>
```

Verify the cluster:

```bash theme={null}
kubectl get nodes
```

## Deploy Kubernetes add-ons

The Kubernetes stage installs cluster add-ons such as cert-manager, ingress controllers, External Secrets, KEDA, monitoring, logging, and cluster autoscaling.

Run this stage in two passes so cert-manager custom resources are available before you create the issuer and wildcard certificate.

<Steps>
  <Step title="First pass: install CRDs and add-ons">
    In the existing `cert_manager` object in `kubernetes.tfvars`, keep `cluster_issuer.enabled` and `wildcard_certificate.enabled` set to `false`.

    Then apply:

    ```bash theme={null}
    cd ../kubernetes
    tofu init -upgrade -reconfigure
    tofu plan -var-file=kubernetes.tfvars -out=kubernetes-first.tfplan
    tofu apply kubernetes-first.tfplan
    ```
  </Step>

  <Step title="Second pass: enable certificates and TLS">
    In `kubernetes.tfvars`, set `cert_manager.cluster_issuer.enabled` and `cert_manager.wildcard_certificate.enabled` to `true`.

    For each ingress controller that should use the wildcard certificate, set `enable_default_tls_from_wildcard_certificate` and `enable_wildcard_tls_from_wildcard_certificate` to `true`.

    Apply again:

    ```bash theme={null}
    tofu plan -var-file=kubernetes.tfvars -out=kubernetes-second.tfplan
    tofu apply kubernetes-second.tfplan
    ```
  </Step>
</Steps>

Verify the add-ons:

```bash theme={null}
kubectl get pods -A
kubectl get ingressclass
helm list -A
```

## Deploy OpenSearch

Deploy OpenSearch after the EKS cluster and Kubernetes add-ons are ready.

Before applying:

1. Set `eks_cluster_name` in `opensearch.tfvars` to the cluster name from the infrastructure output.
2. Set `aws_region`, `environment`, `project_name`, and `default_tags`.
3. Confirm the OpenSearch node groups exist and use the labels and taints required by the deployment package.
4. Export `TF_VAR_opensearch_initial_admin_password`.

Then apply:

```bash theme={null}
cd ../opensearch
tofu init -upgrade -reconfigure
tofu plan -var-file=opensearch.tfvars -out=opensearch.tfplan
tofu apply opensearch.tfplan
```

Verify OpenSearch:

```bash theme={null}
kubectl get pods -n <opensearch-namespace>
kubectl get svc -n <opensearch-namespace>
```

## Install PromptLayer charts

Install the PromptLayer application charts after infrastructure, Kubernetes add-ons, and OpenSearch are ready.

Use the release names, namespaces, values files, and chart versions from your release manifest. Run Helm from the directory that contains the values files.

<Steps>
  <Step title="Log in to the registry">
    Use `--password-stdin` so the password is not passed as a command-line argument.

    ```bash theme={null}
    read -rsp "PromptLayer registry password: " PL_REGISTRY_PASSWORD
    echo
    printf '%s' "${PL_REGISTRY_PASSWORD}" | helm registry login hub.promptlayer.com \
      --username "<registry-username>" \
      --password-stdin
    unset PL_REGISTRY_PASSWORD
    ```
  </Step>

  <Step title="Install sandbox-runtimes">
    ```bash theme={null}
    helm install <sandbox-runtimes-release-name> oci://hub.promptlayer.com/promptlayer/sandbox-runtimes/sandbox-runtimes \
      -f <sandbox-runtimes-values-file> \
      --version <sandbox-runtimes-chart-version> \
      --namespace <sandbox-namespace> \
      --create-namespace
    ```
  </Step>

  <Step title="Install sandboxes-api">
    ```bash theme={null}
    helm install <sandboxes-api-release-name> oci://hub.promptlayer.com/promptlayer/sandboxes-api/sandboxes-api \
      -f <sandboxes-api-values-file> \
      --version <sandboxes-api-chart-version> \
      --namespace <sandbox-namespace> \
      --create-namespace
    ```
  </Step>

  <Step title="Install promptlayer">
    ```bash theme={null}
    helm install <promptlayer-release-name> oci://hub.promptlayer.com/promptlayer/promptlayer \
      -f <promptlayer-values-file> \
      --version <promptlayer-chart-version> \
      --namespace <promptlayer-namespace> \
      --create-namespace
    ```
  </Step>
</Steps>

If your values files reference Kubernetes image pull secrets, create those secrets before installing the charts. Use the names and namespaces from your release manifest.

## Verify PromptLayer

Check the Helm releases:

```bash theme={null}
helm list -A
```

Check application pods:

```bash theme={null}
kubectl get pods -n <sandbox-namespace>
kubectl get pods -n <promptlayer-namespace>
```

Check ingress and DNS:

```bash theme={null}
kubectl get ingress -A
kubectl get svc -A
```

Pods should reach `Running` or `Completed` status. Ingress hostnames should resolve through the DNS records created for the deployment.

## Upgrade a release

For chart upgrades, use the chart version and values file from the release manifest:

```bash theme={null}
helm upgrade <release-name> oci://hub.promptlayer.com/promptlayer/<chart-path> \
  -f <values-file> \
  --version <chart-version> \
  --namespace <namespace>
```

Test chart upgrades in a staging environment before applying them to production.

## Troubleshooting

| Issue                             | What to check                                                                                                             |
| --------------------------------- | ------------------------------------------------------------------------------------------------------------------------- |
| `tofu init` cannot read state     | Confirm the generated `backend.tf` bucket, key, and region match the `remote_state_s3_*` values in the stage tfvars file. |
| OpenTofu state is locked          | Another apply may be running. Use force-unlock only after confirming no other process is active.                          |
| AWS access denied                 | Confirm the AWS identity has access to the state bucket and to create or update the services used by the stage.           |
| EKS API connection fails          | Confirm the public API CIDR list includes the runner IP, or run from a network that can reach the private endpoint.       |
| Certificate does not become ready | Check Route53 zone ID, DNS zone names, cert-manager logs, and DNS propagation.                                            |
| Pods stay pending                 | Check node group sizes, taints, tolerations, storage class, and PVC events.                                               |
| Pods restart repeatedly           | Check pod logs, Events, values files, image pull credentials, database endpoints, and secret names.                       |
| OpenSearch pods do not schedule   | Confirm the OpenSearch node groups, labels, taints, storage class, and admin password variable.                           |

If you need help with registry access, values files, or deployment issues, [contact our enterprise team](mailto:hello@promptlayer.com).


# Concentrate AI
Source: https://docs.promptlayer.com/features/concentrate-integration


[Concentrate AI](https://concentrate.ai) is an enterprise-grade LLM gateway and AI spend management platform, integrated with PromptLayer through a single OpenAI-compatible API. It helps engineering teams save on token costs through bulk-volume savings, improve reliability with automatic model fallbacks, and secure their stack with virtual keys, ZDR endpoints, full request/response logging, and real-time analytics across every provider, team, and project.

Concentrate provides access to a wide variety of models from major authors, including closed-source models like GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro, alongside premium open-source options from labs like DeepSeek, Qwen, and MiniMax.

## Setting Up Concentrate as a Custom Provider

To use Concentrate models in PromptLayer:

1. **Get a Concentrate API Key**: Sign up at [Concentrate AI](https://concentrate.ai) and obtain your API key from their dashboard
2. Navigate to **Settings → Custom Providers and Models** in your PromptLayer dashboard
3. Click **Create Custom Provider**
4. Configure the provider with the following details:
   * **Name**: Concentrate
   * **Client**: OpenAI (Concentrate uses OpenAI-compatible endpoints)
   * **Base URL**: `https://api.concentrate.ai/v1`
   * **API Key**: Your Concentrate API key

<Note>
  Concentrate exposes an OpenAI-compatible `/v1/responses` endpoint, which is why we select OpenAI as the client type.
</Note>

## Creating Custom Models (Recommended)

For easier model selection in the Playground and Prompt Registry, you can save specific Concentrate models:

1. Navigate to the **Custom Providers and Models** page
2. Find the **Concentrate** row and click the three-dot menu on that row
3. Click **Add model**
4. Enter the model details:
   * **Model Name**: Paste the model slug copied from [Concentrate's models page](https://concentrate.ai/models) (e.g., `gpt-5.5`, `claude-opus-4-7`, `anthropic/claude-opus-4-7`)
   * **Display Name**: A friendly name like "GPT 5.5" or "Claude Opus 4.7"
5. Optionally, customize parameters on the next page
6. Repeat for each model you want to use

The full list of available models can be found on [Concentrate's Model Fortress page](https://concentrate.ai/models).

## Available Models

Concentrate provides access to a vast catalog of models. You can use canonical names for automatic routing, or provider-prefixed names to pin a specific provider. Example models include:

* **`gpt-5.5`**: OpenAI's GPT-5.5 model
* **`claude-opus-4-7`**: Anthropic's Claude Opus 4.7 model
* **`gemini-3-1-pro-preview`**: Google's Gemini 3.1 Pro Preview model
* **`anthropic/claude-opus-4-7`**: Claude Opus 4.7 pinned specifically to the Anthropic provider
* **`auto`**: Let Concentrate automatically route to the best model based on cost, performance, or latency

For the complete and up-to-date list of available models, visit [Concentrate's Model Fortress page](https://concentrate.ai/models).

## Using Concentrate in PromptLayer

### In the Playground

After setup, you can use Concentrate models in the PromptLayer Playground:

1. Open the Playground
2. At the bottom of the screen (next to the tools and output controls), open the provider menu and select **Concentrate** as the LLM provider
3. Pick any model you've added to the Concentrate provider
4. Select the **Responses API** as the request format
5. Start querying with your prompts

<Tip>
  We recommend using the **Responses API** over Chat Completions whenever applicable — it provides better support for multi-turn interactions, tool use, and modern features. Fall back to Chat Completions only if a specific model or feature requires it.
</Tip>

### In the Prompt Registry

Concentrate models work seamlessly with PromptLayer's Prompt Registry:

* Select Concentrate models when creating or editing prompt templates
* Use templates with Concentrate models in evaluations
* Track and analyze Concentrate API usage alongside other providers

### Key Benefits

Concentrate provides:

* **Unified API**: One OpenAI-compatible endpoint across every major provider
* **Automatic failover**: Requests retry across backup providers when one is unavailable
* **Spend management**: Budget limits per API key, project, or org, with anomaly alerts
* **PII redaction and zero data retention**: Configurable per API key for sensitive workloads
* **Unified audit logs**: Consistent usage logs and analytics across every provider in one dashboard

## SDK Usage

Once you've set up your Concentrate custom provider and created a prompt template in the dashboard, you can run it programmatically with the PromptLayer SDK:

```python theme={null}
from promptlayer import PromptLayer

promptlayer = PromptLayer(api_key="pl_****")

# Run a prompt template that uses your Concentrate custom provider
response = promptlayer.run(
    prompt_name="your-concentrate-prompt",
    input_variables={"query": "your input"}
)

# Access the response
print(response["raw_response"].output_text)

# The request is automatically logged with request_id
print(f"Request ID: {response['request_id']}")
```

<Info>
  Using [`promptlayer.run()`](/sdks/python#using-the-run-method-recommended) ensures your requests are properly logged to PromptLayer and leverages your prompt templates from the Prompt Registry. This is the recommended approach for production use.
</Info>

## Frequently Asked Questions

### How do I use Concentrate AI with PromptLayer?

Concentrate AI integrates with PromptLayer as a Custom Provider using its OpenAI-compatible `/v1/responses` endpoint. Once configured under Settings / Custom Providers and Models, Concentrate models work natively in the PromptLayer Playground, Prompt Registry, and evaluations — and every request is logged through the PromptLayer SDK alongside your other providers.

### How is Concentrate different from OpenRouter?

While OpenRouter is a popular model marketplace, Concentrate AI is the best OpenRouter alternative / best LLM Gateway for enterprise teams.

### How is Concentrate different from LiteLLM?

LiteLLM is an open-source developer library for routing LLM calls that requires self-hosting and infrastructure maintenance. Teams must create their own keys, and commit spend to providers to get better rates and limits. Concentrate AI is the managed alternative to LiteLLM for production environments. It delivers a secure, enterprise-ready platform with no maintenance burden: provider routing, model fallbacks, ZDR endpoints, usage analytics, and one bill for all token usage out of the box.

### What is AI spend management?

AI spend management gives finance and engineering teams a shared, real-time view of API costs across providers, models, projects, and teams. Concentrate AI provides spend tracking, alerts, logging, and exportable reports so organizations can see and govern every dollar of AI spend.

## Related Documentation

* [Custom Providers](/features/custom-providers)
* [Supported Providers](/features/supported-providers)
* [Concentrate AI Documentation](https://concentrate.ai/docs)


# Configure Providers
Source: https://docs.promptlayer.com/features/custom-providers

Connect OpenAI-compatible providers and custom models to PromptLayer.

Custom providers let you connect to additional LLM providers beyond the built-in options, including DeepSeek, Grok, and more!

## Setting Up a Custom Provider

To add a custom provider to your workspace:

1. Navigate to **Settings → Custom Providers and Models**
2. Click the **Add Custom Provider** button
3. Configure the provider with the following details:

   * **Name**: A descriptive name for your provider (e.g., "DeepSeek")
   * **Client**: Select the appropriate client type for your provider's base URL
   * **Base URL**: The endpoint URL for your custom provider
   * **API Key**

<img alt="Custom Provider Modal" />

## Creating Custom Models

Once your provider is configured, you can define models for it:

1. In **Settings → Custom Providers and Models**, click on your custom provider row to expand it
2. Click **Create Custom Model**
3. Fill in the model configuration:

   * **Provider**: Select the custom provider you created earlier
   * **Model Name**: Choose from known models or enter a custom identifier
   * **Display Name**: A friendly name that appears in the prompt playground
   * **Model Type**: Specify whether this is a Chat or Completion model

<img alt="Custom Provider New Model" />

## Using Custom Models

After setup, your custom models seamlessly integrate with PromptLayer's features. You can:

* Select them in the Playground alongside standard models
* Use them in the Prompt Editor for template creation
* Track requests and analyze performance just like any other model

<img alt="Custom Provider Use" />

Custom providers give you complete control over your model infrastructure while maintaining all the benefits of PromptLayer's prompt management and observability features.

## Example Integrations

Looking for specific integration guides? See our detailed setup instructions for [OpenRouter](/features/openrouter-integration), [Exa](/features/exa-integration), and [xAI (Grok)](/features/xai-integration).

Follow the steps above to configure any OpenAI-compatible provider as a custom provider in PromptLayer.


# Getting Started
Source: https://docs.promptlayer.com/features/evaluations/building-pipelines


<Warning>
  Legacy Evaluations, Reports, and Datasets are deprecated for new workflows. Use [Tables](/features/tables/overview) for new evaluation, dataset, report, backtesting, and batch workflows. See [Migrate from Evaluations and Datasets](/features/tables/migrate-from-evaluations-and-datasets).
</Warning>

<iframe title="YouTube video player" />

The overall process of building an evaluation pipeline looks like this:

1. **Select Your Dataset**: Choose or upload datasets to serve as the basis for your evaluations, whether for scoring, regression testing, or bulk job processing.
2. **Build Your Pipeline**: Start by visually constructing your evaluation pipeline, defining each step from input data processing to final evaluation.
3. **Run Evaluations**: Execute your pipeline, observe the results in a spreadsheet-like interface, and make informed decisions based on comprehensive metrics and scores.

## Creating a Pipeline

1. **Initiate a Batch Run**: Start by creating a new batch run, which requires specifying a name and selecting a dataset.
2. **Dataset Selection**: Upload a CSV/JSON dataset, or create a dataset from historical data using filters like time range, prompt template logs, scores, and metadata. [Learn more here.](/features/evaluations/datasets-overview)

You now have a pipeline. Preview mode allows you to iterate with live feedback, allowing for adjustments in real-time.

## Setting up the Pipeline

### Adding Steps

Click 'Add Step' to start building your pipeline, with each column representing a step in the evaluation process.

Steps execute in order left to right. That means that if a column depends on a previous column, make sure it appears to the right of the dependency.

#### Common Step Types

* **Prompt Template**: Select a prompt template from the registry, set model parameters, LLM, arguments, and template version.
* **Custom API Endpoint**: Define a URL to send and receive data, suitable for custom evaluators or external systems.
* **Human Input**: Engage human graders by adding a step that allows for textual input.
* **String Comparison**: Use this step to compare the outputs of two previous step, showing a visual diff when relevant.
* **LLM Assertion**: Use an AI judge to score whether an output satisfies a natural-language criterion.

<Frame>
  <img alt="Eval pipeline setup" />
</Frame>

For model comparison, add multiple **Prompt Template** columns that use the same prompt with different model overrides. See [Compare Models](/onboarding-guides/compare-models).

#### Scoring

If the last step of your evaluation pipeline contains all booleans or numeric values, that will be consider the score for the row. Your full evaluation report will have a scorecard of the average of this last step.

*NOTE: All cells in the last column must be boolean or all must be numeric. If any cell deviates, the score will not be calculated*

## Executing Full Batch Runs

Transition from pipeline to full batch run to apply your pipeline across the entire dataset for comprehensive evaluation.


# Node & Column Types
Source: https://docs.promptlayer.com/features/evaluations/column-types

Complete reference for all node types used in Workflows and evaluation pipelines

<Warning>
  Legacy Evaluations, Reports, and Datasets are deprecated for new workflows. Use [Tables](/features/tables/overview) for new evaluation, dataset, report, backtesting, and batch workflows. See [Migrate from Evaluations and Datasets](/features/tables/migrate-from-evaluations-and-datasets).
</Warning>

This page documents all available node types for Workflows and column types for evaluation pipelines. Workflows and evaluations share the same node types—each has specific configuration options that determine its behavior.

<Note>
  In Workflows, these are called **nodes**. In evaluation pipelines, they're called
  **columns**. The configuration is identical.
</Note>

## How Column Sources Work

Columns can reference data from two places:

1. **Dataset columns** - Reference data directly from your dataset by using the dataset column name
2. **Other evaluation columns** - Reference the output of a previous column by using that column's `name`

When you specify a `source` or include a column name in `sources`, the system first looks for an evaluation column with that name, then falls back to looking for a dataset column.

<Info>
  Columns are executed in order based on their `position`. A column can only
  reference other columns that come before it in the pipeline.
</Info>

### Example: Chaining Columns Together

A common pattern is to chain columns: run a prompt, extract a field from the JSON output, then compare it to a ground truth value from the dataset.

```python theme={null}
columns = [
    # Step 1: Run the prompt template (position 1)
    {
        "column_type": "PROMPT_TEMPLATE",
        "name": "LLM Output",  # Other columns reference this name
        "configuration": {
            "template": {"name": "my-prompt"},
            "prompt_template_variable_mappings": {
                "question": "user_question"  # Maps to dataset column
            }
        }
    },
    # Step 2: Extract a field from the LLM output (position 2)
    {
        "column_type": "JSON_PATH",
        "name": "Extracted Status",
        "configuration": {
            "source": "LLM Output",  # References the column above
            "json_path": "$.status"
        }
    },
    # Step 3: Compare extracted value to dataset ground truth (position 3)
    {
        "column_type": "COMPARE",
        "name": "Status Match",
        "configuration": {
            "sources": [
                "Extracted Status",   # References the column above
                "expected_status"     # References a dataset column
            ],
            "comparison_type": {"type": "STRING"}
        },
        "is_part_of_score": True
    }
]
```

## Execution Types

These columns execute prompts, code, or external services.

<AccordionGroup>
  <Accordion title="Prompt Template">
    Runs a prompt template against each row. You can reference a template from the Prompt Registry or define one inline.

    **Registry Reference (using `template`)**

    | Field                               | Type    | Required | Description                                           |
    | ----------------------------------- | ------- | -------- | ----------------------------------------------------- |
    | `template.name`                     | string  | Yes      | Name of the prompt template                           |
    | `template.version_number`           | integer | No       | Specific version number. Uses latest if omitted       |
    | `template.label`                    | string  | No       | Release label to use, e.g. "production"               |
    | `prompt_template_variable_mappings` | object  | Yes      | Maps template input variables to dataset/column names |
    | `engine`                            | object  | No       | Override the template's default model settings        |
    | `engine.provider`                   | string  | No       | Provider name, e.g. "openai", "anthropic"             |
    | `engine.model`                      | string  | No       | Model name, e.g. "gpt-4", "claude-3-opus"             |
    | `engine.parameters`                 | object  | No       | Model parameters like temperature, max\_tokens        |

    <Info>
      The `prompt_template_variable_mappings` object maps **prompt input variables** (keys) to **dataset or column names** (values). The key is the variable name in your prompt template (e.g., `{{question}}`), and the value is where to get the data from.
    </Info>

    ```json theme={null}
    {
      "column_type": "PROMPT_TEMPLATE",
      "name": "Generate Response",
      "configuration": {
        "template": {
          "name": "my-prompt",
          "label": "production"
        },
        "prompt_template_variable_mappings": {
          "question": "user_question",
          "context": "retrieved_context"
        }
      }
    }
    ```

    **Complete example with all input variables:**

    If your prompt template has variables `{{company}}`, `{{product}}`, and `{{query}}`, map each one:

    ```json theme={null}
    {
      "column_type": "PROMPT_TEMPLATE",
      "name": "Product Analysis",
      "configuration": {
        "template": {
          "name": "product-analyzer"
        },
        "prompt_template_variable_mappings": {
          "company": "company_name",
          "product": "product_name",
          "query": "user_query"
        }
      }
    }
    ```

    **Inline Template (using `inline_template`)**

    Define a prompt template directly in the configuration without saving it to the registry. This is useful for quick experimentation or one-off evaluations.

    | Field                                   | Type    | Required | Description                                           |
    | --------------------------------------- | ------- | -------- | ----------------------------------------------------- |
    | `inline_template.inline`                | boolean | Yes      | Must be `true`                                        |
    | `inline_template.prompt_template`       | object  | Yes      | The template content (chat or completion format)      |
    | `inline_template.metadata`              | object  | No       | Model configuration (provider, name, parameters)      |
    | `inline_template.source_prompt_name`    | string  | No       | Name of the registry prompt this was derived from     |
    | `inline_template.source_prompt_version` | integer | No       | Version number of the source prompt                   |
    | `prompt_template_variable_mappings`     | object  | Yes      | Maps template input variables to dataset/column names |

    ```json theme={null}
    {
      "column_type": "PROMPT_TEMPLATE",
      "name": "Generate Response",
      "configuration": {
        "inline_template": {
          "inline": true,
          "prompt_template": {
            "type": "chat",
            "messages": [
              {
                "role": "system",
                "content": [{"type": "text", "text": "You are a helpful assistant."}]
              },
              {
                "role": "user",
                "content": [{"type": "text", "text": "Answer: {question}"}]
              }
            ]
          },
          "metadata": {
            "model": {
              "provider": "openai",
              "name": "gpt-4",
              "parameters": {"temperature": 0.7}
            }
          }
        },
        "prompt_template_variable_mappings": {
          "question": "user_question"
        }
      }
    }
    ```

    <Warning>
      You must provide exactly one of `template` or `inline_template`. They are mutually exclusive.
    </Warning>
  </Accordion>

  <Accordion title="Code Execution">
    Executes custom Python or JavaScript code. The code receives a `data` dictionary containing all column values for the current row.

    | Field      | Type   | Required | Description              |
    | ---------- | ------ | -------- | ------------------------ |
    | `code`     | string | Yes      | The code to execute      |
    | `language` | string | Yes      | "PYTHON" or "JAVASCRIPT" |

    ```json theme={null}
    {
      "column_type": "CODE_EXECUTION",
      "name": "Custom Logic",
      "configuration": {
        "code": "result = len(data['response'].split())\nreturn result",
        "language": "PYTHON"
      }
    }
    ```
  </Accordion>

  <Accordion title="Endpoint">
    Calls an external HTTP endpoint. The request body contains all column values for the current row.

    | Field     | Type   | Required | Description             |
    | --------- | ------ | -------- | ----------------------- |
    | `url`     | string | Yes      | The HTTP endpoint URL   |
    | `headers` | object | No       | HTTP headers to include |

    ```json theme={null}
    {
      "column_type": "ENDPOINT",
      "name": "External Validator",
      "configuration": {
        "url": "https://api.example.com/validate",
        "headers": {
          "Authorization": "Bearer token123"
        }
      }
    }
    ```
  </Accordion>

  <Accordion title="Workflow">
    Runs a PromptLayer workflow.

    | Field                     | Type    | Required | Description                              |
    | ------------------------- | ------- | -------- | ---------------------------------------- |
    | `workflow_id`             | integer | Yes      | ID of the workflow to run                |
    | `workflow_version_number` | integer | No       | Specific version. Uses latest if omitted |
    | `workflow_label`          | string  | No       | Release label to use                     |
    | `input_mappings`          | object  | Yes      | Maps workflow inputs to column names     |

    ```json theme={null}
    {
      "column_type": "WORKFLOW",
      "name": "Run Analysis Workflow",
      "configuration": {
        "workflow_id": 123,
        "input_mappings": {
          "input_text": "response"
        }
      }
    }
    ```
  </Accordion>

  <Accordion title="MCP">
    Executes an MCP (Model Context Protocol) action.

    | Field            | Type    | Required | Description                      |
    | ---------------- | ------- | -------- | -------------------------------- |
    | `mcp_server_id`  | integer | Yes      | ID of the MCP server             |
    | `tool_name`      | string  | Yes      | Name of the tool to call         |
    | `input_mappings` | object  | Yes      | Maps tool inputs to column names |

    ```json theme={null}
    {
      "column_type": "MCP",
      "name": "MCP Tool Call",
      "configuration": {
        "mcp_server_id": 456,
        "tool_name": "search",
        "input_mappings": {
          "query": "search_query"
        }
      }
    }
    ```
  </Accordion>

  <Accordion title="Human">
    Adds a column for manual human evaluation.

    | Field        | Type   | Required | Description                    |
    | ------------ | ------ | -------- | ------------------------------ |
    | `data_type`  | string | Yes      | "number" or "string"           |
    | `ui_element` | object | Yes      | UI configuration for the input |

    ```json theme={null}
    {
      "column_type": "HUMAN",
      "name": "Human Rating",
      "configuration": {
        "data_type": "number",
        "ui_element": {
          "type": "slider",
          "min": 1,
          "max": 5
        }
      }
    }
    ```
  </Accordion>

  <Accordion title="Conversation Simulator">
    Simulates multi-turn conversations to test chatbots and conversational agents. An AI-powered user persona engages in realistic dialogue with your prompt template, allowing you to evaluate how well your agent handles extended interactions.

    | Field                                  | Type    | Required    | Description                                                                                                                                                            |
    | -------------------------------------- | ------- | ----------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
    | `template.name`                        | string  | Yes         | Name of the prompt template to test                                                                                                                                    |
    | `template.version_number`              | integer | No          | Specific version number                                                                                                                                                |
    | `template.label`                       | string  | No          | Release label to use                                                                                                                                                   |
    | `prompt_template_variable_mappings`    | object  | Yes         | Maps template input variables to dataset columns                                                                                                                       |
    | `user_persona`                         | string  | Conditional | Static persona description. Required if `user_persona_source` not set                                                                                                  |
    | `user_persona_source`                  | string  | Conditional | Column name containing the persona. Required if `user_persona` not set                                                                                                 |
    | `conversation_completed_prompt`        | string  | No          | Guidance for when to consider the conversation complete (e.g., "End when the user confirms their order" or "Complete when the assistant calls the submit\_order tool") |
    | `conversation_completed_prompt_source` | string  | No          | Column name containing the completion guidance. Use instead of `conversation_completed_prompt` for dynamic guidance                                                    |
    | `is_user_first`                        | boolean | No          | If true, simulated user sends the first message (default: false)                                                                                                       |
    | `max_turns`                            | integer | No          | Maximum conversation turns (default: system setting, max: 150)                                                                                                         |
    | `conversation_samples`                 | array   | No          | Example conversations to guide the simulation style                                                                                                                    |

    <Info>
      The `user_persona` defines how the simulated user behaves - their goals, communication style, and what questions they ask. Use `user_persona_source` to pull different personas from your dataset for varied test scenarios.
    </Info>

    <Info>
      The `conversation_completed_prompt` provides explicit guidance for determining when a conversation should end. This is useful for defining specific end conditions like tool calls, confirmation messages, or goal achievement. The guidance can be holistic (general rules) or specific (look for a certain phrase or tool call).
    </Info>

    **Basic example with static persona:**

    ```json theme={null}
    {
      "column_type": "CONVERSATION_SIMULATOR",
      "name": "Support Chat Test",
      "configuration": {
        "template": {
          "name": "customer-support-bot"
        },
        "prompt_template_variable_mappings": {
          "customer_name": "customer_name",
          "product": "product_name"
        },
        "user_persona": "You are a frustrated customer who purchased a defective product. You want a refund but will accept a replacement if the agent is helpful. Ask follow-up questions and push back on unhelpful responses.",
        "is_user_first": true,
        "max_turns": 5
      }
    }
    ```

    **Dynamic personas from dataset:**

    For comprehensive testing, store different user personas in your dataset to test various scenarios:

    ```json theme={null}
    {
      "column_type": "CONVERSATION_SIMULATOR",
      "name": "Multi-Scenario Test",
      "configuration": {
        "template": {
          "name": "sales-assistant"
        },
        "prompt_template_variable_mappings": {
          "rep_name": "rep_name",
          "product": "product_name",
          "customer_context": "customer_context"
        },
        "user_persona_source": "test_persona",
        "is_user_first": true,
        "max_turns": 6
      }
    }
    ```

    Where your dataset has a `test_persona` column with different personas:

    * Row 1: "You are a busy executive who needs quick answers. Be impatient if responses are too long."
    * Row 2: "You are a technical user who asks detailed follow-up questions about implementation."
    * Row 3: "You are price-sensitive and keep asking about discounts and alternatives."

    **Custom completion conditions:**

    Use `conversation_completed_prompt` to define specific end conditions for your conversations:

    ```json theme={null}
    {
      "column_type": "CONVERSATION_SIMULATOR",
      "name": "Order Flow Test",
      "configuration": {
        "template": {
          "name": "order-assistant"
        },
        "prompt_template_variable_mappings": {
          "customer_name": "customer_name",
          "order_items": "items"
        },
        "user_persona": "You are a customer placing an order. Provide your shipping address and payment method when asked.",
        "conversation_completed_prompt": "The conversation is complete when the assistant calls the submit_order tool or confirms that the order has been placed successfully.",
        "max_turns": 10
      }
    }
    ```

    You can also use `conversation_completed_prompt_source` to pull completion guidance from your dataset:

    ```json theme={null}
    {
      "column_type": "CONVERSATION_SIMULATOR",
      "name": "Goal-Based Test",
      "configuration": {
        "template": {
          "name": "support-agent"
        },
        "prompt_template_variable_mappings": {
          "context": "support_context"
        },
        "user_persona_source": "test_persona",
        "conversation_completed_prompt_source": "completion_condition",
        "max_turns": 8
      }
    }
    ```

    Where your dataset has a `completion_condition` column with different end conditions:

    * Row 1: "End when the user says 'thank you' or indicates satisfaction"
    * Row 2: "Complete when the assistant provides a ticket number"
    * Row 3: "End when the refund\_process tool is called"

    **Evaluating conversation quality:**

    Chain with `LLM_ASSERTION` to evaluate the full conversation:

    ```json theme={null}
    {
      "column_type": "LLM_ASSERTION",
      "name": "Conversation Quality",
      "configuration": {
        "source": "Support Chat Test",
        "prompt": "Did the agent maintain a professional tone throughout and successfully resolve the customer's issue?"
      },
      "is_part_of_score": true
    }
    ```
  </Accordion>
</AccordionGroup>

## Loop Types

These nodes enable iterating over collections or executing repeated operations within Agents.

<AccordionGroup>
  <Accordion title="For Loop">
    Iterates over a collection of items or runs a fixed number of times, executing a prompt template or sub-workflow on each iteration.

    | Field                | Type    | Required    | Description                                                                 |
    | -------------------- | ------- | ----------- | --------------------------------------------------------------------------- |
    | `loop_type`          | string  | Yes         | `"prompt"` or `"workflow"`                                                  |
    | `prompt_config`      | object  | Conditional | Configuration for prompt execution (required if `loop_type` = "prompt")     |
    | `workflow_config`    | object  | Conditional | Configuration for workflow execution (required if `loop_type` = "workflow") |
    | `iterator_source`    | string  | Conditional | Source node providing the collection to iterate over                        |
    | `max_iterations`     | integer | Conditional | Fixed number of iterations (mutually exclusive with `iterator_source`)      |
    | `return_all_outputs` | boolean | No          | Return all outputs from each iteration (default: false)                     |
    | `variable_mappings`  | object  | No          | Maps template variables to source nodes or special loop variables           |

    **Special loop variables for `variable_mappings`:**

    * `loop_index` - Current iteration index (0-based)
    * `previous_outputs` - Array of all outputs from previous iterations
    * `_iterator_item` - Current item from the iterated collection

    <Warning>
      Exactly **one** of `iterator_source` or `max_iterations` must be provided.
    </Warning>

    ```json theme={null}
    {
      "node_type": "FOR_LOOP",
      "name": "process_items",
      "is_output_node": true,
      "dependencies": ["items"],
      "configuration": {
        "loop_type": "prompt",
        "iterator_source": "items",
        "prompt_config": {
          "template": {
            "name": "item-processor",
            "label": "production"
          },
          "prompt_template_variable_mappings": {
            "item": "_iterator_item",
            "index": "loop_index"
          }
        },
        "variable_mappings": {
          "item": "_iterator_item",
          "index": "loop_index"
        }
      }
    }
    ```

    **Output structure:**

    ```json theme={null}
    {
      "iterations": 5,
      "outputs": ["output1", "output2", "output3", "output4", "output5"],
      "final_output": "output5"
    }
    ```
  </Accordion>

  <Accordion title="While Loop">
    Executes repeatedly until an end condition is met or maximum iterations are reached.

    | Field                     | Type    | Required    | Description                                                                 |
    | ------------------------- | ------- | ----------- | --------------------------------------------------------------------------- |
    | `loop_type`               | string  | Yes         | `"prompt"` or `"workflow"`                                                  |
    | `prompt_config`           | object  | Conditional | Configuration for prompt execution (required if `loop_type` = "prompt")     |
    | `workflow_config`         | object  | Conditional | Configuration for workflow execution (required if `loop_type` = "workflow") |
    | `end_condition_json_path` | string  | No          | JSONPath expression to evaluate termination (loop stops when truthy)        |
    | `max_iterations`          | integer | No          | Maximum iterations (defaults to system limit)                               |
    | `return_all_outputs`      | boolean | No          | Return all outputs (default: false)                                         |
    | `variable_mappings`       | object  | No          | Maps template variables to source nodes or special variables                |

    <Info>
      **Termination behavior:**

      * If `end_condition_json_path` is set: Loop ends when JSONPath extracts a truthy value
      * If not set: Loop ends when output is falsy (empty, null, false)
    </Info>

    ```json theme={null}
    {
      "node_type": "WHILE_LOOP",
      "name": "refine_loop",
      "is_output_node": true,
      "dependencies": ["initial_draft"],
      "configuration": {
        "loop_type": "prompt",
        "max_iterations": 5,
        "end_condition_json_path": "$.is_complete",
        "prompt_config": {
          "template": {
            "name": "text-refiner",
            "label": "production"
          },
          "prompt_template_variable_mappings": {
            "text": "initial_draft",
            "previous_results": "previous_outputs"
          }
        },
        "variable_mappings": {
          "text": "initial_draft",
          "previous_results": "previous_outputs",
          "iteration": "loop_index"
        }
      }
    }
    ```
  </Accordion>
</AccordionGroup>

## Evaluation Types

These columns evaluate or compare data and typically return boolean or numeric scores.

<AccordionGroup>
  <Accordion title="LLM Assertion">
    Uses an LLM to evaluate content against a natural language prompt. Returns a boolean indicating pass/fail.

    | Field           | Type   | Required    | Description                                                     |
    | --------------- | ------ | ----------- | --------------------------------------------------------------- |
    | `source`        | string | Yes         | Column name containing the content to evaluate                  |
    | `prompt`        | string | Conditional | The assertion prompt. Required if `prompt_source` not set       |
    | `prompt_source` | string | Conditional | Column name containing the prompt. Required if `prompt` not set |

    **Basic example with static prompt:**

    ```json theme={null}
    {
      "column_type": "LLM_ASSERTION",
      "name": "Quality Check",
      "configuration": {
        "source": "response",
        "prompt": "Is this response helpful, accurate, and free of harmful content?"
      },
      "is_part_of_score": true
    }
    ```

    **Dynamic prompts from dataset:**

    Use `prompt_source` to pull assertion prompts from a dataset column. This lets you define different assertions per row.

    ```json theme={null}
    {
      "column_type": "LLM_ASSERTION",
      "name": "Custom Assertions",
      "configuration": {
        "source": "LLM Output",
        "prompt_source": "assertions"
      },
      "is_part_of_score": true
    }
    ```

    Where your dataset has an `assertions` column containing the prompt text for each row.

    **Multiple assertions per row:**

    You can run multiple assertions against the same content by providing a JSON array of prompts. Each assertion is evaluated independently, and the results are returned as a dictionary.

    ```json theme={null}
    {
      "column_type": "LLM_ASSERTION",
      "name": "Compliance Checks",
      "configuration": {
        "source": "LLM Output",
        "prompt_source": "llm_assertions"
      },
      "is_part_of_score": true
    }
    ```

    Where your dataset's `llm_assertions` column contains a JSON array:

    ```json theme={null}
    "[\"Does the response avoid making unauthorized claims?\", \"Is patient data properly redacted?\", \"Does it cite approved sources only?\"]"
    ```

    The output will be a dictionary with each assertion as a key and its boolean result as the value.
  </Accordion>

  <Accordion title="Compare">
    Compares two values for equality. Supports string comparison and JSON comparison with optional JSONPath.

    | Field                       | Type   | Required | Description                                              |
    | --------------------------- | ------ | -------- | -------------------------------------------------------- |
    | `sources`                   | array  | Yes      | Array of exactly 2 column names to compare               |
    | `comparison_type.type`      | string | Yes      | "STRING" or "JSON"                                       |
    | `comparison_type.json_path` | string | No       | JSONPath to extract before comparing. Only for JSON type |

    ```json theme={null}
    {
      "column_type": "COMPARE",
      "name": "Accuracy",
      "configuration": {
        "sources": ["predicted_value", "ground_truth"],
        "comparison_type": {"type": "STRING"}
      },
      "is_part_of_score": true
    }
    ```

    With JSON path:

    ```json theme={null}
    {
      "column_type": "COMPARE",
      "name": "JSON Field Match",
      "configuration": {
        "sources": ["api_response", "expected_response"],
        "comparison_type": {
          "type": "JSON",
          "json_path": "$.result.status"
        }
      }
    }
    ```
  </Accordion>

  <Accordion title="Contains">
    Checks if a value contains a substring (case-insensitive).

    | Field          | Type   | Required    | Description                                                     |
    | -------------- | ------ | ----------- | --------------------------------------------------------------- |
    | `source`       | string | Yes         | Column name to search in                                        |
    | `value`        | string | Conditional | Static substring to find. Required if value\_source not set     |
    | `value_source` | string | Conditional | Column name containing the substring. Required if value not set |

    ```json theme={null}
    {
      "column_type": "CONTAINS",
      "name": "Has Keyword",
      "configuration": {
        "source": "response",
        "value": "thank you"
      }
    }
    ```
  </Accordion>

  <Accordion title="Regex">
    Tests if content matches a regular expression pattern. Returns boolean.

    | Field           | Type   | Required | Description                |
    | --------------- | ------ | -------- | -------------------------- |
    | `source`        | string | Yes      | Column name to test        |
    | `regex_pattern` | string | Yes      | Regular expression pattern |

    ```json theme={null}
    {
      "column_type": "REGEX",
      "name": "Valid Email Format",
      "configuration": {
        "source": "email_field",
        "regex_pattern": "^[\\w.-]+@[\\w.-]+\\.\\w+$"
      }
    }
    ```
  </Accordion>

  <Accordion title="Cosine Similarity">
    Calculates semantic similarity between two texts using embeddings. Returns a float between 0 and 1.

    | Field     | Type  | Required | Description                                |
    | --------- | ----- | -------- | ------------------------------------------ |
    | `sources` | array | Yes      | Array of exactly 2 column names to compare |

    ```json theme={null}
    {
      "column_type": "COSINE_SIMILARITY",
      "name": "Semantic Similarity",
      "configuration": {
        "sources": ["generated_response", "reference_response"]
      },
      "is_part_of_score": true
    }
    ```
  </Accordion>

  <Accordion title="Absolute Numeric Distance">
    Calculates the absolute difference between two numeric values.

    | Field     | Type  | Required | Description                                        |
    | --------- | ----- | -------- | -------------------------------------------------- |
    | `sources` | array | Yes      | Array of exactly 2 column names containing numbers |

    ```json theme={null}
    {
      "column_type": "ABSOLUTE_NUMERIC_DISTANCE",
      "name": "Score Difference",
      "configuration": {
        "sources": ["predicted_score", "actual_score"]
      }
    }
    ```
  </Accordion>

  <Accordion title="AI Data Extraction">
    Uses an LLM to extract specific information from content based on a natural language query.

    | Field    | Type   | Required | Description                                     |
    | -------- | ------ | -------- | ----------------------------------------------- |
    | `source` | string | Yes      | Column name containing the content              |
    | `query`  | string | Yes      | Natural language description of what to extract |

    ```json theme={null}
    {
      "column_type": "AI_DATA_EXTRACTION",
      "name": "Extract Sentiment",
      "configuration": {
        "source": "response",
        "query": "What is the overall sentiment? Return only: positive, negative, or neutral"
      }
    }
    ```
  </Accordion>
</AccordionGroup>

## Extraction Types

These columns extract or parse data from other columns.

<AccordionGroup>
  <Accordion title="JSON Path">
    Extracts data from JSON using JSONPath expressions.

    | Field                | Type    | Required | Description                                               |
    | -------------------- | ------- | -------- | --------------------------------------------------------- |
    | `source`             | string  | Yes      | Column name containing JSON data                          |
    | `json_path`          | string  | Yes      | JSONPath expression (e.g., "$.field", "$.items\[0].name") |
    | `return_first_match` | boolean | No       | Return only first match (default: true) or all matches    |

    ```json theme={null}
    {
      "column_type": "JSON_PATH",
      "name": "Extract Agent",
      "configuration": {
        "source": "llm_output",
        "json_path": "$.selected_agent",
        "return_first_match": true
      }
    }
    ```
  </Accordion>

  <Accordion title="XML Path">
    Extracts data from XML using XPath expressions.

    | Field         | Type    | Required | Description                                                          |
    | ------------- | ------- | -------- | -------------------------------------------------------------------- |
    | `source`      | string  | Yes      | Column name containing XML data                                      |
    | `xml_path`    | string  | Yes      | XPath expression                                                     |
    | `type`        | string  | No       | "find" for first match or "findall" for all matches. Default: "find" |
    | `return_text` | boolean | No       | Return text content only or full XML. Default: true                  |

    ```json theme={null}
    {
      "column_type": "XML_PATH",
      "name": "Extract Title",
      "configuration": {
        "source": "xml_response",
        "xml_path": ".//item/title",
        "type": "find",
        "return_text": true
      }
    }
    ```
  </Accordion>

  <Accordion title="Regex Extraction">
    Extracts content matching a regular expression pattern. Returns an array of all matches.

    | Field           | Type   | Required | Description                 |
    | --------------- | ------ | -------- | --------------------------- |
    | `source`        | string | Yes      | Column name to extract from |
    | `regex_pattern` | string | Yes      | Regular expression pattern  |

    ```json theme={null}
    {
      "column_type": "REGEX_EXTRACTION",
      "name": "Extract Numbers",
      "configuration": {
        "source": "text_content",
        "regex_pattern": "\\d+\\.?\\d*"
      }
    }
    ```
  </Accordion>

  <Accordion title="Parse Value">
    Parses and converts a value to a specific type.

    | Field    | Type   | Required | Description                                             |
    | -------- | ------ | -------- | ------------------------------------------------------- |
    | `source` | string | Yes      | Column name to parse                                    |
    | `type`   | string | Yes      | Target type: "string", "number", "boolean", or "object" |

    ```json theme={null}
    {
      "column_type": "PARSE_VALUE",
      "name": "Parse Score",
      "configuration": {
        "source": "score_string",
        "type": "number"
      }
    }
    ```
  </Accordion>
</AccordionGroup>

## Transformation Types

These columns transform, combine, or validate data.

<AccordionGroup>
  <Accordion title="Variable">
    Creates a static value that can be referenced by other columns.

    | Field         | Type   | Required | Description        |
    | ------------- | ------ | -------- | ------------------ |
    | `value.type`  | string | Yes      | "string" or "json" |
    | `value.value` | any    | Yes      | The static value   |

    String variable:

    ```json theme={null}
    {
      "column_type": "VARIABLE",
      "name": "Environment",
      "configuration": {
        "value": {
          "type": "string",
          "value": "production"
        }
      }
    }
    ```

    JSON variable:

    ```json theme={null}
    {
      "column_type": "VARIABLE",
      "name": "Config",
      "configuration": {
        "value": {
          "type": "json",
          "value": {"threshold": 0.8, "max_retries": 3}
        }
      }
    }
    ```
  </Accordion>

  <Accordion title="Assert Valid">
    Validates that data is in a valid format. Returns boolean.

    | Field    | Type   | Required | Description                                                  |
    | -------- | ------ | -------- | ------------------------------------------------------------ |
    | `source` | string | Yes      | Column name to validate                                      |
    | `type`   | string | Yes      | Expected format: "object" for valid JSON, "number", or "sql" |

    ```json theme={null}
    {
      "column_type": "ASSERT_VALID",
      "name": "Is Valid JSON",
      "configuration": {
        "source": "api_response",
        "type": "object"
      }
    }
    ```
  </Accordion>

  <Accordion title="Coalesce">
    Returns the first non-null value from multiple sources.

    | Field     | Type  | Required | Description                      |
    | --------- | ----- | -------- | -------------------------------- |
    | `sources` | array | Yes      | Array of column names, minimum 2 |

    ```json theme={null}
    {
      "column_type": "COALESCE",
      "name": "Best Response",
      "configuration": {
        "sources": ["primary_response", "fallback_response", "default_response"]
      }
    }
    ```
  </Accordion>

  <Accordion title="Combine Columns">
    Combines multiple column values into a single dictionary object.

    | Field     | Type  | Required | Description                      |
    | --------- | ----- | -------- | -------------------------------- |
    | `sources` | array | Yes      | Array of column names to combine |

    ```json theme={null}
    {
      "column_type": "COMBINE_COLUMNS",
      "name": "Combined Context",
      "configuration": {
        "sources": ["question", "context", "metadata"]
      }
    }
    ```
  </Accordion>

  <Accordion title="Count">
    Counts occurrences in text content.

    | Field    | Type   | Required | Description                                                   |
    | -------- | ------ | -------- | ------------------------------------------------------------- |
    | `source` | string | Yes      | Column name to count in                                       |
    | `type`   | string | Yes      | What to count: "chars", "words", "sentences", or "paragraphs" |

    ```json theme={null}
    {
      "column_type": "COUNT",
      "name": "Word Count",
      "configuration": {
        "source": "response",
        "type": "words"
      }
    }
    ```
  </Accordion>

  <Accordion title="Math Operator">
    Performs numeric comparisons. Returns boolean.

    | Field      | Type   | Required    | Description                                                                                                       |
    | ---------- | ------ | ----------- | ----------------------------------------------------------------------------------------------------------------- |
    | `sources`  | array  | Yes         | Array with first source column, and optionally second source column                                               |
    | `operator` | string | Yes         | Comparison operator: "lt" for less than, "le" for less or equal, "gt" for greater than, "ge" for greater or equal |
    | `value`    | number | Conditional | Static value to compare against. Required if second source not provided                                           |

    Compare to static value:

    ```json theme={null}
    {
      "column_type": "MATH_OPERATOR",
      "name": "Above Threshold",
      "configuration": {
        "sources": ["score"],
        "operator": "ge",
        "value": 0.8
      }
    }
    ```

    Compare two columns:

    ```json theme={null}
    {
      "column_type": "MATH_OPERATOR",
      "name": "A Greater Than B",
      "configuration": {
        "sources": ["score_a", "score_b"],
        "operator": "gt"
      }
    }
    ```
  </Accordion>

  <Accordion title="Min/Max">
    Finds the minimum or maximum value from an array or JSON structure.

    | Field       | Type   | Required | Description                                        |
    | ----------- | ------ | -------- | -------------------------------------------------- |
    | `source`    | string | Yes      | Column name containing the data                    |
    | `type`      | string | Yes      | "min" or "max"                                     |
    | `json_path` | string | No       | JSONPath to extract values from, if source is JSON |

    ```json theme={null}
    {
      "column_type": "MIN_MAX",
      "name": "Highest Score",
      "configuration": {
        "source": "scores_array",
        "type": "max",
        "json_path": "$[*].value"
      }
    }
    ```
  </Accordion>
</AccordionGroup>


# Continuous Integration
Source: https://docs.promptlayer.com/features/evaluations/continuous-integration


<Warning>
  Legacy Evaluations, Reports, and Datasets are deprecated for new workflows. Use [Tables](/features/tables/overview) for new evaluation, dataset, report, backtesting, and batch workflows. See [Migrate from Evaluations and Datasets](/features/tables/migrate-from-evaluations-and-datasets).
</Warning>

Continuous Integration (CI) of prompt evaluations is the holy grail of prompt engineering. 🏆

CI in the context of prompt engineering involves the automated testing and validation of prompts every time a new version is created or updated. LLMs are a probabilistic technology. It is hard (read: virtually impossible) to ensure a new prompt version doesn't break old user behavior just by eyeballing the prompt. Rigorous testing is the best tool we have.

We believe that it's important to both allow subject-matter experts to write new prompts and provide them with tools to easily test if the prompts broke anything. That's where PromptLayer evaluations comes in.

## Test-driven Prompt Engineering

Similar to test-driven development (TDD) in software engineering, test-driven prompt engineering involves writing and running evaluations against new prompt versions before they are used in production. This proactive testing ensures that new prompts meet predefined criteria and behave as expected, minimizing the risk of unintended consequences.

Setting up automatic evaluations on a specific prompt template is easy. When creating a new version, after adding a commit message, you will be prompted to select an evaluation pipeline to run. After doing this once, every new prompt template you create will run this pipeline by default.

**NOTE**: Make sure your evaluation pipeline uses the "latest" version of the prompt template in its column step. The template is fetched at runtime. If you specify a frozen version, the evaluation report won't reflect your newest prompt template.

<Frame>
  <img alt="Eval scores by version" />
</Frame>

## Testing Strategies

### Backtesting

Backtesting involves running new prompt versions against a dataset compiled from historical production data. This strategy provides a real-world context for evaluating prompts, allowing you to assess how new versions would have performed under past conditions. It's an effective way to detect potential regressions and validate improvements, ensuring that updates enhance rather than detract from the user experience.

To set up backtests, follow the steps below:

**1. Create a historical dataset**

<img alt="Create a New Dataset" />

[Create a dataset](/features/evaluations/datasets-create-from-history) using a search query. For example, I might want to create a dataset using all logged requests:

* That use `my_prompt_template` version 6 or version 5
* That were made in the last 2 months
* That were using the tag `prod`
* That users gave a 👍 response to

This dataset will help you understand if your new prompt version broke any previous versions!

**2. Build an evaluation pipeline**

The next step is to create an evaluation pipeline using our new historical dataset.

In plain English, this evaluation will feed in historical request context into your new prompt version then compare the new results to the old results. You can do a simple string comparison or get fancy with cosine similarities. PromptLayer will even show you a diff view for responses that are different.

**3. Run it when you make a new version**

This is the fun part. Next time you make a new prompt version, just select our new backtesting pipeline to see how the new prompt version fairs.

<img alt="Diffing evaluation" />

### Regression Testing

Regression testing is the continuous refinement of evaluation datasets to include new edge cases and scenarios as they are discovered. This iterative process ensures that prompts remain robust against a growing set of challenges, preventing regressions in areas previously identified as potential failure points. By continually updating evaluations with new edge cases, you maintain a high standard of prompt quality and reliability.

The process of setting up regression tests looks similar to backtesting.

[Create a dataset](/features/evaluations/datasets-create-from-file) containing test cases for every edge case you can think of. The dataset should include context variables that you can input to your prompt template.

### Scoring

The evaluation can result in a single quantitative final score. To configure the score card, all you need to do is make sure that the last step consists entirely of numbers or Booleans. A final objective score makes comparing prompt performance easy, and it will be displayed alongside prompts in the Prompt Registry.

<img alt="Version Scoring" />


# Create from File
Source: https://docs.promptlayer.com/features/evaluations/datasets-create-from-file


<Warning>
  Legacy Evaluations, Reports, and Datasets are deprecated for new workflows. Use [Tables](/features/tables/overview) for new evaluation, dataset, report, backtesting, and batch workflows. See [Migrate from Evaluations and Datasets](/features/tables/migrate-from-evaluations-and-datasets).
</Warning>

Use file upload when you already have a prepared set of test cases and want the fastest path to a structured dataset version. PromptLayer accepts CSV and JSON files so you can turn offline examples, QA sheets, or exported test cases into reusable evaluation inputs.

JSON or CSV files are accepted for the dataset input file.

## Create a Dataset in the UI

If you're starting in the PromptLayer app, this is the fastest path to upload a prepared dataset file.

1. Go to the workspace where you want to store the dataset.
2. Click **New** and create a **Dataset**.
3. Click **Upload data**.
4. Choose a CSV or JSON file from your computer.
5. Review the imported rows and click **Save Dataset** when you are ready to publish the version.

## Create Datasets Programmatically

You can also create datasets from uploaded CSV or JSON files through the API. This is useful when you want to automate evaluation setup, sync prepared test cases from another system, or integrate dataset creation into your own workflow.

* [Create Dataset Group](/reference/create-dataset-group) - Create the dataset group that will contain your draft and saved versions.
* [Create Dataset Version from File](/reference/create-dataset-version-from-file) - Upload a CSV or JSON file and create a dataset version asynchronously.
* [List Datasets](/reference/list-datasets) - Retrieve dataset groups and versions in your workspace.
* [Get Dataset Rows](/reference/get-dataset-rows) - Inspect the rows in a saved dataset version after import.

## File Formats

PromptLayer accepts CSV and JSON files for dataset uploads.

### CSV

In CSV format, the column headers define the columns and each record is a separate row.

```csv theme={null}
name,age,location
John Doe,30,New York
Jane Smith,35,Los Angeles
Michael Johnson,40,Chicago
```

### JSON

In JSON format, each row is a separate object. The object keys define the column names, and the values should be the values you want in the dataset.

```json theme={null}
[
  {
    "name": "John Doe",
    "age": 30,
    "location": "New York"
  },
  {
    "name": "Jane Smith",
    "age": 35,
    "location": "Los Angeles"
  },
  {
    "name": "Michael Johnson",
    "age": 40,
    "location": "Chicago"
  }
]
```


# Create from History
Source: https://docs.promptlayer.com/features/evaluations/datasets-create-from-history


<Warning>
  Legacy Evaluations, Reports, and Datasets are deprecated for new workflows. Use [Tables](/features/tables/overview) for new evaluation, dataset, report, backtesting, and batch workflows. See [Migrate from Evaluations and Datasets](/features/tables/migrate-from-evaluations-and-datasets).
</Warning>

Use request history when you want to build datasets from real production or staging traffic. This is a strong fit for backtesting, regression testing, and creating evaluation sets from the prompts, metadata, and outcomes your system has already seen.

Creating a dataset from history is straightforward using the Dataset dialog. PromptLayer can build a dataset from your request history, including metadata, input variable context, tags, and the request response. This is especially useful when you want to evaluate a new prompt version against real historical examples.

Go to **Datasets** and click **Add from Request History**. This opens a request log browser where you can filter and select requests.

<Frame>
  <img alt="Adding from request history" />
</Frame>

When creating a dataset from history, you can narrow what gets included by filtering on:

* Time range
* Metadata key-value pairs
* Prompt templates and version numbers
* Search query
* Scores
* Tags

After you save the dataset, use it in an evaluation pipeline to backtest a new prompt version against real historical inputs. See [Backtest Prompt Changes](/onboarding-guides/backtesting-prompt-changes).

## Related REST APIs

* [Create Dataset Group](/reference/create-dataset-group) - Create the dataset group that will contain your draft and saved versions.
* [Create Dataset Version from Request History](/reference/create-dataset-version-from-filter-params) - Recommended direct path for creating a dataset version from filtered request logs.
* [Create Draft Dataset Version](/reference/create-draft-dataset-version) - Start a draft dataset manually when you want a more controlled, multi-step workflow.
* [Add Request Log to Dataset](/reference/add-request-log-to-dataset) - Add individual request logs to a draft dataset as rows.
* [Save Draft Dataset Version](/reference/save-draft-dataset-version) - Publish the draft as a saved dataset version.
* [Get Dataset Rows](/reference/get-dataset-rows) - Inspect the rows in the resulting dataset version.

The filter-params endpoint is the recommended way to create a dataset from history in one step. The draft, add-request-log, and save-draft endpoints support a more advanced manual workflow when you want precise control over how rows are assembled.


# Overview
Source: https://docs.promptlayer.com/features/evaluations/datasets-overview

Use Datasets as the versioned system of record for evaluation inputs, backtests, and batch workflows.

<Warning>
  Legacy Evaluations, Reports, and Datasets are deprecated for new workflows. Use [Tables](/features/tables/overview) for new evaluation, dataset, report, backtesting, and batch workflows. See [Migrate from Evaluations and Datasets](/features/tables/migrate-from-evaluations-and-datasets).
</Warning>

Datasets are PromptLayer's versioned system of record for evaluation inputs and historical examples. Use them when you want a reusable set of test cases for evaluations, backtests, regression checks, or batch workflows.

A **Dataset Group** is the container for one dataset over time. Inside that group, you edit a **draft** dataset, save numbered versions, and reuse those versions across evaluations. You can create data by uploading a CSV or JSON file, building from request history, adding individual rows from observability traces, or turning evaluation outputs back into a dataset for the next iteration.

## What you can do

* Create datasets from files, request history, or evaluation results.
* Add individual rows programmatically from observability traces via the API.
* Edit draft datasets before publishing a version, including adding rows, renaming or deleting columns, updating values, and converting columns to JSON.
* Organize dataset groups with folders, tags, changelog history, and archive controls.
* Reuse a specific dataset version in an evaluation blueprint, or update a blueprint to a different dataset later.
* Export datasets to CSV and use report outputs to seed the next round of testing.

## How it fits together

1. Create a dataset group and populate a draft.
2. Edit the draft and save a version.
3. Attach that version to an evaluation blueprint.
4. Run full batches, review results and history, and feed outputs into the next dataset iteration.

## Next steps

<CardGroup>
  <Card title="Evaluations" icon="vial-circle-check" href="/features/evaluations/overview">
    Learn how datasets power evaluation blueprints, scoring, and batch runs.
  </Card>

  <Card title="Programmatic Evals" icon="terminal" href="/features/evaluations/programmatic">
    Build datasets and evaluations from code or CI workflows.
  </Card>

  <Card title="Create from File" icon="file-arrow-up" href="/reference/create-dataset-version-from-file">
    Upload a CSV or JSON file to create a new dataset version.
  </Card>

  <Card title="Create from History" icon="history" href="/reference/create-dataset-version-from-filter-params">
    Build a dataset version from filtered request history.
  </Card>

  <Card title="Add Trace to Dataset" icon="circle-nodes" href="/reference/add-trace-to-dataset">
    Add an observability trace or span subtree as a dataset row via the API.
  </Card>
</CardGroup>


# Eval Types
Source: https://docs.promptlayer.com/features/evaluations/eval-types


<Warning>
  Legacy Evaluations, Reports, and Datasets are deprecated for new workflows. Use [Tables](/features/tables/overview) for new evaluation, dataset, report, backtesting, and batch workflows. See [Migrate from Evaluations and Datasets](/features/tables/migrate-from-evaluations-and-datasets).
</Warning>

This page provides an overview of the various evaluation column types available on our platform.

## Primary Types

<img />

### Prompt Template

The *Prompt Template* evaluation type allows you to execute a prompt template from the Prompt Registry. You have the flexibility to select the latest version, a specific label, or a particular version of the prompt template. You also have the ability to assign the input variables based on available inputs from the dataset or other columns. You can override the model parameters that are set in the Prompt Registry. This functionality is particularly useful for testing a prompt template within a larger evaluation pipeline, comparing different model parameters, or implementing an "LLM as a judge" prompt template.

### Custom API Endpoint

The *Custom API Endpoint* enables you to set up a webhook that our system will call (POST) with all the columns to the left of the API endpoint when that cell is executed. As cells are processed sequentially, we will call this endpoint with the columns in the payload, and the returned result will be displayed. This feature allows for extensive customization to accommodate specific use cases and integrate with external systems or custom evaluators.

### MCP

The *MCP Action* allows you to run functions on a remote MCP server. Simply plug in your server URL and auth, select your function and you will be able to call your function with inputs mapped from other cells. For more information about MCP check out [the official MCP docs.](https://modelcontextprotocol.io/introduction)

### Human Input

The *Human Input* evaluation type allows the addition of either numeric or text input where an evaluator can provide feedback via a slider or a text box. This input can then be utilized in subsequent columns in the evaluation pipeline, allowing for the incorporation of human judgment.

### Code Execution

The *Code Execution* evaluation type allows you to write and execute code for each row in your dataset. You can access the data through the `data` variable and return the cell value. Note that stdout will be ignored. There is a `6 minute timeout` for code execution.

Code example to return a list of the names of each column:

<CodeGroup>
  ```py Python theme={null}
  message = "These are my column names: "
  columns = [column_name for column_name in data.keys()]
  return message + str(columns)
  ```

  ```js JavaScript theme={null}
  const message = "These are my column names: ";
  const columns = Object.keys(data);
  return message + JSON.stringify(columns);
  ```
</CodeGroup>

**Python Runtime**

```
The Python runtime runs Python 3.12.0 with no filesystem. Runtime does have network access. Only the standard library is available. Here are the resource quotas:

- Input code size: 1MiB
- Size of stdin: 10MiB
- Size of stdout: 20MiB
- Size of stderr: 10MiB
- Number of environment variables: 100
- Environment variable key size: 4KiB
- Environment variable value size: 100KiB
- Number of arguments: 100
- Argument size: 100KiB
- Memory consumption: 128MiB
```

**JavaScript Runtime**

```
The JavaScript runtime is built on Mozilla's SpiderMonkey engine with no filesystem. Runtime does have network access. It is not node or deno. Available APIs include:

- Legacy Encoding: atob, btoa, decodeURI, encodeURI, decodeURIComponent, encodeURIComponent
- Streams: ReadableStream, ReadableStreamBYOBReader, ReadableStreamBYOBRequest, ReadableStreamDefaultReader, ReadableStreamDefaultController, ReadableByteStreamController, WritableStream, ByteLengthQueuingStrategy, CountQueuingStrategy, TransformStream
- URL: URL, URLSearchParams
- Console: console
- Performance: Performance
- Task: queueMicrotask, setInterval, setTimeout, clearInterval, clearTimeout
- Location: WorkerLocation, location
- JSON: JSON
- Encoding: TextEncoder, TextDecoder, CompressionStream, DecompressionStream
- Structured Clone: structuredClone
- Fetch: fetch, Request, Response, Headers
- Crypto: SubtleCrypto, Crypto, crypto, CryptoKey

Resource Quotas:

- Input code size: 1MiB
- Size of stdin: 10MiB
- Size of stdout: 20MiB
- Size of stderr: 10MiB
- Number of environment variables: 100
- Environment variable key size: 4KiB
- Environment variable value size: 100KiB
- Number of arguments: 100
- Argument size: 100KiB
- Memory consumption: 128MiB
```

### Coding Agent

The *Coding Agent* evaluation type uses an AI coding agent (such as [Claude Code](https://www.claude.com/product/claude-code)) in a secure, sandboxed environment for each row in your dataset. Instead of writing code directly, you provide natural language instructions describing what you want to accomplish, and the AI coding agent handles the implementation.

**How it works:**

You provide a **natural language prompt** describing the task you want to accomplish. The coding agent executes in an isolated sandbox with access to:

* **variables.json** - Automatically injected file containing all column values from previous cells in that row
* **File attachments** - Any files you upload (CSV, JSON, text files, etc.) are available in the sandbox
* **Network access** - Can make API calls and fetch external data

The agent returns the result which populates the cell for that row.

**Example use cases:**

* **Data transformation**: "Parse the JSON response from the API column and extract all user emails into a comma-separated list"
* **File processing**: "Read the attached sales\_data.csv and calculate the total revenue for products in the 'Electronics' category"
* **API integration**: "Use the api\_key from variables.json to fetch user details from [https://api.example.com/users/\{user\_id}](https://api.example.com/users/\{user_id}) and return their account status"

### Conversation Simulator

The *Conversation Simulator* evaluation type automates the back-and-forth between your AI agent and simulated users to test conversational AI performance. This is particularly useful for evaluating multi-turn conversations where context maintenance, goal achievement, and user interaction patterns are critical.

When setting up the conversation simulator:

* Select your AI agent prompt template from the Prompt Registry
* Pass in user details or context variables from your dataset
* Define a test persona that challenges your AI with specific behaviors or constraints

**Example Test Persona:**

```
User is nervous about seeing the doctor, hasn't been in a long time, 
won't share phone number until asked three times for it
```

**Optional Advanced Configuration:**

* **User Goes First**: By default, the AI agent initiates the conversation. You can enable this setting to have the simulated user start the conversation instead.

* **Conversation Samples**: You can provide sample conversations to help guide the simulated user's responses. These samples help maintain consistent voice and interaction patterns, ensuring the simulated user behaves realistically and consistently with your expected user base.

* **Completion Guidance**: Define specific conditions for when the conversation should be considered complete. This is useful for specifying end conditions like tool calls, confirmation messages, or goal achievement (e.g., "End when the assistant calls the submit\_order tool" or "Complete when the user confirms their booking"). You can provide this as a static value or pull it from a dataset column for varied test scenarios.

The conversation results are returned as a JSON list of messages that can then be evaluated using other eval types like LLM Assertions to assess success criteria.

## Simple Evals

<img />

### Equality Comparison

*Equality Comparison* allows you to compare two different columns as strings. It provides a visual diff if there is a difference between the columns. Note that the diff is not used when calculating the score in that column and the column will be treated as a boolean for the purposes of a score. If there is no difference, it this column return true.

### Contains Value

The *Contains* evaluation type enables you to search for a substring within a column. For instance, you could search for a specific word or phrase within each cell in the column. It is using the python `in` operator to check if the substring is in the cell and is case insensitive.

### Regex Match

The *Regex Match* evaluation type allows you to define a regular expression pattern to search within the column. This provides powerful pattern matching capabilities for complex text analysis tasks.

### Absolute Numeric Distance

The *Absolute Numeric Distance* evaluation type allows you to select two different columns and output the absolute distance between their numeric values in a new column. Both source columns must contain numeric values.

## LLM Evals

<img />

### Run LLM Assertion

The *LLM Assertion* evaluation type enables you to run an assertion on a column using natural language prompts. You can create prompts such as "Does this contain an API key?", "Is this sensitive content?", or "Is this in English?". Our system uses a backend prompt template that processes your assertion and returns either true or false. Assertions should be framed as questions.

#### Using Template Variables

You can use template variables in your assertions to reference values from other dataset columns. This allows you to create dynamic assertions that adapt based on your data.

Use f-string style syntax with single curly braces: `{variable_name}`

**Example assertions with variables:**

* `"Is the response written in {language}?"`
* `"Does the output discuss {topic}?"`
* `"Is the tone {expected_tone}?"`

When you include variables in your assertion, a mapping interface will appear allowing you to connect each variable to a dataset column. For example, if your assertion is `"Is the response written in {language}?"` and you have a column called `target_language`, you would map `language` → `target_language`.

<Note>
  All variables used in assertions must be mapped to a column. If a variable is not mapped, the evaluation will fail with an error indicating which variable mapping is missing.
</Note>

#### Multiple Assertions

You can add multiple assertions to evaluate different criteria on the same data source. Each assertion is evaluated independently and returns its own true/false result.

### AI Data Extract

The *AI Data Extract* evaluation type uses AI/LLM to extract specific information from data sources. You can describe what you want to extract using natural language queries, whether the content is JSON, XML, or just unstructured text.

Example queries:

* "Extract the product name"
* "Find the customer's email address"
* "Get all mentioned dates"
* "Extract the total price including tax"

### Cosine Similarity

*Cosine Similarity* allows you to compare the vector distance between two columns. The system takes the two columns you supply, converts them into strings, and then embeds them using OpenAI's embedding vectors. It then calculates the cosine similarity, resulting in a number between 0 and 1. This metric is useful for understanding how semantically similar two bodies of text are, which can be valuable in assessing topic adherence or content similarity.

## Helper Functions

<img />

### JSON Extraction

The *JSON Extraction* evaluation type allows you to define a JSON path and extract either the first match or all matches in that path. We will automatically cast the source column into a JSON object. This is particularly useful for parsing structured data within your evaluations.

### Parse Value

The *Parse Value* column type enables you to convert another column into one of the following value types: string, number, Boolean, or JSON.

### Apply Diff

The *Apply Diff* evaluation type applies diff patches to original content, similar to git merge operations. This helper function requires two source columns: the original content and a diff patch to apply.

This evaluation type is particularly powerful when combined with code generation workflows or document editing pipelines where AI agents generate incremental changes rather than complete replacements. It enables sophisticated multi-step workflows where agents can review and refine each other's outputs.

Using diff formats often saves context and leads to better results for editing large content.

**Diff Format Details**

The diff patch must be in the standard **unified diff** format, including file headers and hunk headers, as used by tools like `git` and described in the [unidiff documentation](https://pypi.org/project/unidiff/).

If you are using an LLM to generate the diffs, copy and paste the following text into your prompt for format specifics:

```markdown theme={null}
## Unified Diff Specification (strict unidiff)

Produce a valid **unified diff** with file headers and hunk headers. Only modifications of existing text are supported (no file creation or deletion).

### File headers (required)
- Old (source):  \`--- a/<filename>\`
- New (target):  \`+++ b/<filename>\`
- Use consistent prefixes \`a/\` and \`b/\`.

### Hunk headers (required for every changed region)
- Format: \`@@ -<start_old>,<len_old> +<start_new>,<len_new> @@\`
  - \`<start_old>\` / \`<start_new>\` are 1-based line numbers.
  - \`<len_old>\` / \`<len_new>\` are the line counts for the hunk in old/new.
  - Multiple hunks per file are allowed; order them top-to-bottom.

### Hunk body line prefixes (strict)
- \`' '\` (space)  = unchanged context line
- \`-\`          = line removed from source
- \`+\`          = line added in target
- Preserve original whitespace and line endings exactly.

### Rules
- The concatenation of all **context + removed** lines in each hunk must appear **verbatim and contiguously** in the source file.
- Keep context minimal but sufficient for unambiguous matching (usually 1-3 lines around changes).
- Multiple files may be patched in one diff, but each requires its own \`---\` / \`+++\` headers and hunks.
- If no changes are needed, output an empty string (no diff).

### Example
--- a/essay.txt
+++ b/essay.txt
@@ -1,4 +1,4 @@
 This is a simple essay.
-It has a bad sentence.
+It has a better sentence.
 The end.`}
          title="Copy diff specification"
        />
      </div>
    ),
    render: HelperFunctionBlocks.ApplyDiffBlock,
    weight: 5,
  },
};
```

### Static Value

The *Static Value* evaluation type allows you to pre-populate a column with a specific value. This is useful for adding constant values or context that you may need to use later in one of the other columns in your evaluation pipeline.

### Type Validation

*Type Validation* returns a boolean for the given source column if it fits one of the specified types. The types supported for validation are JSON, number, or SQL. It will return `true` if the value is valid for the specified type, and `false` otherwise. For SQL validation, the system utilizes the [SQLGlot library](https://github.com/tobymao/sqlglot?tab=readme-ov-file#parser-errors).

### Coalesce

The *Coalesce* evaluation type allows you to take multiple different columns and coalesce them, similar to [SQL's COALESCE function](https://www.w3schools.com/sql/func_sqlserver_coalesce.asp).

### Count

The *Count* evaluation type allows you to select a source column and count either the characters, words, or paragraphs within it. This will output a numeric value, which can be useful for analyzing the length or complexity of LLM outputs.

Please reach out to us if you have any other evaluation types you would like to see on the platform. We are always looking to expand our evaluation capabilities to better serve your needs.


# Eval Examples
Source: https://docs.promptlayer.com/features/evaluations/examples


<Warning>
  Legacy Evaluations, Reports, and Datasets are deprecated for new workflows. Use [Tables](/features/tables/overview) for new evaluation, dataset, report, backtesting, and batch workflows. See [Migrate from Evaluations and Datasets](/features/tables/migrate-from-evaluations-and-datasets).
</Warning>

## Building & Evaluating a RAG Chatbot

<div>
  <iframe />
</div>

This example shows how you can use PromptLayer to evaluate Retrieval Augmented Generation (RAG) systems. As a cornerstone of the LLM revolution, RAG systems enhance our ability to extract precise information from vast datasets, significantly improving question-answering capabilities.

We will create a RAG system designed for financial data analysis using a dataset from the New York Stock Exchange. The tutorial video elaborates on the step-by-step process of constructing a pipeline that encompasses prompt creation, data retrieval, and the evaluation of the system's efficacy in answering finance-related queries.

Most importantly, you can use PromptLayer to build end-to-end evaluation tests for RAG systems.

## Migrating Prompts to Open-Source Models

<img alt="Migrating Prompts" />

[Click Here to Read the Tutorial](https://blog.promptlayer.com/migrating-prompts-to-open-source-models-c21e1d482d6f)

This tutorial demonstrates how to use PromptLayer to migrate prompts between different language models, with a focus on open-source models like [Mistral](https://mistral.ai/). It covers techniques for batch model comparisons, allowing you to evaluate the performance of your prompt across multiple models. The example showcases migrating an existing prompt for a RAG system to the open-source Mistral model and comparing the new outputs with visual diffs.

The key steps include:

1. Setting up a batch evaluation pipeline to run the prompt on both the original model (e.g., GPT) and the new target model (Mistral), while diffing the outputs.
2. Analyzing the results, including accuracy scores, cost/latency metrics, and string output diffs, to assess the impact of migrating to the new model.
3. Seamlessly updating the prompt template to use the new model (Mistral) if the migration is beneficial.

This example highlights PromptLayer's capabilities for efficient prompt iteration and evaluation across different language models, facilitating the adoption of open-source alternatives like Mistral.


# Overview
Source: https://docs.promptlayer.com/features/evaluations/overview


<Warning>
  Legacy Evaluations, Reports, and Datasets are deprecated for new workflows. Use [Tables](/features/tables/overview) for new evaluation, dataset, report, backtesting, and batch workflows. See [Migrate from Evaluations and Datasets](/features/tables/migrate-from-evaluations-and-datasets).
</Warning>

**We believe that evaluation engineering is half the challenge of building a good prompt.** The Evaluations page is designed to help you iterate, build, and run batch evaluations on top of your prompts. Every prompt and every use case is different.

Inspired by the flexibility of tools like Excel, we offer a visual pipeline builder that allows users to construct complex evaluation batches tailored to their specific requirements. Whether you're scoring prompts, running bulk jobs, or conducting regression testing, the Evaluations page provides the tools needed to assess prompt quality effectively. Made for both engineers and subject-matter experts.

## Common tasks

* **Scoring Prompts**: Utilize golden datasets for comparing prompt outputs with ground truths and incorporate human or AI evaluators for quality assessment.
* **One-off Bulk Jobs**: Ideal for prompt experimentation and iteration.
* **Backtesting**: Use historical data to build datasets and compare how a new prompt version performs against real production examples.
* **Regression Testing**: Build evaluation pipelines and datasets to prevent edge-case regression on prompt template updates.
* **Continuous Integration**: Connect evaluation pipelines to prompt templates to automatically run an eval with each new version (and catologue the results). Think of it like a Github action.

## How evaluations fit together

1. Create or select a dataset with the inputs you want to test.
2. Add one or more **Prompt Template** columns to generate outputs.
3. Add scoring columns such as LLM-as-judge, human grading, equality comparison, cosine similarity, or code evaluators.
4. Run the evaluation and review the scorecard, row-level outputs, and diffs.
5. Attach the evaluation to a prompt template when you want it to run automatically on new versions.

<Frame>
  <img alt="Evaluation pipeline" />
</Frame>

## Example use cases

* **Chatbot Enhancements**: Improve chatbot interactions by evaluating responses to user requests against semantic criteria.
* **RAG System Testing**: Build a RAG pipeline and validate responses against a golden dataset.
* **SQL Bot Optimization**: Test Natural Language to SQL generation prompts by *actually* running generated queries against a database (using the API Endpoint step), followed by an evaluation of the results' accuracy.
* **Improving Summaries**: Combine AI evaluating prompts and human graders to help improve prompts without a ground truth.

## Related guides

* [Compare Models](/onboarding-guides/compare-models)
* [Backtest Prompt Changes](/onboarding-guides/backtesting-prompt-changes)
* [Run Batch Jobs](/onboarding-guides/batch-runs)
* [Continuous Integration](/features/evaluations/continuous-integration)

## Additional Resources

For a deeper understanding of evaluation approaches, especially for complex LLM applications beyond simple classification or programming tasks, check out our blog post: [How to Evaluate LLM Prompts Beyond Simple Use Cases](https://blog.promptlayer.com/how-to-evaluate-llm-prompts-beyond-simple-use-cases/). This guide explores strategies like Decomposition Testing, working with Negative Examples, and implementing LLM as a Judge Rubric frameworks.

[Click here to see in-depth examples.](/features/evaluations/examples)


# Online or Programmatic Evals
Source: https://docs.promptlayer.com/features/evaluations/programmatic


<Warning>
  Legacy Evaluations, Reports, and Datasets are deprecated for new workflows. Use [Tables](/features/tables/overview) for new evaluation, dataset, report, backtesting, and batch workflows. See [Migrate from Evaluations and Datasets](/features/tables/migrate-from-evaluations-and-datasets).
</Warning>

PromptLayer offers powerful options for configuring and running evaluation pipelines programmatically in your workflows. This is ideal for users who require the flexibility to run evaluations from code, enabling seamless integration with existing CI/CD pipelines or custom automation scripts.

## Recommended Workflow

We recommend a systematic approach to implementing automated evaluations:

```mermaid theme={null}
graph TD
    A[Create Dataset] --> B[Build Evaluation Pipeline]
    B --> C[Configure Pipeline Steps]
    C --> D[Trigger Evaluation]
    D --> E{Monitor Progress}
    E -->|Polling| F[Check Report Status]
    E -->|Webhook| G[Receive report_finished Event]
    F --> H[Report Complete?]
    H -->|No| F
    H -->|Yes| I[Get Score]
    G --> I
    I --> J[Send Alerts/Take Action]
```

This approach enables two powerful use cases:

### 1. Nightly Evaluations (Production Monitoring)

Run scheduled evaluations to ensure nothing has changed in your production system. The score can be sent to Slack or your alerting system with a direct link to the evaluation pipeline. This helps detect production issues by sampling a wide range of requests and comparing against expected performance.

### 2. CI/CD Integration

Trigger evaluations in your CI/CD pipeline (GitHub, GitLab, etc.) whenever relevant PRs are created. Wait for the evaluation score before proceeding with deployment, to make sure that your changes do not break anything.

## Complete Example: Building an Evaluation Pipeline

Here's a complete example of building an evaluation pipeline from scratch using the API.

### Option A: Single Request (Recommended)

Create the entire pipeline with columns and custom scoring in one API call:

```python theme={null}
import requests
import base64

API_KEY = "your_api_key"
BASE_URL = "https://api.promptlayer.com"

headers = {
    "X-API-Key": API_KEY,
    "Content-Type": "application/json"
}

# Step 1: Create a dataset group
dataset_group_response = requests.post(
    f"{BASE_URL}/api/public/v2/dataset-groups",
    headers=headers,
    json={"name": "QA Test Dataset Group"}
)
dataset_group_id = dataset_group_response.json()["dataset_group"]["id"]

# Step 2: Upload dataset
csv_content = """question,expected_answer
What is the capital of France?,Paris
Who wrote Romeo and Juliet?,William Shakespeare
What is 2+2?,4"""

encoded_csv = base64.b64encode(csv_content.encode()).decode()
dataset_response = requests.post(
    f"{BASE_URL}/api/public/v2/dataset-versions/from-file",
    headers=headers,
    json={
        "dataset_group_id": dataset_group_id,
        "file_name": "test_qa.csv",
        "file_content_base64": encoded_csv
    }
)

# Wait for async dataset processing
import time
time.sleep(3)

# Step 3: Create pipeline with columns and custom scoring in ONE call
report_response = requests.post(
    f"{BASE_URL}/reports",
    headers=headers,
    json={
        "dataset_group_id": dataset_group_id,
        "name": "QA Evaluation Pipeline",
        "columns": [
            {
                "column_type": "LLM_ASSERTION",
                "name": "Answer Correct",
                "configuration": {
                    "source": "expected_answer",
                    "prompt": "Is this a valid answer to the question?"
                },
                "is_part_of_score": True
            },
            {
                "column_type": "LLM_ASSERTION",
                "name": "Answer Complete",
                "configuration": {
                    "source": "expected_answer",
                    "prompt": "Is this answer complete and not missing key information?"
                },
                "is_part_of_score": True
            }
        ],
        "score_configuration": {
            "code": """
# Weighted scoring
weights = {"Answer Correct": 0.7, "Answer Complete": 0.3}
total_weight = weighted_sum = 0

for row in data:
    for col, weight in weights.items():
        if col in row:
            total_weight += weight
            val = row[col]
            if isinstance(val, dict) and 'value' in val:
                val = val['value']
            if val == True:
                weighted_sum += weight

score = (weighted_sum / total_weight * 100) if total_weight > 0 else 0
return {"score": round(score, 2)}
""",
            "code_language": "PYTHON"
        }
    }
)
report_id = report_response.json()["report_id"]

# Step 4: Run the evaluation
run_response = requests.post(
    f"{BASE_URL}/reports/{report_id}/run",
    headers=headers,
    json={"name": "QA Eval Run"}
)
run_report_id = run_response.json()["report_id"]

# Step 5: Poll for completion and get score
while True:
    status_response = requests.get(f"{BASE_URL}/reports/{run_report_id}", headers=headers)
    if status_response.json()["status"] == "COMPLETED":
        break
    time.sleep(5)

score_response = requests.get(f"{BASE_URL}/reports/{run_report_id}/score", headers=headers)
print(f"Score: {score_response.json()['score']['overall_score']}%")
```

### Option B: Step-by-Step

For more control, you can create the pipeline and add columns separately:

```python theme={null}
import requests
import json

API_KEY = "your_api_key"
BASE_URL = "https://api.promptlayer.com"

headers = {
    "X-API-Key": API_KEY,
    "Content-Type": "application/json"
}

# Step 1: Create a dataset group
dataset_group_response = requests.post(
    f"{BASE_URL}/api/public/v2/dataset-groups",
    headers=headers,
    json={"name": "QA Test Dataset Group"}
)
dataset_group_id = dataset_group_response.json()["id"]

# Step 2: Create a dataset version (from CSV)
csv_content = """question,expected_answer
What is the capital of France?,Paris
Who wrote Romeo and Juliet?,William Shakespeare
What is 2+2?,4"""

import base64
encoded_csv = base64.b64encode(csv_content.encode()).decode()

dataset_response = requests.post(
    f"{BASE_URL}/api/public/v2/dataset-versions/from-file",
    headers=headers,
    json={
        "dataset_group_id": dataset_group_id,
        "file_name": "test_qa.csv",
        "file_content_base64": encoded_csv
    }
)
dataset_id = dataset_response.json()["id"]

# Step 3: Create the evaluation pipeline (report)
report_response = requests.post(
    f"{BASE_URL}/reports",
    headers=headers,
    json={
        "dataset_group_id": dataset_group_id,
        "name": "QA Evaluation Pipeline"
    }
)
report_id = report_response.json()["report_id"]

# Step 4: Add columns to the pipeline

# Column 1: Prompt Template to generate answers
requests.post(
    f"{BASE_URL}/report-columns",
    headers=headers,
    json={
        "report_id": report_id,
        "column_type": "PROMPT_TEMPLATE",
        "name": "AI Answer",
        "configuration": {
            "template": {
                "name": "qa_answerer",
                "version_number": null
            },
            "prompt_template_variable_mappings": {
                "question": "question"  # Maps to dataset column
            },
            "engine": {
                "provider": "openai",
                "model": "gpt-4",
                "parameters": {"temperature": 0.3}
            }
        }
    }
)

# Column 2: Compare AI answer with expected answer
requests.post(
    f"{BASE_URL}/report-columns",
    headers=headers,
    json={
        "report_id": report_id,
        "column_type": "COMPARE",
        "name": "Exact Match",
        "configuration": {
            "source1": "AI Answer",
            "source2": "expected_answer"
        }
    }
)

# Column 3: LLM assertion for semantic correctness
requests.post(
    f"{BASE_URL}/report-columns",
    headers=headers,
    json={
        "report_id": report_id,
        "column_type": "LLM_ASSERTION",
        "name": "Semantically Correct",
        "configuration": {
            "source": "AI Answer",
            "prompt": "Is this answer semantically equivalent to the expected answer?"
        },
        "is_part_of_score": True  # Include in final score calculation
    }
)

# Step 5: Run the evaluation
run_response = requests.post(
    f"{BASE_URL}/reports/{report_id}/run",
    headers=headers,
    json={
        "name": "QA Eval Run #1",
        "dataset_id": dataset_id
    }
)

# Step 6: Poll for completion
import time
while True:
    status_response = requests.get(
        f"{BASE_URL}/reports/{report_id}",
        headers=headers
    )
    status = status_response.json()["status"]
    if status == "COMPLETED":
        break
    print(f"Status: {status}, waiting...")
    time.sleep(5)

# Step 7: Get the final score
score_response = requests.get(
    f"{BASE_URL}/reports/{report_id}/score",
    headers=headers
)
score = score_response.json()["score"]["overall_score"]
print(f"Evaluation complete! Score: {score}%")
```

## Step-by-Step Implementation

### Step 1: Create a Dataset

To run evaluations, you'll need a dataset against which to test your prompts. PromptLayer now provides a comprehensive set of APIs for dataset management:

#### 1.1 Create a Dataset Group

First, create a dataset group to organize your datasets:

* **Endpoint**: `POST /api/public/v2/dataset-groups`
* **Description**: Create a new dataset group within a workspace
* **Authentication**: API key
* **Docs Link**: [Create Dataset Group](../../reference/create-dataset-group)

```json theme={null}
{
  "name": "Production Evaluation Datasets"
}
```

#### 1.2 Create a Dataset Version

Once you have a dataset group, you can create dataset versions using two methods:

##### Option A: From Request History

Create a dataset from your existing request logs:

* **Endpoint**: `POST /api/public/v2/dataset-versions/from-filter-params`
* **Description**: Create a dataset version by filtering request logs
* **Authentication**: API key only
* **Docs Link**: [Create Dataset Version from Filter Params](../../reference/create-dataset-version-from-filter-params)

```json theme={null}
{
  "dataset_group_id": 123,
  "tags": ["prod"],
  "metadata": {
    "environment": "production"
  },
  "prompt_id": 456,
  "start_time": "2024-01-01T00:00:00Z",
  "end_time": "2024-01-31T23:59:59Z"
}
```

##### Option B: From File Upload

Upload a CSV or JSON file to create a dataset:

* **Endpoint**: `POST /api/public/v2/dataset-versions/from-file`
* **Description**: Create a dataset version by uploading a file
* **Authentication**: API key only
* **Docs Link**: [Create Dataset Version from File](../../reference/create-dataset-version-from-file)

```json theme={null}
{
  "dataset_group_id": 123,
  "file_name": "test_cases.csv",
  "file_content_base64": "aW5wdXQsZXhwZWN0ZWRfb3V0cHV0LHNjb3JlCiJIZWxsbyIsIldvcmxkIiwxLjAK..."
}
```

### Step 2: Create an Evaluation Pipeline

Create your evaluation pipeline (called a "report" in the API) by making a POST request to `/reports`:

* **Endpoint**: `POST /reports`
* **Description**: Creates a new evaluation pipeline
* **Authentication**: API key
* **Docs Link**: [Create Reports](../../reference/create-reports)

#### Request Payload

```json theme={null}
{
  "dataset_group_id": 123,           // Required: ID of the dataset group
  "name": "My Evaluation Pipeline",  // Optional: Pipeline name (auto-generated if not provided)
  "folder_id": null,                 // Optional: Folder ID for organization
  "dataset_version_number": null     // Optional: Specific version (uses latest if not specified)
}
```

#### Response

```json theme={null}
{
  "success": true,
  "report_id": 456  // Use this ID for adding columns and running evaluations
}
```

### Step 3: Configure Pipeline Steps

The evaluation pipeline consists of steps, each referred to as a "report column". Columns execute sequentially from left to right, where each column can reference the outputs of previous columns.

* **Endpoint**: `POST /report-columns`
* **Description**: Add a step to your evaluation pipeline
* **Authentication**: API key

#### Basic Request Structure

```json theme={null}
{
  "report_id": 456,              // Required: The report ID from Step 2
  "column_type": "COLUMN_TYPE",  // Required: Type of evaluation (see Column Types Reference)
  "name": "Column Name",         // Required: Display name for this step
  "configuration": {},           // Required: Type-specific configuration
  "position": null,              // Optional: Column position (auto-assigned if not provided)
  "is_part_of_score": false      // Optional: Include this column in score calculation
}
```

#### Scoring Columns

By default, only the last column in a pipeline is used for score calculation. To include multiple columns in the final score, set `is_part_of_score: true` on each column you want to include. The final score will be the average of all included columns.

## Column Types Reference

Below is a complete reference of all available column types and their configurations. Each column type serves a specific purpose in your evaluation pipeline.

### Primary Column Types

#### PROMPT\_TEMPLATE

Executes a prompt template from your Prompt Registry or an inline template defined directly in the configuration.

**Option A: Registry Reference**

Reference a prompt template stored in the Prompt Registry:

```json theme={null}
{
  "column_type": "PROMPT_TEMPLATE",
  "name": "Generate Response",
  "configuration": {
    "template": {
      "name": "my_prompt_template",     // Required: Template name
      "version_number": null,            // Optional: Specific version (null for latest)
      "label": null                      // Optional: Use specific label
    },
    "prompt_template_variable_mappings": {
      "input_var": "column_name"        // Map template variables to columns
    },
    "engine": {                         // Optional: Override template's default engine
      "provider": "openai",
      "model": "gpt-4",
      "parameters": {
        "temperature": 0.7,
        "max_tokens": 500
      }
    },
    "chat_history_source": "chat_messages_column",  // Optional: Dataset column containing chat history (list of {role, content} messages) to append to the prompt
    "verbose": false,                   // Optional: Include detailed response info
    "return_template_only": false       // Optional: Return template without executing
  },
  "report_id": 456
}
```

**Option B: Inline Template**

Define a prompt template directly in the configuration without saving it to the registry. This is useful for quick experimentation, one-off evaluations, or iterating on prompts before committing them to the registry.

```json theme={null}
{
  "column_type": "PROMPT_TEMPLATE",
  "name": "Generate Response",
  "configuration": {
    "inline_template": {
      "inline": true,
      "prompt_template": {              // Required: The template content
        "type": "chat",
        "messages": [
          {
            "role": "system",
            "content": [{"type": "text", "text": "You are a helpful assistant."}]
          },
          {
            "role": "user",
            "content": [{"type": "text", "text": "Answer this question: {question}"}]
          }
        ]
      },
      "metadata": {                     // Optional: Model configuration
        "model": {
          "provider": "openai",
          "name": "gpt-4",
          "parameters": {"temperature": 0.7}
        }
      },
      "source_prompt_name": null,       // Optional: Track which registry prompt this was derived from
      "source_prompt_version": null     // Optional: Track the source version number
    },
    "prompt_template_variable_mappings": {
      "question": "question"            // Map template variables to columns
    }
  },
  "report_id": 456
}
```

<Info>
  You must provide exactly one of `template` (registry reference) or `inline_template` (inline content) in the configuration. They are mutually exclusive.
</Info>

<Info>
  **Chat History Source**: For chat-type prompts, you can use `chat_history_source` to specify a dataset column containing a list of chat messages (each with `role` and `content` fields). These messages are appended to the end of the prompt template before execution, allowing you to test prompts with different conversation histories. The column value should be a JSON array of message objects, e.g. `[{"role": "user", "content": "Hello"}, {"role": "assistant", "content": "Hi there!"}]`.
</Info>

#### ENDPOINT

Calls a custom API endpoint with data from previous columns.

```json theme={null}
{
  "column_type": "ENDPOINT",
  "name": "Custom Evaluator",
  "configuration": {
    "url": "https://api.example.com/evaluate",  // Required: Endpoint URL
    "headers": {                                 // Optional: Custom headers
      "Authorization": "Bearer token"
    },
    "timeout": 30                                // Optional: Timeout in seconds
  },
  "report_id": 456
}
```

#### MCP

Executes functions on a Model Context Protocol (MCP) server.

```json theme={null}
{
  "column_type": "MCP",
  "name": "MCP Function",
  "configuration": {
    "server_url": "https://mcp.example.com",    // Required: MCP server URL
    "function_name": "analyze_text",            // Required: Function to execute
    "auth": {                                    // Optional: Authentication
      "type": "bearer",
      "token": "your_token"
    },
    "input_mappings": {                         // Map function inputs to columns
      "text": "response_column"
    }
  },
  "report_id": 456
}
```

#### HUMAN

Allows manual human input for evaluation.

```json theme={null}
{
  "column_type": "HUMAN",
  "name": "Human Review",
  "configuration": {
    "input_type": "text",              // Required: "text" or "numeric"
    "prompt": "Rate the response quality",  // Optional: Instructions for reviewer
    "min": 0,                          // For numeric: minimum value
    "max": 10                          // For numeric: maximum value
  },
  "report_id": 456
}
```

#### CODE\_EXECUTION

Executes Python or JavaScript code for custom logic.

```json theme={null}
{
  "column_type": "CODE_EXECUTION",
  "name": "Custom Processing",
  "configuration": {
    "language": "python",              // Required: "python" or "javascript"
    "code": "# Access data dict\nresult = len(data['response'])\nreturn result"
  },
  "report_id": 456
}
```

#### CODING\_AGENT

Uses an AI coding agent to process data.

```json theme={null}
{
  "column_type": "CODING_AGENT",
  "name": "AI Processing",
  "configuration": {
    "prompt": "Extract all email addresses from the response column",
    "files": []                       // Optional: File attachments (base64 encoded)
  },
  "report_id": 456
}
```

#### CONVERSATION\_SIMULATOR

Simulates multi-turn conversations for testing chatbots and conversational agents.

```json theme={null}
{
  "column_type": "CONVERSATION_SIMULATOR",
  "name": "Conversation Test",
  "configuration": {
    "template": {
      "name": "support_agent",
      "version_number": null
    },
    "prompt_template_variable_mappings": {
      "customer_name": "customer_name"
    },
    "user_persona": "You are a frustrated customer who needs quick help. Ask follow-up questions if responses are unclear.",
    "conversation_completed_prompt": "The conversation is complete when the assistant resolves the issue or the user indicates they are satisfied.",
    "is_user_first": true,            // Optional: If true, simulated user starts (default: false)
    "max_turns": 10,                  // Optional: Maximum conversation turns (max: 150)
    "conversation_samples": []         // Optional: Example conversations to guide style
  },
  "report_id": 456
}
```

You can also use `user_persona_source` instead of `user_persona` to pull the persona from a dataset column for varied test scenarios. Similarly, use `conversation_completed_prompt_source` to pull completion guidance from a dataset column.

#### WORKFLOW

Executes a PromptLayer workflow.

```json theme={null}
{
  "column_type": "WORKFLOW",
  "name": "Run Workflow",
  "configuration": {
    "workflow_id": 123,               // Required: Workflow ID
    "input_mappings": {               // Map workflow inputs to columns
      "input_param": "source_column"
    }
  },
  "report_id": 456
}
```

### Node & Column Types

#### LLM\_ASSERTION

Uses an LLM to evaluate assertions about the data.

```json theme={null}
{
  "column_type": "LLM_ASSERTION",
  "name": "Quality Check",
  "configuration": {
    "source": "response_column",      // Required: Column to evaluate
    "prompt": "Is this response professional and helpful?"  // Required (unless using prompt_source): Question to evaluate
  },
  "report_id": 456,
  "is_part_of_score": true  // Optional: Include in score calculation
}
```

**Using template variables:**

You can use f-string style template variables `{variable_name}` in your assertions and map them to dataset columns using `variable_mappings`:

```json theme={null}
{
  "column_type": "LLM_ASSERTION",
  "name": "Language Check",
  "configuration": {
    "source": "response_column",
    "prompt": "Is the response written in {language}?",
    "variable_mappings": {
      "language": "target_language_column"  // Maps {language} to the target_language_column
    }
  },
  "report_id": 456
}
```

**Multiple assertions:**

Pass a JSON array string to evaluate multiple assertions:

```json theme={null}
{
  "column_type": "LLM_ASSERTION",
  "name": "Multi Check",
  "configuration": {
    "source": "response_column",
    "prompt": "[\"Is this professional?\", \"Is this helpful?\"]"
  },
  "report_id": 456
}
```

**Importing assertions from a column:**

Use `prompt_source` instead of `prompt` to read assertions from a dataset column:

```json theme={null}
{
  "column_type": "LLM_ASSERTION",
  "name": "Dynamic Assertion",
  "configuration": {
    "source": "response_column",
    "prompt_source": "assertions_column"  // Column containing assertion(s)
  },
  "report_id": 456
}
```

#### AI\_DATA\_EXTRACTION

Extracts specific data using AI.

```json theme={null}
{
  "column_type": "AI_DATA_EXTRACTION",
  "name": "Extract Info",
  "configuration": {
    "source": "response_column",      // Required: Column to extract from
    "extraction_prompt": "Extract all product names mentioned"  // Required: What to extract
  },
  "report_id": 456
}
```

#### COMPARE

Compares two columns for equality.

```json theme={null}
{
  "column_type": "COMPARE",
  "name": "Response Match",
  "configuration": {
    "source1": "expected_output",     // Required: First column
    "source2": "actual_output"        // Required: Second column
  },
  "report_id": 456
}
```

#### CONTAINS

Checks if a column contains specific text.

```json theme={null}
{
  "column_type": "CONTAINS",
  "name": "Contains Check",
  "configuration": {
    "source": "response_column",      // Required: Column to search in
    "value": "error",                 // Option 1: Static value to search for
    "value_source": "expected_column" // Option 2: Column containing search value
  },
  "report_id": 456
}
```

#### REGEX

Matches a regular expression pattern.

```json theme={null}
{
  "column_type": "REGEX",
  "name": "Pattern Match",
  "configuration": {
    "source": "response_column",      // Required: Column to search
    "pattern": "\\d{3}-\\d{3}-\\d{4}" // Required: Regex pattern
  },
  "report_id": 456
}
```

#### REGEX\_EXTRACTION

Extracts text using a regex pattern.

```json theme={null}
{
  "column_type": "REGEX_EXTRACTION",
  "name": "Extract Pattern",
  "configuration": {
    "source": "response_column",      // Required: Column to extract from
    "pattern": "(\\w+@\\w+\\.\\w+)",  // Required: Extraction pattern
    "group": 1                        // Optional: Capture group (default: 0)
  },
  "report_id": 456
}
```

#### COSINE\_SIMILARITY

Calculates semantic similarity between two texts.

```json theme={null}
{
  "column_type": "COSINE_SIMILARITY",
  "name": "Similarity Score",
  "configuration": {
    "source1": "expected_response",   // Required: First text column
    "source2": "actual_response"      // Required: Second text column
  },
  "report_id": 456
}
```

#### ABSOLUTE\_NUMERIC\_DISTANCE

Calculates absolute distance between numeric values.

```json theme={null}
{
  "column_type": "ABSOLUTE_NUMERIC_DISTANCE",
  "name": "Score Difference",
  "configuration": {
    "source1": "expected_score",      // Required: First numeric column
    "source2": "actual_score"         // Required: Second numeric column
  },
  "report_id": 456
}
```

### Helper Column Types

#### JSON\_PATH

Extracts data from JSON using JSONPath.

```json theme={null}
{
  "column_type": "JSON_PATH",
  "name": "Extract JSON",
  "configuration": {
    "source": "json_response",        // Required: Column with JSON data
    "path": "$.data.items[0].name",   // Required: JSONPath expression
    "return_all": false                // Optional: Return all matches (default: false)
  },
  "report_id": 456
}
```

#### XML\_PATH

Extracts data from XML using XPath.

```json theme={null}
{
  "column_type": "XML_PATH",
  "name": "Extract XML",
  "configuration": {
    "source": "xml_response",         // Required: Column with XML data
    "xpath": "//item[@id='1']/name",  // Required: XPath expression
    "return_all": false                // Optional: Return all matches
  },
  "report_id": 456
}
```

#### PARSE\_VALUE

Converts column values to different types.

```json theme={null}
{
  "column_type": "PARSE_VALUE",
  "name": "Parse to JSON",
  "configuration": {
    "source": "response_column",      // Required: Column to parse
    "target_type": "json"             // Required: "string", "number", "boolean", or "json"
  },
  "report_id": 456
}
```

#### APPLY\_DIFF

Applies diff patches to content.

```json theme={null}
{
  "column_type": "APPLY_DIFF",
  "name": "Apply Changes",
  "configuration": {
    "original_source": "original_code",  // Required: Original content column
    "diff_source": "diff_patch"          // Required: Diff patch column
  },
  "report_id": 456
}
```

#### VARIABLE

Creates a static value column.

```json theme={null}
{
  "column_type": "VARIABLE",
  "name": "Static Context",
  "configuration": {
    "value": "production",            // Required: Static value
    "value_type": "string"            // Optional: Value type hint
  },
  "report_id": 456
}
```

#### ASSERT\_VALID

Validates data types (JSON, number, SQL).

```json theme={null}
{
  "column_type": "ASSERT_VALID",
  "name": "Validate JSON",
  "configuration": {
    "source": "response_column",      // Required: Column to validate
    "validation_type": "json"         // Required: "json", "number", or "sql"
  },
  "report_id": 456
}
```

#### COALESCE

Returns the first non-null value from multiple columns.

```json theme={null}
{
  "column_type": "COALESCE",
  "name": "First Valid",
  "configuration": {
    "sources": ["col1", "col2", "col3"]  // Required: List of columns to coalesce
  },
  "report_id": 456
}
```

#### COMBINE\_COLUMNS

Combines multiple columns into one.

```json theme={null}
{
  "column_type": "COMBINE_COLUMNS",
  "name": "Combine Data",
  "configuration": {
    "sources": ["col1", "col2"],      // Required: Columns to combine
    "separator": ", ",                // Optional: Separator (default: ", ")
    "format": "{col1}: {col2}"        // Optional: Custom format string
  },
  "report_id": 456
}
```

#### COUNT

Counts characters, words, or paragraphs.

```json theme={null}
{
  "column_type": "COUNT",
  "name": "Word Count",
  "configuration": {
    "source": "response_column",      // Required: Column to count
    "count_type": "words"             // Required: "characters", "words", or "paragraphs"
  },
  "report_id": 456
}
```

#### MATH\_OPERATOR

Performs mathematical operations.

```json theme={null}
{
  "column_type": "MATH_OPERATOR",
  "name": "Calculate Score",
  "configuration": {
    "source1": "score1",              // Required: First operand
    "source2": "score2",              // Required: Second operand
    "operator": "+"                   // Required: "+", "-", "*", "/", "%", "**"
  },
  "report_id": 456
}
```

#### MIN\_MAX

Finds minimum or maximum values.

```json theme={null}
{
  "column_type": "MIN_MAX",
  "name": "Max Score",
  "configuration": {
    "sources": ["score1", "score2", "score3"],  // Required: Columns to compare
    "operation": "max"                          // Required: "min" or "max"
  },
  "report_id": 456
}
```

## Column Reference Syntax

When configuring columns that reference other columns, use these formats:

* **Dataset columns**: Use the exact column name from your dataset (e.g., `"question"`, `"expected_output"`)
* **Previous step columns**: Use the exact name you gave to the column (e.g., `"AI Answer"`, `"Validation Result"`)
* **Variable columns**: For columns of type VARIABLE, reference them by their name

### Important Notes

1. **Column Order Matters**: Columns execute left to right. A column can only reference columns to its left.
2. **Column Names**: Must be unique within a pipeline. Use descriptive names.
3. **Dataset Columns**: Are automatically available as the first columns in your pipeline.
4. **Error Handling**: If a column fails, subsequent columns that depend on it will also fail.
5. **Scoring**: If your last column contains all boolean or numeric values, it becomes the evaluation score.

### Step 4: Trigger the Evaluation

Once your pipeline is configured, trigger it programmatically using the run endpoint:

* **Endpoint**: `POST /reports/{report_id}/run`
* **Description**: Execute the evaluation pipeline with optional dataset refresh
* **Docs Link**: [Run Evaluation Pipeline](../../reference/run-report)

#### Example Payload

```json theme={null}
{
  "name": "Nightly Eval - 2024-12-15",
  "dataset_id": 123
}
```

### Step 5: Monitor and Retrieve Results

You have two options for monitoring evaluation progress:

#### Option A: Polling

Continuously check the report status until completion:

* **Endpoint**: `GET /reports/{report_id}`
* **Description**: Retrieve the status and results of a specific report by its ID.
* **Docs Link**: [Get Report Status](../../reference/get-report)

```bash theme={null}
# Response includes status: "RUNNING" or "COMPLETED"
{
    "success": true,
    "report": {...},
    "status": "COMPLETED",
    "stats": {
        "status_counts": {
            "COMPLETED": 95,
            "FAILED": 3,
            "QUEUED": 0,
            "RUNNING": 2
        }
    }
}
```

#### Option B: Webhooks

Listen for the `report_finished` webhook event for real-time notifications when evaluations complete.

### Step 6: Get the Score

Once the evaluation is complete, retrieve the final score:

* **Endpoint**: `GET /reports/{report_id}/score`
* **Description**: Fetch the score of a specific report by its ID.
* **Docs Link**: [Get Evaluation Score](../../reference/get-report-score)

#### Example Response

```json theme={null}
{
  "success": true,
  "message": "success",
  "score": {
    "overall_score": 87.5,
    "score_type": "multi_column",
    "has_custom_scoring": false,
    "details": {
      "columns": [
        {
          "column_name": "Accuracy Check",
          "score": 90.0,
          "score_type": "boolean"
        },
        {
          "column_name": "Safety Check",
          "score": 85.0,
          "score_type": "boolean"
        }
      ]
    }
  }
}
```

### Step 7: Configure Custom Scoring (Optional)

By default, PromptLayer calculates scores by averaging boolean columns. For more control, you can configure custom scoring logic using Python or JavaScript code.

* **Endpoint**: `PATCH /reports/{report_id}/score-card`
* **Description**: Configure which columns to include and optionally provide custom scoring code
* **Docs Link**: [Configure Custom Scoring](../../reference/update-report-score-card)

#### Example: Weighted Scoring

```python theme={null}
import requests

# Configure custom scoring with weights
requests.patch(
    f"{BASE_URL}/reports/{report_id}/score-card",
    headers=headers,
    json={
        "column_names": ["Accuracy Check", "Style Check", "Safety Check"],
        "code": """
# Weight accuracy more heavily than other checks
weights = {
    "Accuracy Check": 0.5,
    "Style Check": 0.2,
    "Safety Check": 0.3
}

total_weight = 0
weighted_sum = 0

for row in data:
    for col_name, weight in weights.items():
        if col_name in row and isinstance(row[col_name], bool):
            total_weight += weight
            if row[col_name]:
                weighted_sum += weight

score = (weighted_sum / total_weight * 100) if total_weight > 0 else 0
return {"score": score}
""",
        "code_language": "PYTHON"
    }
)
```

#### Example: All Checks Must Pass

```python theme={null}
requests.patch(
    f"{BASE_URL}/reports/{report_id}/score-card",
    headers=headers,
    json={
        "column_names": ["Accuracy Check", "Safety Check", "Format Check"],
        "code": """
# A row only counts as passed if ALL checks pass
check_columns = ["Accuracy Check", "Safety Check", "Format Check"]
passed_rows = 0
total_rows = len(data)

for row in data:
    all_passed = all(
        row.get(col) == True
        for col in check_columns
        if col in row
    )
    if all_passed:
        passed_rows += 1

score = (passed_rows / total_rows * 100) if total_rows > 0 else 0
return {"score": score}
""",
        "code_language": "PYTHON"
    }
)
```

#### Custom Code Interface

Your custom code receives a `data` variable containing all evaluation results:

```python theme={null}
data = [
    {
        "input": "What is 2+2?",
        "expected": "4",
        "AI Response": "The answer is 4",
        "Accuracy Check": True,
        "Safety Check": True
    },
    # ... more rows
]
```

Your code must return a dictionary with at least a `score` key (0-100):

```python theme={null}
return {"score": 85.5}
```


# Score Card
Source: https://docs.promptlayer.com/features/evaluations/score-card


<Warning>
  Legacy Evaluations, Reports, and Datasets are deprecated for new workflows. Use [Tables](/features/tables/overview) for new evaluation, dataset, report, backtesting, and batch workflows. See [Migrate from Evaluations and Datasets](/features/tables/migrate-from-evaluations-and-datasets).
</Warning>

The score card feature in PromptLayer allows you to assign a score to each evaluation you run. This score provides a quick and easy way to assess the performance of your prompts and compare different versions.

## Configuring the Score Card

<Frame>
  <img alt="Score Card Example" />
</Frame>

### Default Configuration

By default, the score is calculated based on the last column in your evaluation results:

* If the last column contains Booleans, the score will be the percentage of `true` values.
* If the last column contains numbers, the score will be the average of those numbers.

### Custom Column Selection

You can customize which columns are included in the score card calculation. When setting up your evaluation pipeline, click the "Score card" button to configure the score card.

Here, you can add specific columns to be included in the score calculation:

* If you add multiple numeric columns, the total score will be the average of the averages for each selected column.
* If you add multiple Boolean columns, the total score will be the average of the `true` percentages for each selected column.
* Columns that do not contain numbers or Booleans will not be included in the score calculation.

<Frame>
  <img alt="Score Card Columns" />
</Frame>

These selected columns will also be formatted for more easy viewing in the evaluation report. You will see larger numbers, and check/x icons for booleans.

### Custom Scoring Logic

For more advanced scoring needs, you can provide your own custom scoring logic using Python or JavaScript code. The code execution environment is the same as the one used for the code execution evaluation column type [(learn more)](/features/evaluations/eval-types#code-execution).

This custom scoring logic can be used to generate a single score number or a drill-down matrix.

<Frame>
  <img alt="Score Card Matrix" />
</Frame>

You can optionally return multiple drill-down matrices. This is useful for generating confusion matrices.

<Frame>
  <img alt="Score Card Matrices" />
</Frame>

Your custom scoring code must return an object with the following keys:

* `score` (required): A number representing the overall score. This is mandatory.
* `score_matrix` (optional): A list of lists of lists, representing one or more matrices of drilled-down scores. Each cell in these matrices can be a raw value or an object with metadata.
* `sub_scores` (optional): A dictionary mapping label names to numeric values. Sub-scores appear in the score card's drill-down breakdown panel, letting you break the overall score into named components (e.g., `{"correctness": 0.9, "relevance": 0.8}`).

#### Score Matrix Cell Format

Each cell in the `score_matrix` can be either:

* A raw value (string or number), or
* An object with the following properties:
  * `value`: The actual value of the cell, which can be a string or number.
  * `positive_metric`: (Optional) A boolean indicating whether an increase in this value is considered positive (`true`). If absent, we default to true.

**Examples**

* Simple value: `42`
* Object with metadata: `{"value": 42, "positive_metric": true}`

The optional `positive_metric` property can be used to indicate how changes in the value should be interpreted when comparing evaluations. This is particularly useful for automated reporting and analysis tools.

#### Adding Titles to Score Matrices

To add titles to your score matrices, simply add an extra field to the first row of the matrix and it will automatically be interpreted as the primary title. For example, if you have a matrix like:

```python theme={null}
[[1,2],[1,2]]
```

You can add a title by modifying it to:

```python theme={null}
[["Title",1,2],[1,2]]
```

### Code example

The `data` variable will be available in your scoring code, which is a list containing a dictionary for each row in the evaluation results. The keys in each dictionary correspond to the column names, and the values are the corresponding cell values.

For example:

```py Python theme={null}
# The variable `data` is a list of rows.
# Each row is a dictionary of column name -> value
# For example: [
#       {'columnA': 1, 'columnB': 2},
#       {'columnA': 4, 'columnB': 1}
#  ]
#
# Must return a dictionary with the following structure:
# {
#   'score': int,           # Required
#   'sub_scores': {...},    # Optional - named component scores (label -> number)
#   'score_matrix': [[[int, int, ...], ...]...],  # Optional - list of lists of lists
# }

return {
    'score': len(data),
    'sub_scores': {
        'correctness': 0.85,
        'completeness': 0.70,
    },
    'score_matrix': [[
        ["Criteria", "Weight", "Value"],
        ["Correctness", 4, 7],
        ["Completeness", 3, 6],
        ["Accuracy", 5, 8],
        ["Relevance", 4, 9]
    ]],
}
```

## Comparing Evaluation Reports

You can compare two evaluation reports to see how scores and other metrics have changed between runs. Simply click the "Compare" button and select the evaluation reports you want to compare.

The score card and any score matrices will be displayed side-by-side for easy comparison of your prompt's performance over time.

<Frame>
  <img alt="Compare Score Cards" />
</Frame>


# Exa
Source: https://docs.promptlayer.com/features/exa-integration

Set up Exa as a custom provider in PromptLayer.

[Exa](https://exa.ai/) provides AI-powered web search and research models that can be integrated with PromptLayer through custom providers. Exa's models excel at finding relevant information, generating research reports, and providing cited answers from web sources.

## Setting Up Exa as a Custom Provider

To use Exa models in PromptLayer:

1. Navigate to **Settings → Custom Providers and Models** in your PromptLayer dashboard
2. Click **Create Custom Provider**
3. Configure the provider with the following details:
   * **Name**: Exa (or your preferred name)
   * **Client**: OpenAI
   * **Base URL**: `https://api.exa.ai`
   * **API Key**: Your Exa API key (get one at [exa.ai](https://exa.ai))

<Note>
  Exa uses OpenAI-compatible endpoints, which is why we select OpenAI as the client type.
</Note>

## Creating Custom Models (Recommended)

For easier model selection in the Playground and Prompt Registry, you can create custom models:

1. In **Settings → Custom Providers and Models**, find your Exa provider in the list
2. Click on the Exa row to expand it
3. Click **Create Custom Model**
4. Configure each model:
   * **Provider**: Select the Exa provider you created
   * **Model Name**: Enter the Exa model identifier (e.g., `exa`, `exa-research`)
   * **Display Name**: A friendly name like "Exa Answer" or "Exa Research"
   * **Model Type**: Chat
5. Repeat for each model you want to use

This allows you to select Exa models directly from the dropdown instead of typing them manually.

## Available Models

Exa regularly updates their model offerings. Example models include:

* **`exa`**: Fast answer generation with web search
* **`exa-research`**: In-depth research with comprehensive citations
* **`exa-research-pro`**: Advanced research capabilities

For the complete and up-to-date list of available models, visit [Exa's official documentation](https://docs.exa.ai/reference/openai-sdk).

## Using Exa in PromptLayer

### In the Playground

After setup, you can use Exa models in the PromptLayer Playground:

1. Open the Playground
2. Select your Exa provider from the provider dropdown
3. Choose your desired Exa model
4. Start querying with your prompts

### In the Prompt Registry

Exa models work seamlessly with PromptLayer's Prompt Registry:

* Select Exa models when creating or editing prompt templates
* Use templates with Exa models in evaluations
* Track and analyze Exa API usage alongside other providers

### Key Features

Exa models in PromptLayer support:

* **Citations**: Exa responses include source citations for research and fact-checking
* **Research Capabilities**: Deep web search for comprehensive answers
* **Web Search Integration**: Real-time access to current web information

## SDK Usage

Once you've set up your Exa custom provider and created a prompt template in the dashboard, you can run it programmatically with the PromptLayer SDK:

```python theme={null}
from promptlayer import PromptLayer

promptlayer = PromptLayer(api_key="pl_****")

# Run a prompt template that uses your Exa custom provider
response = promptlayer.run(
    prompt_name="your-exa-prompt",
    input_variables={"query": "latest developments in AI safety"}
)

# Access the response
print(response["raw_response"].choices[0].message.content)

# The request is automatically logged with request_id
print(f"Request ID: {response['request_id']}")
```

<Info>
  Using [`promptlayer.run()`](/sdks/python#using-the-run-method-recommended) ensures your requests are properly logged to PromptLayer and leverages your prompt templates from the Prompt Registry. This is the recommended approach for production use.
</Info>

## Related Documentation

* [Custom Providers](/features/custom-providers)
* [Supported Providers](/features/supported-providers)
* [Exa Official Documentation](https://docs.exa.ai/reference/openai-sdk)


# FAQ
Source: https://docs.promptlayer.com/features/faq


Frequently Asked Questions

Don't see your question here? Send a message in [Discord](https://discord.gg/DBAhQbW39S) or email us at [hello@promptlayer.com](mailto:hello@promptlayer.com)

## Does PromptLayer support multi-modal image models like `gpt-4-vision`?

Yes, PromptLayer supports multi-modal image models, including `gpt-4-vision-preview`. They are used in a similar way to normal LLMs.

To use `gpt-4-vision-preview` with PromptLayer, follow these steps:

1. Ensure you have the PromptLayer and OpenAI Python libraries installed.
2. Use the [`run()` method](/sdks/python#using-the-run-method-recommended) to execute prompts, or use [`log_request`](/features/prompt-history/custom-logging) to log requests made with your own client.
3. Make your request to `gpt-4-vision-preview` with the necessary image inputs, either through image URLs or base64 encoded images.
4. Check the PromptLayer dashboard to see your request logged!

<img alt="gpt-4-vision request" />

Multi-modal models are also supported in the Prompt Registry, Playground, and Evaluations pages.

## Do you support OpenAI function calling?

Yes, we take great pride in staying up to date. PromptLayer supports [function calling](https://platform.openai.com/docs/guides/function-calling) through the `run()` method and via [custom logging](/features/prompt-history/custom-logging). You can also configure tool calling directly in the [Prompt Registry](/features/prompt-registry/tool-calling).

## Does PromptLayer support streaming?

Yes, streaming requests are supported on the PromptLayer Python and JS SDK. PromptLayer now includes `prompt_blueprint` support in streaming responses, providing both raw streaming data and progressively built structured responses.

When streaming is enabled, each chunk includes:

* `raw_response`: The raw streaming response from the LLM provider
* `prompt_blueprint`: The progressively built prompt blueprint showing the current state of the response
* `request_id`: Only included in the final chunk to indicate completion

<Note>
  The `raw_response` structure is provider-specific and may change as LLM providers update their APIs. For stable, provider-agnostic access, use `prompt_blueprint` instead.
</Note>

Example usage for OpenAI:

```python theme={null}
for chunk in pl.run(prompt_name="your-prompt", stream=True):
    # Access raw streaming response
    print(chunk["raw_response"])
    
    # Access progressively built prompt blueprint
    if chunk["prompt_blueprint"]:
        current_response = chunk["prompt_blueprint"]["prompt_template"]["messages"][-1]
        if current_response.get("content"):
            print(f"Current response: {current_response['content']}")
```

Example usage for Anthropic:

```python theme={null}
for chunk in pl.run(prompt_name="your-prompt", stream=True):
    raw_chunk = chunk["raw_response"]
    
    # Handle Anthropic streaming event types
    if raw_chunk.get("type") == "content_block_delta":
        delta = raw_chunk.get("delta", {})
        if delta.get("type") == "text_delta":
            print(f"Streaming content: {delta.get('text', '')}")
    
    # Access progressively built prompt blueprint
    if chunk["prompt_blueprint"]:
        current_response = chunk["prompt_blueprint"]["prompt_template"]["messages"][-1]
        if current_response.get("content") and len(current_response["content"]) > 0:
            text_content = current_response["content"][0].get("text", "")
            if text_content:
                print(f"Current response: {text_content}")
```

Finally, if you are interacting with PromptLayer through our [REST API](/reference/introduction) you will need to store the whole output and log it to PromptLayer (`log-request`) only after it is finished.

## Can I export my data from PromptLayer?

Yes. You can export your usage data with the button shown below.

Filter your training data export by tags, a search query, or metadata.

<video>
  <source type="video/mp4" />
</video>

## Do you support on-premises deployment?

Yes, we do support on-premises deployment for a select few of our enterprise customers. However, we are rolling out this option slowly.

If you are interested in onprem, please [contact us](mailto:hello@promptlayer.com) for more information.

## Does async work with PromptLayer?

Yes, PromptLayer supports asynchronous operations through `AsyncPromptLayer`. You can use the async `run()` method or `log_request` to log requests made with async LLM clients.

```python theme={null}
import asyncio
from promptlayer import AsyncPromptLayer

async def main():
    async_client = AsyncPromptLayer(api_key="pl_*****")

    # Use the async run method
    response = await async_client.run(
        prompt_name="my-prompt",
        input_variables={"topic": "testing"}
    )
    print(response)

asyncio.run(main())
```

See the [Async Support section](/sdks/python#async-support) for more details.

## Is PromptLayer SOC 2 certified?

Yes, we have achieved SOC 2 Type 2 certification. Please [contact us](mailto:hello@promptlayer.com) for the report.

## Why doesn't my evaluation report use the newest version of my prompt?

To ensure your evaluation report reflects the newest version of your prompt template, you must configure your evaluation pipeline to use the "latest" version of the prompt template in its column step. The template is fetched at runtime, and specifying a frozen version will result in the evaluation report not reflecting your newest prompt template.

<img alt="Latest FAQ Updates" />

## What model providers do you support on your evaluations page?

While you can log LLM requests from any model and our Prompt Registry is agnostic, our evaluations & playground requests support OpenAI's GPT, Anthropic's Claude, Google's Gemini, Bedrock, Mistral, and Cohere.

## Do you support open source models?

PromptLayer provides out-of-the-box support for Mistral in our logs, playground, Prompt Registry, and evals. You can also connect your own models to the logs & registry.

## What's the difference between tags and metadata?

Both [tags](/features/prompt-history/tagging-requests) and [metadata](/features/prompt-history/metadata) enable the addition of supplementary information to your request logs, yet they serve distinct purposes. Tags are ideal for classifying requests into a limited number of predefined categories, such as "prod" or "dev". Conversely, metadata is tailored for capturing unique, request-specific details like user IDs or session IDs.

## Why do I see extra input variables in my prompt template? Parsing does not seem to be working.

If you see extra input variables in the Prompt Registry or when creating an evaluation, it is likely due to string parsing errors. By default every prompt template uses "f-string" string parsing (`{var}`). If your prompt includes JSON, this will cause issues. We recommend switching to "jinja2" string parsing (`{{var}}`) to avoid such issues.

To switch input variable string parsers, navigate to the prompt template in the Prompt Registry. Then, click "Edit". In the editor, on the top right, you will see a dropdown that allows you to switch between "f-string" and "jinja2". For more details on using template variables effectively, see our [Template Variables](/features/prompt-registry/template-variables) documentation.

## How do I inject multiple messages into my prompt template?

You can use [placeholders](/features/prompt-registry/placeholder-messages), built just for that!

## Does PromptLayer support self-hosted models or custom base URLs?

Yes, PromptLayer supports using your own self-hosted models, those from providers like HuggingFace, or Azure OpenAI. To use a custom base URL:

1. Go to your workspace settings
2. Scroll to "Provider Base URLs"
3. Add the base URL for your model provider

<img alt="Base URL Configuration" />

## Can I cancel my PromptLayer subscription?

Yes, you can cancel your subscription at any time. Your subscription will remain active until the end of the billing cycle. To cancel your subscription, go to your settings and click on billing portal.

## Does PromptLayer support Grok from xAI?

Yes, PromptLayer supports Grok models through custom providers. For detailed setup instructions and usage guidelines, see our [xAI (Grok) integration guide](/features/xai-integration).

## Does PromptLayer support Deepseek models?

Yes, PromptLayer supports Deepseek models through custom base URLs. Configure it in workspace settings under "Provider Base URLs" using OpenAI as the provider and `https://api.deepseek.com` as the base URL. You can then use models like `deepseek-chat` and `deepseek-reasoner` in the Playground and Prompt Registry.

## Does PromptLayer support MCP?

Yes, PromptLayer supports MCP functions in agents. The MCP Action node allows you to invoke remote functions hosted on your MCP server. To use it, configure the node with your MCP server's base URL and authentication token. Then, select the target function from the available list—PromptLayer will automatically introspect the function schema. Input parameters can be dynamically mapped from outputs of previous nodes, enabling full integration with the rest of your agent workflow. Function calls are executed at runtime, and outputs are passed downstream like any other node result.

<iframe />

## How do I use OpenAI's built-in tools (Web Search, File Search, Image Generation)?

PromptLayer supports OpenAI's built-in tools from the Responses API, including Web Search, File Search with Vector Stores, and Image Generation. These tools enable your prompts to access real-time web information, search through uploaded documents, and generate images—all without writing custom function definitions.

Learn how to set up and use these tools in our [Tool Calling documentation](/features/prompt-registry/tool-calling).

## Does PromptLayer support image generation?

Yes. PromptLayer supports image generation across multiple providers:

* **OpenAI Images API** — Use dedicated image models like `gpt-image-1`, `dall-e-3`, and `dall-e-2` to generate images from text prompts. Select "Images API" in the API dropdown when using OpenAI or OpenAI Azure.
* **OpenAI Responses API (Image Generation tool)** — Enable the built-in `image_generation` tool in the Responses API for conversational image generation. The model decides when to generate images based on the conversation.
* **Google Gemini** — Gemini image models (`gemini-2.5-flash-image`, `gemini-3-pro-image-preview`) generate images natively as part of their response. Works with both Google and Vertex AI providers.

Image generation is supported in the Playground, Prompt Registry, Evaluations, and via the SDK with `pl.run()`. Generated images are automatically stored, tracked, and viewable in the dashboard.

See the full [Image Generation guide](/features/image-generation) for setup instructions, parameters, and code examples.


# Image Generation
Source: https://docs.promptlayer.com/features/image-generation


PromptLayer supports image generation across multiple providers and APIs. You can generate images from the Playground, Prompt Registry, Evaluations, and via the SDK — with full logging, versioning, and cost tracking.

## Supported Providers & APIs

Image generation is available through three distinct paths:

| Provider            | API / Method                            | Models                                                                                                                                                                      | Template Type |
| ------------------- | --------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------- |
| **OpenAI**          | Images API                              | `dall-e-2`, `dall-e-3`, `gpt-image-1`, `gpt-image-1-mini`, `gpt-image-1.5`, `gpt-image-2`                                                                                   | Completion    |
| **OpenAI**          | Responses API (`image_generation` tool) | `gpt-5`, `gpt-5.4`, `gpt-5.2`, `gpt-5-nano`, `gpt-4.1`, and [other supported models](https://developers.openai.com/api/docs/guides/tools-image-generation#supported-models) | Chat          |
| **OpenAI Azure**    | Images API                              | Same as OpenAI (depends on Azure deployment)                                                                                                                                | Completion    |
| **OpenAI Azure**    | Responses API (`image_generation` tool) | Same as OpenAI Responses API                                                                                                                                                | Chat          |
| **Google (Gemini)** | Native image output                     | `gemini-2.5-flash-image`, `gemini-3-pro-image-preview`, and other image-capable Gemini models                                                                               | Chat          |
| **Vertex AI**       | Native image output                     | Same Gemini image models via Vertex AI                                                                                                                                      | Chat          |

## OpenAI Images API

The OpenAI Images API is a dedicated endpoint for image generation. In PromptLayer, it uses a **completion template** — you write a text prompt and the model returns one or more generated images.

### Setting Up in the Playground

1. Open the **Playground** or **Prompt Registry**
2. In the model settings panel, select **OpenAI** (or **OpenAI Azure**) as the provider
3. Change the **API** dropdown to **Images API**
4. Select an image model (e.g. `gpt-image-1`, `dall-e-3`)
5. Write your image prompt in the text area
6. Click **Run**

<img alt="API selector showing Images API option" />

### Parameters

The following parameters are available for the Images API:

| Parameter            | Description                         | Example Values                              |
| -------------------- | ----------------------------------- | ------------------------------------------- |
| `quality`            | Image quality level                 | `"standard"`, `"high"`, `"hd"`              |
| `size`               | Output image dimensions             | `"1024x1024"`, `"1024x1792"`, `"1792x1024"` |
| `background`         | Background style (GPT Image models) | `"auto"`, `"transparent"`, `"opaque"`       |
| `output_format`      | Output image format                 | `"png"`, `"webp"`, `"jpeg"`                 |
| `output_compression` | Compression level (0-100)           | `50`                                        |
| `n`                  | Number of images to generate        | `1`, `2`, `4`                               |
| `style`              | Style preset (DALL-E 3)             | `"vivid"`, `"natural"`                      |
| `moderation`         | Content moderation level            | `"auto"`, `"low"`                           |

### Using with `pl.run()`

When you save a prompt template with the Images API configuration, you can run it via the SDK:

<CodeGroup>
  ```python Python theme={null}
  from promptlayer import PromptLayer

  pl = PromptLayer(api_key="pl_*****")

  response = pl.run(
      prompt_name="my-image-prompt",
      input_variables={"subject": "a cat wearing a top hat"}
  )

  # The generated image is in the prompt blueprint
  content = response["prompt_blueprint"]["prompt_template"]["content"]
  for item in content:
      if item["type"] == "output_media":
          print(f"Image URL: {item['url']}")
          print(f"MIME type: {item['mime_type']}")
  ```

  ```js JavaScript theme={null}
  import { PromptLayer } from "promptlayer";

  const pl = new PromptLayer({ apiKey: "pl_*****" });

  const response = await pl.run({
    promptName: "my-image-prompt",
    inputVariables: { subject: "a cat wearing a top hat" }
  });

  const content = response.prompt_blueprint.prompt_template.content;
  for (const item of content) {
    if (item.type === "output_media") {
      console.log("Image URL:", item.url);
      console.log("MIME type:", item.mime_type);
    }
  }
  ```
</CodeGroup>

### Pricing

GPT Image models (`gpt-image-1`, `gpt-image-1-mini`, `gpt-image-1.5`, `gpt-image-2`) use **token-based pricing** — cost is calculated from input and output tokens, just like text models. PromptLayer automatically tracks these tokens and calculates cost.

DALL-E models (`dall-e-2`, `dall-e-3`) use **per-image pricing** based on the image size and model. PromptLayer tracks this automatically.

<Note>
  The Images API uses a completion template (single text prompt in, image(s) out).
</Note>

## OpenAI Responses API — Image Generation Tool

The OpenAI Responses API includes a built-in **Image Generation** tool that the model can invoke during a conversation. Unlike the Images API, this works within a chat context — the model decides when to generate images based on the conversation.

### Setting Up in the Prompt Registry

1. Open your prompt in the **Prompt Registry**
2. Set the provider to **OpenAI** and the **API** to **Responses API**
3. Click the **Functions & Output** button
4. Click **Built-in tools** and add **Image Generation**
5. Save and run your prompt

<img alt="Built-in Tools Panel showing Image Generation tool" />

### How It Works

When the `image_generation` tool is enabled:

1. The model receives your messages and decides if an image should be generated
2. The model invokes the `image_generation` tool with an optimized prompt
3. The generated image appears in the response as an `output_media` content block
4. The model may also return text alongside the image (e.g., describing what was generated)

The response includes metadata about the generation:

* **Revised prompt** — The optimized prompt the model used for image generation (collapsible in the UI)
* **Parameters** — Size, quality, background, and output format used
* **Image ID** — Unique identifier for the `image_generation_call`

### Multiple Images

The model can generate multiple images in a single response. When this happens, consecutive `image_generation_call` items are grouped into a single assistant message in PromptLayer's display. Each image shows its own revised prompt and parameters.

### Using with `pl.run()`

<CodeGroup>
  ```python Python theme={null}
  from promptlayer import PromptLayer

  pl = PromptLayer(api_key="pl_*****")

  response = pl.run(
      prompt_name="image-chat-prompt",
      input_variables={"request": "Draw a sunset over mountains"}
  )

  messages = response["prompt_blueprint"]["prompt_template"]["messages"]
  assistant_msg = messages[-1]

  for content in assistant_msg["content"]:
      if content["type"] == "output_media":
          print(f"Generated image: {content['url'][:50]}...")
          if content.get("provider_metadata", {}).get("revised_prompt"):
              print(f"Revised prompt: {content['provider_metadata']['revised_prompt']}")
      elif content["type"] == "text":
          print(f"Text: {content['text']}")
  ```

  ```js JavaScript theme={null}
  import { PromptLayer } from "promptlayer";

  const pl = new PromptLayer({ apiKey: "pl_*****" });

  const response = await pl.run({
    promptName: "image-chat-prompt",
    inputVariables: { request: "Draw a sunset over mountains" }
  });

  const messages = response.prompt_blueprint.prompt_template.messages;
  const assistantMsg = messages[messages.length - 1];

  for (const content of assistantMsg.content) {
    if (content.type === "output_media") {
      console.log("Generated image:", content.url.substring(0, 50) + "...");
      if (content.provider_metadata?.revised_prompt) {
        console.log("Revised prompt:", content.provider_metadata.revised_prompt);
      }
    } else if (content.type === "text") {
      console.log("Text:", content.text);
    }
  }
  ```
</CodeGroup>

## Google Gemini — Native Image Generation

Gemini image models (e.g., `gemini-2.5-flash-image`, `gemini-3-pro-image-preview`) can generate images natively as part of their response. This works with both the **Google** provider and **Vertex AI** provider.

### Setting Up in the Playground

1. Open the **Playground** or **Prompt Registry**
2. Select **Google** (or **Vertex AI**) as the provider
3. Choose a Gemini image model (e.g., `gemini-2.5-flash-image`)
4. PromptLayer automatically configures `response_modalities: ["TEXT", "IMAGE"]` when an image model is selected
5. Write your prompt and click **Run**

### Image Configuration

You can configure image generation parameters in the model settings:

| Parameter      | Description               | Example Values                                |
| -------------- | ------------------------- | --------------------------------------------- |
| `aspect_ratio` | Output image aspect ratio | `"1:1"`, `"16:9"`, `"9:16"`, `"4:3"`, `"3:4"` |
| `image_size`   | Image resolution          | Model-dependent                               |

These parameters are set via the `imageConfig` in the model parameters panel.

### How It Works

When a Gemini image model generates an image:

1. The image data is returned as `inline_data` in the Gemini response
2. PromptLayer converts this into an `output_media` content block for consistent display
3. The image is displayed in the same card format as OpenAI-generated images
4. Image metadata (aspect ratio, image size) is shown in the parameters block

### Using with `pl.run()`

<CodeGroup>
  ```python Python theme={null}
  from promptlayer import PromptLayer

  pl = PromptLayer(api_key="pl_*****")

  response = pl.run(
      prompt_name="gemini-image-prompt",
      input_variables={"description": "A watercolor painting of a garden"}
  )

  messages = response["prompt_blueprint"]["prompt_template"]["messages"]
  assistant_msg = messages[-1]

  for content in assistant_msg["content"]:
      if content["type"] == "output_media":
          print(f"Generated image (MIME: {content['mime_type']})")
          if content.get("provider_metadata", {}).get("aspect_ratio"):
              print(f"Aspect ratio: {content['provider_metadata']['aspect_ratio']}")
  ```

  ```js JavaScript theme={null}
  import { PromptLayer } from "promptlayer";

  const pl = new PromptLayer({ apiKey: "pl_*****" });

  const response = await pl.run({
    promptName: "gemini-image-prompt",
    inputVariables: { description: "A watercolor painting of a garden" }
  });

  const messages = response.prompt_blueprint.prompt_template.messages;
  const assistantMsg = messages[messages.length - 1];

  for (const content of assistantMsg.content) {
    if (content.type === "output_media") {
      console.log(`Generated image (MIME: ${content.mime_type})`);
      if (content.provider_metadata?.aspect_ratio) {
        console.log(`Aspect ratio: ${content.provider_metadata.aspect_ratio}`);
      }
    }
  }
  ```
</CodeGroup>

## Output Media Format

All image generation results in PromptLayer use a unified `output_media` content type, regardless of the provider:

```json theme={null}
{
  "type": "output_media",
  "id": "img_abc123",
  "url": "<base64>",
  "mime_type": "image/png",
  "media_type": "image",
  "provider_metadata": {
    "revised_prompt": "A detailed, photorealistic sunset...",
    "size": "1024x1024",
    "quality": "high",
    "background": "auto",
    "output_format": "png",
    "aspect_ratio": "16:9",
    "image_size": "1024"
  }
}
```

| Field               | Description                                                      |
| ------------------- | ---------------------------------------------------------------- |
| `type`              | Always `"output_media"`                                          |
| `id`                | Unique ID for the generation call (Responses API only)           |
| `url`               | Image URL or base64 data                                         |
| `mime_type`         | MIME type of the image (`image/png`, `image/webp`, `image/jpeg`) |
| `media_type`        | Media category — currently `"image"`                             |
| `provider_metadata` | Provider-specific metadata (varies by provider, see below)       |

### Provider Metadata by Provider

**OpenAI (Images API & Responses API):**

* `revised_prompt` — The model's optimized version of your prompt
* `size` — Image dimensions
* `quality` — Quality level
* `background` — Background setting (GPT Image models)
* `output_format` — Image format

**Google Gemini:**

* `aspect_ratio` — Configured aspect ratio
* `image_size` — Configured image size

## Viewing Generated Images

Generated images are displayed in PromptLayer with a rich card format:

* **Header** — Shows "Image Generation" label (with the tool call ID for Responses API)
* **Parameters block** — Displays generation parameters (size, quality, format, etc.)
* **Revised prompt** — Collapsible accordion showing the optimized prompt (when available)
* **Image preview** — The generated image with download/copy support

<img alt="OutputMediaBlock showing generated image with parameters and revised prompt" />

## Image Storage

PromptLayer automatically handles image storage for generated images:

* **Base64 images** are uploaded to cloud storage and replaced with a URL reference
* **URL images** are stored as-is
* This keeps request logs compact and ensures images remain accessible
* Images are available in the dashboard, evaluations, and API responses

## Logging Image Generation Requests

If you're making image generation calls with your own client, you can log them to PromptLayer using [`log_request`](/features/prompt-history/custom-logging). PromptLayer recognizes the following function names for image generation:

* `openai.images.generate`
* `openai.OpenAI.images.generate`
* `openai.AzureOpenAI.images.generate`
* `openai.responses.create` (when using the `image_generation` tool)

## Evaluations

Image generation outputs work with PromptLayer's evaluation system. Generated images flow through evaluation pipelines as `output_media` content — you can use **LLM Assertion** columns to evaluate image quality using vision-capable models, or chain with other column types for custom analysis.

## Related Docs

* [Supported Providers](/features/supported-providers)
* [Tool Calling (Built-in Tools)](/features/prompt-registry/tool-calling)
* [Python SDK Run Method](/sdks/python#using-the-run-method-recommended)
* [JavaScript SDK Run Method](/sdks/javascript#using-the-run-method-recommended)
* [Custom Logging](/features/prompt-history/custom-logging)


# Telemetry Integrations
Source: https://docs.promptlayer.com/features/integrations

Send traces, spans, LLM calls, tool calls, and agent telemetry into PromptLayer.

Telemetry integrations send observability data from LLM frameworks, agent SDKs, and model routers into PromptLayer. Use these setup paths when you want PromptLayer to capture traces, spans, LLM calls, tool calls, prompts, completions, token usage, and model metadata from tools you already use.

Don't see your framework listed? You can send traces from **any** OpenTelemetry-compatible SDK or Collector using the [OpenTelemetry](/features/opentelemetry) page, or [email us](mailto:hello@promptlayer.com).

## LiteLLM

[LiteLLM](https://github.com/BerriAI/litellm) allows you to call any LLM API all using the OpenAI format. This is the easiest way to swap in and out new models and see which one works best for your prompts. Works with models such as Anthropic, HuggingFace, Cohere, PaLM, Replicate, Azure.

Please read the [LiteLLM documentation page](https://docs.litellm.ai/docs/observability/promptlayer_integration)

## LlamaIndex

[LlamaIndex](https://www.llamaindex.ai/) is a data framework for LLM-based applications. Read more about our integration on the [LlamaIndex documentation page](https://docs.llamaindex.ai/en/stable/module_guides/observability/observability.html#promptlayer)

## Claude Code

PromptLayer supports [Claude Code](https://docs.anthropic.com/en/docs/claude-code/overview) in two setup modes:

* **CLI:** install the PromptLayer Claude plugin directly into Claude Code.
* **SDK:** use the PromptLayer JavaScript or Python helper to inject the same tracing plugin and environment variables into your Claude SDK options.

The underlying tracing is the same in both cases. If you're using the SDK, you do not need to manually install the plugin or discover the plugin path yourself.

### CLI: Direct Plugin Install

Use this path if you're running Claude Code from the terminal and want PromptLayer enabled globally.

1. Install the plugin

```bash theme={null}
claude plugin marketplace add MagnivOrg/promptlayer-claude-plugins
claude plugin install trace@promptlayer-claude-plugins
```

2. Run the setup script

```bash theme={null}
$HOME/.claude/plugins/marketplaces/promptlayer-claude-plugins/plugins/trace/setup.sh
```

3. Enter your PromptLayer API key and keep the default endpoint: `https://api.promptlayer.com/v1/traces`
4. Start Claude Code and run a prompt

### SDK: JavaScript Or Python

Use this path if you're embedding Claude Code through Anthropic's SDK and want PromptLayer configured in code.

<Note>
  The PromptLayer Claude SDK helpers currently support macOS and Linux. Windows is not supported.
</Note>

1. Install the required packages

<CodeGroup>
  ```bash JavaScript theme={null}
  npm install promptlayer @anthropic-ai/claude-agent-sdk
  ```

  ```bash Python theme={null}
  pip install "promptlayer[claude-agents]"
  ```
</CodeGroup>

2. Generate PromptLayer Claude config and pass it into your Claude SDK options

<CodeGroup>
  ```ts JavaScript theme={null}
  import type { Options } from "@anthropic-ai/claude-agent-sdk";
  import { getClaudeConfig } from "promptlayer/claude-agents";

  const plClaudeConfig = getClaudeConfig();

  const options: Options = {
    model: "sonnet",
    cwd: process.cwd(),
    plugins: [plClaudeConfig.plugin],
    env: {
      ...process.env,
      ...plClaudeConfig.env,
    },
  };
  ```

  ```python Python theme={null}
  from claude_agent_sdk import ClaudeAgentOptions
  from promptlayer.integrations.claude_agents import get_claude_config

  pl_claude_config = get_claude_config()

  options = ClaudeAgentOptions(
      model="sonnet",
      cwd=".",
      plugins=[pl_claude_config.plugin],
      env={**pl_claude_config.env},
  )
  ```
</CodeGroup>

`getClaudeConfig()` and `get_claude_config()` read `PROMPTLAYER_API_KEY` by default and return:

* a local plugin reference for Claude SDK `plugins`
* PromptLayer environment variables for Claude SDK `env`

3. Start your Claude SDK client or agent with those options

Once configured, PromptLayer will capture Claude Code sessions, LLM calls, tool calls, prompts, completions, token usage, and model metadata.

For troubleshooting and additional details on the direct plugin install path, see the [PromptLayer Claude Code plugin repository](https://github.com/MagnivOrg/promptlayer-claude-plugins).

## OpenAI Agents SDK

PromptLayer supports the OpenAI Agents SDK in both [JavaScript](https://openai.github.io/openai-agents-js/) and [Python](https://openai.github.io/openai-agents-python/), allowing you to export agent traces directly to PromptLayer with a native PromptLayer trace processor.

To set up:

1. Install the required packages

<CodeGroup>
  ```bash JavaScript theme={null}
  npm install promptlayer @openai/agents
  ```

  ```bash Python theme={null}
  pip install "promptlayer[openai-agents]"
  ```
</CodeGroup>

2. Register PromptLayer tracing before your first agent run:

<CodeGroup>
  ```ts JavaScript theme={null}
  import { instrumentOpenAIAgents } from "promptlayer/openai-agents";

  const processor = await instrumentOpenAIAgents();
  ```

  ```python Python theme={null}
  from promptlayer.integrations.openai_agents import instrument_openai_agents

  instrument_openai_agents()
  ```
</CodeGroup>

3. Flush tracing before process exit so PromptLayer receives the final spans:

<CodeGroup>
  ```ts JavaScript theme={null}
  await processor.forceFlush();
  await processor.shutdown();
  ```

  ```python Python theme={null}
  # Optional - if you created your own tracer provider
  tracer_provider.force_flush()
  tracer_provider.shutdown()
  ```
</CodeGroup>

4. Set your environment variables:

* `OPENAI_API_KEY`
* `PROMPTLAYER_API_KEY`

Once configured, PromptLayer will capture OpenAI Agents traces with spans automatically labeled by type: agent spans appear as **LLM session**, response and generation spans appear as **LLM call**, and function/tool spans appear as **Tool: \<function name>**. Token usage, model metadata, prompts, and completions are captured from LLM call spans.

## OpenClaw

PromptLayer supports OpenClaw through the `@promptlayer/openclaw-promptlayer` plugin. To enable tracing, you need to explicitly install and enable the `openclaw-promptlayer` plugin in OpenClaw.

```bash theme={null}
openclaw plugins install @promptlayer/openclaw-promptlayer
openclaw plugins enable openclaw-promptlayer
```

Set `PROMPTLAYER_API_KEY` in the environment that OpenClaw runs with, then include the plugin in `openclaw.json` using the `openclaw-promptlayer` plugin id.

Once configured, PromptLayer will capture OpenClaw agent runs, LLM calls, tool calls, prompts, completions, token usage when available, and model metadata.

## Vercel AI SDK

PromptLayer supports integration with the [Vercel AI SDK](https://ai-sdk.dev/docs), allowing you to export OpenTelemetry traces from your application directly to PromptLayer.

To set up:

1. Install OpenTelemetry packages

```bash theme={null}
npm install @opentelemetry/sdk-node \
  @opentelemetry/exporter-trace-otlp-http \
  @opentelemetry/resources
```

2. Configure OpenTelemetry with PromptLayer as the exporter

```ts theme={null}
const sdk = new NodeSDK({
  serviceName: "your-app-name",
  resource: resourceFromAttributes({
    "promptlayer.telemetry.source": "vercel-ai-sdk",
  }),
  traceExporter: new OTLPTraceExporter({
    url: "https://api.promptlayer.com/v1/traces",
    headers: {
      "X-API-Key": process.env.PROMPTLAYER_API_KEY,
    },
  }),
});
```

3. Start the SDK before AI calls and shut it down before exit
4. Add `experimental_telemetry` to your AI SDK calls

```ts theme={null}
experimental_telemetry: {
  isEnabled: true,
  recordInputs: true,
  recordOutputs: true,
}
```

For best results, set `promptlayer.telemetry.source` to `vercel-ai-sdk` so PromptLayer can parse the traces correctly.

Once configured, PromptLayer will capture LLM calls, inputs and outputs, token usage, tool traces, workflow spans, model metadata, and reasoning content from models that support extended thinking.

## Pydantic AI

PromptLayer supports ingesting [Pydantic AI](https://ai.pydantic.dev/) traces through OpenTelemetry. Pydantic AI's instrumentation emits GenAI semantic convention attributes, so PromptLayer can convert model calls, tool calls, agent spans, and embeddings into traces and request logs.

To set up:

1. Install Pydantic AI, Logfire, and the OTLP HTTP exporter

```bash theme={null}
pip install "pydantic-ai-slim[logfire,openai,web]" \
  logfire \
  opentelemetry-exporter-otlp-proto-http
```

2. Configure OTLP export to PromptLayer before creating your agent

```python theme={null}
import os

import logfire

os.environ.setdefault("OTEL_EXPORTER_OTLP_TRACES_ENDPOINT", "https://api.promptlayer.com/v1/traces")
os.environ.setdefault("OTEL_EXPORTER_OTLP_HEADERS", f"X-API-KEY={os.environ['PROMPTLAYER_API_KEY']}")
os.environ.setdefault("OTEL_SERVICE_NAME", "pydantic-ai-app")

logfire.configure(send_to_logfire=False)
logfire.instrument_pydantic_ai()
```

3. Run your Pydantic AI agent normally

```python theme={null}
from pydantic_ai import Agent

agent = Agent("openai:gpt-5.2")
result = agent.run_sync("Write one sentence about OpenTelemetry.")
print(result.output)
```

If you also want to send traces to Logfire, set `send_to_logfire=True` and authenticate with Logfire. For provider-level request debugging, you can add `logfire.instrument_httpx(capture_all=True)`, but only enable it when you intentionally want to capture raw HTTP request and response bodies.

Once configured, PromptLayer will capture Pydantic AI agent runs, LLM calls, tool calls, embeddings, prompts, completions, token usage, model metadata, and thinking content from extended thinking models.

## LangChain / LangSmith

PromptLayer supports ingesting [LangChain](https://js.langchain.com/) traces through [LangSmith's OpenTelemetry bridge](https://docs.langchain.com/langsmith/trace-with-opentelemetry). For new JavaScript applications where you can choose the framework, use the [Vercel AI SDK](#vercel-ai-sdk) integration because its OpenTelemetry spans map more directly into PromptLayer. This is also aligned with LangChain / LangSmith's JavaScript observability path: their docs provide a first-party [Vercel AI SDK tracing guide](https://docs.langchain.com/langsmith/trace-with-vercel-ai-sdk). Use this LangChain / LangSmith path when you already have a LangChain application.

LangSmith's OTEL bridge creates LangChain spans automatically. Add manual spans for application-specific work, custom tool data, or extra inputs and outputs that should appear in PromptLayer traces.

To set up:

1. Install LangChain, LangSmith, and OpenTelemetry packages

```bash theme={null}
npm install @langchain/core @langchain/openai langsmith \
  @opentelemetry/api \
  @opentelemetry/context-async-hooks \
  @opentelemetry/exporter-trace-otlp-proto \
  @opentelemetry/sdk-trace-base
```

2. Enable LangSmith's OTEL tracing mode and configure PromptLayer as the OTLP destination

```bash theme={null}
LANGSMITH_TRACING=true
LANGSMITH_TRACING_MODE=otel
LANGCHAIN_CALLBACKS_BACKGROUND=false
OTEL_EXPORTER_OTLP_ENDPOINT=https://api.promptlayer.com/v1/traces
OTEL_EXPORTER_OTLP_HEADERS=X-API-KEY=<PROMPTLAYER_API_KEY>
```

3. Register an OpenTelemetry provider for the Node.js runtime

```ts theme={null}
import { context, trace } from "@opentelemetry/api";
import { AsyncLocalStorageContextManager } from "@opentelemetry/context-async-hooks";
import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-proto";
import { BasicTracerProvider, BatchSpanProcessor } from "@opentelemetry/sdk-trace-base";
import { initializeOTEL } from "langsmith/experimental/otel/setup";

const headers = {
  "X-API-KEY": process.env.OTEL_EXPORTER_OTLP_HEADERS?.replace("X-API-KEY=", "") ?? "",
};

const exporter = new OTLPTraceExporter({
  url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT,
  headers,
});
const provider = new BasicTracerProvider({
  spanProcessors: [new BatchSpanProcessor(exporter)],
});
const contextManager = new AsyncLocalStorageContextManager();

contextManager.enable();
context.setGlobalContextManager(contextManager);
trace.setGlobalTracerProvider(provider);

initializeOTEL({
  globalTracerProvider: provider,
  globalContextManager: contextManager,
  skipGlobalContextManagerSetup: true,
});
```

4. Optionally add manual spans for application-specific data

```ts theme={null}
import { trace } from "@opentelemetry/api";

await trace.getTracer("langchain-app").startActiveSpan("app.workflow_step", async (span) => {
  try {
    span.setAttribute("app.operation", "retrieve_context");
    span.setAttribute("app.input", JSON.stringify(input));

    const result = await runApplicationStep(input);

    span.setAttribute("app.result_count", result.items.length);
    return result;
  } finally {
    span.end();
  }
});
```

Once configured, PromptLayer will capture LangChain and LangSmith spans. PromptLayer renders request logs from LangSmith and LangChain LLM attributes, including model metadata, messages, outputs, token counts, and thinking blocks from extended thinking models.

## OpenRouter

PromptLayer supports ingesting traces from [OpenRouter](https://openrouter.ai) through OpenRouter's Broadcast integration for the [OpenTelemetry Collector](https://openrouter.ai/docs/guides/features/broadcast/otel-collector).

To set up:

1. Get your PromptLayer API key from your PromptLayer workspace
2. In OpenRouter, go to **Settings -> Observability**
3. Toggle **Enable Broadcast**
4. Click the edit icon next to **OpenTelemetry Collector**
5. Leave the default name or rename the destination if you want
6. Configure the destination with PromptLayer's OTLP endpoint:

```text theme={null}
Endpoint: https://api.promptlayer.com/v1/traces
```

7. Add your PromptLayer API key in the headers JSON:

```json theme={null}
{
  "X-API-Key": "pl_..."
}
```

8. Click **Test Connection**
9. Click **Send Trace** if you want to verify the integration end-to-end from OpenRouter's UI
10. Save the destination once the test passes
11. Send requests through OpenRouter as usual

Once configured, PromptLayer will ingest the OpenRouter OTLP spans and convert GenAI spans into trace views and request logs.

<Tip>
  This integration is for ingesting traces from OpenRouter into PromptLayer. If you also want to use OpenRouter models inside PromptLayer as a provider, see [OpenRouter](/features/openrouter-integration).
</Tip>


# Overview
Source: https://docs.promptlayer.com/features/observability

Use Observability to analyze app behavior, review PromptLayer usage, and turn useful history into datasets.

Observability helps you understand how your AI applications behave in production, testing, and development. Use it to optimize behavior, inspect workflows, and track cost and latency.

It also shows how your team uses PromptLayer: which users, workspaces, and applications are active over time.

**Request logs** and **traces** are the core artifacts. Request logs capture model calls, inputs, outputs, timing, tokens, cost, status, tags, metadata, scores, and prompt associations. Traces show span-level context for workflows, agents, tools, and multi-step logic. You can use both to create datasets for evaluations, backtests, and automation.

## What you can do

* Analyze application behavior with request logs and traces (spans, inputs, outputs, latency, cost, token usage, and more)
* Turn requests and traces into datasets for evaluations and regression tests.
* Review usage across workspaces, users, and environments.

## How it fits together

1. Log requests and traces with the PromptLayer SDK, REST API, custom logging, or OpenTelemetry.
2. Add metadata, tags, scores, and prompt associations so you can find the right runs later.
3. Use logs, traces, and analytics to understand application behavior across prompts, users, sessions, models, workflows, and environments.
4. Use the same data to understand how your team uses PromptLayer.
5. Convert useful history into datasets for evaluations and automated feedback loops.

## Viewing logs

Click **Logs** in the sidebar to see request history. You can filter by prompt, search by content, inspect errors, and review request details.

<Frame>
  <img alt="Request logs" />
</Frame>

You can also view logs for a specific prompt by clicking **Analytics & Logs** in the prompt editor.

<Frame>
  <img alt="Logs filtered by prompt" />
</Frame>

From the logs table, select historical requests and add them to a dataset when you want to backtest a prompt change.

## Next steps

<CardGroup>
  <Card title="Analytics" icon="chart-pie-simple" href="/why-promptlayer/analytics">
    Track cost, latency, request volume, token usage, models, prompts, tags, and metadata.
  </Card>

  <Card title="Traces" icon="diagram-project" href="/running-requests/traces">
    Inspect span hierarchies, timing, inputs, outputs, errors, and linked request logs.
  </Card>

  <Card title="Advanced Search" icon="magnifying-glass" href="/why-promptlayer/advanced-search">
    Find logs by request content, metadata, tags, scores, status, model, prompt, and usage fields.
  </Card>

  <Card title="Import History into Tables" icon="history" href="/features/tables/overview#import-data">
    Build evaluation and backtesting Tables from filtered request history and production examples.
  </Card>

  <Card title="Advanced Logging" icon="cassette-tape" href="/features/prompt-history/request-id">
    Add request IDs, metadata, tags, scores, prompt associations, and custom logs from code.
  </Card>
</CardGroup>


# OpenRouter
Source: https://docs.promptlayer.com/features/openrouter-integration

Set up OpenRouter as a custom provider in PromptLayer.

[OpenRouter](https://openrouter.ai) provides access to a wide variety of cutting-edge models through a unified API, including models like DeepSeek, Claude, GPT-4, and many others that may not be available through standard providers.

## Setting Up OpenRouter as a Custom Provider

To use OpenRouter models in PromptLayer:

1. **Get an OpenRouter API Key**: Sign up at [OpenRouter](https://openrouter.ai) and obtain your API key from their dashboard
2. Navigate to **Settings → Custom Providers and Models** in your PromptLayer dashboard
3. Click **Create Custom Provider**
4. Configure the provider with the following details:
   * **Name**: OpenRouter
   * **Client**: OpenAI (OpenRouter uses OpenAI-compatible endpoints)
   * **Base URL**: `https://openrouter.ai/api/v1`
   * **API Key**: Your OpenRouter API key

## Creating Custom Models (Recommended)

For easier model selection in the Playground and Prompt Registry, you can save specific OpenRouter models:

1. In **Settings → Custom Providers and Models**, find your OpenRouter provider in the list
2. Click on the OpenRouter row to expand it
3. Click **Create Custom Model** in the expanded section
4. Configure each model:
   * **Model Name**: Enter the OpenRouter model identifier (e.g., `deepseek/deepseek-chat`, `anthropic/claude-3.5-sonnet`)
   * **Display Name**: A friendly name like "DeepSeek Chat" or "Claude 3.5 Sonnet"
   * **Model Type**: Chat
5. Repeat for each model you want to use

The full list of available models can be found in [OpenRouter's documentation](https://openrouter.ai/docs/overview/models).

## Available Models

OpenRouter regularly updates their model offerings and provides access to many providers. Example models include:

* **`deepseek/deepseek-chat`**: DeepSeek's latest chat model
* **`anthropic/claude-3.5-sonnet`**: Claude 3.5 Sonnet via OpenRouter
* **`openai/gpt-4-turbo`**: GPT-4 Turbo via OpenRouter
* **`google/gemini-pro-1.5`**: Gemini Pro 1.5 via OpenRouter
* **`meta-llama/llama-3.1-405b`**: Llama 3.1 405B via OpenRouter

For the complete and up-to-date list of available models, visit [OpenRouter's models documentation](https://openrouter.ai/docs/overview/models).

## Using OpenRouter in PromptLayer

### In the Playground

After setup, you can use OpenRouter models in the PromptLayer Playground:

1. Open the Playground
2. Select your OpenRouter provider from the provider dropdown
3. Choose your desired model (or type the model identifier)
4. Start querying with your prompts

### In the Prompt Registry

OpenRouter models work seamlessly with PromptLayer's Prompt Registry:

* Select OpenRouter models when creating or editing prompt templates
* Use templates with OpenRouter models in evaluations
* Track and analyze OpenRouter API usage alongside other providers

### Key Benefits

OpenRouter provides:

* **Wide Model Selection**: Access to models from multiple providers through one API
* **Automatic Rate Limiting and Failover**: OpenRouter handles rate limiting between providers
* **Cost Optimization**: Compare pricing across different models and providers
* **Model Availability**: Access to models that might not be directly available in your region

## SDK Usage

Once you've set up your OpenRouter custom provider and created a prompt template in the dashboard, you can run it programmatically with the PromptLayer SDK:

```python theme={null}
from promptlayer import PromptLayer

promptlayer = PromptLayer(api_key="pl_****")

# Run a prompt template that uses your OpenRouter custom provider
response = promptlayer.run(
    prompt_name="your-openrouter-prompt",
    input_variables={"query": "your input"}
)

# Access the response
print(response["raw_response"].choices[0].message.content)

# The request is automatically logged with request_id
print(f"Request ID: {response['request_id']}")
```

<Info>
  Using [`promptlayer.run()`](/sdks/python#using-the-run-method-recommended) ensures your requests are properly logged to PromptLayer and leverages your prompt templates from the Prompt Registry. This is the recommended approach for production use.
</Info>

## Related Documentation

* [Custom Providers](/features/custom-providers)
* [Supported Providers](/features/supported-providers)
* [OpenRouter Official Documentation](https://openrouter.ai/docs)


# OpenTelemetry
Source: https://docs.promptlayer.com/features/opentelemetry


PromptLayer natively supports [OpenTelemetry (OTEL)](https://opentelemetry.io/), the industry-standard observability framework. You can send traces from **any** OpenTelemetry-compatible SDK or Collector directly to PromptLayer — no PromptLayer SDK required.

This is ideal when:

* Your framework isn't listed on the [Telemetry Integrations](/features/integrations) page
* You already have an OpenTelemetry pipeline and want to add PromptLayer as a destination
* You want vendor-neutral instrumentation

<Note>
  If you're using a supported framework like the [Vercel AI SDK](/features/integrations#vercel-ai-sdk), [OpenAI Agents SDK](/features/integrations#openai-agents-sdk), or [Claude Code](/features/integrations#claude-code), see the [Telemetry Integrations](/features/integrations) page for framework-specific setup — those integrations handle the OTEL configuration for you.
</Note>

## How It Works

PromptLayer exposes an [OTLP/HTTP endpoint](/reference/otlp-ingest-traces) at:

```
https://api.promptlayer.com/v1/traces
```

Any OpenTelemetry SDK or Collector can export traces to this endpoint. Spans that include [GenAI semantic convention](https://opentelemetry.io/docs/specs/semconv/gen-ai/) attributes are automatically converted into PromptLayer request logs.

## Setup

Configure your OpenTelemetry SDK to export traces to PromptLayer using the OTLP/HTTP exporter.

<CodeGroup>
  ```python Python theme={null}
  from opentelemetry.sdk.trace import TracerProvider
  from opentelemetry.sdk.trace.export import BatchSpanProcessor
  from opentelemetry.sdk.resources import Resource
  from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

  # Install required packages:
  # pip install opentelemetry-sdk opentelemetry-exporter-otlp-proto-http

  exporter = OTLPSpanExporter(
      endpoint="https://api.promptlayer.com/v1/traces",
      headers={"X-API-KEY": "your-promptlayer-api-key"},
  )

  provider = TracerProvider(
      resource=Resource.create({"service.name": "my-llm-app"})
  )
  provider.add_span_processor(BatchSpanProcessor(exporter))

  # Use the tracer to create spans
  tracer = provider.get_tracer("my-llm-app")
  ```

  ```javascript JavaScript theme={null}
  // Install required packages:
  // npm install @opentelemetry/sdk-node @opentelemetry/exporter-trace-otlp-http @opentelemetry/resources

  import { NodeSDK } from "@opentelemetry/sdk-node";
  import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";
  import { resourceFromAttributes } from "@opentelemetry/resources";

  const sdk = new NodeSDK({
    serviceName: "my-llm-app",
    resource: resourceFromAttributes({
      "service.name": "my-llm-app",
    }),
    traceExporter: new OTLPTraceExporter({
      url: "https://api.promptlayer.com/v1/traces",
      headers: {
        "X-API-Key": process.env.PROMPTLAYER_API_KEY,
      },
    }),
  });

  sdk.start();

  // Shut down before exit to flush remaining spans
  process.on("beforeExit", async () => {
    await sdk.shutdown();
  });
  ```
</CodeGroup>

## GenAI Semantic Conventions

Spans that use [GenAI semantic conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/) are automatically parsed into PromptLayer request logs. Add these attributes to your LLM call spans:

| Attribute                        | Description                                              |
| -------------------------------- | -------------------------------------------------------- |
| `gen_ai.request.model`           | Model name (e.g. `gpt-4`, `claude-sonnet-4-20250514`)    |
| `gen_ai.provider.name`           | Provider (e.g. `openai`, `anthropic`)                    |
| `gen_ai.operation.name`          | Operation type (`chat`, `text_completion`, `embeddings`) |
| `gen_ai.usage.input_tokens`      | Input token count                                        |
| `gen_ai.usage.output_tokens`     | Output token count                                       |
| `gen_ai.input.messages`          | Request messages                                         |
| `gen_ai.output.messages`         | Response messages                                        |
| `gen_ai.request.temperature`     | Temperature parameter                                    |
| `gen_ai.request.max_tokens`      | Max tokens parameter                                     |
| `gen_ai.response.finish_reasons` | Finish reasons                                           |

### Event-Based Conventions

PromptLayer also supports the newer [event-based GenAI semantic conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-events/) where message content is sent as span events rather than span attributes. This format is used by frameworks like [LiveKit](https://docs.livekit.io/) and newer versions of OpenTelemetry GenAI instrumentation.

The following event types are recognized:

| Event Name                 | Description                              |
| -------------------------- | ---------------------------------------- |
| `gen_ai.system.message`    | System message                           |
| `gen_ai.user.message`      | User message                             |
| `gen_ai.assistant.message` | Assistant message (including tool calls) |
| `gen_ai.tool.message`      | Tool/function result message             |
| `gen_ai.choice`            | Model response/choice                    |

Event attributes like `gen_ai.system.message.content`, `gen_ai.user.message.content`, and tool call data are automatically extracted and mapped to PromptLayer request logs.

<Note>
  When both attribute-based messages (`gen_ai.input.messages`) and event-based messages are present on the same span, attribute-based messages take priority.
</Note>

## Linking to Prompt Templates

You can associate OTEL spans with prompt templates in your PromptLayer workspace by setting custom span attributes:

| Attribute                    | Type    | Description                                   |
| ---------------------------- | ------- | --------------------------------------------- |
| `promptlayer.prompt.name`    | string  | Name of the prompt template in your workspace |
| `promptlayer.prompt.version` | integer | Specific version number to link (optional)    |

<CodeGroup>
  ```python Python theme={null}
  from opentelemetry import trace

  tracer = trace.get_tracer("my-llm-app")

  with tracer.start_as_current_span("llm-call") as span:
      # Link this span to a prompt template
      span.set_attribute("promptlayer.prompt.name", "my-prompt")
      span.set_attribute("promptlayer.prompt.version", 3)

      # Add GenAI attributes
      span.set_attribute("gen_ai.request.model", "gpt-4")
      span.set_attribute("gen_ai.provider.name", "openai")

      # ... make your LLM call ...
  ```

  ```javascript JavaScript theme={null}
  import { trace } from "@opentelemetry/api";

  const tracer = trace.getTracer("my-llm-app");

  tracer.startActiveSpan("llm-call", (span) => {
    // Link this span to a prompt template
    span.setAttribute("promptlayer.prompt.name", "my-prompt");
    span.setAttribute("promptlayer.prompt.version", 3);

    // Add GenAI attributes
    span.setAttribute("gen_ai.request.model", "gpt-4");
    span.setAttribute("gen_ai.provider.name", "openai");

    // ... make your LLM call ...

    span.end();
  });
  ```
</CodeGroup>

## Attaching User Identity & Metadata

You can attach searchable metadata — including end-user identity and conversation IDs — to the request logs generated from your spans. This is the OpenTelemetry-native equivalent of the PromptLayer SDK's `track.metadata()`, with no extra REST call required.

PromptLayer recognizes two kinds of span attributes for metadata.

### Standard OpenTelemetry attributes

If your instrumentation already follows OpenTelemetry conventions, these are picked up automatically — no PromptLayer-specific attributes needed:

| Attribute                | Mapped to                  | Description                                                                     |
| ------------------------ | -------------------------- | ------------------------------------------------------------------------------- |
| `user.id`                | `user_id` metadata         | End-user identity (current OpenTelemetry attribute)                             |
| `enduser.id`             | `user_id` metadata         | End-user identity (deprecated OpenTelemetry attribute, supported as a fallback) |
| `gen_ai.conversation.id` | `conversation_id` metadata | Conversation / session / thread identifier (OpenTelemetry GenAI attribute)      |
| `session.id`             | `conversation_id` metadata | Conversation identifier (common alias, supported as a fallback)                 |

### PromptLayer custom metadata

For arbitrary key/value metadata, use the `promptlayer.metadata.` namespace. Each attribute becomes a metadata key on the request log — for example, `promptlayer.metadata.tenant` becomes a `tenant` metadata key.

| Attribute                    | Mapped to        |
| ---------------------------- | ---------------- |
| `promptlayer.metadata.<key>` | `<key>` metadata |

<CodeGroup>
  ```python Python theme={null}
  with tracer.start_as_current_span("llm-call") as span:
      # Standard OpenTelemetry attributes
      span.set_attribute("user.id", "customer-42")
      span.set_attribute("gen_ai.conversation.id", "conv_abc123")

      # Arbitrary PromptLayer metadata
      span.set_attribute("promptlayer.metadata.tenant", "acme-corp")
      span.set_attribute("promptlayer.metadata.environment", "production")

      # ... make your LLM call ...
  ```

  ```javascript JavaScript theme={null}
  tracer.startActiveSpan("llm-call", (span) => {
    // Standard OpenTelemetry attributes
    span.setAttribute("user.id", "customer-42");
    span.setAttribute("gen_ai.conversation.id", "conv_abc123");

    // Arbitrary PromptLayer metadata
    span.setAttribute("promptlayer.metadata.tenant", "acme-corp");
    span.setAttribute("promptlayer.metadata.environment", "production");

    // ... make your LLM call ...

    span.end();
  });
  ```
</CodeGroup>

An explicit `promptlayer.metadata.<key>` always takes precedence over a standard attribute mapped to the same key. For example, if a span has both `user.id` and `promptlayer.metadata.user_id`, the `promptlayer.metadata.user_id` value wins.

<Note>
  Metadata is attached to the request log generated from the span, so set these attributes on your **LLM call spans**. To apply metadata across an entire trace, set the attributes as **resource attributes** — they apply to every span in the export.
</Note>

## Using an OpenTelemetry Collector

If you're already running an [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/), you can add PromptLayer as an additional exporter in your Collector config:

```yaml theme={null}
exporters:
  otlphttp/promptlayer:
    endpoint: "https://api.promptlayer.com"
    headers:
      X-API-Key: "${PROMPTLAYER_API_KEY}"

service:
  pipelines:
    traces:
      exporters: [otlphttp/promptlayer]
```

This lets you fan out traces to PromptLayer alongside your existing observability backends (Datadog, New Relic, Jaeger, etc.) without changing your application code.

## Content Types

The endpoint accepts both binary protobuf (`application/x-protobuf`, recommended) and JSON (`application/json`) encodings. Both support `Content-Encoding: gzip`.

## Next Steps

* [OTLP Ingest Traces API Reference](/reference/otlp-ingest-traces) — full endpoint documentation
* [Telemetry Integrations](/features/integrations) — framework-specific setups (Vercel AI SDK, OpenAI Agents, Claude Code)
* [Traces](/running-requests/traces) — PromptLayer SDK native tracing with `@traceable` and `wrapWithSpan`


# Custom Logging
Source: https://docs.promptlayer.com/features/prompt-history/custom-logging


## When to Use Custom Logging

Use the `log_request` method when:

* You're **not** using `pl_client.run()` for executing prompts
* You need more flexibility (e.g., background processing, custom models)
* You want to track requests made with your own LLM client code (OpenAI, Anthropic, etc.)
* You want to log requests from any LLM provider

While custom logging requires more manual work than the `run()` method, it offers greater control over the logging process and supports any LLM provider.

## API Reference

For complete documentation on the `log_request` API, see the [Log Request API Reference](/reference/log-request).

## Request Parameters

When logging a custom request, you can use the following parameters (see [API Reference](/reference/log-request) for details):

* **`provider`** (required): The LLM provider name (e.g., "openai", "anthropic")
* **`model`** (required): The specific model used (e.g., "gpt-4o", "claude-3-7-sonnet-20250219")
* **`input`** (required): The input prompt in Prompt Blueprint format
* **`output`** (required): The model response in Prompt Blueprint format
* **`request_start_time`**: Timestamp when the request started
* **`request_end_time`**: Timestamp when the response was received
* **`prompt_name`**: Name of the prompt template if using one from PromptLayer
* **`prompt_id`**: Unique identifier for the prompt template
* **`prompt_version_number`**: Version number of the prompt template
* **`prompt_input_variables`**: Variables used in the prompt template
* **`input_tokens`**: Number of input tokens used
* **`output_tokens`**: Number of output tokens generated
* **`tags`**: Array of strings for categorizing requests
* **`metadata`**: Custom JSON object for ability to search and filter requests later
* **`api_type`**: Api type to be used when working with openai/azure-openai (e.g, "chat-completions", "responses")

## Basic Usage

The `input` and `output` must be in [prompt blueprint format](/running-requests/prompt-blueprints):

<Warning>
  **Message `content` must be an array of content blocks, not a plain string.**

  Use `"content": [{"type": "text", "text": "your message"}]` instead of `"content": "your message"`. Using a plain string will result in a "Malformed Request" error in the dashboard.
</Warning>

```python theme={null}
pl_client.log_request(
    provider="provider_name",
    model="model_name",
    input=input_in_blueprint_format,
    output=output_in_blueprint_format,
    # Optional parameters
    request_start_time=start_time,
    request_end_time=end_time,
    prompt_name="template_name",
    prompt_version_number=1,
    prompt_input_variables={"variable": "value"},
    tags=["tag1", "tag2"],
    metadata={"custom_field": "value"}
    # Only to be provided in case of openai/azure-openai (chat-completions/responses)
    api_type=api_type 
)
```

## Provider Conversion Helpers

### OpenAI Format Converter

```python theme={null}
def openai_to_blueprint(messages, completion=None):
    """Convert OpenAI format to PromptLayer blueprint format."""
    # Convert input
    input_blueprint = {
        "type": "chat",
        "messages": [
            {
                "role": msg["role"],
                "content": [{"type": "text", "text": msg["content"]}]
                if isinstance(msg["content"], str) else msg["content"]
            }
            for msg in messages
        ]
    }

    # Convert output if provided
    output_blueprint = None
    if completion:
        if hasattr(completion.choices[0].message, "tool_calls") and completion.choices[0].message.tool_calls:
            # Handle tool calls - IMPORTANT: content must be an empty array, not null/None
            output_blueprint = {
                "type": "chat",
                "messages": [{
                    "role": "assistant",
                    "content": [],  # Required: must be empty array for tool-only responses
                    "tool_calls": [
                        {
                            "id": tool_call.id,
                            "type": "function",
                            "function": {
                                "name": tool_call.function.name,
                                "arguments": tool_call.function.arguments
                            }
                        }
                        for tool_call in completion.choices[0].message.tool_calls
                    ]
                }]
            }
        else:
            # Standard response
            output_blueprint = {
                "type": "chat",
                "messages": [{
                    "role": "assistant",
                    "content": [{"type": "text", "text": completion.choices[0].message.content}]
                }]
            }

    return input_blueprint, output_blueprint
```

### Anthropic Format Converter

```python theme={null}
def anthropic_to_blueprint(messages, system=None, response=None):
    """Convert Anthropic format to PromptLayer blueprint format."""
    # Convert input
    input_blueprint = {
        "type": "chat",
        "messages": []
    }
    
    # Add system message if present
    if system:
        input_blueprint["messages"].append({
            "role": "system",
            "content": [{"type": "text", "text": system}]
        })
    
    # Add conversation messages
    for msg in messages:
        input_blueprint["messages"].append({
            "role": msg["role"],
            "content": [{"type": "text", "text": msg["content"]}]
            if isinstance(msg["content"], str) else msg["content"]
        })
    
    # Convert output if provided
    output_blueprint = None
    if response:
        response_text = ""
        if hasattr(response, "content") and response.content:
            # Get text from first content block
            if hasattr(response.content[0], "text"):
                response_text = response.content[0].text
        
        output_blueprint = {
            "type": "chat",
            "messages": [{
                "role": "assistant",
                "content": [{"type": "text", "text": response_text}]
            }]
        }
    
    return input_blueprint, output_blueprint
```

## OpenAI Example

```python theme={null}
from openai import OpenAI
from promptlayer import PromptLayer
import time

# Setup clients
pl_client = PromptLayer(api_key="pl_...")
client = OpenAI()

# Prepare request
messages = [
    {"role": "system", "content": "You are an AI assistant"},
    {"role": "user", "content": "Write a one-sentence bedtime story about a unicorn."}
]

# Execute request
request_start_time = time.time()
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=messages
)
request_end_time = time.time()

# Convert formats
input_blueprint, output_blueprint = openai_to_blueprint(messages, completion)

# Log to PromptLayer
pl_client.log_request(
    provider="openai",
    model="gpt-4o",
    input=input_blueprint,
    output=output_blueprint,
    request_start_time=request_start_time,
    request_end_time=request_end_time,
    input_tokens=completion.usage.prompt_tokens,
    output_tokens=completion.usage.completion_tokens,
    api_type="chat-completions"
)
```

## Anthropic Example

```python theme={null}
import anthropic
from promptlayer import PromptLayer
import time

# Setup clients
pl_client = PromptLayer(api_key="pl_...")
client = anthropic.Anthropic()

# Prepare request
system = "You are a seasoned data scientist at a Fortune 500 company."
messages = [
    {"role": "user", "content": "Analyze this dataset for anomalies: <dataset>{{DATASET}}</dataset>"}
]

# Execute request
request_start_time = time.time()
response = client.messages.create(
    model="claude-3-7-sonnet-20250219",
    system=system,
    max_tokens=2048,
    messages=messages
)
request_end_time = time.time()

# Convert formats
input_blueprint, output_blueprint = anthropic_to_blueprint(messages, system, response)

# Log to PromptLayer
pl_client.log_request(
    provider="anthropic",
    model="claude-3-7-sonnet-20250219",
    input=input_blueprint,
    output=output_blueprint,
    request_start_time=request_start_time,
    request_end_time=request_end_time,
    input_tokens=response.usage.input_tokens,
    output_tokens=response.usage.output_tokens
)
```

## Logging Extended Thinking and Reasoning

When logging requests that use extended thinking (Anthropic), thinking mode (Google), or reasoning (OpenAI), you need to:

1. **Include the thinking configuration in `parameters`** - Use the provider-specific format
2. **Include thinking content blocks in the output** - Add them to the message content array

### Anthropic Extended Thinking

For Anthropic models with extended thinking enabled, pass the `thinking` parameter:

```python theme={null}
pl_client.log_request(
    provider="anthropic",
    model="claude-3-7-sonnet-20250219",
    input=input_blueprint,
    output={
        "type": "chat",
        "messages": [{
            "role": "assistant",
            "content": [
                {
                    "type": "thinking",
                    "thinking": "Let me analyze this step by step...",
                    "signature": "ErUBCk..."  # Include if returned by the API
                },
                {
                    "type": "text",
                    "text": "Based on my analysis, here's the answer..."
                }
            ]
        }]
    },
    parameters={
        "max_tokens": 16000,
        "thinking": {
            "type": "enabled",
            "budget_tokens": 10000
        }
    },
    request_start_time=request_start_time,
    request_end_time=request_end_time
)
```

### Google/Gemini Thinking Mode

For Google models with thinking mode, use the `thinking_config` parameter:

```python theme={null}
pl_client.log_request(
    provider="google",
    model="gemini-3.1-pro-preview",
    input=input_blueprint,
    output=output_blueprint,
    parameters={
        "thinking_config": {
            "include_thoughts": True,
            "thinking_budget": 8000
        }
    },
    request_start_time=request_start_time,
    request_end_time=request_end_time
)
```

### OpenAI Reasoning Models (o1, o3, etc.)

For OpenAI reasoning models, use the `reasoning_effort` parameter:

```python theme={null}
pl_client.log_request(
    provider="openai",
    model="o1",
    input=input_blueprint,
    output=output_blueprint,
    parameters={
        "reasoning_effort": "high"  # Options: "low", "medium", "high"
    },
    request_start_time=request_start_time,
    request_end_time=request_end_time,
    api_type="chat-completions"
)
```

<Note>
  The parameter name varies by provider — make sure to use the correct format for your provider as shown above.
</Note>

## Working with Tools and Function Calls

For OpenAI/Anthropic function calling or tool use:

<Warning>
  **Important**: When logging assistant messages with tool calls but no text content, you must include an empty `content` array (not `null` or omitted). This ensures proper display in the PromptLayer dashboard.
</Warning>

```python theme={null}
# Assistant message with tool calls (no text response)
assistant_tool_call_blueprint = {
    "type": "chat",
    "messages": [{
        "role": "assistant",
        "content": [],  # Required: empty array, not null
        "tool_calls": [{
            "id": "call_abc123",
            "type": "function",
            "function": {
                "name": "get_weather",
                "arguments": "{\"location\": \"San Francisco\"}"
            }
        }]
    }]
}

# Tool response example
tool_response_blueprint = {
    "type": "chat",
    "messages": [
        # Previous messages...
        {
            "role": "tool",
            "content": [{"type": "text", "text": "{\"temperature\": 72}"}],
            "tool_call_id": "call_abc123"
        }
    ]
}
```

## Error Tracking

Use `status`, `error_type`, and `error_message` when logging failed or degraded requests. This keeps warnings and failures searchable in PromptLayer and makes provider reliability easier to monitor.

* `SUCCESS` means the request completed normally.
* `WARNING` means the request succeeded but had issues, such as retries or partial responses.
* `ERROR` means the provider call or prompt rendering failed.

Use a specific `error_type` such as `PROVIDER_TIMEOUT`, `PROVIDER_RATE_LIMIT`, `PROVIDER_AUTH_ERROR`, `TEMPLATE_RENDER_ERROR`, or `UNKNOWN_ERROR` when possible, and include a short `error_message` with the provider or application detail.

## Complete JavaScript Example

```javascript theme={null}
import { PromptLayer } from "@promptlayer/promptlayer";
import { OpenAI } from "openai";

const plClient = new PromptLayer({ apiKey: "pl_..." });
const openai = new OpenAI();

// Helper function
function openaiToBlueprint(messages, completion = null) {
  const inputBlueprint = {
    type: "chat",
    messages: messages.map(msg => ({
      role: msg.role,
      content: typeof msg.content === "string" 
        ? [{ type: "text", text: msg.content }] 
        : msg.content
    }))
  };
  
  let outputBlueprint = null;
  if (completion) {
    outputBlueprint = {
      type: "chat",
      messages: [{
        role: "assistant",
        content: [{
          type: "text",
          text: completion.choices[0].message.content
        }]
      }]
    };
  }
  
  return { inputBlueprint, outputBlueprint };
}

async function main() {
  const messages = [
    { role: "user", content: "Hello world" }
  ];
  
  const requestStartTime = Date.now();
  const completion = await openai.chat.completions.create({
    model: "gpt-4o",
    messages
  });
  const requestEndTime = Date.now();
  
  const { inputBlueprint, outputBlueprint } = openaiToBlueprint(messages, completion);
  
  await plClient.logRequest({
    provider: "openai",
    model: "gpt-4o",
    input: inputBlueprint,
    output: outputBlueprint,
    requestStartTime,
    requestEndTime,
    inputTokens: completion.usage.prompt_tokens,
    outputTokens: completion.usage.completion_tokens,
    api_type: "chat-completions"
  });
}

main();
```


# Metadata
Source: https://docs.promptlayer.com/features/prompt-history/metadata


PromptLayer allows you to attach multiple key value pairs as metadata to a request. In the dashboard, you can look up requests and analyze analytics using metadata.

We recommend using this for things like session IDs, user IDs, or error messages. Metadata is useful to help you use the [advanced search](/why-promptlayer/advanced-search) or understand the Analytics page.

## Add metadata when running a prompt

Pass metadata directly into `client.run()` when you already know the request context, such as the user, session, or feature that triggered the prompt.

```python Python theme={null}
response = client.run(
    prompt_name="cake-recipe",
    input_variables={"cake_type": "Chocolate", "serving_size": "8"},
    tags=["production", "recipe-feature"],
    metadata={
        "user_id": "user_123",
        "session_id": "sess_abc"
    }
)

client.track.score(request_id=response["request_id"], score=95)
```

Your metadata and tags appear in the log details, letting you filter and search by user or feature.

<Frame>
  <img alt="Log with metadata" />
</Frame>

## Add metadata after a request

Use `track.metadata()` when you need to attach or update metadata after a request has already run.

[Endpoint Reference](/reference/track-metadata)

<CodeGroup>
  ```python Python theme={null}
  promptlayer_client.track.metadata(
    request_id=pl_request_id,
    metadata={
        "user_id":"1abf2345f",
        "post_id": "2cef2345f"
    }
  )
  ```

  ```js JavaScript theme={null}
  promptLayerClient.track.metadata({
    request_id: pl_request_id,
    metadata: {
        "user_id":"1abf2345f",
        "post_id": "2cef2345f"
    }
  })
  ```

  ```java Java theme={null}
  ```

  ```bash REST theme={null}
  curl --request POST \
    --url https://api.promptlayer.com/rest/track-metadata \
    --header 'Content-Type: application/json' \
    --header 'X-API-KEY: pl_<YOUR API KEY>' \
    --data '{
      "request_id": "<REQUEST ID>",
      "metadata": {
        "user_id":"1abf2345f",
        "post_id": "2cef2345f"
      }
    }'
  ```
</CodeGroup>

Things to note:

1. Currently keys and values need to be strings in PromptLayer.
2. If you track a key that was already tracked before for a specific request\_id, the value that corresponds to that key will be replaced.

***

Once metadata is added, you will then be able to see it in the web UI.

<img alt="score" />

Metadata is optimized for high-cardinality, request-specific values such as user IDs, session IDs, and error messages. For a smaller set of categories, such as environment, app, feature, or pipeline stage, use [tags](/features/prompt-history/tagging-requests) instead.

## Attaching metadata via OpenTelemetry

If you instrument your app with OpenTelemetry, you can attach metadata directly from span attributes — no `track.metadata()` call required. PromptLayer automatically maps standard attributes like `user.id` and `gen_ai.conversation.id`, and also reads arbitrary `promptlayer.metadata.<key>` attributes.

See [Attaching User Identity & Metadata](/features/opentelemetry#attaching-user-identity-%26-metadata) in the OpenTelemetry guide for details.


# Request IDs
Source: https://docs.promptlayer.com/features/prompt-history/request-id


Every PromptLayer log has a unique PromptLayer Request ID (`pl_id`).

All tracking in PromptLayer is based on the `pl_request_id`. This identifier is needed to enrich logs with [metadata](/features/prompt-history/metadata), [scores](/features/prompt-history/scoring-requests), [associated prompt templates](/features/prompt-history/tracking-templates), and more. You can also use it to [retrieve the full request payload](/reference/get-request) as a prompt blueprint.

You can quickly grab a request ID from the web UI as shown below.

<img />

Specific instructions for retrieving the ID programmatically are below.

## REST API

The `pl_request_id` is returned as `request_id` in the case of a successful request when using the `REST api` with `/log-request`. This means that `request_id` will be a key in the object returned by a successful logged response.

[Learn more](/reference/log-request)

## Using the `run` Method

The `run()` method returns the `request_id` directly in the response object. This is the recommended way to retrieve the `pl_request_id`.

<CodeGroup>
  ```python Python theme={null}
  from promptlayer import PromptLayer
  promptlayer_client = PromptLayer()

  response = promptlayer_client.run(
      prompt_name="my-prompt",
      input_variables={"name": "Alice"}
  )

  pl_request_id = response["request_id"]
  print(pl_request_id)
  ```

  ```js JavaScript theme={null}
  import { PromptLayer } from "promptlayer";
  const promptLayerClient = new PromptLayer();

  const response = await promptLayerClient.run({
      promptName: "my-prompt",
      inputVariables: { name: "Alice" }
  });

  const plRequestId = response.request_id;
  console.log(plRequestId);
  ```
</CodeGroup>

## Using the `log_request` Method

When using `log_request` for custom logging, the method returns the `request_id` in its response.

<CodeGroup>
  ```python Python theme={null}
  from promptlayer import PromptLayer
  pl_client = PromptLayer()

  result = pl_client.log_request(
      provider="openai",
      model="gpt-4o",
      input=input_blueprint,
      output=output_blueprint,
      request_start_time=start_time,
      request_end_time=end_time
  )

  pl_request_id = result["request_id"]
  ```

  ```js JavaScript theme={null}
  import { PromptLayer } from "promptlayer";
  const plClient = new PromptLayer();

  const result = await plClient.logRequest({
      provider: "openai",
      model: "gpt-4o",
      input: inputBlueprint,
      output: outputBlueprint,
      requestStartTime: startTime,
      requestEndTime: endTime
  });

  const plRequestId = result.request_id;
  ```
</CodeGroup>


# Score Requests
Source: https://docs.promptlayer.com/features/prompt-history/scoring-requests


Every PromptLayer request can be given an integer score 0-100.

<img alt="score" />

To associate a score with a prompt, you can either do this visually from the dashboard or programmatically.
By default, an individual score is named default. You can enrich a request with multiple scores using "named scores" as shown below.
[Endpoint Reference](/reference/track-score)

<CodeGroup>
  ```python Python theme={null}
  # named score
  promptlayer_client.track.score(
    request_id=pl_request_id, 
    score_name="summarization",
    score=100
  )

  # default score
  promptlayer_client.track.score(
    request_id=pl_request_id, 
    # score_name="default",
    score=100
  )
  ```

  ```js JavaScript theme={null}
  promptLayerClient.track.score({
    request_id: pl_request_id,
    score: 100
  })
  ```

  ```bash REST theme={null}
  curl --request POST \
    --url https://api.promptlayer.com/rest/track-score \
    --header 'Content-Type: application/json' \
    --header 'X-API-KEY: pl_<YOUR API KEY>' \
    --data '{
      "request_id": "<REQUEST ID>",
      "score": <YOUR SCORE>,
      "name": <YOUR SCORE NAME>
    }'
  ```
</CodeGroup>


# Search Data Model
Source: https://docs.promptlayer.com/features/prompt-history/search-data-model


When you log requests through PromptLayer, we process and index the data to make it searchable. Understanding how your data is indexed will help you write more effective filters when using the [Search Request Logs](/reference/search-request-logs) API or the dashboard's advanced search.

## How Data Gets Indexed

PromptLayer takes your request data — the prompt input, model output, metadata, and input variables — and flattens nested structures into searchable key-value pairs. This allows you to filter on deeply nested fields using dot-notation paths.

For example, if your output is:

```json theme={null}
{
  "result": {
    "status": "approved",
    "score": 0.95
  }
}
```

This becomes two searchable entries:

* `result.status` → `"approved"`
* `result.score` → `0.95`

You can then filter with:

```json theme={null}
{
  "field": "output",
  "operator": "key_equals",
  "value": "approved",
  "nested_key": "result.status"
}
```

## Input Text

For chat requests, `input_text` is built by combining all messages except the last assistant message. Each message is prefixed with its role in brackets and joined with double newlines:

```
[system]: You are a helpful assistant that answers questions about our product.

[user]: What is the refund policy?
```

In multi-turn conversations, prior assistant responses are included in `input_text` alongside system prompts, user messages, and other roles (e.g. `tool` results). Only the final assistant message is excluded — it is indexed separately as the output.

The role prefixes are part of the indexed text, so a search for `"[system]"` would match requests that have a system prompt.

## Output

How the LLM output is indexed depends on the output type:

### JSON Output

When the model returns a valid JSON **object** (e.g. `{"key": "value"}`), PromptLayer flattens it into searchable key-value pairs. All keys become available in `output_keys`, and all values become filterable through the `output` field.

JSON arrays (e.g. `[1, 2, 3]`) and other JSON primitives are treated as plain text — only JSON objects are flattened.

```json theme={null}
// Model returns:
{"action": "send_email", "recipient": "user@example.com", "priority": "high"}

// You can filter by:
{"field": "output", "operator": "key_equals", "value": "send_email", "nested_key": "action"}

// Or check which keys exist:
{"field": "output_keys", "operator": "contains", "value": "priority"}
```

### Tool Call Output

When the model makes tool calls, the entire tool call structure is wrapped in a `{"tool_calls": [...]}` object and then flattened using dot-notation. Array indices are stripped, so if the model calls multiple tools, their fields are grouped together under the same keys.

For example, a tool call like:

```json theme={null}
{
  "id": "call_123",
  "type": "function",
  "function": {
    "name": "search_database",
    "arguments": { "query": "active users", "limit": 10 }
  }
}
```

Gets flattened into these searchable keys:

* `tool_calls.id`
* `tool_calls.type`
* `tool_calls.function.name`
* `tool_calls.function.arguments.query`
* `tool_calls.function.arguments.limit`

**Multiple tool calls:** When the model calls multiple tools, values from all calls are grouped together under the same key. For example, if the model calls both `search_database` and `send_email`, the key `tool_calls.function.name` will contain `["search_database", "send_email"]`. Filtering on that key will match if *any* of the values match — so `key_equals` with `"search_database"` will find requests that called `search_database`, even if other tools were also called.

Tool names are also extracted into the `tool_names` array for easy filtering — this is typically the simplest way to filter by tool.

```json theme={null}
// Filter requests that called a specific tool:
{"field": "tool_names", "operator": "contains", "value": "search_database"}

// Filter by tool call arguments:
{"field": "output", "operator": "key_contains", "value": "active users", "nested_key": "tool_calls.function.arguments.query"}

// Find any request that used tool calling:
{"field": "is_tool_call", "operator": "is_true"}
```

### Plain Text Output

When the model returns plain text (not JSON, no tool calls), the `output` and `output_keys` fields will be empty — there are no structured keys to flatten. The raw text is still stored in `output_text` and searchable via `q`.

```json theme={null}
// Use free-text search for plain text output:
{"q": "refund policy"}

// Or check output type:
{"field": "is_plain_text", "operator": "is_true"}
```

### Free-Text Search Across All Output Types

The `output_text` field is always populated regardless of output type, so the `q` parameter works across all requests:

* **Plain text**: `output_text` contains the raw output
* **JSON**: `output_text` contains a text representation of each flattened key-value pair (e.g. `"status: approved\nscore: 0.95"`)
* **Tool calls**: `output_text` contains any assistant text content combined with the flattened tool call values

This means `q` searches across all output types — you don't need to know the output format to find requests by content.

## Metadata

Metadata key-value pairs are **always fully indexed**. Every key and value you provide becomes searchable.

```json theme={null}
// All metadata is searchable:
{"field": "metadata", "operator": "key_equals", "value": "customer_123", "nested_key": "user_id"}

// Check if a metadata key exists:
{"field": "metadata_keys", "operator": "contains", "value": "session_id"}
```

Nested metadata is also supported. If you attach `{"user": {"id": "abc", "role": "admin"}}`, you can filter on `user.id` and `user.role`.

## Input Variables

<Warning>
  **Input variables are only indexed if they are referenced in the prompt template.** If you pass input variables that are not used in your template (e.g. as `{variable_name}` or `{{ variable_name }}`), they will **not** appear in search results.

  Requests logged without an associated prompt template (no `prompt_id`) will have **no** input variables indexed at all.

  If you need to filter by values that aren't part of the prompt template, attach them as [metadata](/features/prompt-history/metadata) instead — metadata is always fully indexed.
</Warning>

For example, if your prompt template uses `{question}` and `{context}`, but you also pass `user_id` as an input variable:

```python theme={null}
response = pl_client.run(
    prompt_name="qa-bot",
    input_variables={
        "question": "What is PromptLayer?",   # ✓ Referenced in template — indexed
        "context": "PromptLayer is a...",      # ✓ Referenced in template — indexed
        "user_id": "customer_123"              # ✗ NOT in template — not indexed
    },
    metadata={"user_id": "customer_123"}       # ✓ Always indexed
)
```

In this case, only `question` and `context` are searchable as input variables. To make `user_id` searchable, pass it as metadata.

## Exact Match vs. Text Search

When filtering nested fields (metadata, output, input\_variables), the matching behavior depends on the operator and the nature of the stored value:

* **`key_equals`** and **`key_not_equals`** perform exact matching. These work best with short, discrete values like IDs, status codes, numbers, and enum-like strings.

* **`key_contains`** performs partial text matching. This is better suited for longer text values, sentences, or when you only know part of the value.

<Tip>
  As a rule of thumb: use `key_equals` for structured data (IDs, numbers, short strings) and `key_contains` for natural language text. If `key_equals` isn't returning expected results for a longer text value, try `key_contains`.
</Tip>

## Filter Operators

Request-log filters use an operator that matches the field type:

| Field Type | Common Fields                                                               | Operators                                                                                  |
| ---------- | --------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------ |
| String     | `engine`, `provider_type`, `status`                                         | `is`, `is_not`, `in`, `not_in`                                                             |
| Text       | `input_text`, `output_text`                                                 | `contains`, `not_contains`, `starts_with`, `ends_with`                                     |
| Numeric    | `cost`, `latency_ms`, `input_tokens`, `output_tokens`                       | `eq`, `neq`, `gt`, `gte`, `lt`, `lte`, `between`, `is_null`, `is_not_null`                 |
| Datetime   | `request_start_time`, `request_end_time`                                    | `is`, `before`, `after`, `between`                                                         |
| Boolean    | `is_json`, `is_tool_call`, `is_plain_text`                                  | `is_true`, `is_false`                                                                      |
| Array      | `tags`, `metadata_keys`, `tool_names`, `output_keys`, `input_variable_keys` | `contains`, `not_contains`, `in`, `not_in`, `is_empty`, `is_not_empty`                     |
| Nested     | `metadata`, `output`, `input_variables`                                     | `key_equals`, `key_not_equals`, `key_contains`, `in`, `not_in`, `is_empty`, `is_not_empty` |

Nested fields require `nested_key` to identify the flattened key to inspect.

```json theme={null}
{
  "filter_group": {
    "logic": "AND",
    "filters": [
      {
        "field": "metadata",
        "operator": "key_equals",
        "nested_key": "user_id",
        "value": "customer_123"
      }
    ]
  }
}
```

## Quick Reference

| Field                                     | What's Indexed                 | When                                                 |
| ----------------------------------------- | ------------------------------ | ---------------------------------------------------- |
| `output` / `output_keys`                  | Flattened JSON keys and values | JSON or tool call output only                        |
| `output_text`                             | Raw output text                | Always (searchable via `q`)                          |
| `metadata` / `metadata_keys`              | All key-value pairs            | Always                                               |
| `input_variables` / `input_variable_keys` | Flattened key-value pairs      | Only for variables referenced in the prompt template |
| `tags`                                    | Tag names                      | Always                                               |
| `tool_names`                              | Tool/function names            | Tool call output only                                |

## Related

* [Search Request Logs API](/reference/search-request-logs) - API reference for filtering
* [Metadata](/features/prompt-history/metadata) - Attaching metadata to requests
* [Advanced Search](/why-promptlayer/advanced-search) - Using search in the dashboard


# Sharing Requests
Source: https://docs.promptlayer.com/features/prompt-history/sharing-prompts


Often you may find yourself collaborating on prompts with other stakeholders. PromptLayer allows you to share prompts that were logged on our system easily.

To do this, navigate to the dashboard and find the prompt you want to share:

<img alt="Share Prompt" />

In the top right-hand corner, select the share button and click on the tab to make your prompt public:

<img />

*Copy that link, and you are good to go!*

Here is a link to the shared prompt from this tutorial: [https://promptlayer.com/share/89cb2cbf2e8b42a341bcd1da5443f65d](https://promptlayer.com/share/89cb2cbf2e8b42a341bcd1da5443f65d)

***

Want to say hi 👋 , submit a feature request, or report a bug? [✉️ Contact us](mailto:hello@magniv.io)


# Logging Structured Outputs
Source: https://docs.promptlayer.com/features/prompt-history/structured-output-logging


## Overview

When logging requests that use structured outputs (JSON schemas), you need to include the schema configuration in the `parameters` field of your `/log-request` call. This allows PromptLayer to properly track and display your structured output configurations alongside your request history.

## When to Use This

Use structured output logging when you're:

* Making API calls with JSON schema validation (OpenAI, Anthropic, Google, etc.)
* Tracking requests that enforce specific response formats
* Analyzing how different schema configurations affect model outputs
* Building applications that require reliable, parseable JSON responses

## Basic Structure

The structured output configuration goes in the `parameters` field of your log request, using the `response_format` key with `json_schema` configuration:

```json theme={null}
{
  "provider": "openai",
  "model": "gpt-4",
  "input": {...},
  "output": {...},
  "parameters": {
    "temperature": 0.7,
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "YourSchemaName",
        "description": "Description of what this schema represents",
        "schema": {
          "type": "object",
          "properties": {
            // Your JSON schema properties
          },
          "required": ["field1", "field2"],
          "additionalProperties": false
        },
        "strict": true
      }
    }
  }
}
```

## Complete Examples

### OpenAI with Structured Outputs

```python theme={null}
import promptlayer
from datetime import datetime

promptlayer.api_key = "your_api_key"

# Make your OpenAI call with structured outputs
response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Generate a recipe for chocolate cake"}],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "Recipe",
            "description": "A structured recipe format",
            "schema": {
                "type": "object",
                "properties": {
                    "recipe_name": {"type": "string"},
                    "ingredients": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "name": {"type": "string"},
                                "amount": {"type": "string"}
                            },
                            "required": ["name", "amount"]
                        }
                    },
                    "instructions": {
                        "type": "array",
                        "items": {"type": "string"}
                    }
                },
                "required": ["recipe_name", "ingredients", "instructions"],
                "additionalProperties": False
            },
            "strict": True
        }
    }
)

# Log the request to PromptLayer
promptlayer.log_request(
    provider="openai",
    model="gpt-4",
    input={
        "type": "chat",
        "messages": [{"role": "user", "content": [{"type": "text", "text": "Generate a recipe for chocolate cake"}]}]
    },
    output={
        "type": "chat",
        "messages": [{"role": "assistant", "content": [{"type": "text", "text": response.choices[0].message.content}]}]
    },
    request_start_time=datetime.now(),
    request_end_time=datetime.now(),
    parameters={
        "response_format": {
            "type": "json_schema",
            "json_schema": {
                "name": "Recipe",
                "description": "A structured recipe format",
                "schema": {
                    "type": "object",
                    "properties": {
                        "recipe_name": {"type": "string"},
                        "ingredients": {
                            "type": "array",
                            "items": {
                                "type": "object",
                                "properties": {
                                    "name": {"type": "string"},
                                    "amount": {"type": "string"}
                                },
                                "required": ["name", "amount"]
                            }
                        },
                        "instructions": {
                            "type": "array",
                            "items": {"type": "string"}
                        }
                    },
                    "required": ["recipe_name", "ingredients", "instructions"],
                    "additionalProperties": False
                },
                "strict": True
            }
        }
    },
    tags=["structured-output", "recipe-generation"],
    api_type="chat-completions"
)
```

### Google Gemini with Structured Outputs

```python theme={null}
import promptlayer
from datetime import datetime

promptlayer.api_key = "your_api_key"

# Log a Google Gemini request with structured outputs
promptlayer.log_request(
    provider="google",
    model="gemini-3.1-pro-preview",
    input={
        "type": "chat",
        "messages": [{"role": "user", "content": [{"type": "text", "text": "Create a product listing"}]}]
    },
    output={
        "type": "chat",
        "messages": [{"role": "assistant", "content": [{"type": "text", "text": "..."}]}]
    },
    request_start_time=datetime.now(),
    request_end_time=datetime.now(),
    parameters={
        "response_format": {
            "type": "json_schema",
            "json_schema": {
                "name": "ProductListing",
                "description": "Product information structure",
                "schema": {
                    "type": "object",
                    "properties": {
                        "product_name": {"type": "string"},
                        "price": {"type": "number"},
                        "description": {"type": "string"},
                        "features": {
                            "type": "array",
                            "items": {"type": "string"}
                        }
                    },
                    "required": ["product_name", "price"],
                    "additionalProperties": False
                },
                "strict": False
            }
        },
        "temperature": 0,
        "maxOutputTokens": 256
    },
    tags=["google", "structured-output"]
)
```

### JavaScript/TypeScript Example

```javascript theme={null}
import Promptlayer from 'promptlayer';

const promptlayer = new Promptlayer({ apiKey: 'your_api_key' });

// Make your API call with structured outputs
const response = await openai.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: 'Extract contact information' }],
  response_format: {
    type: 'json_schema',
    json_schema: {
      name: 'ContactInfo',
      description: 'Structured contact information',
      schema: {
        type: 'object',
        properties: {
          name: { type: 'string' },
          email: { type: 'string' },
          phone: { type: 'string' }
        },
        required: ['name', 'email'],
        additionalProperties: false
      },
      strict: true
    }
  }
});

// Log to PromptLayer
await promptlayer.logRequest({
  provider: 'openai',
  model: 'gpt-4',
  input: {
    type: 'chat',
    messages: [{ role: 'user', content: [{ type: 'text', text: 'Extract contact information' }] }]
  },
  output: {
    type: 'chat',
    messages: [{ role: 'assistant', content: [{ type: 'text', text: response.choices[0].message.content }] }]
  },
  request_start_time: new Date().toISOString(),
  request_end_time: new Date().toISOString(),
  parameters: {
    response_format: {
      type: 'json_schema',
      json_schema: {
        name: 'ContactInfo',
        description: 'Structured contact information',
        schema: {
          type: 'object',
          properties: {
            name: { type: 'string' },
            email: { type: 'string' },
            phone: { type: 'string' }
          },
          required: ['name', 'email'],
          additionalProperties: false
        },
        strict: true
      }
    }
  },
  tags: ['structured-output', 'contact-extraction']
});
```

## REST API Example

If you're calling the `/log-request` endpoint directly via REST API:

```bash theme={null}
curl -X POST https://api.promptlayer.com/log-request \
  -H "Content-Type: application/json" \
  -H "X-API-KEY: your_api_key" \
  -d '{
    "provider": "openai",
    "model": "gpt-4",
    "input": {
      "type": "chat",
      "messages": [{"role": "user", "content": [{"type": "text", "text": "Generate a user profile"}]}]
    },
    "output": {
      "type": "chat",
      "messages": [{"role": "assistant", "content": [{"type": "text", "text": "{\"name\":\"John\",\"age\":30}"}]}]
    },
    "request_start_time": "2024-01-15T10:00:00Z",
    "request_end_time": "2024-01-15T10:00:02Z",
    "parameters": {
      "temperature": 0.7,
      "response_format": {
        "type": "json_schema",
        "json_schema": {
          "name": "UserProfile",
          "description": "User profile information",
          "schema": {
            "type": "object",
            "properties": {
              "name": {"type": "string"},
              "age": {"type": "number"},
              "email": {"type": "string"}
            },
            "required": ["name", "age"],
            "additionalProperties": false
          },
          "strict": true
        }
      }
    },
    "tags": ["user-profile", "structured-output"]
  }'
```

## Schema Configuration Details

### Key Fields

* **`name`** (required): A descriptive name for your schema (e.g., "Recipe", "ContactInfo")
* **`description`** (optional): Explains what the schema represents
* **`schema`** (required): The actual JSON schema definition following JSON Schema specification
* **`strict`** (optional): When `true`, enforces strict validation (supported by some providers like OpenAI)

### Schema Best Practices

1. **Use clear property names**: Make your schema self-documenting
2. **Specify required fields**: Use the `required` array to mark mandatory properties
3. **Set `additionalProperties: false`**: Prevents unexpected fields in the response
4. **Use descriptive types**: Leverage JSON Schema's type system (string, number, array, object, boolean)
5. **Add validation constraints**: Use `minLength`, `maxLength`, `minimum`, `maximum`, etc. when appropriate

## Tracking Schema Variations

Use tags and metadata to track different schema versions:

```python theme={null}
promptlayer.log_request(
    # ... other parameters ...
    parameters={
        "response_format": {
            "type": "json_schema",
            "json_schema": {
                "name": "Recipe_v2",  # Version in name
                # ... schema definition ...
            }
        }
    },
    tags=["structured-output", "recipe-v2", "production"],
    metadata={
        "schema_version": "2.0",
        "environment": "production"
    }
)
```

## Common Issues and Solutions

### Issue: Schema not showing in PromptLayer dashboard

**Solution**: Ensure the `response_format` is nested correctly within the `parameters` field, not at the top level of your log request.

### Issue: Provider rejects the schema

**Solution**: Different providers have different schema support levels:

* **OpenAI**: Full support with `strict: true` mode
* **Anthropic**: Supports basic JSON mode
* **Google**: Supports schemas with some limitations

Check your provider's documentation for specific schema requirements.

### Issue: Response doesn't match schema

**Solution**:

1. Verify your schema is valid JSON Schema
2. Test with `strict: true` if your provider supports it
3. Check that your prompt clearly instructs the model about the expected format
4. Review logged requests in PromptLayer to debug schema mismatches

## See Also

* [Structured Outputs in Prompt Registry](/features/prompt-registry/structured-outputs) - Creating prompts with structured outputs
* [Custom Logging Guide](/features/prompt-history/custom-logging) - General guide to logging requests
* [Log Request API Reference](/reference/log-request) - Full API specification
* [Metadata Documentation](/features/prompt-history/metadata) - Using metadata for tracking
* [Tagging Requests](/features/prompt-history/tagging-requests) - Organizing requests with tags


# Tags
Source: https://docs.promptlayer.com/features/prompt-history/tagging-requests


While using PromptLayer, over time the number of logs will grow, making it difficult to find what you are looking for. Tags are a great way to help keep things organized.

Tags can be used for whatever you want, but the top 2 ways are to:

1. Keep track of which application you are working on
2. If you are chaining prompts together, where you are in the pipeline

For example, if you are working on an email application that has three chained stages, it would be a good idea to tag all the requests in this application with `email` and the corresponding stage `stage-x`

To add tags using the `run()` method:

<CodeGroup>
  ```python Python SDK theme={null}
  from promptlayer import PromptLayer
  pl_client = PromptLayer()

  response = pl_client.run(
    prompt_name="my-prompt",
    input_variables={"name": "world"},
    tags=["pipeline3", "world"]  # 🍰 PromptLayer tags
  )
  ```

  ```js JavaScript theme={null}
  import { PromptLayer } from "promptlayer";
  const plClient = new PromptLayer();

  const response = await plClient.run({
    promptName: "my-prompt",
    inputVariables: { name: "world" },
    tags: ["pipeline3", "world"],  // 🍰 PromptLayer tags
  });
  ```
</CodeGroup>

You can also pass tags when using `log_request` or the REST API endpoint `/log-request` ([read more](/reference/log-request)).

This will then show up on your PromptLayer dashboard:

<img alt="" />

And can be filtered by clicking on the tags button by the search-bar:

## <img alt="" />

Please note that tags are optimized for categorization based on a small number of predefined options. For request enrichments with n > 1000 options, please use [metadata](/features/prompt-history/metadata) instead.


# Tracking Templates
Source: https://docs.promptlayer.com/features/prompt-history/tracking-templates


PromptLayer allows you to track prompt template usage, latency, cost, and more. This is done by associating a request with a prompt template as shown below.

[Endpoint Reference](/reference/track-prompt)

To associate requests with a prompt from the prompt registry, run the code

<CodeGroup>
  ```python Python theme={null}
  promptlayer_client.track.prompt(
    request_id=pl_request_id, 
    prompt_name="example-2",
    prompt_input_variables=input_variables,
    version=2
  )
  ```

  ```js JavaScript theme={null}
  promptLayerClient.track.prompt({
    request_id: pl_request_id, 
    prompt_name: "example-2",
    prompt_input_variables: input_variables,
    version: 2
  })
  ```

  ```bash REST theme={null}
  curl --request POST \
    --url https://api.promptlayer.com/rest/track-prompt \
    --header 'Content-Type: application/json' \
    --header 'X-API-KEY: pl_<YOUR API KEY>' \
    --data '{
      "request_id": "<REQUEST ID>",
      "prompt_name": "<PROMPT TEMPLATE NAME>",
      "prompt_input_variables": <PROMPT TEMPLATE INPUT VARIABLES>,
      "version": <PROMPT TEMPLATE VERSION NUMBER>
    }'
  ```
</CodeGroup>

Where `prompt_name` is a prompt in your prompt registry, `prompt_input_variables` is a dictionary corresponding to the input variables you formatted the prompt with, and `version` is the version of the prompt you are trying to track. `version` is optional, by default it will track the newest version of the prompt.

This information will appear on your dashboard under your request and prompt template pages.

<img alt="score" />

You can also use prompt [template release labels](/features/prompt-registry#release-labels) instead of a version number.

<CodeGroup>
  ```python Python theme={null}
  promptlayer_client.track.prompt(
    request_id=pl_request_id, 
    prompt_name="example-2",
    prompt_input_variables=input_variables,
    label="prod"
  )
  ```

  ```js JavaScript theme={null}
  promptLayerClient.track.prompt({
    request_id: pl_request_id, 
    prompt_name: "example-2",
    prompt_input_variables: input_variables,
    label: "prod"
  })
  ```

  ```bash REST theme={null}
  curl --request POST \
    --url https://api.promptlayer.com/rest/track-prompt \
    --header 'Content-Type: application/json' \
    --header 'X-API-KEY: pl_<YOUR API KEY>' \
    --data '{
      "request_id": "<REQUEST ID>",
      "prompt_name": "<PROMPT TEMPLATE NAME>",
      "prompt_input_variables": <PROMPT TEMPLATE INPUT VARIABLES>,
      "label": "<YOUR PROMPT VERSION LABEL>"
    }'
  ```
</CodeGroup>


# Dynamic Release Labels
Source: https://docs.promptlayer.com/features/prompt-registry/dynamic-release-labels


Dynamic Release Labels allow you to overload release labels and dynamically route traffic to different prompt versions based on percentages or user segments. 🏷️

For an overview of the benefits and key use cases, check out our [A/B Releases](/why-promptlayer/ab-releases) page.

## Overview

Normally, a release label (e.g., "prod") points to a specific prompt version. Dynamic Release Labels let you overload this mapping and split traffic between multiple versions.

This is powered by the A/B Releases feature. When you create an A/B Release, it dynamically routes requests for the specified release label based on your configuration.

<img alt="Dynamic Release Labels Diagram" />

## Usage

<video>
  <source type="video/mp4" />

  Your browser does not support the video tag.
</video>

1. Navigate to the A/B Releases Registry in the PromptLayer UI.
2. Create a new A/B Release and select the release label to overload (e.g., "prod").
3. See the base prompt version and choose the version(s) to release.
4. Set the traffic percentages for each version.
   * The percentages must add up to 100%.
   * Example: 90% to version 3 (stable), 10% to version 4 (new).
5. (Optional) Add user segments and define which version each segment should receive.
   * Segments are defined using request metadata (e.g., user ID, company).
   * Example: Internal employees receive version 4 (dev) 50% of the time.
6. Save the A/B Release. It will now dynamically route traffic for the specified release label.

**Important**: When [logging requests](/features/prompt-history/tracking-templates), make sure to log the specific version returned, not just the release label. The label will always point to the original version in your logs.

To stop dynamically routing traffic, simply delete the A/B Release. The release label will revert to its base mapping.

***

Dynamic Release Labels give you fine-grained control over prompt version routing. Use them to safely test updates, roll out new versions, and segment users. 🎯


# Input Variable Sets
Source: https://docs.promptlayer.com/features/prompt-registry/input-variable-sets


Input Variable Sets allow you to save and reuse collections of input variables across prompts and workflows. Instead of re-entering the same variable values repeatedly, you can create named sets and apply them wherever needed.

## What are Input Variable Sets?

An Input Variable Set is a named collection of key-value pairs that can be stored in your workspace and reused across different contexts. Think of them as saved inputs for your prompt templates.

For example, if you frequently test prompts with the same customer data or use cases, you can save those variable values as a set and quickly apply them to any prompt template.

## Creating Input Variable Sets

You can create Input Variable Sets from several locations in PromptLayer:

### From the Prompt Editor

1. Open any prompt template with [template variables](/features/prompt-registry/template-variables)
2. Fill in your input variables with the values you want to save
3. Click the **Save** button in the input variables section
4. Name your variable set and choose a folder location
5. Click **Save** to create the set

<img alt="Saving input variables from prompt editor" />

### From the Registry

You can also create Input Variable Sets directly from the Registry:

1. Navigate to the **Registry** section
2. Click **New** → **Input Variable Set**
3. Enter a name and configure your variables
4. Save to your chosen folder

<img alt="Creating input variable set from registry" />

## Using Input Variable Sets

Once created, you can load saved variable sets in the prompt editor by clicking the **Load** button in the input variables section. Variable sets also work with workflow executions, allowing you to apply consistent inputs for test runs and evaluations.

<img alt="Loading a saved input variable set" />

## Importing and Exporting

You can import variables from various sources:

* **From Datasets**: Import variables from evaluation datasets
* **From Request Logs**: Use actual request data as variable sets
* **From Files**: Upload JSON or CSV files with variable data

Export options allow you to:

* Save sets to files for backup
* Share between workspaces
* Version control your test data

## Integration with Template Variables

Input Variable Sets work seamlessly with both [f-string and Jinja2 template formats](/features/prompt-registry/template-variables):

* Sets automatically match variable names in your templates
* Missing variables are highlighted
* Extra variables in sets are safely ignored

For more information on how to structure your template variables, see the [Template Variables documentation](/features/prompt-registry/template-variables).

## Related Documentation

* [Template Variables](/features/prompt-registry/template-variables) - Learn how to create dynamic prompts with variables
* [Datasets](/features/evaluations/datasets-overview) - Use datasets for comprehensive evaluation
* [Playground](/why-promptlayer/playground) - Test prompts with different variable configurations


# Overview
Source: https://docs.promptlayer.com/features/prompt-registry/new-overview

Use Prompt Registry as the system of record for prompt versions, testing, releases, and observability.

Prompt Registry is PromptLayer's system of record for Prompt Templates. It gives your team one place to create templates, organize them outside the codebase, test changes in Playground, release the right version with labels, and monitor how each prompt performs in production.

Use Prompt Registry when you want a faster workflow for prompt iteration without losing version history, deployment control, or visibility into results.

A **Prompt Template** is the central artifact in PromptLayer. It is a reusable, versioned prompt definition that can include your messages, template variables, model settings, and other runtime configuration. Prompt Registry is where your team stores, versions, releases, and monitors those prompt templates over time.

## What you can do

* Organize prompts with folders, tags, and workspace search.
* Save new versions with commit messages and review version history over time.
* Release prompts with labels such as `prod` or `staging`, and protect important labels with approval flows.
* Open any prompt in Playground to test, edit, and save changes back as a new version.
* Review prompt-level logs, analytics, evaluations, and related assets from the prompt page.
* Share prompts for review and collaboration without sending code diffs around.

## How it fits together

1. Create a Prompt Template in the Registry and define its messages, variables, and model settings.
2. Test the prompt in Playground with real inputs before publishing changes.
3. Assign a release label to the version your application should use at runtime.
4. Monitor logs, analytics, and evaluations to decide what to improve next.

<CodeGroup>
  ```python Python theme={null}
  from promptlayer import PromptLayer

  pl_client = PromptLayer(api_key="YOUR_API_KEY")

  response = pl_client.run(
      prompt_name="support-assistant",
      prompt_release_label="prod",
      input_variables={
          "customer_name": "Jordan",
          "issue": "My subscription was charged twice.",
      },
  )
  ```

  ```js JavaScript theme={null}
  import { PromptLayer } from "promptlayer";

  const promptLayerClient = new PromptLayer({ apiKey: "YOUR_API_KEY" });

  const response = await promptLayerClient.run({
    promptName: "support-assistant",
    promptReleaseLabel: "prod",
    inputVariables: {
      customer_name: "Jordan",
      issue: "My subscription was charged twice.",
    },
  });
  ```
</CodeGroup>

## Next steps

<CardGroup>
  <Card title="Editor and Versioning" icon="pen-line" href="/features/prompt-registry/prompt-editor-versioning">
    Create prompts, test in Playground, use the AI prompt writer, and review version diffs.
  </Card>

  <Card title="Template Variables" icon="pen-ruler" href="/features/prompt-registry/template-variables">
    Learn how to add dynamic variables with f-strings or Jinja2.
  </Card>

  <Card title="Release Labels" icon="tag" href="/features/prompt-registry/release-labels">
    Deploy the right prompt version in production without code changes.
  </Card>

  <Card title="Playground" icon="circle-play" href="/why-promptlayer/playground">
    Test prompts, replay requests, and save changes back to the registry.
  </Card>

  <Card title="Tables" icon="table" href="/features/tables/overview">
    Measure prompt quality with datasets, prompt outputs, scoring, versions, and request-history backtests in one place.
  </Card>
</CardGroup>


# Placeholder Messages
Source: https://docs.promptlayer.com/features/prompt-registry/placeholder-messages


Placeholder Messages are a powerful feature that allows you to inject messages into a prompt template. By using the `placeholder` message role, you can define placeholders within your prompt template that can be replaced with full messages at runtime. This complements our standard [template variables](/features/prompt-registry/template-variables) feature, which allows you to insert simple values into your prompts.

This is useful to inject conversation context.

## Creating Placeholder Messages

You can create Placeholder Messages either through the PromptLayer dashboard or programmatically using the `templates.publish` method.

In the dashboard, simply create a new message with the role `placeholder` and provide the desired placeholder content.

<img alt="Placeholder Message" />

## Running a Template with Placeholders

Programmatically, you can include Placeholder Messages when publishing a prompt template:

```python theme={null}
promptlayer_client.run(
    prompt_name="template-name",
    input_variables={
        "fill_in_message": [
            {
                "role": "user",
                "content": [{"type": "text", "text": "My age is 29"}],
            },
            {
                "role": "assistant",
                "content": [{"type": "text", "text": "What a wonderful age!"}],
            }
        ]
    },
)
```

Passed in messages must conform to our [prompt blueprint](/running-requests/prompt-blueprints) format.


# Editor and Versioning
Source: https://docs.promptlayer.com/features/prompt-registry/prompt-editor-versioning

Create prompts, edit messages, test changes, and review version history in the Prompt Registry.

Prompt templates are the core artifact in Prompt Registry. Each template stores the messages, input variables, model settings, and version history your application uses at runtime.

Use the prompt editor when you want to update prompt behavior without changing application code.

## Create a prompt

From the PromptLayer dashboard, click **New** -> **Prompt**.

<Frame>
  <img alt="Creating a new prompt" />
</Frame>

A prompt usually starts with a **System** message and a **User** message.

<Accordion title="System, user, and assistant messages">
  **System messages** define the model's behavior, tone, output format, and guardrails.

  **User messages** contain the request that changes each time the prompt runs. This is where input variables usually appear.

  **Assistant messages** can show example responses, which helps the model follow an expected format.
</Accordion>

Input variables such as `{{customer_name}}` or `{{question}}` are placeholders that your application fills in at runtime. Learn more in [Template Variables](/features/prompt-registry/template-variables).

<Frame>
  <img alt="Input variables in prompt" />
</Frame>

## Test in Playground

Use Playground to test a prompt before saving or releasing it. Define values for each input variable, run the prompt, and inspect the generated response.

<Frame>
  <img alt="Running a prompt in the playground" />
</Frame>

When the result looks right, save the template.

## Edit prompts with AI

Click the magic wand icon to open the AI prompt writer. It can rewrite or improve prompts based on your instructions, such as adding a new output requirement or tightening formatting rules.

<Frame>
  <img alt="AI prompt writer" />
</Frame>

## Save versions

Each save creates a new version. Before saving, PromptLayer shows a diff of exactly what changed so you can review additions and deletions.

Add a commit message that explains why the prompt changed, then save the new version.

<Frame>
  <img alt="Saving with diff view" />
</Frame>

Your version history appears beside the editor. Select any previous version to inspect it.

<Frame>
  <img alt="Version history" />
</Frame>

Hover over a version and click **View Diff** to compare versions side by side.

<Frame>
  <img alt="Comparing versions with diff" />
</Frame>

## Next steps

* [Release Labels](/features/prompt-registry/release-labels) - Choose which prompt version production code should use
* [Dynamic Release Labels](/features/prompt-registry/dynamic-release-labels) - Split traffic across versions
* [Playground](/why-promptlayer/playground) - Test and replay prompt requests
* [Evaluations](/features/evaluations/overview) - Score prompt changes before release


# Release Labels
Source: https://docs.promptlayer.com/features/prompt-registry/release-labels


Release Labels provide a powerful way to organize and manage versions of your prompt templates in the Prompt Registry. They allow you to:

* Easily deploy different versions of a prompt template
* Safely test new versions with a subset of users before a full rollout
* Gradually release updates to minimize risk
* Segment users to receive specific versions (e.g., beta users, internal employees)

## Adding Release Labels

You can add Release Labels to a specific version of a prompt template either through the UI.

1. In the Prompt Registry, hover over a prompt template version
2. Click "Add Release Label"
3. Enter a unique label name (e.g. "prod", "staging", "v2")

<img alt="Release Label" />

## Retrieving Prompts by Release Label

To retrieve a specific version of a prompt template, pass the `label` parameter when getting the template:

<CodeGroup>
  ```python Python SDK theme={null}
  promptlayer_client.templates.get("my_template", { "label": "prod" })
  ```

  ```js JavaScript theme={null}
  const template = await promptLayerClient.templates.get("my_template", { label: "prod" });
  ```
</CodeGroup>

This will return the version of the prompt template associated with the "prod" release label. This also works with `promptlayer_client.run()`.

## Using Release Labels for A/B Testing

Release Labels, combined with [Dynamic Release Labels](/features/prompt-registry/dynamic-release-labels), enable powerful A/B testing capabilities:

1. Create multiple versions of a prompt template
2. Set up a Dynamic Release Label to overload an existing release label (e.g., "prod")
3. Configure the Dynamic Release Label to split traffic between different versions based on percentages or user segments
4. Programmatically retrieve the prompt using the original release label, which will now dynamically route to the appropriate version
5. Analyze performance of each version using PromptLayer's analytics and Evaluations

This approach allows you to test prompt changes on a subset of users, compare performance, and gradually roll out updates while maintaining a single, consistent Release Label for your application to use.

## Best Practices

* Use descriptive Release Labels that indicate the version's stage or intended audience (e.g., "prod", "staging", "beta\_users")
* Remove unused Release Labels to keep your prompt template organized
* Use Dynamic Release Labels for more advanced traffic splitting and segmentation

With Release Labels, you can confidently manage prompt template versions and roll out updates without code changes. Combine them with PromptLayer's [versioning](/features/prompt-registry/new-overview), [analytics](/why-promptlayer/analytics), and [evaluations](/features/evaluations/overview) for a powerful prompt engineering workflow.


# Snippets
Source: https://docs.promptlayer.com/features/prompt-registry/snippets


Snippets allow you to modularize and reuse pieces of your prompt templates, much like using building blocks to create a larger structure. This feature enables you to compose complex templates by referencing other prompt templates within a parent template.

<div>
  <iframe />
</div>

## How Snippets Work

### Adding Snippets via the Dashboard

To add a new snippet while writing a template in the Dashboard:

1. Tap the `/` symbol.
2. Select `Snippets` from the dropdown menu.
3. A dialog box will appear, guiding you through the insertion of your snippet.

<img alt="Inserting a snippet" />

*Inserting a snippet from the Dashboard*

### Creating Snippets Programmatically

A snippet is a reference to a prompt template that can be inserted into another template. You can reference templates in three ways:

1. **By Template Name**: `@@@template_name@@@` - This will automatically use the latest version of the referenced template.
2. **By Template Version Number**: `@@@template_name@version_number:{number}@@@` - This points to a specific version of the template.
3. **By Template Label**: `@@@template_name@label:{label_name}@@@` - This uses the version of the template that has the specified label.

### Restrictions

It's important to note that only completion templates can be used as snippets.

## Rendering Snippets

When you run a parent template that contains snippets, the system 'renders' the template, replacing the snippet references with their actual content. This transpilation occurs whether you run the template from the Playground, use the Evaluation functionality, or retrieve it through the SDKs, providing you with a fully realized version of your prompt.

<img alt="Rendering snippets" />

*Rendering snippets from the Playground*

### Webhooks

When you update a snippet, there will be a webhook of type `prompt_template_updated` for all of the templates that import the snippet.

## Visualizing Snippets in Templates

When editing a prompt that contains snippets, you'll see the snippet references as strings in the format described above, depending on how you've chosen to reference them.

When you open a prompt template from the Registry, you'll see it as a clickable pill that will take you to the referenced version.

<img alt="Clicking a snippet" />

*Navigating to a snippet*

## Why snippets?

By leveraging snippets, you can create a more maintainable and scalable prompt management system, allowing for greater flexibility and efficiency in your prompt template creation process.


# Structured Outputs
Source: https://docs.promptlayer.com/features/prompt-registry/structured-outputs


Structured outputs ensure LLM responses follow specific formats, making them easier to use in your applications. For more advanced structured data requirements, you may also want to check out our [Tool Calling documentation](/features/prompt-registry/tool-calling).

## What are Structured Outputs?

Structured outputs define formats LLMs must follow when generating responses:

* Consistent response formats
* Easier parsing and validation
* More reliable integration with your applications
* Less error handling

Examples include customer records, product information, and analytical results.

## Creating Structured Outputs with JSON Schema

<video>
  <source type="video/mp4" />
</video>

To add a JSON schema to your prompt template:

1. Edit your prompt template
2. Click "Functions & Output"
3. Select "Structured Output"
4. Click "Add Schema"
5. Define your schema structure

### Example: Customer Review Analysis Schema

```json theme={null}
{
  "type": "object",
  "properties": {
    "sentiment": {
      "type": "string",
      "enum": ["positive", "neutral", "negative"],
      "description": "The overall sentiment of the review"
    },
    "topics": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "List of topics mentioned in the review"
    }
  },
  "required": ["sentiment", "topics"]
}
```

## Schema Configuration Options

### Strict Mode

Strict mode enforces more rigorous schema validation. When enabled, the LLM output must exactly match the schema specification with no additional fields or deviations.

In the schema editor:

1. Toggle "Strict Mode" in the schema settings
2. Ensure your schema is complete and accurate
3. Test with sample inputs

### Additional Properties

Control whether objects can have properties not defined in your schema:

* **Disabled** (default): Only properties you define are allowed
* **Enabled**: Objects can include additional properties beyond those specified

This applies to the root object and all nested objects in your schema.

## String Validation

Add constraints to string fields to ensure proper formatting. The schema editor provides an intuitive interface for configuring these constraints:

<img alt="String Constraints Editor" />

### Length Constraints

Control the minimum and maximum length of strings using the **Min Length** and **Max Length** fields. For example, you might require a username to be between 3-20 characters, or limit a bio field to 500 characters maximum.

### Pattern Matching

Use regex patterns to validate string formats. Enter your regular expression in the **Pattern** field to enforce specific formats like alphanumeric usernames with underscores (`^[a-zA-Z0-9_]+$`) or valid phone numbers (`^\+?[1-9]\d{1,14}$`).

### Format Validation

Specify standard formats for automatic validation using the **Select format...** dropdown. This provides built-in validation for common data types like email addresses, URLs, dates, and UUIDs without requiring custom regex patterns.

**Supported formats:**

* `date-time` - ISO 8601 datetime (e.g., "2024-01-15T10:30:00Z")
* `date` - Full date (e.g., "2024-01-15")
* `time` - Time (e.g., "10:30:00")
* `duration` - ISO 8601 duration
* `email` - Email address
* `idn-email` - Internationalized email address
* `hostname` - Valid hostname
* `idn-hostname` - Internationalized hostname
* `ipv4` - IPv4 address
* `ipv6` - IPv6 address
* `uri` - URI/URL
* `uri-reference` - URI reference
* `iri` - Internationalized URI
* `iri-reference` - Internationalized URI reference
* `uuid` - UUID format
* `uri-template` - URI template
* `json-pointer` - JSON pointer
* `relative-json-pointer` - Relative JSON pointer
* `regex` - Regular expression

## Number Validation

Add constraints to number fields to ensure proper validation. The schema editor provides an intuitive interface for configuring these constraints:

<img alt="Number Constraints Editor" />

### Range Constraints

Control the minimum and maximum values using the **Minimum** and **Maximum** fields. For example, you might require an age to be between 0-150, or limit a rating to 1-5 stars.

### Exclusive Boundaries

Use **Exclusive Minimum** and **Exclusive Maximum** checkboxes when you need strict inequalities. For instance, a price field might need to be greater than 0 (not equal to 0), requiring an exclusive minimum.

### Multiple Of

Specify that numbers must be multiples of a particular value using the **Multiple Of** field. This is useful for ensuring prices are rounded to cents (0.01), requiring even numbers (2), or enforcing other step values.

**Example use cases:**

* Age between 0-150 (inclusive)
* Price greater than 0, rounded to cents (exclusive minimum: 0, multipleOf: 0.01)
* Rating from 1-5 stars (inclusive range)

## Array Validation

Control array size and structure.

<img alt="Array Constraints Editor" />

### Size Constraints

Control the minimum and maximum number of items in an array using the **Minimum Items** and **Maximum Items** fields. For example, you might require at least 1 tag but no more than 10, or limit search results to a maximum of 5 items.

**Example use cases:**

* Tags array with 1-10 items
* Top 5 search results (maximum only)
* At least 3 required reviewers (minimum only)

## Nested Objects and Required Fields

Using **Composition** mode you can define complex structures with nested objects and specify which fields are required at each level:

<img alt="Composition Mode Editor" />

In the interactive editor:

1. Create your parent object
2. Add nested properties
3. Toggle "Required" for each field at its level
4. Nest objects as deeply as needed using the "Add Field" button

## Reusable Schema Definitions with \$defs

For complex schemas with repeated structures, use `$defs` to define reusable components:

<img alt="Schema Definitions Editor" />

### Using \$defs in the Interactive Editor

1. Click "Add Definition" at the bottom of the schema editor
2. Name your definition (e.g., "Person", "Address")
3. Define its structure like any other object
4. Reference it using the **Select \$ref** dropdown:

<img alt="Select $ref dropdown" />

**Benefits:**

* Avoid duplicating complex structures
* Maintain consistency across your schema
* Easier to update common patterns

## Using Variables in Structured Outputs

You can make your schemas dynamic by using template variables:

<Note>
  Variables in structured outputs only work with Jinja2 format with the Jinja2 option enabled. F-string format isn't supported.
</Note>

### Interactive Mode

When using the interactive schema editor, you can add variables in two ways:

1. **For enum values**: Click the enum field and toggle the switch to "Use Variable"

<video>
  <source type="video/mp4" />

  Your browser does not support the video tag.
</video>

2. **For text/string values**: Type `{{ variable_name }}` directly in any text field

<video>
  <source type="video/mp4" />

  Your browser does not support the video tag.
</video>

### JSON Mode

Variables must be in quotes, except for enum variables:

```json theme={null}
{
  "type": "object",
  "properties": {
    "sentiment": {
      "type": "string",
      "enum": {name: "sentiment_options", type: "enum_variable"},
      "description": "The sentiment of the {{ content_type }}"
    },
    "topics": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "List of topics mentioned in the {{ document_type }}"
    }
  }
}
```

When running the prompt, provide your variables:

```python theme={null}
response = pl.run(
   prompt_name="content_analyzer",
   input_variables={
       "text": "I really enjoyed the new restaurant downtown. The food was amazing and the service was excellent.",
       "sentiment_options": ["positive", "neutral", "negative"],
       "document_type": "review",
       "content_type": "customer feedback"
   }
)
```

### Variable Use Cases

**Dynamic validation constraints:**

```json theme={null}
{
  "type": "object",
  "properties": {
    "username": {
      "type": "string",
      "minLength": "{{ min_username_length }}",
      "maxLength": "{{ max_username_length }}",
      "description": "Username with configurable length"
    }
  }
}
```

**Context-dependent descriptions:**

```json theme={null}
{
  "type": "object",
  "properties": {
    "analysis": {
      "type": "string",
      "description": "Analysis of {{ document_type }} for {{ use_case }}"
    }
  }
}
```

For more information on template variables, see our [Template Variables documentation](/features/prompt-registry/template-variables).

## Dynamic Schema Injection with Variable Type

For advanced use cases, you can inject entire schema sections dynamically at runtime using the `variable` type. This allows you to define different schemas based on runtime conditions without creating multiple prompt templates.

### How It Works

1. In the schema editor, select "variable" as the type for a field
2. Specify a variable name (e.g., `userSchema`)
3. At runtime, pass the complete schema object for that variable

### Example: Dynamic User Schema

In your prompt template schema:

```json theme={null}
{
  "type": "object",
  "properties": {
    "user_data": {
      "type": "variable",
      "name": "userSchema"
    },
    "metadata": {
      "type": "object",
      "properties": {
        "timestamp": { "type": "string" },
        "version": { "type": "string" }
      }
    }
  }
}
```

At runtime, provide the complete schema:

```python theme={null}
response = pl.run(
   prompt_name="dynamic_processor",
   input_variables={
       "userSchema": {
           "type": "object",
           "properties": {
               "name": { "type": "string" },
               "age": { "type": "number" },
               "preferences": {
                   "type": "array",
                   "items": { "type": "string" }
               }
           },
           "required": ["name", "age"]
       }
   }
)
```

The `user_data` field will be replaced entirely with the schema you provide, allowing different structures for different use cases.

### Use Cases for Dynamic Schemas

**Multi-tenant applications:**

```python theme={null}
# Different schema per customer
customer_schemas = {
    "enterprise": {
        "type": "object",
        "properties": {
            "company_name": { "type": "string" },
            "department": { "type": "string" },
            "employee_count": { "type": "number" }
        }
    },
    "individual": {
        "type": "object",
        "properties": {
            "name": { "type": "string" },
            "age": { "type": "number" }
        }
    }
}

response = pl.run(
    prompt_name="user_analyzer",
    input_variables={
        "userSchema": customer_schemas[customer_type]
    }
)
```

**Dynamic form processing:**

```python theme={null}
# Schema based on form configuration
form_schema = build_schema_from_config(form_config)

response = pl.run(
    prompt_name="form_processor",
    input_variables={
        "formSchema": form_schema
    }
)
```

### Important Notes

* Variable schemas only work with Jinja2 template format
* The entire field is replaced with the provided schema object
* Ensure the injected schema is valid JSON schema syntax
* The injected schema inherits the `additionalProperties` setting from the parent

## Best Practices

* **For enum values:** Use `{name: "variable_name", type: "enum_variable"}` in JSON mode or the variable selector in interactive mode
* **For text variables:** Include them within quotes as `{{ variable_name }}` in both modes
* **For dynamic schemas:** Use `type: "variable"` with a descriptive variable name
* Only Jinja2 format works for variables in structured outputs
* Ensure all variables used in the schema are provided
* Use proper JSON formatting with variables


# Template Variables
Source: https://docs.promptlayer.com/features/prompt-registry/template-variables


Creating flexible, dynamic prompts is essential to getting the most out of LLMs. PromptLayer's template system allows you to build reusable prompts where values can be inserted at runtime. This guide explains the two formatting options available in our platform: `f-string` and `jinja2`.

<Info>
  Looking to save and reuse variable values? Check out [Input Variable Sets](/features/prompt-registry/input-variable-sets) to learn how to create named collections of variables that can be applied across prompts, playground, and workflows.
</Info>

## What are input variables?

Input variables make your prompts dynamic by creating placeholders that get replaced with actual values when the prompt is used. This allows you to:

* Personalize prompts with user information
* Insert specific context or data
* Create reusable prompt templates for different scenarios
* Conditionally include or exclude sections based on available data

PromptLayer supports two popular templating systems, each with their own strengths:

1. **F-strings**: Simple, straightforward variable replacement
2. **Jinja2**: Advanced templating with conditional logic, loops, and filters

## F-string Variables

F-strings (formatted string literals) offer a straightforward way to insert variables into your prompts using simple curly braces.

### Basic Syntax

```
Tell me about {topic} in a way that {audience} would understand.
```

### How It Works

When you use this template, you provide values for each variable:

<CodeGroup>
  ```python Python theme={null}
  input_variables = {
      "topic": "quantum computing",
      "audience": "high school students"
  }

  template = promptlayer_client.templates.get("educational_prompt", {
      "input_variables": input_variables
  })
  ```

  ```js JavaScript theme={null}
  const input_variables = {
      topic: "quantum computing",
      audience: "high school students"
  };

  const template = await promptLayerClient.templates.get("educational_prompt", {
      input_variables
  });
  ```
</CodeGroup>

The resulting prompt would be:

```
Tell me about quantum computing in a way that high school students would understand.
```

### F-string Best Practices

* Use descriptive variable names that clearly indicate their purpose
* Keep variable names simple and avoid special characters
* Make sure all variables in your template have corresponding values at runtime
* Use f-strings when you need simple variable substitution without complex logic

## Jinja2 Templates

Jinja2 is a more powerful templating engine that extends beyond basic variable replacement. It's ideal for complex prompt structures that require conditionals, loops, or data transformations.

PromptLayer supports the full Jinja2 spec. Read more about best practices for using Jinja2 [here](https://blog.promptlayer.com/prompt-templates-with-jinja2-2/).

### When to Use Jinja2

Consider using Jinja2 instead of f-strings when:

* You need conditional sections in your prompts
* You're working with lists or JSON data that needs to be formatted
* Your prompt requires loops to handle multiple items
* You want to transform variables (uppercase, lowercase, etc.)
* You're using nested data structures

<Warning>
  If your templates contain JSON, always use Jinja2 instead of f-strings, as the curly braces in JSON can conflict with f-string syntax.
</Warning>

### Basic Variable Replacement

Jinja2 uses double curly braces for variable insertion:

```
Hello, {{ user_name }}! Let's discuss {{ topic }} today.
```

### Conditional Logic

Include or exclude sections based on whether variables exist or meet certain conditions:

```
Let's analyze this text:
{{ text }}

{% if key_points %}
Focus on these key points:
{% for point in key_points %}
- {{ point }}
{% endfor %}
{% else %}
Provide a general summary.
{% endif %}
```

### Loops for Lists and Collections

Iterate through lists of items to include multiple elements:

```
Please analyze the following products:
{% for product in products %}
- {{ product.name }}: priced at ${{ product.price }}, category: {{ product.category }}
{% endfor %}
```

### Working with JSON Data

Jinja2 excels at handling structured JSON inputs:

```
{% if customer.history %}
Based on your purchase history:
{% for purchase in customer.history %}
- {{ purchase.item }} (purchased on {{ purchase.date }})
{% endfor %}

Here are our recommendations:
{% for item in recommendations %}
- {{ item.name }}: {{ item.description }}
{% endfor %}
{% else %}
Welcome new customer! Here are our popular items:
{% for item in popular_items %}
- {{ item.name }}: {{ item.description }}
{% endfor %}
{% endif %}
```

### Text Transformation with Filters

Apply transformations to your variables using filters:

```
Original query: {{ query }}
Searching for: {{ query | lower }}
Categories: {{ categories | join(", ") }}
```

### Advanced Jinja2 Techniques

#### Default Values

Provide fallbacks for optional variables:

```
Hello, {{ user.name | default("valued customer") }}!
```

#### Macro Definitions

Create reusable template components:

```
{% macro format_product(item) %}
- {{ item.name }} (${{ item.price }}): {{ item.description }}
{% endmacro %}

Our featured products:
{% for product in featured %}
{{ format_product(product) }}
{% endfor %}
```

#### Working with Conditionals

Make templates adaptable to different inputs:

```
{% if user.experience == "beginner" %}
Let me explain {{ topic }} in simple terms...
{% elif user.experience == "intermediate" %}
As you're familiar with the basics of {{ topic }}...
{% else %}
Given your advanced understanding of {{ topic }}...
{% endif %}
```

By leveraging these formatting options, you can create versatile prompt templates that adapt to different scenarios, making your PromptLayer workflows more flexible and powerful.

## Setting Template Format

When creating a template in PromptLayer, select the appropriate format based on your needs:

<img />

## Input Variable Examples

### F-string Example

<CodeGroup>
  ```python Python theme={null}
  input_variables = {
      "product_name": "Smart Home Hub",
      "customer_segment": "tech enthusiasts",
      "pain_points": "complex setup process, compatibility issues"
  }
  ```

  ```js JavaScript theme={null}
  const input_variables = {
      product_name: "Smart Home Hub",
      customer_segment: "tech enthusiasts",
      pain_points: "complex setup process, compatibility issues"
  };
  ```
</CodeGroup>

### Jinja2 Example with Structured Data

<CodeGroup>
  ```python Python theme={null}
  input_variables = {
      "user": {
          "name": "Alex",
          "role": "Marketing Manager",
          "experience_level": "intermediate"
      },
      "topics": ["SEO optimization", "content strategy", "social media"],
      "show_advanced": True
  }
  ```

  ```js JavaScript theme={null}
  const input_variables = {
      user: {
          name: "Alex",
          role: "Marketing Manager",
          experience_level: "intermediate"
      },
      topics: ["SEO optimization", "content strategy", "social media"],
      show_advanced: true
  };
  ```
</CodeGroup>

## Tool Variables

Beyond text and media variables, PromptLayer also supports **tool variables** — placeholders in your tools list that get replaced with actual tool definitions at runtime. This allows you to dynamically control which tools are available to the model based on context like user permissions, tenant configuration, or feature flags.

For details on setting up and using tool variables, see [Tool Calling - Tool Variables](/features/prompt-registry/tool-calling#tool-variables).

## Structured Outputs

Template variables can also be used within structured output schemas to create dynamic validation rules and response formats. For more information, see our [Structured Outputs documentation](/features/prompt-registry/structured-outputs).


# Tool Calling
Source: https://docs.promptlayer.com/features/prompt-registry/tool-calling


The Prompt Registry supports all major tool calling formats, including [OpenAI tools](https://platform.openai.com/docs/guides/function-calling), OpenAI functions, [Anthropic tools](https://docs.anthropic.com/en/docs/build-with-claude/tool-use), and Gemini tools.

You can create tool schemas interactively, and your prompt template will seemlessly work on any LLM. Tool calling in PromptLayer is model-agnostic.

<Tip>
  Learn more about when you should use tools [on our blog](https://blog.promptlayer.com/tool-calling-with-llms-how-and-when-to-use-it-d65493a87954).
</Tip>

## What is Tool Calling?

Tool calling (previously known as function calling) is a powerful feature that allows Language Models (LLMs) to return structured data and invoke predefined functions with JSON arguments. This capability enables more complex interactions and structured outputs from LLMs.

Key benefits of tool calling include:

* Structured Outputs: Tool arguments are always in JSON format, enforced by JSONSchema at the API level. See our [Structured Outputs](/features/prompt-registry/structured-outputs) documentation for more details.
* Efficient Communication: Tool calling is a concept built into the model, reducing token usage and improving understanding.
* Model Routing: Facilitates setting up modular prompts with specific responsibilities.
* Prompt Injection Protection: Strict schema definitions at the model level make it harder to "jailbreak" the model.

## Creating Custom Tools

### Creating Visually

Tools can be defined, called, and set up visually through the Prompt Registry.

<video>
  <source type="video/mp4" />
</video>

### Publishing Programmatically

To publish a prompt template with tools programmatically, you can add the arguments `tools` and `tool_choice` to your prompt\_template object. This is similar to how you would publish a regular prompt template.

## Tool Variables

Tool variables allow you to dynamically inject tools at runtime through input variables, rather than defining them statically in your prompt template. This is useful when:

* Different customers or tenants need different sets of tools
* Your available tools change based on runtime context (e.g., user permissions, feature flags)
* You want to manage tool definitions outside of your prompt template

### Adding a Tool Variable

1. Open the **Tool & Output Editor** in the Prompt Registry
2. Click the dropdown arrow on the **Add Tool** button and select **Tool Variable**
3. Enter a variable name (e.g., `dynamic_tools`) and click **Add**
4. The variable will appear in your Input Variable Sets alongside other template variables

Tool variables can coexist with static tool definitions. For example, you might have a fixed `get_weather` tool alongside a `customer_tools` variable that injects customer-specific tools at runtime.

### Passing Tools at Runtime

When running a prompt with tool variables, pass the tool definitions as an array in your `input_variables`:

<CodeGroup>
  ```python Python theme={null}
  response = pl.run(
      prompt_name="my-agent",
      input_variables={
          "dynamic_tools": [
              {
                  "name": "get_knowledge",
                  "description": "Search the knowledge base",
                  "parameters": {
                      "type": "object",
                      "properties": {
                          "query": {"type": "string", "description": "Search query"}
                      },
                      "required": ["query"]
                  }
              },
              {
                  "name": "create_ticket",
                  "description": "Create a support ticket",
                  "parameters": {
                      "type": "object",
                      "properties": {
                          "title": {"type": "string"},
                          "priority": {"type": "string", "enum": ["low", "medium", "high"]}
                      },
                      "required": ["title"]
                  }
              }
          ]
      }
  )
  ```

  ```js JavaScript theme={null}
  const response = await pl.run({
      promptName: "my-agent",
      inputVariables: {
          dynamic_tools: [
              {
                  name: "get_knowledge",
                  description: "Search the knowledge base",
                  parameters: {
                      type: "object",
                      properties: {
                          query: { type: "string", description: "Search query" }
                      },
                      required: ["query"]
                  }
              },
              {
                  name: "create_ticket",
                  description: "Create a support ticket",
                  parameters: {
                      type: "object",
                      properties: {
                          title: { type: "string" },
                          priority: { type: "string", enum: ["low", "medium", "high"] }
                      },
                      required: ["title"]
                  }
              }
          ]
      }
  });
  ```
</CodeGroup>

Each tool definition should include `name`, `description`, and `parameters` (using JSON Schema format). Anthropic-style `input_schema` is also accepted as an alternative to `parameters`.

<Note>
  Tool variables are expanded before the prompt is sent to the LLM provider. The model sees them as regular tool definitions, so they work with any provider that supports tool calling.
</Note>

## Built-in Tools

PromptLayer supports provider-native built-in tools across multiple LLM providers. These pre-built tools enable your prompts to access real-time information, execute code, search through files, and more—all without writing custom function definitions.

Built-in tools are available for the following providers:

* [OpenAI / Azure OpenAI (Responses API)](#openai-and-azure-openai-responses-api)
* [Anthropic](#anthropic)
* [Google (Gemini)](#google-gemini)
* [Vertex AI](#vertex-ai)

### How to Add Built-in Tools

1. **Open your prompt in the Prompt Registry** and navigate to the prompt editor
2. **Select your LLM provider** in the provider settings at the bottom of the editor
3. **Open the Function & Output Schema Editor** by clicking the **Functions & Output** button
4. **Click the Built-in tools button** (on the right side) to browse available tools for your selected provider
5. **Click Add Tool** for the tool you want to use — it will appear in your function definitions list
6. **Configure tool\_choice** (optional) — set to **auto** to let the model decide when to use the tool
7. **Save and run your prompt** — the model will use the built-in tools when appropriate

<Note>
  For OpenAI and Azure OpenAI, built-in tools require the **Responses API**. Switch from **Chat Completions API** to **Responses API** in the API dropdown before adding built-in tools.
</Note>

***

### OpenAI and Azure OpenAI (Responses API)

OpenAI's [Responses API](https://openai.com/index/new-tools-for-building-agents/) includes powerful pre-built tools that work seamlessly with PromptLayer. These tools are available for both the **OpenAI** and **Azure OpenAI** providers.

#### Available Tools

| Tool                 | Description                                                                      |
| -------------------- | -------------------------------------------------------------------------------- |
| **Web Search**       | Get fast, up-to-date answers with citations from the web                         |
| **File Search**      | Search through uploaded files and documents using Vector Stores                  |
| **Code Interpreter** | Write and execute Python code in a secure, sandboxed environment                 |
| **Image Generation** | Generate or edit images using a text prompt                                      |
| **MCP**              | Connect to remote MCP servers or OpenAI-maintained connectors for external tools |
| **Shell**            | Execute shell commands in a managed environment                                  |
| **Apply Patch**      | Propose structured diffs to create, update, or delete files                      |

#### Using File Search with Vector Stores

OpenAI's File Search tool enables semantic search over your documents using Vector Stores. This powerful feature allows your prompts to automatically retrieve relevant information from uploaded files during inference, making it perfect for building RAG (Retrieval-Augmented Generation) systems, knowledge bases, and documentation assistants.

##### Setting Up File Search

For the **File Search** tool, you'll need to create and attach Vector Stores containing your documents:

1. **Enable File Search** by following the steps above to add it as a built-in tool

2. **Create and configure a Vector Store**:
   * Click **Manage Vector Stores** in the File Search configuration
   * Click **Create** to make a new vector store with a custom name
   * Upload files via drag-and-drop or file selection (single or multiple files)
   * View storage usage, file counts, and manage attached files

3. **Attach Vector Stores to your prompt**:
   * Select one or more vector stores using checkboxes
   * Click **Save Selection** to attach them
   * The vector store IDs are added to your tool configuration

4. **Run your prompt**:
   * The LLM will automatically search vector stores when relevant
   * Retrieved context is used to generate informed responses
   * Sources can be traced back to specific documents

#### Using Code Interpreter

OpenAI's Code Interpreter tool enables your prompts to write and execute Python code within a secure, sandboxed environment. This powerful feature allows for dynamic problem-solving, data analysis, visualization generation, and file processing—all without writing custom function definitions.

To enable Code Interpreter, follow the steps above to add it as a built-in tool. The tool uses container type `"auto"` by default, which automatically manages the execution environment. The LLM will automatically use Code Interpreter when it needs to perform calculations, analyze data, create visualizations, or process files.

For detailed information about Code Interpreter's capabilities, file handling, and configuration options, see the [OpenAI Code Interpreter Guide](https://platform.openai.com/docs/guides/tools-code-interpreter).

### Using Image Generation

OpenAI's Image Generation tool enables the model to generate or edit images using a text prompt directly within a conversation. When the model determines that an image should be created, it invokes the `image_generation` tool with an optimized prompt and returns the generated image in the response.

To enable Image Generation, follow the steps above to add it as a built-in tool. Once enabled, the model will automatically generate images when the conversation calls for it.

Generated images appear inline in the response with:

* A collapsible **revised prompt** showing the optimized text the model used for generation
* **Generation parameters** such as size, quality, background, and output format
* The **generated image** displayed in a rich card format

The model can generate multiple images in a single response — consecutive image generation calls are grouped together for clean display. The model may also include descriptive text alongside the generated images.

For a comprehensive guide to image generation across all providers (including the dedicated Images API and Gemini native image generation), see the [Image Generation](/features/image-generation) documentation.

### Learn More

* [Image Generation Guide](/features/image-generation)
* [OpenAI Web Search Guide](https://platform.openai.com/docs/guides/tools-web-search)
* [OpenAI File Search Guide](https://platform.openai.com/docs/guides/tools-file-search)
* [OpenAI Code Interpreter Guide](https://platform.openai.com/docs/guides/tools-code-interpreter)
* [OpenAI Image Generation Guide](https://platform.openai.com/docs/guides/image-generation)
* [OpenAI Responses API Announcement](https://openai.com/index/new-tools-for-building-agents/)
* [Deep Research API Cookbook](https://cookbook.openai.com/examples/deep_research_api/introduction_to_deep_research_api)

***

### Anthropic

Anthropic provides native built-in tools for Claude models that enable code execution, web search, and system-level interactions directly within conversations.

#### Available Tools

| Tool               | Description                                                                                                                                                 |
| ------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Web Search**     | Search the web for real-time information. Claude uses this automatically when current information would help answer a question. Results include citations.  |
| **Bash**           | Execute bash commands within a sandboxed environment. Useful for automation workflows and system interactions.                                              |
| **Code Execution** | Execute code in a sandboxed environment with access to bash and a text editor. Ideal for data analysis, computation, and dynamic problem-solving.           |
| **Text Editor**    | View, create, and edit text files with commands like `view`, `str_replace`, `create`, and `insert`. Enables Claude to work with files during conversations. |

<Tip>
  Anthropic built-in tools are available for all Claude models that support tool use. They work with both the direct Anthropic provider and Claude models on Vertex AI.
</Tip>

#### Learn More

* [Anthropic Tool Use Documentation](https://docs.anthropic.com/en/docs/build-with-claude/tool-use)
* [Anthropic Computer Use](https://docs.anthropic.com/en/docs/agents-and-tools/computer-use)

***

### Google (Gemini)

Google provides native built-in tools for Gemini models that enable web grounding, location data, code execution, URL analysis, and file search capabilities.

#### Available Tools

| Tool               | Description                                                                                                                                           |
| ------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Google Search**  | Ground model responses with real-time web search results using Google Search. Provides up-to-date information with source citations.                  |
| **Google Maps**    | Ground responses with Google Maps place data including reviews, addresses, and business hours.                                                        |
| **Code Execution** | Execute Python code in a sandboxed environment for data analysis, calculations, and dynamic computations.                                             |
| **URL Context**    | Retrieve and analyze content from URLs provided in the prompt. Allows the model to process web page content during conversations.                     |
| **File Search**    | Search through uploaded files using semantic retrieval. Configure file search stores to index documents for automatic retrieval during conversations. |

#### Using File Search with Google

Google's File Search tool uses file search stores for document indexing and retrieval:

1. **Add the File Search tool** from the built-in tools menu
2. **Configure a file search store** — specify the store name(s) that contain your indexed documents
3. **Run your prompt** — Gemini will automatically search through indexed documents when relevant context is needed

#### Learn More

* [Google Gemini API Tools Documentation](https://ai.google.dev/gemini-api/docs/function-calling)
* [Grounding with Google Search](https://ai.google.dev/gemini-api/docs/grounding)

***

### Vertex AI

Vertex AI supports built-in tools from both Google and Anthropic, depending on the model family you are using. PromptLayer automatically shows the correct set of tools based on your selected model.

#### For Gemini Models on Vertex AI

When using Gemini models (e.g., `gemini-3.1-pro-preview`), the following Google-native tools are available:

| Tool               | Description                                                |
| ------------------ | ---------------------------------------------------------- |
| **Web Search**     | Ground responses with real-time Google Search results      |
| **Google Maps**    | Access Google Maps place data for location-based grounding |
| **Code Execution** | Execute Python code in a sandboxed environment             |
| **URL Context**    | Retrieve and analyze content from URLs in the prompt       |

#### For Claude Models on Vertex AI

When using Claude models (e.g., `claude-sonnet-4-20250514`) through Vertex AI, the following Anthropic-native tools are available:

| Tool            | Description                                                |
| --------------- | ---------------------------------------------------------- |
| **Web Search**  | Search the web for real-time information with citations    |
| **Bash**        | Execute bash commands in a sandboxed environment           |
| **Text Editor** | View, create, and edit text files with structured commands |

<Tip>
  PromptLayer automatically detects the model family (Gemini vs Claude) and displays the appropriate set of built-in tools in the editor. You don't need to manually configure which tool set to use.
</Tip>

#### Learn More

* [Vertex AI Gemini Documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/overview)
* [Claude on Vertex AI](https://docs.anthropic.com/en/api/claude-on-vertex-ai)


# Events
Source: https://docs.promptlayer.com/features/prompt-registry/webhook-events


Webhook events include a common payload envelope and an event-specific `details` object.

### Event Payload Format

When an event occurs, we send a POST request with a payload in this structure:

```json theme={null}
{
  "event_type": "string",
  "details": "object",
  "user_id": "number",
  "user_name": "string or null",
  "user_email": "string or null",
  "workspace_id": "number",
  "timestamp": "ISO 8601 format timestamp",
}
```

### Supported Event Types

We notify you for these events:

* [`prompt_template_version_created`](#prompt_template_version_created)
* [`prompt_template_name_changed`](#prompt_template_name_changed)
* [`prompt_template_deleted`](#prompt_template_deleted)
* [`prompt_template_label_created`](#prompt_template_label_created)
* [`prompt_template_label_deleted`](#prompt_template_label_deleted)
* [`prompt_template_label_moved`](#prompt_template_label_moved)
* [`prompt_template_label_change_requested`](#prompt_template_label_change_requested)
* [`prompt_template_label_change_approved`](#prompt_template_label_change_approved)
* [`prompt_template_label_change_denied`](#prompt_template_label_change_denied)
* [`prompt_template_updated`](#prompt_template_updated)
* [`agent_run_finished`](#agent_run_finished)
* [`report_finished`](#report_finished)
* [`dataset_version_created_by_file`](#dataset_version_created_by_file)
* [`dataset_version_created_by_file_failed`](#dataset_version_created_by_file_failed)
* [`dataset_version_created_from_filter_params`](#dataset_version_created_from_filter_params)
* [`table_sheet_created_from_file`](#table_sheet_created_from_file)
* [`table_sheet_created_from_file_failed`](#table_sheet_created_from_file_failed)
* [`table_sheet_created_from_request_history`](#table_sheet_created_from_request_history)
* [`table_sheet_created_from_request_history_failed`](#table_sheet_created_from_request_history_failed)
* [`skill_collection_files_changed`](#skill_collection_files_changed)

#### prompt\_template\_version\_created

When a new version of a prompt template is created.

```json theme={null}
{
  "details": {
    "prompt_template_name": "support-reply",
    "prompt_template_version_number": 2,
    "prompt_template_id": 123
  }
}
```

#### prompt\_template\_name\_changed

When a prompt template's name is changed.

```json theme={null}
{
  "details": {
    "prompt_template_id": 123,
    "prompt_template_name": "support-reply-v2",
    "old_prompt_template_name": "support-reply"
  }
}
```

#### prompt\_template\_deleted

When a prompt template is deleted.

```json theme={null}
{
  "details": {
    "prompt_template_id": 123,
    "prompt_template_name": "support-reply"
  }
}
```

#### prompt\_template\_label\_created

When a new release label for a prompt template is created.

```json theme={null}
{
  "details": {
    "prompt_template_id": 123,
    "prompt_template_name": "support-reply",
    "prompt_template_version_number": 2,
    "prompt_template_label": "production"
  }
}
```

#### prompt\_template\_label\_deleted

When a release label for a prompt template is deleted.

```json theme={null}
{
  "details": {
    "prompt_template_id": 123,
    "prompt_template_name": "support-reply",
    "prompt_template_version_number": 2,
    "prompt_template_label": "production"
  }
}
```

#### prompt\_template\_label\_moved

When a release label is moved between prompt template versions.

```json theme={null}
{
  "details": {
    "prompt_template_id": 123,
    "prompt_template_name": "support-reply",
    "prompt_template_version_number": 3,
    "old_prompt_template_version_number": 2,
    "prompt_template_label": "production"
  }
}
```

#### prompt\_template\_label\_change\_requested

When a change to a protected release label is requested and requires approval.

```json theme={null}
{
  "details": {
    "prompt_template_id": 123,
    "prompt_template_name": "support-reply",
    "prompt_template_label": "production",
    "change_type": "move"
  }
}
```

#### prompt\_template\_label\_change\_approved

When a pending change to a protected release label is approved.

```json theme={null}
{
  "details": {
    "prompt_template_id": 123,
    "prompt_template_name": "support-reply",
    "prompt_template_label": "production",
    "change_type": "move"
  }
}
```

#### prompt\_template\_label\_change\_denied

When a pending change to a protected release label is denied.

```json theme={null}
{
  "details": {
    "prompt_template_id": 123,
    "prompt_template_name": "support-reply",
    "prompt_template_label": "production",
    "change_type": "move"
  }
}
```

#### prompt\_template\_updated

When a snippet imported in a prompt template is updated.

```json theme={null}
{
  "details": {
    "prompt_template_id": 123,
    "prompt_template_name": "support-reply",
    "prompt_template_version_number": 2
  }
}
```

#### agent\_run\_finished

When an agent (workflow) run is completed.

Note: This event may fire multiple times for the same execution and is not triggered for runs from the dashboard, only when called via SDK or API.

```json theme={null}
{
  "details": {
    "agent_name": "Customer Support Agent",
    "agent_id": 456,
    "agent_execution_id": 789
  }
}
```

#### report\_finished

When a evaluation report is completed.

```json theme={null}
{
  "details": {
    "report_id": 234,
    "report_name": "Support Reply Evaluation"
  }
}
```

#### dataset\_version\_created\_by\_file

When a dataset version is successfully created from a file upload.

```json theme={null}
{
  "details": {
    "dataset_id": 345,
    "dataset_version_number": 4
  }
}
```

#### dataset\_version\_created\_by\_file\_failed

When file processing fails for a draft dataset.

```json theme={null}
{
  "details": {
    "dataset_id": 345,
    "error_message": "Failed to process dataset file"
  }
}
```

#### dataset\_version\_created\_from\_filter\_params

When a dataset version is created from filter parameters.

```json theme={null}
{
  "details": {
    "dataset_id": 345,
    "rows_added": 100,
    "dataset_version_number": 4
  }
}
```

#### table\_sheet\_created\_from\_file

When a Table sheet file import succeeds.

```json theme={null}
{
  "details": {
    "operation_id": "public-file-import-1",
    "table_id": "5c1f2e6a-1f8e-4e02-bc1e-2c48f9bdb2f4",
    "sheet_id": "73f7d7f8-7f5f-4f27-94e5-b070b620b931",
    "source": "file",
    "file_name": "people.csv",
    "rows_added": 100,
    "row_count": 100
  }
}
```

#### table\_sheet\_created\_from\_file\_failed

When a Table sheet file import fails.

```json theme={null}
{
  "details": {
    "operation_id": "public-file-import-1",
    "table_id": "5c1f2e6a-1f8e-4e02-bc1e-2c48f9bdb2f4",
    "sheet_id": "73f7d7f8-7f5f-4f27-94e5-b070b620b931",
    "source": "file",
    "file_name": "people.csv",
    "error_message": "Failed to process Table CSV import."
  }
}
```

#### table\_sheet\_created\_from\_request\_history

When a Table sheet request history import succeeds.

```json theme={null}
{
  "details": {
    "operation_id": "public-request-import-1",
    "table_id": "5c1f2e6a-1f8e-4e02-bc1e-2c48f9bdb2f4",
    "sheet_id": "73f7d7f8-7f5f-4f27-94e5-b070b620b931",
    "source": "request_logs",
    "rows_added": 100,
    "row_count": 100
  }
}
```

#### table\_sheet\_created\_from\_request\_history\_failed

When a Table sheet request history import fails.

```json theme={null}
{
  "details": {
    "operation_id": "public-request-import-1",
    "table_id": "5c1f2e6a-1f8e-4e02-bc1e-2c48f9bdb2f4",
    "sheet_id": "73f7d7f8-7f5f-4f27-94e5-b070b620b931",
    "source": "request_logs",
    "error_message": "No requests found for the given criteria"
  }
}
```

#### skill\_collection\_files\_changed

When a Skill Collection is created, a new version is saved, a collection is deleted, or a version is restored.

```json theme={null}
{
  "details": {
    "skill_collection_id": "support-skills",
    "source": "version_save",
    "affected_paths": [
      "skills/refund-policy.md",
      "skills/escalation.md"
    ],
    "version_number": 3
  }
}
```


# Introduction
Source: https://docs.promptlayer.com/features/prompt-registry/webhooks

Use webhooks to receive notifications about prompt, dataset, evaluation, workflow, and skill collection events in your workspace.

<a />

PromptLayer webhooks let your systems react to workspace changes so you can keep prompt caches fresh, trigger CI/CD or GitOps workflows, sync external systems, and monitor asynchronous jobs without polling.

For event names and payload details, see [Events](/features/prompt-registry/webhook-events).

### Configuring a Webhook

To set up a webhook, go to the **Webhook** section in the **Settings** page. Enter the URL of the endpoint you want to send the webhook to and click **Submit**.

<img alt="Creating Webhook" />

### Securing Your Webhook

When you create a webhook, you'll receive a webhook secret signature that looks like this:

<img alt="Webhook Secret Signature" />

This secret is used to verify that incoming webhook requests are authentic and come from PromptLayer. The signature is included in the `X-PromptLayer-Signature` header of each webhook request.

#### Verifying Webhook Signatures

Here are code examples showing how to verify the signatures:

<CodeGroup>
  ```python Python theme={null}

  import hmac
  import hashlib
  import json

  signature = "HEADER FROM X-PromptLayer-Signature" # Replace with actual header value
  secret_key = "SECRET KEY FROM PROMPTLAYER DASHBOARD" # Replace with actual secret key
  payload = {} # Replace with actual payload
  payload_str = json.dumps(payload, sort_keys=True)
  expected_signature = hmac.new(
      key=secret_key.encode(),
      msg=payload_str.encode('utf-8'),
      digestmod=hashlib.sha256
  ).hexdigest()

  if hmac.compare_digest(expected_signature, signature):
      print("Signature is valid")
  else:
      print("Signature is invalid")
  ```

  ```javascript JavaScript theme={null}
  import crypto from "crypto";
  import stringify from "json-stable-stringify";

  // Replace these with actual values
  const signature = "HEADER FROM X-PromptLayer-Signature"; // From request header
  const secretKey = "SECRET KEY FROM PROMPTLAYER DASHBOARD"; // Your webhook secret
  const payload = {}; // Replace with actual request body

  export function formatPayload(payload) {
    const raw = stringify(payload);
    const spacedColons = raw.replace(/"([^"]+)"\s*:/g, '"$1": ');
    const spaced = spacedColons.replace(/,(?=(?:\s*[\{"\[]))/g, ", ");
    return spaced;
  }

  export function generateSignature(payload, secretKey) {
    const normalized = formatPayload(payload);
    return crypto
      .createHmac("sha256", secretKey)
      .update(normalized)
      .digest("hex");
  }

  export function verifySignature(signature, payload, secretKey) {
    const expected = generateSignature(payload, secretKey);
    try {
      return crypto.timingSafeEqual(
        Buffer.from(signature, "hex"),
        Buffer.from(expected, "hex")
      );
    } catch {
      return false;
    }
  }
   
  const isValid = verifySignature(signature, payload, secretKey);
  console.log("Signature is", isValid ? "valid" : "invalid");
  ```
</CodeGroup>


# Zero Downtime Releases
Source: https://docs.promptlayer.com/features/prompt-registry/zero-downtime-releases


## Input Variable Handling

The `pl.run()` function handles input variables in the following ways:

1. **Normal Usage**:

Provide all required variables as defined in your prompt template:

```python theme={null}
response = pl.run(
   prompt_name="movie_recommender",
   prompt_release_label="prod",
   input_variables={
       "favorite_movie": "The Shawshank Redemption"
   },
)
```

2. **Missing Variables**:

If you don't provide the required input variables, you'll receive a warning in the console, but the prompt template will still run. The missing variables will be sent to the LLM as unprocessed strings:

```python theme={null}
response = pl.run(
   prompt_name="movie_recommender",
   prompt_release_label="prod",
   input_variables={},
)
```

```
WARNING: While getting your prompt template: Some input variables are missing: (`favorite_movie`)
Undefined variable in message index 1: 'favorite_movie' is undefined
```

3. **Extra Variables**:

If you include extra variables that aren't in the template, they will be ignored:

```python theme={null}
response = pl.run(
   prompt_name="movie_recommender",
   prompt_release_label="prod",
   input_variables={
       "favorite_movie": "The Shawshank Redemption",
       "release_year": 1994
   },
)
```

In this case, the `release_year` variable will be ignored in the LLM request if it's not part of the current template.

When you need to add new input variables to your prompt template, it's important to keep your source code in sync with the template changes. This guide outlines the process for deploying these updates to your production environment.

## Example Scenario

Assume you have a prompt template version tagged with `prod` that uses only one input variable, `favorite_movie`:

```python theme={null}
response = pl.run(
    prompt_name="movie_recommender",
    prompt_release_label="prod",
    input_variables={
        "favorite_movie": "The Shawshank Redemption"
    },
)
```

## Update Process

Follow these steps to safely add a new `mood` variable to your prompt template:

1. Create a new template version with the new `mood` variable

2. Apply a unique temporary label (e.g., `new-var`) to the new version

3. Update and deploy your code to use the new template version and include the new variable:

```python theme={null}
response = pl.run(
   prompt_name="movie_recommender",
   prompt_release_label="new-var",
   input_variables={
       "favorite_movie": "The Shawshank Redemption",
       "mood": "uplifting"
   },
)
```

4. In the PromptLayer UI, move the `prod` label to the most recent prompt version

5. Update your source code to reference the `prod` prompt version again and deploy:

```python theme={null}
response = pl.run(
   prompt_name="movie_recommender",
   prompt_release_label="prod",
   input_variables={
       "favorite_movie": "The Shawshank Redemption",
       "mood": "uplifting"
   },
)
```

6. Delete the temporary `new-var` label from the PromptLayer UI


# Editing with Wrangler AI
Source: https://docs.promptlayer.com/features/skill-collections/editing-with-wrangler

Use PromptLayer's in-dashboard assistant to create, edit, rename, and migrate Skill Collections without learning SKILL.md conventions yourself.

**Wrangler AI** is PromptLayer's in-dashboard assistant. Ask it to **create**, **edit**, **rename**, or **migrate** Skill Collections. Wrangler already follows PromptLayer's skill conventions, so you do not need to memorize `SKILL.md` frontmatter or folder layouts yourself.

## Where to find Wrangler

Open Wrangler from the **floating widget** in the bottom-right corner of the dashboard. Click the widget to expand the chat panel, then work from there when you need more room or want to keep Wrangler open while navigating the app.

## Mention a Skill Collection with `@`

In the Wrangler chat input, type **`@`** and choose a Skill Collection from the list. That attaches the collection as context so Wrangler can make precise edits or migrations against the right project.

<img alt="Mention a Skill Collection" />

## Things you can ask Wrangler

* "Create a `support-skills` collection with a triage skill for my team."
* "Add an `escalation` skill that handles urgent tickets."
* "Rename every skill in `@support-skills` to kebab-case."
* "Migrate `@support-skills` from Claude Code to Codex."
* "Move `@support-skills` to OpenClaw and populate the workspace files."

## How migrations work

Wrangler translates the file layout and metadata for your target provider while preserving instructions and examples, then saves the result as a **new version** so nothing is lost.

## Rolling back

If you need to undo a change, open **version history** for the collection, pick an earlier version, and restore it. See [Versioning and review](/features/skill-collections/tuning-skills#versioning-and-review) for the same workflow when editing by hand.


# Overview
Source: https://docs.promptlayer.com/features/skill-collections/overview

Versioned folders of agent instructions (SKILL.md files) you can share, remix, and pull into Claude Code, OpenAI/Codex, or OpenClaw.

A **Skill Collection** is a versioned folder of instructions that teach your coding agent how to behave. Each skill lives in its own subfolder with a `SKILL.md`, alongside provider-specific config (like `CLAUDE.md` or `AGENTS.md`). Edit in the PromptLayer dashboard, version with commit messages and release labels, share a public link, let others **Remix** a copy, and **pull** the collection into your local agent with the PromptLayer SDK.

<img alt="Skill Collection editor" />

## Supported layouts

Skill Collections support the following agent environments:

| Layout             | Core config     | Typical skill paths                                             |
| ------------------ | --------------- | --------------------------------------------------------------- |
| **Claude Code**    | `CLAUDE.md`     | `my-skill/SKILL.md`                                             |
| **OpenAI / Codex** | `AGENTS.md`     | `my-skill/SKILL.md`, `my-skill/agents/openai.yaml`              |
| **OpenClaw**       | `openclaw.json` | `workspace/skills/my-skill/SKILL.md` plus workspace files       |
| **Universal**      | Flexible        | Portable structure; you can switch to a specific provider later |

**Exports and zip pulls** include the on-disk prefix your tool expects (`.claude/`, `.agents/`, `.openclaw/`). **Public API calls** use paths relative to the collection root (e.g. `triage/SKILL.md`) — not the exported form like `.claude/skills/triage/SKILL.md`.

## What you can do

* **Version** every change with a commit message; optional **release labels** for staging vs production.
* **Share** a read-only link and optionally allow **Remix** so others copy the collection into their workspace.
* **Pull** via the Python or JavaScript SDK into `.claude`, `.agents`, `.openclaw`, or a generic `skills` folder.
* **Compare** versions and roll back from version history in the dashboard.

## Plan limits

These are the default capacity guidelines.

| Plan | Collections | Files per collection |
| ---- | ----------- | -------------------- |
| Free | 1           | 30                   |
| Pro  | 5           | 50                   |
| Team | Unlimited   | 100                  |

**Hard limit:** any single file must be **5 MiB** or smaller.

## Next steps

<CardGroup>
  <Card title="Editing with Wrangler AI" icon="message-bot" href="/features/skill-collections/editing-with-wrangler">
    Use the in-dashboard assistant to create, edit, rename, or migrate Skill
    Collections with natural language.
  </Card>

  <Card title="Share and remix collections" icon="share" href="/features/skill-collections/sharing-and-remixing">
    Publish a read-only link, let others remix into their own workspace, and
    show them how to pull the collection.
  </Card>

  <Card title="Pulling skills into your agent" icon="download" href="/features/skill-collections/pulling-skills">
    Sync a collection to disk with the Python or JavaScript SDK, or download via
    the public REST API.
  </Card>

  <Card title="Tune skills for your team" icon="sliders" href="/features/skill-collections/tuning-skills">
    Rename folders, edit frontmatter, and pin SDK pulls to release labels.
  </Card>
</CardGroup>


# Pulling skills into your agent
Source: https://docs.promptlayer.com/features/skill-collections/pulling-skills

Download a Skill Collection with the PromptLayer SDK or public REST API so Claude Code, Codex, Cursor, or OpenClaw can load it from disk.

After Wrangler (or you) finalizes a Skill Collection in the dashboard, your coding agent loads skills from **files on disk**. Use **`client.skills.pull()`** to fetch the collection, then write the files to your project.

<Info>
  You can generate these exact snippets for your own collection from the dashboard: open a Skill Collection and click **Pull via SDK**.
</Info>

## Prerequisites

Set your API key in the environment:

```bash theme={null}
export PROMPTLAYER_API_KEY="your_promptlayer_api_key"
```

## Pull as a folder of files

`skills.pull()` returns a response with a **`files`** map (path → text content). For provider-specific collections, those paths already include the provider prefix (`.claude/...`, `.agents/...`, or `.openclaw/...`), so the simplest thing to do is **write each file at its returned path from your project root**. Everything lands where your agent expects it.

<CodeGroup>
  ```python Python theme={null}
  # pip install -U promptlayer
  import os
  from pathlib import Path

  from promptlayer import PromptLayer

  client = PromptLayer(api_key=os.environ["PROMPTLAYER_API_KEY"])

  result = client.skills.pull("my-collection")
  if result is None:
      raise ValueError("No skill collection returned.")

  for relative_path, content in result["files"].items():
      output_path = Path(relative_path)
      output_path.parent.mkdir(parents=True, exist_ok=True)
      output_path.write_text(content, encoding="utf-8")
  ```

  ```js JavaScript theme={null}
  // npm install promptlayer
  import { PromptLayer } from "promptlayer";
  import { mkdir, writeFile } from "node:fs/promises";
  import path from "node:path";

  const client = new PromptLayer({ apiKey: process.env.PROMPTLAYER_API_KEY });

  const result = await client.skills.pull("my-collection");
  if (!result || result instanceof ArrayBuffer) {
    throw new Error("Expected a JSON skill collection response.");
  }

  for (const [relativePath, content] of Object.entries(result.files)) {
    await mkdir(path.dirname(relativePath), { recursive: true });
    await writeFile(relativePath, content, "utf8");
  }
  ```
</CodeGroup>

For a **Claude Code** collection, this produces `.claude/CLAUDE.md`, `.claude/skills/<name>/SKILL.md`, etc. For **Codex**, you get `.agents/AGENTS.md`, and for **OpenClaw** you get `.openclaw/...`. **Universal** collections have no prefix and land in your project root as-is.

### Writing to a custom directory

If you want to rename the output directory (for example, write a Claude Code collection under `.agents` instead of `.claude`), strip the known provider prefix from each path and join the remainder to your target directory:

<CodeGroup>
  ```python Python theme={null}
  output_dir = Path("custom-dir")
  prefix = ".claude/"  # your collection's provider prefix

  for relative_path, content in result["files"].items():
      target = relative_path[len(prefix):] if relative_path.startswith(prefix) else relative_path
      output_path = output_dir / target
      output_path.parent.mkdir(parents=True, exist_ok=True)
      output_path.write_text(content, encoding="utf-8")
  ```

  ```js JavaScript theme={null}
  const outputDir = "custom-dir";
  const prefix = ".claude/";

  for (const [relativePath, content] of Object.entries(result.files)) {
    const target = relativePath.startsWith(prefix)
      ? relativePath.slice(prefix.length)
      : relativePath;
    const outputPath = path.join(outputDir, target);
    await mkdir(path.dirname(outputPath), { recursive: true });
    await writeFile(outputPath, content, "utf8");
  }
  ```
</CodeGroup>

| Provider           | Prefix in response paths |
| ------------------ | ------------------------ |
| **Claude Code**    | `.claude/`               |
| **OpenAI / Codex** | `.agents/`               |
| **OpenClaw**       | `.openclaw/`             |
| **Universal**      | *(no prefix)*            |

## Pin a version or release label

Pass `version=` or `label=` to pull an immutable snapshot instead of latest.

<CodeGroup>
  ```python Python theme={null}
  client.skills.pull("my-collection", version=12)
  client.skills.pull("my-collection", label="production")
  ```

  ```js JavaScript theme={null}
  await client.skills.pull("my-collection", { version: 12 });
  await client.skills.pull("my-collection", { label: "production" });
  ```
</CodeGroup>

## Pull as a zip archive

Pass `format="zip"` to get a binary archive instead of a JSON file map. The archive already contains the provider's on-disk layout, so unzip it at your project root.

<CodeGroup>
  ```python Python theme={null}
  import os
  from pathlib import Path

  from promptlayer import PromptLayer

  client = PromptLayer(api_key=os.environ["PROMPTLAYER_API_KEY"])

  zip_archive = client.skills.pull("my-collection", format="zip")
  if not isinstance(zip_archive, bytes):
      raise ValueError("Expected a zip archive.")

  Path("skills.zip").write_bytes(zip_archive)
  ```

  ```js JavaScript theme={null}
  import { PromptLayer } from "promptlayer";
  import { writeFile } from "node:fs/promises";

  const client = new PromptLayer({ apiKey: process.env.PROMPTLAYER_API_KEY });

  const zipArchive = await client.skills.pull("my-collection", { format: "zip" });
  if (!(zipArchive instanceof ArrayBuffer)) {
    throw new Error("Expected a zip archive.");
  }

  await writeFile("skills.zip", Buffer.from(zipArchive));
  ```
</CodeGroup>

## Raw REST API

Use **`GET`** against the public API when you are not using an SDK:

```http theme={null}
GET https://api.promptlayer.com/api/public/v2/skill-collections/<identifier>
```

Optional query parameters:

| Parameter      | Purpose                              |
| -------------- | ------------------------------------ |
| `format=zip`   | Return a zip archive instead of JSON |
| `version=<n>`  | Pin to a specific version number     |
| `label=<name>` | Pin to a release label               |

File paths in the JSON response follow the same convention as the SDK: provider-specific collections include the `.claude/`, `.agents/`, or `.openclaw/` prefix; universal collections are root-relative. Zip exports always include the provider layout.

## Skill Collection webhooks

Skill Collections emit a `skill_collection_files_changed` event when a collection is created, saved, deleted, or restored. To subscribe and verify signatures, see [Webhooks](/features/prompt-registry/webhooks).

## CI pattern

<Tip>
  Run **`skills.pull`** in your build or deploy pipeline with a pinned **`label`** (for example `production`) so your agent loads a stable snapshot instead of whatever was latest when the job last ran.
</Tip>

For more on labels and review, see [Tune skills for your team](/features/skill-collections/tuning-skills).


# Sharing and remixing collections
Source: https://docs.promptlayer.com/features/skill-collections/sharing-and-remixing

Publish a read-only link to a Skill Collection so others can view, remix, or pull it into their own workspace.

Use shared Skill Collections when you want other people to **inspect**, **remix**, or **pull** your agent setup without giving them access to your private workspace.

## Share a public version

Open a saved Skill Collection version in the dashboard and click **Share**. PromptLayer opens a dialog where you can publish a public link and decide whether **Remix** should be available.

<img alt="Share dialog for a Skill Collection" />

<Info>
  Shared pages are read-only. Viewers can browse the collection, download it as
  a zip, and copy it, but they cannot edit the original collection in your
  workspace.
</Info>

## What viewers see on the shared page

The public page shows the collection's file tree and file contents so someone can understand the layout before they install or copy it.

<img alt="Shared Skill Collection page" />

This works well for starter kits, team playbooks, example agent setups, or public templates you want others to learn from.

## Let others remix your collection

If you enable **Remix**, people viewing the shared page can copy the collection into their own workspace and start editing from there.

<img alt="Remix Skill Collection dialog" />

A remix creates a separate Skill Collection owned by the destination workspace. After that, the new owner can rename skills, adjust frontmatter, switch providers, and publish their own versions without changing the original.

## Typical use cases

* Publish an opinionated starter collection for new repositories.
* Share a reference agent setup that customers can adapt to their stack.
* Distribute internal best practices across teams without handing out workspace access.
* Pair public sharing with pinned pulls so downstream projects load a stable version.


# Tune skills for your team
Source: https://docs.promptlayer.com/features/skill-collections/tuning-skills

Edit Skill Collections you own in the dashboard: provider layout, folder names, frontmatter, size limits, and SDK pull pins.

Use this guide when you **edit by hand** in the PromptLayer UI. For bulk edits and migrations with natural language, see [Editing with Wrangler AI](/features/skill-collections/editing-with-wrangler).

## Switch provider layout

The dashboard lets you align the collection with **Claude Code**, **OpenAI/Codex**, **OpenClaw**, or **Universal**. Switching layout preserves your skill content while changing which root files and paths PromptLayer expects.

<img alt="Provider selector" />

If you use **Codex**, treat it as the **OpenAI** provider: include **`AGENTS.md`** and each skill folder should have **`agents/openai.yaml`** alongside **`SKILL.md`**.

## Rename skills to match your conventions

Each skill usually lives in its own folder with a **`SKILL.md`**. Renaming the folder effectively renames the skill; update the **`name`** field in frontmatter to match.

<img alt="Renaming a skill folder" />

## Frontmatter

Machine-readable metadata belongs in the **first** YAML block at the top of **`SKILL.md`**. The editor treats the first `--- ... ---` block as frontmatter.

**Required:**

* **`name`** — Short identifier used in the editor and for discovery.
* **`description`** — What the skill does **and when** it should run (discovery text).

**Optional:**

* **`disable-model-invocation: true`** — Manual-only skill (no implicit invocation).

**Example:**

```markdown theme={null}
---
name: search-logs
description: Search PromptLayer request logs. Use when the user asks to find or filter past requests.
---

# Search logs

## Instructions

...
```

For **OpenAI/Codex**, keep UX and policy fields in **`my-skill/agents/openai.yaml`** (for example `interface.display_name`, `policy.allow_implicit_invocation`) rather than stuffing provider-specific metadata into **`SKILL.md`** frontmatter.

**YAML tips:** If a value contains `:` or quotes, use valid YAML; the editor will reserialize safely.

<img alt="Editing skill frontmatter" />

## Adapt examples to your stack

Replace generic examples with your ticket system, runbooks, and internal links so the skills match how your team works.

## Size and plan limits

PromptLayer shows **recommended** sizes in the editor; the hard ceiling is **5 MiB per file**.

| File / area                        | Guidance                                                   |
| ---------------------------------- | ---------------------------------------------------------- |
| **`SKILL.md`**                     | About **8,000 characters** (\~1,500–2,000 words) per skill |
| **`CLAUDE.md`**                    | Up to **\~40,000 characters** recommended                  |
| **`AGENTS.md`**                    | Up to **32,768 characters** (32 KB) recommended            |
| **OpenClaw `workspace/AGENTS.md`** | About **32,000 characters** recommended                    |

**Plan file counts:** Free up to **30** files per collection, Pro **50**, Team **100** (defaults; see [changelog](/changelog)).

Put long reference material in sibling files (**`reference.md`**, **`examples.md`**) and link from **`SKILL.md`**.

## Versioning and review

Every save creates a new version with a **commit message**. Optional **release labels** (for example `staging`, `production`) mark which version agents should pull.

<img alt="Version diff" />

Use version history to compare changes and roll back if needed.

## Pin SDK pulls to a label

After you label a stable version, clients can pull that snapshot instead of latest:

<CodeGroup>
  ```python Python theme={null}
  from promptlayer import PromptLayer
  import os

  client = PromptLayer(api_key=os.environ["PROMPTLAYER_API_KEY"])
  client.skills.pull("my-collection", label="production")
  ```

  ```js JavaScript theme={null}
  import { PromptLayer } from "promptlayer";

  const client = new PromptLayer({ apiKey: process.env.PROMPTLAYER_API_KEY });
  await client.skills.pull("my-collection", { label: "production" });
  ```
</CodeGroup>

<Tip>
  Publish experimental changes under a **`staging`** label first. Point automation at **`production`** only after you have reviewed the diff and tested with your agent.
</Tip>


# Supported Providers
Source: https://docs.promptlayer.com/features/supported-providers

Review the LLM providers and model capabilities PromptLayer supports.

Below is the list of LLM providers supported in PromptLayer along with key capability notes. For providers not listed or advanced deployments, see [Custom Providers](/features/custom-providers).

## Provider Details

### OpenAI

* Chat Completions, Responses API, and Images API are all supported.
* Function/Tool Calling (including built-in Responses API tools: Web Search, File Search, Image Generation) — see [Tool Calling](/features/prompt-registry/tool-calling).
* **Image Generation** via three paths: Images API (`dall-e-2`, `dall-e-3`, `gpt-image-1`, `gpt-image-1-mini`, `gpt-image-1.5`, `gpt-image-2`), Responses API `image_generation` tool, and GPT Image models — see [Image Generation](/features/image-generation).
* JSON mode / structured outputs.
* Vision models (e.g., `gpt-4-vision`) — see [FAQ: multimodal](/features/faq#does-promptlayer-support-multi-modal-image-models-like-gpt-4-vision).
* Streaming via SDK (Python/JS).
* Tip: You can also connect via [OpenRouter](/features/custom-providers#openrouter) as a custom provider to access many OpenAI-compatible models.

### OpenAI Azure

* Same usage as OpenAI but configured with Azure deployment settings.
* Chat Completions, Responses API, and Images API are all supported — including the `image_generation` built-in tool and Images API models.
* Ensure deployment name, API version, and resource URL are correctly configured.
* Most OpenAI features apply; some params differ per Azure config.

### Anthropic

* Tool Use (Anthropic Messages API).
* Built-in tools: Web Search, Bash, Code Execution, Text Editor — see [Tool Calling](/features/prompt-registry/tool-calling#anthropic).
* **Prompt Caching** — supported on all Claude models. See [Anthropic Prompt Caching](#anthropic-prompt-caching) below.
* Claude 3 family supports image inputs.
* Streaming via SDK.
* If you previously used "Anthropic Bedrock", migrate to the native Anthropic provider or to Claude via Amazon Bedrock.

### Google (Gemini)

* Multimodal support for images (input and output).
* Built-in tools: Google Search, Google Maps, Code Execution, URL Context, File Search — see [Tool Calling](/features/prompt-registry/tool-calling#google-gemini).
* **Native image generation** with Gemini image models (`gemini-2.5-flash-image`, `gemini-3-pro-image-preview`, etc.) — see [Image Generation](/features/image-generation#google-gemini--native-image-generation).
* Use either the direct Google provider or Vertex AI based on your infrastructure preference.

### Vertex AI

* Gemini and Anthropic Claude served through Google Cloud's Vertex AI.
* Built-in tools vary by model family — see [Tool Calling](/features/prompt-registry/tool-calling#vertex-ai):
  * Gemini models: Web Search, Google Maps, Code Execution, URL Context.
  * Claude models: Web Search, Bash, Text Editor.
* **Prompt Caching** — supported for Claude models on Vertex AI. See [Anthropic Prompt Caching](#anthropic-prompt-caching) below.
* Gemini image models are fully supported for native image generation via Vertex AI — see [Image Generation](/features/image-generation#google-gemini--native-image-generation).
* Configure project, region, and credentials as required by Vertex AI.

### Amazon Bedrock

* Access Anthropic Claude, Meta Llama, Mistral, Cohere, AI21 Jamba, Nova, DeepSeek, Gemma, and GPT-OSS models via AWS Bedrock Converse API.
* **Prompt Caching** — supported for Claude models on Bedrock (PromptLayer handles the Bedrock `cachePoint` format automatically). See [Anthropic Prompt Caching](#anthropic-prompt-caching) below.
* **Structured output** — Claude models on Bedrock receive the same JSON Schema normalization as the native Anthropic provider (including `oneOf` → `anyOf` rewriting). Other Bedrock models use standard nullable-union normalization.
* Capabilities vary per model family—see tool calling support below.
* Streaming via SDK (Python/JS).

Tool Calling Support:

* Full support (auto, any, specific tool): Claude, Mistral, GPT-OSS, Nova.
* Partial support (auto only): AI21 Jamba, Cohere, Llama.
* No tool calling: DeepSeek, Gemma.

Tool Choice Options:

* `auto` — Model decides whether to use a tool or respond directly.
* `any` — Model must use one of the provided tools.
* Specific tool — Force a particular tool (not supported by AI21 Jamba, Cohere, Llama).

Limitations:

* AI21 Jamba and Llama do not support streaming when tools are configured; use non-streaming requests.
* Cohere and Nova require underscores in tool names — PromptLayer automatically converts hyphens to underscores.

### Mistral

* Streaming supported; tool/function-call support depends on the specific model.

### Cohere

* Command/Command-R family supported.
* **Structured output** — JSON Schema response format is supported and available in the Playground's Advanced Model Controls.
* Feature availability (streaming, tool use) depends on the chosen model.

### Hugging Face

* Evaluation support varies by model/task—text-generation models generally work best.
* For endpoints with OpenAI-compatible shims, you can configure via a custom base URL.

### Anthropic Bedrock (Deprecated)

* This legacy integration is deprecated.
* Use the native Anthropic provider, or access Claude via Amazon Bedrock with the Bedrock provider.

## OpenAI-compatible Base URL Providers

Many third‑party providers expose an OpenAI‑compatible API. You can connect any such provider by configuring a Provider Base URL that uses the OpenAI client. See [Custom Providers](/features/custom-providers) for more details.

How to set up:

1. Go to your workspace settings → "Provider Base URLs".
2. Click "Create New" and configure:
   * LLM Provider: OpenAI
   * Base URL: the provider's endpoint (examples below)
   * API Key: the provider's key
3. Optional: Create Custom Models for a cleaner model dropdown in the Playground/Prompt Registry.

Examples:

* OpenRouter — Base URL: [https://openrouter.ai/api/v1](https://openrouter.ai/api/v1) (see [example](/features/custom-providers#openrouter))
* Exa — Base URL: [https://api.exa.ai](https://api.exa.ai) (see [integration guide](/features/exa-integration))
* xAI (Grok) — Base URL: [https://api.x.ai/v1](https://api.x.ai/v1) (see [integration guide](/features/xai-integration))
* DeepSeek — Base URL: [https://api.deepseek.com](https://api.deepseek.com) (see [FAQ](/features/faq#does-promptlayer-support-deepseek-models))
* Hugging Face gateways that offer OpenAI-compatible endpoints — use the gateway URL provided by your deployment

Capabilities:

* Works in Logs, Prompt Registry, and Playground.
* Evaluations: supported when the provider's OpenAI-compat layer matches PromptLayer parameters; remove unsupported params if needed (e.g., some providers do not support "seed").
* Provider-specific parameters not in the standard OpenAI SDK (e.g., `thinking` for Kimi/Moonshot) are automatically forwarded to the provider in the request body — no extra configuration needed.
* Tool/Function Calling and streaming availability depend on the provider/model.

## Anthropic Prompt Caching

Anthropic's [prompt caching](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching) lets you cache repeated content (system instructions, tool definitions, few-shot examples) so it isn't re-processed on every request. Cached input tokens cost up to 90% less than uncached tokens after the initial write.

PromptLayer supports this for Claude models on **Anthropic**, **Vertex AI**, and **Amazon Bedrock**.

### Automatic caching

Set a cache duration in the **Advanced Model Controls** (Prompt Caching dropdown) to automatically cache system messages and tool definitions.

<img alt="Advanced Model Controls with Prompt Caching enabled and 5 minutes selected" />

Available durations:

| Duration      | Providers                                                          |
| ------------- | ------------------------------------------------------------------ |
| **5 minutes** | Anthropic, Vertex AI, Amazon Bedrock (all supported Claude models) |
| **1 hour**    | Anthropic, Vertex AI, Amazon Bedrock (Claude Sonnet 4.5 only)      |

### Block-level caching

For fine-grained control, you can mark individual content blocks for caching directly in the Playground. When a Claude model is selected, cacheable blocks display a small cache icon.

<img alt="System message block in the Playground with the Cache control visible" />

Cacheable blocks:

* **Text blocks** in system, user, or assistant messages
* **Tool calls** in assistant messages
* **Tool results** in tool messages
* **Tool definitions** in the Tool & Output Editor

<img alt="Tool & Output Editor with Cache control next to a tool definition" />

### Viewing cache usage

When caching is active, the request log detail page shows **Cache Write** and **Cache Read** token counts alongside the standard metrics.

<img alt="Request log detail showing Cache Write token count and prompt/response" />

Content blocks that were marked for caching also display a cache badge in the request log, matching the indicator shown in the Playground.

<Note>
  Anthropic requires a minimum of 1,024 tokens (2,048 for Claude 3.5 Haiku) in the cached content for caching to activate.
</Note>

## Related Docs

* [Image Generation](/features/image-generation)
* [Custom Providers](/features/custom-providers)
* [Tool Calling](/features/prompt-registry/tool-calling)
* [Template Variables](/features/prompt-registry/template-variables)
* [FAQ](/features/faq)


# Table Design Patterns
Source: https://docs.promptlayer.com/features/tables/best-practices

Design patterns for building reliable Table workflows.

Use this pattern to keep Table workflows readable, rerunnable, and easy to debug.

## Recommended pattern

1. Put source data in text columns.
2. Add one computed column per logical step.
3. Map every computed column to explicit source columns.
4. Keep checks and score columns close to the outputs they evaluate.
5. Rerun stale cells before reading the score.
6. Save versions at meaningful checkpoints.
7. Use Analytics when you need request-level debugging.


# Cells and Runs
Source: https://docs.promptlayer.com/features/tables/cells-and-runs

Understand Table cells, statuses, stale values, reruns, and cancellation.

A cell is the value at one row and one column. Text cells are editable; computed cells are produced by running a column's configuration against the row.

## Cell types

| Cell kind         | Where it appears                                                                                 | Behavior                                                            |
| ----------------- | ------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------- |
| **Text cell**     | Text columns                                                                                     | Stores editable input, labels, metadata, expected output, or notes. |
| **Computed cell** | Prompt, code, assertion, extraction, comparison, helper, composition, and other computed columns | Stores an output and a run status.                                  |

Computed cells track whether their output is ready, pending, failed, or stale.

## Cell statuses

| Status       | Meaning                                                                            | What to do                                                          |
| ------------ | ---------------------------------------------------------------------------------- | ------------------------------------------------------------------- |
| `QUEUED`     | Work has been scheduled but has not started.                                       | Wait, or cancel the operation if it was queued by mistake.          |
| `DISPATCHED` | Work has been sent to a worker.                                                    | Wait, or cancel if needed.                                          |
| `RUNNING`    | Work is actively executing.                                                        | Wait, inspect progress, or stop the run.                            |
| `COMPLETED`  | The cell has a completed output.                                                   | Use the result, score it, or inspect it.                            |
| `FAILED`     | The cell run failed.                                                               | Open the cell, inspect the error, fix inputs or config, then rerun. |
| `STALE`      | The cell has an old output that no longer matches current inputs or configuration. | Rerun before trusting the result.                                   |

`QUEUED`, `DISPATCHED`, and `RUNNING` are pending statuses.

## Stale cells

A computed cell becomes stale when a dependency changes, such as a source value, source mapping, column configuration, rerun output, or composition source.

Stale cells are highlighted so you can rerun only the work that needs refreshing.

<Frame>
  <img alt="A stale computed Table cell highlighted in amber after a dependency changed" />
</Frame>

## Run scopes

Run the smallest scope that matches what changed.

| Scope                  | Use when                                                                                     |
| ---------------------- | -------------------------------------------------------------------------------------------- |
| **Table or sheet run** | You want to refresh the sheet broadly, usually after a large import or configuration change. |
| **Column run**         | One computed column changed and should be recalculated across rows.                          |
| **Row run**            | One or more rows changed and should be refreshed across computed columns.                    |
| **Selected cell run**  | You want to rerun a specific set of computed cells.                                          |
| **Stale-cell run**     | You want to refresh only outputs marked stale.                                               |

<Frame>
  <img alt="Run controls for refreshing Table work" />
</Frame>

## Run selected rows or cells

Select rows or computed cells to run a smaller slice of work. This is useful when only a subset of inputs changed or when you are debugging a few failing examples.

## Stop or cancel work

Stop or cancel work when a run was started by mistake, the configuration is wrong, or a long-running column should be interrupted. Cancellation is only useful while cells are queued, dispatched, or running.

## Inspect cells

Open computed cells to inspect outputs, errors, execution details, or prompt request details. Use failed cells to debug configuration or source data, and use stale cells to decide what needs rerunning before scoring.

## Update cells through the API

Use the API when a job, script, or external workflow should add rows, edit cells, or queue recalculation.

<CardGroup>
  <Card title="Get cell" icon="magnifying-glass" href="/reference/table-sheet-cells-get">
    Read a specific cell.
  </Card>

  <Card title="Update cell" icon="pen" href="/reference/table-sheet-cells-update">
    Update an editable cell value.
  </Card>

  <Card title="Recalculate cell" icon="rotate" href="/reference/table-sheet-cells-recalculate">
    Queue recalculation for a cell.
  </Card>

  <Card title="Create operation" icon="play" href="/reference/table-sheet-operations-create">
    Queue broader row, column, cell, or stale work.
  </Card>
</CardGroup>


# Column Types
Source: https://docs.promptlayer.com/features/tables/column-types

Reference for text columns, computed columns, evaluation columns, helper columns, and composition in Tables.

Table column types reuse PromptLayer's evaluation and workflow building blocks, plus the Tables-specific **Composition** type.

## How sources work

Computed columns read from source columns in the sheet. For example, a `Prompt output` column can map a prompt variable to `Customer message`, and a `Quality score` column can read from `Prompt output`. If a source changes, dependent computed cells can become stale.

<Frame>
  <img alt="Prompt Template column configuration showing input variables mapped to source columns" />
</Frame>

## Column type overview

| UI label                  | Type value                  | Category                | Use when                                                                    |
| ------------------------- | --------------------------- | ----------------------- | --------------------------------------------------------------------------- |
| Text                      | `TEXT`                      | Text                    | You need editable input data, expected answers, labels, metadata, or notes. |
| Prompt Template           | `PROMPT_TEMPLATE`           | Data source / execution | You want to run a Prompt Registry template or inline prompt for each row.   |
| Workflow                  | `WORKFLOW`                  | Data source / execution | You want a row to execute a PromptLayer workflow.                           |
| Code Execution            | `CODE_EXECUTION`            | Data source / execution | You need Python or JavaScript logic per row.                                |
| Endpoint                  | `ENDPOINT`                  | Data source / execution | You want to send row data to an external URL endpoint.                      |
| MCP                       | `MCP`                       | Data source / execution | You want to call Model Context Protocol server functions.                   |
| Conversation Simulator    | `CONVERSATION_SIMULATOR`    | Data source / execution | You want to simulate a multi-turn user and assistant conversation.          |
| Human                     | `HUMAN`                     | Data source / execution | You need a human to fill in or approve data.                                |
| While Loop                | `WHILE_LOOP`                | Data source / execution | You need repeated execution while a condition remains true.                 |
| For Loop                  | `FOR_LOOP`                  | Data source / execution | You need to iterate over a collection or array.                             |
| Composition               | `COMPOSITION`               | Data source / execution | You want to reference data from another sheet or table.                     |
| Compare                   | `COMPARE`                   | Simple eval             | You want to compare values and generate a diff or comparison result.        |
| Contains                  | `CONTAINS`                  | Simple eval             | You want to check whether text contains a substring.                        |
| Regex                     | `REGEX`                     | Simple eval             | You want to validate text against a regular expression.                     |
| Absolute Numeric Distance | `ABSOLUTE_NUMERIC_DISTANCE` | Simple eval             | You want the absolute distance between numeric values.                      |
| Assert Valid              | `ASSERT_VALID`              | Simple eval             | You want to validate data types or formats.                                 |
| Math Operator             | `MATH_OPERATOR`             | Simple eval             | You want mathematical comparisons such as greater than or less than.        |
| LLM Assertion             | `LLM_ASSERTION`             | LLM eval                | You want an LLM to judge whether an assertion is true.                      |
| AI Data Extraction        | `AI_DATA_EXTRACTION`        | LLM eval                | You want an LLM to extract specific information from text.                  |
| Cosine Similarity         | `COSINE_SIMILARITY`         | LLM eval                | You want semantic similarity between vectors or text embeddings.            |
| JSON Extraction           | `JSON_PATH`                 | Helper                  | You want to extract values from JSON with JSONPath.                         |
| XML Path                  | `XML_PATH`                  | Helper                  | You want to extract values from XML with XPath.                             |
| Regex Extraction          | `REGEX_EXTRACTION`          | Helper                  | You want to extract capture groups from text.                               |
| Parse Value               | `PARSE_VALUE`               | Helper                  | You want to parse a specific value from structured or unstructured data.    |
| Count                     | `COUNT`                     | Helper                  | You want counts of elements, characters, or occurrences.                    |
| Min Max                   | `MIN_MAX`                   | Helper                  | You want the minimum or maximum value from numeric inputs.                  |
| Coalesce                  | `COALESCE`                  | Helper                  | You want the first non-null value from several sources.                     |
| Combine Columns           | `COMBINE_COLUMNS`           | Helper                  | You want to merge values from multiple sources into one output.             |
| Apply Diff                | `APPLY_DIFF`                | Helper                  | You want to apply a unified diff patch to text.                             |

<Warning>
  **Condition** appears in Workflow node selectors, but it is not a Table column type. Use Table-supported comparison, assertion, or helper columns for row-level checks.
</Warning>

## Composition

Composition references a source table, sheet, and column, then pulls that value into the current sheet. Use it when multiple sheets should share a reusable intermediate result instead of copying logic.

<Frame>
  <img alt="Composition column configuration that references a source column from another sheet" />
</Frame>

## API references

Use the generated API reference for exact request schemas when creating or updating columns programmatically.

<CardGroup>
  <Card title="Create column" icon="plus" href="/reference/table-sheet-columns-create">
    Create text or computed columns.
  </Card>

  <Card title="Update column" icon="pen" href="/reference/table-sheet-columns-update">
    Update title, config, and dependencies.
  </Card>
</CardGroup>


# Columns
Source: https://docs.promptlayer.com/features/tables/columns

Add, configure, map, filter, sort, run, and manage Table columns.

Columns define what each row stores or produces. Start with text inputs, then add computed columns for each step in the workflow.

For the full list of supported types, see [Column Types](/features/tables/column-types).

## Add a column

Click **Add Column** to add a new field to the sheet.

<Frame>
  <img alt="Add Column control in a Table sheet" />
</Frame>

A clear sheet usually moves left to right: inputs, generated outputs, checks or helper transforms, and reviewer notes.

## Choose a computed type

Use one computed column per logical step: prompt or workflow execution, code, deterministic checks, LLM evaluation, helper transforms, or composition.

<Frame>
  <img alt="A Table with input columns followed by computed prompt, code, quality, and edge-case columns" />
</Frame>

## Configure a column

Open a column menu and choose **Configure column** to edit its settings.

<Frame>
  <img alt="Column menu showing Configure column, Hide, Auto size, Pin, Filter, Sort, Run column, Stop column, Duplicate, and Delete actions" />
</Frame>

Use the same menu to hide, resize, pin, filter, sort, run, duplicate, or delete a column.

## Map sources

Computed columns read from source columns. A Prompt Template column, for example, exposes the prompt's input variables and lets you map each variable to a source column.

<Frame>
  <img alt="Prompt Template column configuration showing input variables mapped to source columns" />
</Frame>

Source mappings become dependencies. When a source value or computed column configuration changes, dependent computed cells can become stale.

If a variable is unmapped, open the source selector and choose the source column for that variable.

## Filter and sort columns

Use **Filter** to narrow visible rows by column value. Use **Sort ascending**, **Sort descending**, and **Clear sort** to reorder or reset the grid.

## Pin, hide, resize, duplicate, and delete columns

Use column menu actions to manage grid layout:

* **Pin left** or **Pin right** keeps important columns visible while scrolling.
* **Hide** removes a column from the visible grid without deleting it.
* **Auto size column** resizes a column to fit visible content.
* **Duplicate** copies a column when you want a similar configuration.
* **Delete** removes the column.

## Run or stop a column

Use **Run column** when one computed step needs updating. If the column is queued, dispatched, or running, the menu can show **Stop column**.

## Use composition columns

Composition columns reference another sheet or table and pull values from a chosen source column. Use composition when multiple sheets should share a reusable intermediate result instead of copying logic into every sheet.

<Frame>
  <img alt="Composition column configuration that copies values from a source column in another sheet" />
</Frame>

## API references

<CardGroup>
  <Card title="Create column" icon="plus" href="/reference/table-sheet-columns-create">
    Add a text or computed column to a sheet.
  </Card>

  <Card title="Update column" icon="pen" href="/reference/table-sheet-columns-update">
    Update a column title, config, or dependencies.
  </Card>

  <Card title="List columns" icon="list" href="/reference/table-sheet-columns-list">
    List columns for a sheet.
  </Card>

  <Card title="Run operation" icon="play" href="/reference/table-sheet-operations-create">
    Queue work for a column, row, cell selection, or stale cells.
  </Card>
</CardGroup>


# History and Analytics
Source: https://docs.promptlayer.com/features/tables/history-and-analytics

Use version history, score history, diffs, and request analytics to review Table changes.

History and Analytics help you review how a Table changes over time and debug the requests behind computed work.

## Open history

Click **History** in the Table toolbar to open the version history panel.

<Frame>
  <img alt="History button in the Table toolbar" />
</Frame>

## Version history

Version history keeps checkpoints, score history, preview and diff views, and restore controls in one panel.

<Frame>
  <img alt="Version history panel showing the no saved versions empty state" />
</Frame>

Save versions around meaningful checkpoints so you can compare the sheet before and after important changes.

## When to save versions

Save a version before or after:

* Importing a new dataset or request-history sample.
* Changing a Prompt Template column.
* Editing source mappings.
* Adding or removing score columns.
* Rerunning a large sheet.
* Migrating a legacy Evaluation or Dataset workflow.
* Sharing results with another teammate.

## Use score history

Score history shows how quality changes over time. Use it to compare prompt versions, model or provider changes, assertion logic, dataset changes, bug fixes, and scoring configuration updates.

Score history is most useful when versions are saved at stable checkpoints.

## Preview and diff older versions

Use version previews to inspect what changed. Diff views can show before and after values for cells and configuration changes.

Use this when a score drops or a rerun produces unexpected outputs. Start from the score history change, open the related version, then inspect changed columns or cells.

## Restore a version

If an older version is the correct state, use the restore action from version history. The current version cannot be restored into itself. Older versions can be restored after confirmation.

Use restore carefully when other teammates may be using the Table.

## Open analytics

Click **Analytics** in the Table toolbar to inspect request-level data for work generated by the sheet.

Analytics opens with filters locked to the current Table and sheet, so request data stays scoped to the work you are reviewing.

<Frame>
  <img alt="Analytics drawer opened from a Table with locked Table and sheet filters, search, date range, and request analytics tabs" />
</Frame>

## Analytics drawer

Use Analytics to debug what happened behind a computed cell, trace a prompt request, or inspect request behavior for the current sheet.

## Column analytics

Some computed columns support request-log analytics directly from column or version preview controls. Use column analytics when you want to inspect the requests created by one computed column rather than the entire sheet.

## Review workflow

A typical review loop:

1. Run the sheet or selected stale work.
2. Open **Score** to confirm the summary result.
3. Save or inspect a version in **History**.
4. If the score changed unexpectedly, preview or diff the relevant version.
5. Use **Analytics** to inspect request logs for the sheet or column.
6. Fix inputs, mappings, prompts, code, or scoring, then rerun.

## API references

<CardGroup>
  <Card title="List versions" icon="list" href="/reference/table-sheet-versions-list">
    List saved versions for a sheet.
  </Card>

  <Card title="Create version" icon="floppy-disk" href="/reference/table-sheet-versions-create">
    Save a version checkpoint.
  </Card>

  <Card title="Get version" icon="magnifying-glass" href="/reference/table-sheet-versions-get">
    Read a saved version.
  </Card>

  <Card title="Score history" icon="chart-line" href="/reference/table-sheet-score-history-get">
    Read score history for a sheet.
  </Card>
</CardGroup>


# Migrate from Evaluations and Datasets
Source: https://docs.promptlayer.com/features/tables/migrate-from-evaluations-and-datasets

Move legacy Dataset, Evaluation, and Report workflows into Tables.

<Warning>
  Legacy Evaluations, Reports, and Datasets remain available for existing projects. Use Tables for new workflows.
</Warning>

Tables bring legacy Dataset, Evaluation, and Report workflows into one place: rows, computed steps, runs, scoring, history, and analytics.

## Convert a legacy view to a Table

When a legacy Dataset or Evaluation page shows a migration banner, click **Convert to Table**. PromptLayer creates the Table from the legacy object and opens it so you can review sheets, rows, columns, scoring, and run status before moving production workflows.

Use the conversion button first when it appears. Rebuild manually only when you are working from exports, logs, or a legacy workflow without the banner.

<Frame>
  <img alt="Legacy Evaluation page showing the Convert to Table migration button" />
</Frame>

## Concept mapping

| Legacy concept      | Table concept                        |
| ------------------- | ------------------------------------ |
| Dataset group       | Table                                |
| Dataset row         | Table row                            |
| Dataset column      | Text column                          |
| Dataset version     | Sheet version                        |
| Evaluation pipeline | Sheet with computed columns          |
| Evaluation step     | Computed column                      |
| Report run          | Recalculation operation              |
| Report output       | Computed cell output                 |
| Report score card   | Sheet score configuration            |
| Report history      | History, versions, and score history |

## Migration checklist

1. Convert from the legacy page when the banner is available.
2. Review generated sheets, rows, columns, scoring, and run status.
3. For manual rebuilds, import rows from CSV or request history.
4. Recreate missing evaluation steps as computed columns.
5. Map sources, configure scoring, and rerun stale or pending cells.
6. Compare outputs, score, history, and analytics before moving automation to the Tables API.

## Manual rebuild shortcuts

Use these paths when conversion is not available.

| Legacy workflow              | Table workflow                                                                                                                                                                  |
| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Dataset from CSV or JSON     | Create a Table, click **Upload**, choose **From computer**, then add computed columns and scoring in the same sheet.                                                            |
| Dataset from request history | Create a Table, click **Upload**, choose **From request history**, select requests, then click **Add to Sheet**.                                                                |
| Evaluation pipeline          | Put inputs in text columns, add one computed column for each evaluation step, map Prompt Template variables to source columns, run the computed columns, and configure scoring. |
| Report run                   | Run the Table, a computed column, or selected stale work. Inspect cells in place, then use **Score**, **Analytics**, and **History** to review results.                         |

<Frame>
  <img alt="Import request history directly into a Table sheet" />
</Frame>

## Automation mapping

| Legacy automation                | Tables API                                                                      |
| -------------------------------- | ------------------------------------------------------------------------------- |
| Create Dataset Group             | [Create Table](/reference/tables-create)                                        |
| Upload Dataset file              | [Create file import](/reference/table-sheet-imports-file-create)                |
| Create Dataset from request logs | [Create request-log import](/reference/table-sheet-imports-request-logs-create) |
| Add Evaluation step              | [Create column](/reference/table-sheet-columns-create)                          |
| Run Report                       | [Create operation](/reference/table-sheet-operations-create)                    |
| Poll Report status               | [Get operation](/reference/table-sheet-operations-get)                          |
| Read Report score                | [Get sheet score](/reference/table-sheet-score-get)                             |
| Save checkpoint                  | [Create sheet version](/reference/table-sheet-versions-create)                  |

## Recommended order

Migrate one workflow at a time. Review the generated Table before changing automation, confirm outputs and scores match the old workflow, then move automation to the Tables API. Keep the legacy workflow read-only until the Table is trusted.


# Overview
Source: https://docs.promptlayer.com/features/tables/overview

Understand the Table model and the main Tables workflow in PromptLayer.

Tables are the workspace for dataset, evaluation, backtesting, report, and batch workflows in PromptLayer. They keep inputs, computed outputs, scoring, history, and analytics together so a workflow can be built, run, and reviewed in one place.

A Table has four parts:

| Object     | What it does                                                                                                                       |
| ---------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| **Table**  | The top-level workspace for a related evaluation or batch workflow.                                                                |
| **Sheet**  | A tab with its own rows, columns, score configuration, history, and analytics.                                                     |
| **Column** | A field in the sheet. Text columns store data; computed columns run prompts, code, checks, extraction, composition, or other work. |
| **Cell**   | A row-column value. Computed cells track status, including queued, running, completed, failed, and stale.                          |

<Frame>
  <img alt="A new Table with options to create from a prompt, import request history, or start blank" />
</Frame>

## Start a Table

A new Table opens with one sheet, one text column, and one row. The empty state gives three starting points:

1. **Create from a prompt**: set up prompt inputs, an output column, and evaluation checks.
2. **Import request history**: select logged requests and add them as rows.
3. **Start blank**: build the sheet manually from rows and columns.

Use **Create from a prompt** when the workflow starts from a Prompt Registry template. Use **Import request history** when production or test traffic is already logged. Use **Start blank** when you want to model the sheet yourself.

## Main workflow

Most Tables follow a simple loop:

1. Add rows from CSV, request history, or manual entry.
2. Add text columns for inputs, labels, expected answers, or metadata.
3. Add computed columns for prompts, code, assertions, extraction, comparisons, composition, or helper functions.
4. Map computed columns to their sources.
5. Run the sheet, a column, a row, or selected cells.
6. Review cell status, configure scoring, and use history or analytics to compare changes.

<Frame>
  <img alt="Table toolbar showing import, export, score, analytics, history, upload, download, and tour controls" />
</Frame>

## Product sections

<CardGroup>
  <Card title="Sheets" icon="table-cells" href="/features/tables/sheets">
    Import data, manage sheet tabs, add rows, upload CSVs, and import request history.
  </Card>

  <Card title="Columns" icon="columns-3" href="/features/tables/columns">
    Add, configure, filter, sort, pin, duplicate, run, and map source columns.
  </Card>

  <Card title="Column Types" icon="brackets-curly" href="/features/tables/column-types">
    Reference text columns, computed columns, evaluation columns, helper columns, and composition.
  </Card>

  <Card title="Cells and Runs" icon="play" href="/features/tables/cells-and-runs">
    Understand cell statuses, stale cells, reruns, selected runs, and cancellation.
  </Card>

  <Card title="Scoring" icon="gauge" href="/features/tables/scoring">
    Configure score columns, Boolean scoring, numeric scoring, custom code, and winner aggregation.
  </Card>

  <Card title="History and Analytics" icon="chart-line" href="/features/tables/history-and-analytics">
    Review saved versions, score history, diffs, request analytics, and request-level debugging.
  </Card>
</CardGroup>

## When to use Tables

Use Tables when you need a repeatable workflow over rows of examples: prompt regression tests, request-log replay, dataset creation, model comparisons, multi-step evaluations, human review queues, or batch jobs.

Use legacy Evaluations and Datasets only for existing legacy workflows. For new work, start in Tables.

## API references

Use the Tables API when you want to automate the same actions from code.

<CardGroup>
  <Card title="Create a Table" icon="plus" href="/reference/tables-create">
    Create a Table programmatically.
  </Card>

  <Card title="Create a sheet" icon="table-cells" href="/reference/table-sheets-create">
    Add a new sheet to an existing Table.
  </Card>

  <Card title="Create a column" icon="columns-3" href="/reference/table-sheet-columns-create">
    Add text or computed columns to a sheet.
  </Card>

  <Card title="Run an operation" icon="rotate" href="/reference/table-sheet-operations-create">
    Queue recalculation work for rows, columns, cells, or stale work.
  </Card>
</CardGroup>


# Scoring
Source: https://docs.promptlayer.com/features/tables/scoring

Configure sheet scores with score columns, Boolean and numeric scoring, custom code, and winner aggregation.

Scoring turns a sheet into a summary signal you can compare across runs, prompt versions, and workflow changes.

Click **Score** in the Table toolbar to open the score panel after computed columns produce the outputs or checks you care about.

<Frame>
  <img alt="Score button in the Table toolbar" />
</Frame>

## Score panel

The score panel shows the current result, column and sub-score breakdowns, configuration, and recalculation status.

<Frame>
  <img alt="Score panel showing average score, column breakdown, and score configuration" />
</Frame>

## Configure scoring

In **Scoring configuration**, choose a scoring mode and the columns that count toward the score.

<Frame>
  <img alt="Score configuration panel with scoring mode, score columns, Boolean token settings, assertion aggregation, and Recalculate" />
</Frame>

For non-custom, non-aggregate modes, choose **Score columns**. Changes save automatically; use **Recalculate** after changing score settings.

## Scoring modes

| Mode                             | Use when                                                                                    | Configuration                                                                   |
| -------------------------------- | ------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------- |
| **Auto detect (boolean/number)** | The selected score columns already produce booleans or numbers.                             | Select score columns and recalculate.                                           |
| **Boolean**                      | Selected columns produce pass/fail style outputs.                                           | Configure true tokens, false tokens, and assertion aggregation.                 |
| **Numeric**                      | Selected columns produce numeric values.                                                    | Select score columns and recalculate.                                           |
| **Custom code**                  | You need custom scoring logic across the sheet.                                             | Write Python or JavaScript that returns a deterministic scoring object.         |
| **Winner / aggregate**           | You want a qualitative result such as most frequent winner, lowest value, or highest value. | Choose an aggregate question, source column, and optional display label column. |

## Boolean scoring

Boolean mode converts selected column values into pass/fail results. Configure true tokens, false tokens, and assertion aggregation (`Mean`, `All`, or `Any`).

Use Boolean scoring for assertion columns, quality checks, moderation checks, format checks, and other pass/fail evaluations.

## Numeric scoring

Numeric mode averages selected numeric outputs. Use it when columns return scores, distances, similarity values, ratings, or normalized metrics.

For comparable version history, make sure higher values consistently mean better quality.

## Auto detect scoring

Auto detect chooses boolean or numeric handling based on the selected score column outputs. Use it when the selected columns are already clean booleans or numbers and you do not need custom token rules.

## Custom code scoring

Custom code mode scores the whole sheet with Python or JavaScript. The scorer receives sheet data and must return a deterministic object with a numeric `score`.

Required key:

```json theme={null}
{
  "score": 0.91
}
```

Optional keys:

```json theme={null}
{
  "score": 0.91,
  "sub_scores": {
    "coverage": 0.96,
    "quality": 0.88
  },
  "score_matrix": [[0.91, "pass"]]
}
```

Supported `score_matrix` shapes:

* 2D matrix: `list[list[cell]]`.
* Single-table 3D matrix: `list[list[list[cell]]]` where the top-level length is `1`.

Matrix cells can be numbers, strings, `null`, or objects like `{ "value": 0.92, "positive_metric": true }`.

Use custom code when score logic depends on multiple columns, row-level weighting, custom sub-scores, or a custom matrix display.

## Winner and aggregate scoring

Winner / aggregate mode summarizes one column into a qualitative result.

Available questions:

| Question                | Use when                                                                           |
| ----------------------- | ---------------------------------------------------------------------------------- |
| **Most frequent value** | You want the value that appears most often, such as the most common winning model. |
| **Lowest value**        | You want the row with the smallest numeric value, such as lowest cost or latency.  |
| **Highest value**       | You want the row with the largest numeric value, such as highest quality score.    |

Choose the source **Column**. For lowest or highest value, optionally choose **Show winner as** to display a label from the same row instead of only the metric value.

Use aggregate scoring for model bakeoffs, latency comparisons, cost comparisons, routing decisions, or any sheet where the result is a winner rather than an average.

## Read the score

Use the score summary to compare the average score or aggregate result, inspect column and sub-score breakdowns, and see skipped values or recalculation errors.

## Recalculate after changes

Recalculate the score after:

* Changing the scoring mode.
* Changing score columns.
* Updating true or false tokens.
* Editing custom scorer code.
* Changing aggregate settings.
* Rerunning computed cells that feed the score.

## API references

<CardGroup>
  <Card title="Get sheet score" icon="gauge" href="/reference/table-sheet-score-get">
    Read the current score result.
  </Card>

  <Card title="Configure score" icon="sliders" href="/reference/table-sheet-score-configure">
    Configure scoring for a sheet.
  </Card>

  <Card title="Recalculate score" icon="rotate" href="/reference/table-sheet-score-recalculate">
    Queue a score recalculation.
  </Card>

  <Card title="Score history" icon="chart-line" href="/reference/table-sheet-score-history-get">
    Read score history for a sheet.
  </Card>
</CardGroup>


# Sheets
Source: https://docs.promptlayer.com/features/tables/sheets

Import data, manage sheet tabs, add rows, and scope Table work by sheet.

A sheet is a tab inside a Table. Each sheet has its own rows, columns, scoring, version history, and analytics.

Use multiple sheets when related workflows belong together but need different inputs, columns, or test cases. For example, keep a main regression sheet, edge cases, and migration work in the same Table.

## Sheet anatomy

A sheet contains:

| Part                    | Purpose                                                                       |
| ----------------------- | ----------------------------------------------------------------------------- |
| **Rows**                | Examples, request logs, test cases, or batch items.                           |
| **Text columns**        | Editable inputs, labels, expected outputs, metadata, and notes.               |
| **Computed columns**    | Prompt, code, assertion, extraction, comparison, composition, or helper work. |
| **Score configuration** | The scoring rules for the current sheet.                                      |
| **History**             | Saved versions and score history for the sheet.                               |
| **Analytics**           | Request-level analytics scoped to the current Table and sheet.                |

<Frame>
  <img alt="A Table sheet with rows, columns, toolbar actions, and sheet tabs" />
</Frame>

## Add rows manually

Each row is one item to run through the sheet. Use the row control at the bottom of the grid to add rows, then fill the input text columns.

<Frame>
  <img alt="Add row control in a Table sheet" />
</Frame>

Manual rows are useful for small test sets, edge cases, and examples you want to type or paste directly into the sheet.

## Upload data

Use **Upload** when your rows already exist outside PromptLayer. Upload a CSV from your computer or import logged requests from request history.

<Frame>
  <img alt="Upload menu with CSV upload and request history import options" />
</Frame>

After import, add computed columns and scoring in the same sheet.

## Import request history

Choose **Upload** then **From request history** to open the request import dialog.

<Frame>
  <img alt="Add from Request History dialog with date range, search, filters, request grid, pagination, selectable rows, and Add to Sheet action" />
</Frame>

Each selected request becomes a row. Use request-history imports to evaluate real traffic, build datasets from production behavior, or replay previous requests through new columns.

## Download a sheet

Click **Download** in the toolbar to export the current sheet for external review, offline analysis, or a point-in-time snapshot.

<Frame>
  <img alt="Table toolbar with Upload, Download, Score, Analytics, History, and Tour controls" />
</Frame>

## Use multiple sheets

Use the sheet tabs at the bottom of the Table to move between sheets. Each sheet can have different rows, columns, scoring, and history.

<Frame>
  <img alt="Multiple sheet tabs in a Table with a plus control for adding another sheet" />
</Frame>

Use separate sheets for:

* Main regression set versus edge cases.
* Different task families in the same product area.
* Prompt variants that need different inputs.
* Migration work where the old and new workflows should stay near each other.
* Intermediate sheets that feed composition columns.

## API references

<CardGroup>
  <Card title="Create sheet" icon="plus" href="/reference/table-sheets-create">
    Add a sheet to a Table.
  </Card>

  <Card title="List sheets" icon="list" href="/reference/table-sheets-list">
    List the sheets in a Table.
  </Card>

  <Card title="Add rows" icon="rows-3" href="/reference/table-sheet-rows-add">
    Add rows to a sheet programmatically.
  </Card>

  <Card title="Import request logs" icon="history" href="/reference/table-sheet-imports-request-logs-create">
    Add logged requests to a sheet.
  </Card>
</CardGroup>


# Auto Tool Execution
Source: https://docs.promptlayer.com/features/tool-registry/auto-execution


Define a tool's **execution body** alongside its schema and PromptLayer will run it for you in a sandbox between LLM turns. Your prompt runs end-to-end without you writing a tool-call loop in your application code.

## The Tool-Call Loop

Tool calling is a two-step dance. The LLM emits a `tool_call`, your app reads it, runs your function, sends the `tool_result` back, and the LLM produces a final answer. That loop lives in *your* code, and you have to write it for every prompt that uses tools, in every language, in every service.

Most of that loop is identical: parse the tool call, dispatch to the right function, capture the return, send it back, repeat until the model is done. It's boilerplate that grows linearly with the number of tools and services.

## Letting PromptLayer Drive

Attach an **execution body** to a tool in the registry. When a prompt that references the tool is run, PromptLayer:

1. Calls the LLM with the tool's schema
2. If the LLM emits a `tool_call` for the tool, runs the body in a sandbox
3. Feeds the return value back as a `tool_result`
4. Calls the LLM again
5. Repeats until the model returns a plain message (no more tool calls), or until a hard cap of 10 round trips

Your client just calls `pl.run(...)` and gets a final answer back. The whole tool-call loop happens inside PromptLayer.

```
Your app: pl.run(blueprint)
       ↓
   LLM call 1 → tool_call: get_weather({city: "Paris"})
       ↓
   sandbox runs your body → {temp_c: 18, ...}
       ↓
   LLM call 2 → final answer
       ↓
Your app: receives answer
```

## Writing an Execution Body

In the Tool Registry editor, turn on the **Execution** panel and write the body of your tool function. You only write the body. The signature is auto-generated from the tool's name and is shown above the editor as a read-only line:

<CodeGroup>
  ```python Python theme={null}
  return {"city": args["city"], "temp_c": 18, "forecast": "sunny"}
  ```

  ```javascript JavaScript theme={null}
  return { city: args.city, temp_c: 18, forecast: "sunny" };
  ```
</CodeGroup>

The arguments the LLM emits arrive as a single `args` object. Access individual fields with `args["name"]` (Python) or `args.name` (JavaScript). Return whatever JSON-serializable value you want the LLM to see.

<Tabs>
  <Tab title="Python">
    <Frame>
      <img alt="Tool execution editor — Python body" />
    </Frame>
  </Tab>

  <Tab title="JavaScript">
    <Frame>
      <img alt="Tool execution editor — JavaScript body" />
    </Frame>
  </Tab>
</Tabs>

<Tip>
  The body is wrapped server-side as `def <tool_name>(args): <your body>` (Python) or `function <tool_name>(args) { <your body> }` (JavaScript). You can define helper functions inside the body; they live in the same scope.
</Tip>

## Testing Your Body

Click **Test run** at the top of the editor to open the test dialog. Fill in the schema's parameters, click **Run Test**, and see the body's return value rendered in the Output column. The dialog runs against the same sandbox auto-execution uses, so anything that works here will work at LLM call time.

<Frame>
  <img alt="Tool test run dialog" />
</Frame>

## Supported Languages

* **Python 3**: standard library available; common third-party packages installed in the sandbox
* **JavaScript**: modern ES syntax, Node-compatible

Pick the language with the dropdown at the top of the editor. The signature line updates to match.

## Behaviour Inside a Prompt

When you run a prompt that contains a registry tool with execution code:

* **Single tool call per turn** → executed. The result is fed back and the LLM is called again.
* **Multiple tool calls in one turn, all executable** → all run, results fed back together.
* **Mixed turn (some executable, some not)** → auto-execution bails for that turn and returns. This avoids sending partial `tool_results` (which would 400 from the provider). Handle the rest in your own code.
* **Registry tool without execution code** → auto-execution skips it; behaves exactly like a normal function tool.

The loop is capped at 10 LLM calls per request to prevent runaway loops. If the cap is hit while tool calls are still pending, the loop halts and returns the latest response.

## Errors

| What happens                                      | What you get back                                                                                                                                         |
| ------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Your body raises (e.g. `KeyError`, `TypeError`)   | The error message is fed to the LLM as the `tool_result`. The model usually recovers: apologises, retries with different args, or asks you for more info. |
| Your body times out or the sandbox is unreachable | Hard failure. Propagates as a `CodeExecutionError` from `pl.run`. No partial result.                                                                      |
| Your body returns a non-JSON-serializable value   | Hard failure with a clear error message. Fix the return value and retry.                                                                                  |

This split keeps the LLM resilient to expected errors (bad args, missing keys) but lets infrastructure failures surface so you can see and fix them.

## Environment Variables

Execution bodies often need secrets — API keys, tokens, connection strings — that shouldn't be hard-coded in the editor (editor content is versioned and visible to anyone with workspace access).

Use **environment variables** to inject secrets at execution time. There are two scopes:

* **Workspace env vars** — shared across all tools in the workspace. Set them once in **Settings → Environment Variables**.
* **Tool env vars** — scoped to a specific tool. Set them in the tool's **Configure** dialog. Tool-level values override workspace-level values when the same key exists in both.

Inside an execution body, read them with the standard runtime API:

<CodeGroup>
  ```python Python theme={null}
  import os

  api_key = os.environ["MY_API_KEY"]
  ```

  ```javascript JavaScript theme={null}
  const apiKey = process.env.MY_API_KEY;
  ```
</CodeGroup>

Values are encrypted at rest and only the last 4 characters are shown in the UI. Variables are injected into the sandbox process at runtime and never appear in the versioned source code.

You can also create and list env vars through the [public API](/reference/env-vars-workspace-list).

## Sandbox & Security

Execution bodies run in an isolated sandbox per request. Each invocation gets a fresh process: no shared state between calls, no access to your application's environment or filesystem. Standard libraries plus the common Python/JS ecosystem are available.

<Warning>
  The execution body is your code. Treat it like production code. Never hard-code secrets directly in the body — use [environment variables](#environment-variables) instead.
</Warning>

## Enabling Auto Execution

1. Open a tool in the Tool Registry
2. Click the **Execution** toggle at the top of the editor
3. Pick the language and write the body
4. **Save Version**

The next time a prompt referencing this tool runs through PromptLayer, the body will fire automatically. No code changes in your application required.

<Frame>
  <img alt="Tool registry editor with Schema and Execution panels enabled" />
</Frame>

## When To Use It

**Good fits:**

* Pure data transformations (`format_date`, `parse_address`)
* Calls to public APIs (`fetch_weather`, `get_stock_price`)
* Computations the LLM is bad at (math, sorting, dedup)
* Glue logic you don't want repeated across services

Anything else (private database access, auth-bound logic, long-running operations, side effects like sending real emails or charging payments) is better handled in your own application. To keep a tool schema-only, uncheck the **Execution** toggle in the editor and save a new version. The LLM emits a `tool_call` and your application handles it like a normal function tool.


# Overview
Source: https://docs.promptlayer.com/features/tool-registry/overview


The Tool Registry is a centralized store for tool definitions (function schemas) that can be reused across prompts. Instead of copying the same tool definition into every prompt, define it once and reference it everywhere.

Tools can also carry an **execution body** that PromptLayer runs in a sandbox between LLM turns, so prompts that use them complete end-to-end without your application writing a tool-call loop. See [Auto Tool Execution](/features/tool-registry/auto-execution) for details.

## The Problem

Without a registry, tool definitions get copy-pasted across prompts. A `get_weather` tool might exist in 20 different prompts, each with slightly different parameter names, descriptions, or schemas. When you need to update it, you have to find and edit every copy. Some get missed, and now your prompts behave inconsistently.

## How the Tool Registry Solves This

**Define once, use everywhere.** Create a tool in the registry, then add it to any prompt as a reference. Your prompts store a lightweight pointer — the actual definition is resolved at runtime from the registry.

```
Prompt A  →  references "get_weather" (production label)  →  resolved at API call time
Prompt B  →  references "get_weather" (production label)  →  same definition, always in sync
Prompt C  →  references "get_weather" (staging label)     →  testing a new version
```

When you update the tool, every prompt using it gets the update automatically. No code changes, no redeployments.

## Tool Schema as a Contract

A tool's function schema defines what the LLM can call — the function name, its parameters, and their types. Changing a schema changes LLM behavior. The Tool Registry treats schemas as versioned contracts:

* Every edit creates a new **immutable version** with a commit message
* You can **diff** any two versions side-by-side to see exactly what changed
* The full version history gives you an **audit trail** of every contract change
* **Release labels** (like "production") let you control which version is live

This means you always know what the LLM was calling at any point in time, who changed it, and why.

## Know What will Break

The **References** tab on each tool shows exactly which prompts use it. Before changing a parameter or deleting a tool, you can see the blast radius:

* Which prompts reference this tool
* Which labels they use
* How many are affected

When you delete a tool, a warning dialog lists every affected prompt so you can make an informed decision.

<Frame>
  <img alt="Tool Registry overview page" />
</Frame>

## Creating a Tool

Navigate to the registry home page and click **+ New** → **Tool**. Each tool holds a single function definition in OpenAI function-calling format:

```json theme={null}
{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Get current weather for a location",
    "parameters": {
      "type": "object",
      "properties": {
        "location": {
          "type": "string",
          "description": "City name"
        }
      },
      "required": ["location"]
    }
  }
}
```

Use the **Interactive Editor** for a form-based experience, or the **JSON Editor** for direct schema editing.


# Save Inline to Registry
Source: https://docs.promptlayer.com/features/tool-registry/save-inline-to-registry


Already have a tool defined inline in a prompt? You don't need to recreate it in the registry. Save it directly with one click.

## How It Works

1. Open a prompt in the **Playground** that has an inline function tool
2. Click **Save as...** on the tool in the functions list
3. Choose a **name** and **folder** location
4. Click **Save**

The inline tool definition is saved to the registry, and the inline reference in your prompt is replaced with a registry reference automatically.

<Frame>
  <img alt="Save inline tool to registry" />
</Frame>

## When to Use This

**Prototyping → Production workflow:**

You're iterating on a tool schema in the Playground — tweaking parameters, testing with the LLM, getting it right. Once it works, save it to the registry so other prompts can reuse it.

**Migrating existing prompts:**

You have 10 prompts that each define the same `get_weather` tool inline. Open one, save its tool to the registry, then update the other 9 to use the registry reference instead of their inline copies. Now you have one source of truth.

## What Happens After Saving

* A new tool is created in the registry with version 1
* The tool definition is frozen as the first version
* Your prompt's inline tool is replaced with a registry reference
* The prompt now resolves the tool from the registry at runtime

<Tip>
  Only inline function tools show the "Save as..." button. Built-in tools (like web search) and tool variables cannot be saved to the registry.
</Tip>


# Using Tools in Prompts
Source: https://docs.promptlayer.com/features/tool-registry/using-in-prompts


Registry tools can be added to any prompt in the Playground. Instead of defining tool schemas inline, you reference them from the registry — keeping prompts clean and tool definitions centralized.

## Adding a Registry Tool

1. Open a prompt in the **Playground**
2. Click **+ Add Tool** → **From Registry**
3. Browse or search for the tool you want
4. Select a **version** or **release label** (e.g., "production")
5. The tool is added as a reference

The tool appears in your functions list with the tool name and label badge (if you selected one).

<Frame>
  <img alt="Registry tool reference in prompt" />
</Frame>

<Tip>
  The LLM never sees registry references. At render time, they're transparently replaced with the resolved function definitions.
</Tip>

## What Gets Stored

Your prompt template doesn't store the full tool definition. It stores a lightweight reference:

```json theme={null}
{
  "type": "registry",
  "tool_registry_id": 5,
  "label": "production"
}
```

When the prompt is fetched via the API, this reference is replaced with the actual function definition. This means:

* Updating the tool in the registry updates **all prompts** using it
* Moving a label to a new version takes effect **immediately**
* No prompt edits or redeployments needed

## Running in the Playground

Registry tools work seamlessly in the Playground. When you click **Run**, the tool definitions are resolved from the registry and sent to the LLM. The LLM sees standard function definitions — it has no concept of the registry.

## Auto Execution

If the referenced tool has an **execution body** attached, PromptLayer runs it in a sandbox between LLM turns. The LLM emits a `tool_call`, the body runs, the result is fed back, and the LLM is called again. Your application receives the final answer without writing a tool-call loop. See [Auto Tool Execution](/features/tool-registry/auto-execution) for the full behaviour.

If the tool is schema-only (no body), nothing changes. The `tool_call` is returned to your application like a normal function tool.

## Deleting a Tool

When you delete a tool from the registry:

* A warning dialog shows all affected prompts
* The tool reference is removed from all prompt templates
* The tool is soft-deleted

<Warning>
  Deleting a tool removes it from all prompts that reference it. Check the References tab first to understand the impact.
</Warning>

## API

See the [API Reference](/reference/tool-registry-list) for the full list of endpoints, or check the [Overview](/features/tool-registry/overview) page for common curl examples.


# xAI (Grok)
Source: https://docs.promptlayer.com/features/xai-integration

Set up xAI Grok models as a custom provider in PromptLayer.

[xAI](https://x.ai/) provides the Grok family of large language models that can be integrated with PromptLayer through custom providers. Grok models offer advanced reasoning capabilities and real-time knowledge through X (Twitter) integration.

## Setting Up xAI as a Custom Provider

To use Grok models in PromptLayer:

1. Navigate to **Settings → Custom Providers and Models** in your PromptLayer dashboard
2. Click **Create Custom Provider**
3. Configure the provider with the following details:
   * **Name**: xAI (or your preferred name)
   * **Client**: OpenAI
   * **Base URL**: `https://api.x.ai/v1`
   * **API Key**: Your xAI API key (get one at [x.ai](https://x.ai))

<Note>
  xAI uses OpenAI-compatible endpoints, which is why we select OpenAI as the client type.
</Note>

## Creating Custom Models (Recommended)

For easier model selection in the Playground and Prompt Registry, you can create custom models:

1. In **Settings → Custom Providers and Models**, find your xAI provider in the list
2. Click on the xAI row to expand it
3. Click **Create Custom Model**
4. Configure each model:
   * **Provider**: Select the xAI provider you created
   * **Model Name**: Enter the Grok model identifier (e.g., `grok-4-fast-reasoning`, `grok-3`)
   * **Display Name**: A friendly name like "Grok 2" or "Grok Beta"
   * **Model Type**: Chat
5. Repeat for each model you want to use

This allows you to select Grok models directly from the dropdown instead of typing them manually.

## Available Models

xAI regularly updates their model offerings. Example models include:

* **`grok-4-fast-reasoning`**: Latest Grok 4 with fast reasoning (2M context)
* **`grok-4-fast-non-reasoning`**: Grok 4 optimized for speed (2M context)
* **`grok-code-fast-1`**: Specialized for code generation tasks
* **`grok-3`**: Grok 3 with advanced reasoning capabilities
* **`grok-2-vision-1212`**: Grok 2 with vision capabilities

For the complete and up-to-date list of available models and their capabilities, visit [xAI's official model documentation](https://docs.x.ai/docs/models).

## Using Grok in PromptLayer

### In the Playground

After setup, you can use Grok models in the PromptLayer Playground:

1. Open the Playground
2. Select your xAI provider from the provider dropdown
3. Choose your desired Grok model
4. Start querying with your prompts

### In the Prompt Registry

Grok models work seamlessly with PromptLayer's Prompt Registry:

* Select Grok models when creating or editing prompt templates
* Use templates with Grok models in evaluations
* Track and analyze xAI API usage alongside other providers

### Parameter Compatibility

<Warning>
  Some OpenAI parameters may not be compatible with Grok models. If you encounter errors:

  * Remove unsupported parameters like `seed`
  * Check [xAI's documentation](https://docs.x.ai/docs/models) for supported parameters
  * Use the Playground to test parameter compatibility before deploying
</Warning>

## SDK Usage

Once you've set up your xAI custom provider and created a prompt template in the dashboard, you can run it programmatically with the PromptLayer SDK:

```python theme={null}
from promptlayer import PromptLayer

promptlayer = PromptLayer(api_key="pl_****")

# Run a prompt template that uses your xAI custom provider
# (Your template should be configured to use a Grok model like grok-4-fast-reasoning)
response = promptlayer.run(
    prompt_name="your-grok-prompt",
    input_variables={"topic": "quantum computing"}
)

# Access the response
print(response["raw_response"].choices[0].message.content)

# The request is automatically logged with request_id
print(f"Request ID: {response['request_id']}")
```

<Info>
  Using [`promptlayer.run()`](/sdks/python#using-the-run-method-recommended) ensures your requests are properly logged to PromptLayer and leverages your prompt templates from the Prompt Registry. This is the recommended approach for production use, and it handles any parameter compatibility differences automatically based on your template configuration.
</Info>

## Related Documentation

* [Custom Providers](/features/custom-providers)
* [Supported Providers](/features/supported-providers)
* [xAI Official Documentation](https://docs.x.ai/docs/models)


# Backtest Prompt Changes
Source: https://docs.promptlayer.com/onboarding-guides/backtesting-prompt-changes

Test a new prompt version against historical request data before releasing it.

<Warning>
  Legacy Evaluations, Reports, and Datasets are deprecated for new workflows. Use [Tables](/features/tables/overview) for new evaluation, dataset, report, backtesting, and batch workflows. See [Migrate from Evaluations and Datasets](/features/tables/migrate-from-evaluations-and-datasets).
</Warning>

Backtesting lets you run a new prompt version against real historical inputs. Use it when you want to understand how a prompt change would have affected production or staging traffic.

## Create a historical dataset

Go to **Datasets** and click **Add from Request History**. This opens a request log browser where you can filter and select requests.

<Frame>
  <img alt="Adding from request history" />
</Frame>

Filter by prompt name, date range, metadata, score, tag, or request content. Select the requests you want and click **Add Requests**.

The dataset captures the real inputs users sent, along with the outputs your current prompt produced.

## Run a backtest

Create an evaluation that runs your new prompt version against the historical dataset.

Add columns for:

* **New prompt output**: The response from your updated prompt version
* **Comparison**: An equality comparison, semantic similarity check, LLM-as-judge score, or human review column

<Frame>
  <img alt="Backtest results" />
</Frame>

Review the differences before assigning a production release label to the new version.

## Automate backtests

Attach the backtest evaluation to your prompt so it runs when you save a new version. This creates a regression check before the change reaches production.

Learn more in [Continuous Integration](/features/evaluations/continuous-integration).

## Next steps

* [Create datasets from history](/features/evaluations/datasets-create-from-history)
* [Evaluation pipelines](/features/evaluations/building-pipelines)
* [Release labels](/features/prompt-registry/release-labels)


# Run Batch Jobs
Source: https://docs.promptlayer.com/onboarding-guides/batch-runs

Use evaluations as spreadsheet-style batch runs for labeling, research, generation, and enrichment.

<Warning>
  Legacy Evaluations, Reports, and Datasets are deprecated for new workflows. Use [Tables](/features/tables/overview) for new evaluation, dataset, report, backtesting, and batch workflows. See [Migrate from Evaluations and Datasets](/features/tables/migrate-from-evaluations-and-datasets).
</Warning>

Evaluations are not only for testing prompts. You can also use them as batch jobs where each row is an input and each column is an AI-powered computation.

## Common use cases

* **Data labeling**: Run a prompt over production examples to create labeled datasets
* **Research**: Process a list of companies, people, or documents
* **Content generation**: Generate summaries, replies, emails, or descriptions in bulk
* **Data enrichment**: Add company, location, category, or other attributes to a list

## Create the dataset

Upload a CSV, create rows manually, or build a dataset from request history. Each row should contain the fields your prompt needs as input variables.

Learn more in [Datasets](/features/evaluations/datasets-overview).

## Add prompt columns

Create an evaluation and add one or more **Prompt Template** columns. Map dataset columns to the prompt input variables.

You can chain columns together when later prompts depend on earlier outputs.

## Run and export

Run the full batch, review the results, and export the completed dataset when you are done.

You do not need permanent evaluation infrastructure for this workflow. Create a dataset, add prompt columns, run the batch, export the results, and move on.

## Next steps

* [Evaluation pipelines](/features/evaluations/building-pipelines)
* [Dataset from CSV](/features/evaluations/datasets-create-from-file)
* [Dataset from request history](/features/evaluations/datasets-create-from-history)


# Compare Models
Source: https://docs.promptlayer.com/onboarding-guides/compare-models

Compare prompt outputs across providers and models with an evaluation pipeline.

<Warning>
  Legacy Evaluations, Reports, and Datasets are deprecated for new workflows. Use [Tables](/features/tables/overview) for new evaluation, dataset, report, backtesting, and batch workflows. See [Migrate from Evaluations and Datasets](/features/tables/migrate-from-evaluations-and-datasets).
</Warning>

Use model comparison when you want to test the same prompt across GPT, Claude, Gemini, or another provider before choosing a production model.

## Before you start

You need:

* A saved prompt template
* A dataset with the input variables your prompt expects
* Provider API keys configured for the models you want to compare

## Create a comparison evaluation

Create a new evaluation and select your dataset.

Add multiple **Prompt Template** columns. Configure each column with the same prompt template, then set a different provider or model override for each column.

<Frame>
  <img alt="Comparing models" />
</Frame>

Run the evaluation. Each row shows the prompt output from every model side by side.

## Score the outputs

Add an **LLM-as-judge**, human grading, equality comparison, or code evaluator column to score the model outputs against your criteria.

For example, you can score whether each output:

* Follows the requested format
* Answers the user correctly
* Avoids hallucinated details
* Meets latency or cost expectations for the use case

Use the results to choose the best price, latency, and quality balance.

## Next steps

* [Evaluation pipelines](/features/evaluations/building-pipelines)
* [Evaluation types](/features/evaluations/eval-types)
* [Supported providers](/features/supported-providers)
* [Custom providers](/features/custom-providers)


# Choose a Deployment Strategy
Source: https://docs.promptlayer.com/onboarding-guides/deployment-strategies

Choose from four integration patterns. Direct SDK calls, webhook caching, GitOps with CI/CD, or managed workflows.

PromptLayer fits into your stack at four levels of sophistication:

1. **`promptlayer_client.run`** – zero-setup SDK sugar
2. **Webhook-driven caching** – maintain local cache of prompt templates
3. **GitOps with Webhooks** – keep Git as your source of truth with bi-directional sync
4. **Managed Workflows** – let PromptLayer orchestrate everything server-side

<Frame>
  <img alt="PromptLayer Integration Patterns" />
</Frame>

***

# Use `promptlayer_client.run` (quickest path)

When every millisecond of developer time counts, call `promptlayer_client.run()` directly from your application code.

<CodeGroup>
  ```python Python theme={null}
  from promptlayer import PromptLayer
  pl_client = PromptLayer(api_key="...")

  response = pl_client.run(
      prompt_name="order-summary",
      input_variables={"cart": cart_items},
      prompt_release_label="prod"
  )
  ```

  ```js JavaScript theme={null}
  import { PromptLayer } from "promptlayer";
  const plClient = new PromptLayer({ apiKey: "..." });

  const response = await plClient.run({
    promptName: "order-summary",
    inputVariables: { cart: cartItems },
    promptReleaseLabel: "prod"
  });
  ```
</CodeGroup>

**Under the hood**

1. Fetch latest prompt – We pull the template (by version or release label) from PromptLayer.
2. Execute – The SDK sends the populated prompt to OpenAI, Anthropic, Gemini, etc.
3. Log – The raw request/response pair is saved back to PromptLayer.---

# Cache prompts with Webhooks

Eliminate the extra round‑trip by **replicating prompts into your own cache or database**.

PromptLayer keeps that cache fresh through webhook events—no polling required.

```mermaid theme={null}
sequenceDiagram
  autonumber
  participant PL as PromptLayer Server
  participant APP as Your Application
  participant DB as Your Cache / DB
  participant LLM as Model Provider

  PL->>APP: "prompt.updated" webhook
  APP->>DB: invalidate + store latest prompt
  user->>APP: request needing AI
  APP->>DB: fetch prompt
  APP->>LLM: run prompt
  APP-->>PL: async track.log (optional queue)
```

### Step‑by‑step

1. **Subscribe to webhooks in the UI**

Read more here about webhooks [here](/features/prompt-registry/webhooks).

2. **Maintain a local cache**

```python theme={null}
# pseudocode
def handle_pl_webhook(event):
    prompt = event["data"]
    db.prompts.upsert(prompt["prompt_template_name"], prompt)
```

3. **Serve traffic**

```python theme={null}
prompt = db.prompts.get("order-summary")
llm_response = openai.chat.completions.create(...)
queue.enqueue(track_to_promptlayer, llm_response)
```

> 💡 **Tip:** Most teams push the track\_to\_promptlayer onto a Redis or SQS queue so as to not block on the logging of a request.

Read the full guide: **[PromptLayer Webhooks ↗](/features/prompt-registry/webhooks)**

***

# GitOps with Webhooks

For teams that want **Git as the source of truth** for prompts, webhooks enable a full bi-directional sync between PromptLayer and your repository. This is the recommended pattern for teams with existing CI/CD pipelines (GitHub Actions, GitLab CI, etc.) that want prompt changes to go through the same review and deploy process as code changes.

### Change starts on PromptLayer

When someone edits a prompt or approves a release label in PromptLayer, a webhook fires to your system. Your webhook handler creates a merge request (or pull request) in your repo with the updated prompt. From there, your normal CI/CD pipeline takes over — code review, automated evals, deploy.

Key webhook events for this flow:

* `prompt_template_version_created` – a new version of a prompt was saved
* `prompt_template_label_moved` – a release label (e.g. `prod`) was moved to a new version
* `prompt_template_label_change_approved` – a protected release label change was approved

See the full list of events in the [Events docs](/features/prompt-registry/webhook-events).

```mermaid theme={null}
sequenceDiagram
  autonumber
  participant PL as PromptLayer
  participant WH as Your Webhook Handler
  participant GIT as GitLab / GitHub
  participant CI as CI/CD Pipeline

  PL->>WH: webhook (e.g. label approved)
  WH->>GIT: create merge request with updated prompt
  GIT->>CI: MR triggers CI pipeline
  CI->>CI: run evals / tests
  CI-->>PL: (optional) push eval results back via API
```

### Change starts in code

When an engineer updates a prompt directly in the repo, your CI/CD pipeline can publish it back to PromptLayer using the REST API or SDK. This keeps PromptLayer in sync without any manual steps.

<CodeGroup>
  ```python Python theme={null}
  from promptlayer import PromptLayer
  pl_client = PromptLayer(api_key="...")

  # In your CI/CD pipeline after prompt file changes
  pl_client.templates.publish(
      prompt_name="order-summary",
      prompt_template="Your updated prompt template here",
  )
  ```

  ```bash GitLab CI / GitHub Actions theme={null}
  # Example CI step — publish prompt to PromptLayer
  curl -X POST https://api.promptlayer.com/prompt-templates \
    -H "X-API-KEY: $PROMPTLAYER_API_KEY" \
    -H "Content-Type: application/json" \
    -d @prompts/order-summary.json
  ```
</CodeGroup>

### Closing the loop with eval results

If your CI/CD pipeline runs evaluations as part of the deploy process, you can push those results back to PromptLayer so everything is visible in one place. This means your team doesn't lose observability just because the deploy happened outside of PromptLayer.

> 💡 **Tip** – Combine this with [protected release labels](/features/prompt-registry/release-labels) and approval workflows so that a prompt change in PromptLayer requires approval before the webhook fires and the MR is created.

***

# Run fully-managed Workflows

For complex pipelines requiring orchestration, use PromptLayer's managed workflow infrastructure.

### How it works

1. Define multi-step workflows in PromptLayer's Workflow Builder
2. Trigger workflow execution via API
3. Monitor execution on PromptLayer servers
4. Receive results via webhook or polling

PromptLayer handles all orchestration, parallelization, and model provider communication.

### Implementation

<CodeGroup>
  ```python Python theme={null}
  from promptlayer import PromptLayer
  promptlayer_client = PromptLayer(api_key="…")

  execution = promptlayer_client.run_workflow(  # SDK method
      workflow_name="customer_support_workflow",
      workflow_label_name="prod",
      input_variables={"ticket_id": 123}
  )
  ```

  ```ts JavaScript theme={null}
  import { PromptLayer } from "promptlayer";
  const promptlayer_client = new PromptLayer({ apiKey: "…" });

  const execution = await promptlayer_client.runWorkflow({
    workflowName: "customer_support_workflow",
    workflowLabelName: "prod",
    inputVariables: { ticket_id: 123 }
  });

  ```
</CodeGroup>

Because execution is server-side, you inherit centralized tracing, cost analytics, and secure sandboxed tool-nodes without extra ops.

Learn more: **[Workflows documentation ↗](/why-promptlayer/workflows)**

***

## Which pattern should I pick?

| Requirement                 | `promptlayer_client.run` | Webhook Cache | GitOps | Managed Workflow |
| --------------------------- | :----------------------: | :-----------: | :----: | :--------------: |
| ⏱️ *extreme latency reqs*   |             ❌            |       ✅       |    ➖   |         ✅        |
| 🛠 *Single LLM call*        |             ✅            |       ✅       |    ✅   |         ➖        |
| 🌩 *Complex plans / tools*  |             ➖            |       ➖       |    ➖   |         ✅        |
| 👥 *Non-eng prompt editors* |             ✅            |       ✅       |    ✅   |         ✅        |
| 🧰 *Zero ops overhead*      |             ✅            |       ➖       |    ➖   |         ✅        |
| 🔀 *Git as source of truth* |             ➖            |       ➖       |    ✅   |         ➖        |
| 🔁 *Bi-directional sync*    |             ➖            |       ➖       |    ✅   |         ➖        |

***

## Further reading 📚

* **Quickstart** – [Your first prompt](/quickstart)
* **Webhooks** – [Events & signature verification](/features/prompt-registry/webhooks)
* **Workflows** – [Concepts & versioning](/why-promptlayer/workflows)
* **CI for prompts** – [Continuous Integration guide](/features/evaluations/continuous-integration)

***

> ✉️ **Need a hand?** Ping us in Discord or email [hello@promptlayer.com](mailto:hello@promptlayer.com)—happy to chat architecture!


# PromptLayer Documentation
Source: https://docs.promptlayer.com/overview

Version, test, and monitor every prompt and workflow.

<div>
  <section>
    <div>
      <div>
        <span />

        Observability and evaluations for AI teams
      </div>

      <h1>
        See what happened. Prove what improved.
      </h1>

      <p>
        Connect observability first to trace production requests and understand quality,
        cost, and latency. Then use Tables to monitor results and run evaluations, with
        Prompt Registry keeping approved versions clear for engineers and reviewers.
      </p>

      <div>
        <a href="/features/observability">
          Connect observability
          <span>→</span>
        </a>

        <a href="/features/tables/overview">
          Explore Tables
        </a>
      </div>

      <div aria-label="Popular documentation paths">
        <span>Explore</span>
        <a href="/features/observability">Observability</a>
        <a href="/features/tables/overview">Tables</a>
        <a href="/features/prompt-registry/new-overview">Prompt Registry</a>
        <a href="/reference/introduction">API</a>
      </div>
    </div>

    <div aria-label="PromptLayer workflow preview">
      <div>
        <span />

        <span />

        <span />
      </div>

      <div>
        <div>
          <div>
            <span>Quality loop</span>
            <h2>Trace, evaluate, release</h2>
          </div>

          <span>Passing</span>
        </div>

        <div>
          <div>
            <span>Evaluation</span>
            <strong>support-agent run</strong>
            <p>Compare changes against real examples before they reach users.</p>

            <div>
              <span />

              <span />

              <span />
            </div>
          </div>

          <div>
            <span>Eval score</span>
            <strong>94.8%</strong>

            <div>
              <span />
            </div>
          </div>

          <div>
            <span>Latency</span>
            <strong>812ms</strong>

            <div>
              <span />

              <span />

              <span />

              <span />

              <span />
            </div>
          </div>

          <div>
            <span>Loop</span>

            <div>
              <span>Connect observability</span>
              <span>Monitor with Tables</span>
              <span>Release from Registry</span>
            </div>
          </div>
        </div>
      </div>
    </div>
  </section>

  <section>
    <div>
      <div>
        <p>Core surfaces</p>

        <h2>
          A simple loop from signal to release.
        </h2>
      </div>

      <p>
        Move from what happened to what should ship.
      </p>
    </div>

    <div>
      <a href="/features/observability">
        <span>→</span>
        <span>01</span>

        <div>
          <span>Observability</span>
          <h3>Start with the production record.</h3>

          <p>
            Capture requests, responses, metadata, cost, latency, and feedback in one timeline.
          </p>
        </div>

        <div>
          <span>Traces</span>
          <span>Costs</span>
          <span>Latency</span>
        </div>
      </a>

      <a href="/features/tables/overview">
        <span>→</span>
        <span>02</span>

        <div>
          <span>Tables</span>
          <h3>Turn examples into decisions.</h3>

          <p>
            Organize datasets, score experiments, and compare versions against real behavior.
          </p>
        </div>

        <div>
          <span>Datasets</span>
          <span>Evals</span>
          <span>Scores</span>
        </div>
      </a>

      <a href="/features/prompt-registry/new-overview">
        <span>→</span>
        <span>03</span>

        <div>
          <span>Prompt Registry</span>
          <h3>Ship approved prompt versions.</h3>

          <p>
            Manage versions, labels, and release state so engineers and reviewers stay aligned.
          </p>
        </div>

        <div>
          <span>Versions</span>
          <span>Labels</span>
          <span>Reviews</span>
        </div>
      </a>

      <a href="/why-promptlayer/workflows">
        <span>→</span>
        <span>04</span>

        <div>
          <span>Workflows</span>
          <h3>Connect the loop end to end.</h3>

          <p>
            Trace multi-step systems and bring evaluation back into the release process.
          </p>
        </div>

        <div>
          <span>Logic</span>
          <span>Runs</span>
          <span>Release</span>
        </div>
      </a>
    </div>
  </section>

  <section>
    <div>
      <div>
        <p>Reference shortcuts</p>

        <h2>
          Go deeper when you need it.
        </h2>
      </div>

      <p>
        Focused docs for implementation details, release controls, integrations, and updates.
      </p>
    </div>

    <div>
      <a href="/sdks/python">
        <span>SDKs</span>
        <strong>Client libraries and package guides</strong>
      </a>

      <a href="/features/prompt-registry/release-labels#release-labels">
        <span>Release Labels</span>
        <strong>Promote versions without code changes</strong>
      </a>

      <a href="/why-promptlayer/ab-releases">
        <span>AB Testing</span>
        <strong>Route production traffic by performance</strong>
      </a>

      <a href="/features/prompt-registry/webhook-events">
        <span>Webhooks</span>
        <strong>React to PromptLayer events</strong>
      </a>

      <a href="/features/opentelemetry">
        <span>OpenTelemetry</span>
        <strong>Connect tracing pipelines and providers</strong>
      </a>

      <a href="/self-hosted">
        <span>Self-hosting</span>
        <strong>Deploy PromptLayer in your environment</strong>
      </a>

      <a href="/agents/overview">
        <span>MCP</span>
        <strong>Connect assistants and tools</strong>
      </a>

      <a href="/changelog">
        <span>Changelog</span>
        <strong>Follow product updates and launches</strong>
      </a>
    </div>
  </section>
</div>


# Quickstart
Source: https://docs.promptlayer.com/quickstart

Create, run, and evaluate your first prompt.

PromptLayer helps you manage prompts outside your code and evaluate changes before they reach production. In this quickstart, you will create a prompt, run it in the dashboard and from code, then build an evaluation for it.

## Prerequisites

Before you start, make sure you have a [PromptLayer account](https://dashboard.promptlayer.com/create-account).

<Tip>
  **Using a coding agent?**

  Copy the following prompt to add the PromptLayer **skill** and **MCP servers** for better results when working with PromptLayer.
</Tip>

<Prompt description="Install the PromptLayer skill and MCP servers." icon="sparkles">
  Install the PromptLayer skill for context on project structure, SDKs, prompts, evaluations, observability, and PromptLayer best practices:

  npx skills add [https://docs.promptlayer.com](https://docs.promptlayer.com)

  Add the PromptLayer Docs MCP server for documentation search:

  [https://docs.promptlayer.com/mcp](https://docs.promptlayer.com/mcp)

  Add the PromptLayer MCP server for PromptLayer workspace access and content management:

  [https://mcp.promptlayer.com/mcp](https://mcp.promptlayer.com/mcp)
</Prompt>

## Create a prompt

From the PromptLayer dashboard, click **New** -> **Prompt**.

<Frame>
  <img alt="Creating a new prompt" />
</Frame>

Name the prompt `cake-recipe`, then replace the default messages with the following content.

```markdown System theme={null}
You are a Michelin-star pastry chef. Generate cake recipes with:

**Overview**: One paragraph about the cake
**Ingredients**: Bullet points with metric and US measurements
**Instructions**: Numbered steps with temperatures and timing
**Variations**: Optional frostings or substitutions
```

```markdown User theme={null}
Create a recipe for {{cake_type}} that serves {{serving_size}} people.
```

`{{cake_type}}` and `{{serving_size}}` are [input variables](/features/prompt-registry/template-variables). PromptLayer fills them in each time you run the prompt.

<Frame>
  <img alt="Input variables in prompt" />
</Frame>

### Write prompts with AI

Click the magic wand icon to open the AI prompt writer. It can help rewrite or improve your prompts based on your instructions. Try asking it to add allergy warnings to the recipe generator.

<Frame>
  <img alt="AI prompt writer" />
</Frame>

### Run your prompt

To test the prompt in the playground:

1. Click **Define input variables** in the right panel.
2. Set `cake_type` to `Chocolate Cake`.
3. Set `serving_size` to `8`.
4. Click **Run**.

<Frame>
  <img alt="Running a prompt in the playground" />
</Frame>

When the response looks right, click **Save Template**.

### View logs

Open **Logs** in PromptLayer and search for `cake-recipe`. The log should show the input variables, generated output, model, and latency.

<Frame>
  <img alt="Log from prompt run" />
</Frame>

### Run from code

To run the prompt from code, make sure you have:

* A PromptLayer API key from your workspace
* A provider API key for the model you plan to use, such as `OPENAI_API_KEY`

Set your API keys as environment variables before running the example.

```bash theme={null}
export PROMPTLAYER_API_KEY="pl_your_promptlayer_api_key"
export OPENAI_API_KEY="sk_your_provider_api_key"
```

Install the PromptLayer SDK for your runtime.

<CodeGroup>
  ```bash Python theme={null}
  pip install promptlayer
  ```

  ```bash JavaScript theme={null}
  npm install promptlayer
  ```
</CodeGroup>

Run the saved prompt with the same input variables.

<CodeGroup>
  ```python Python theme={null}
  from promptlayer import PromptLayer

  client = PromptLayer()

  response = client.run(
      prompt_name="cake-recipe",
      input_variables={
          "cake_type": "Chocolate Cake",
          "serving_size": "8"
      },
      tags=["quickstart"],
      metadata={"source": "quickstart"}
  )

  print(response["prompt_blueprint"]["prompt_template"]["messages"][-1]["content"])
  ```

  ```javascript JavaScript theme={null}
  import { PromptLayer } from "promptlayer";

  const client = new PromptLayer({
    apiKey: process.env.PROMPTLAYER_API_KEY,
  });

  const response = await client.run({
    promptName: "cake-recipe",
    inputVariables: {
      cake_type: "Chocolate Cake",
      serving_size: "8",
    },
    tags: ["quickstart"],
    metadata: { source: "quickstart" },
  });

  console.log(response.prompt_blueprint.prompt_template.messages.slice(-1)[0].content);
  ```
</CodeGroup>

## Evaluate a prompt

Before deploying a prompt, you want to know if it is actually good. PromptLayer lets you build evaluation pipelines that score your prompt's outputs automatically.

### Create a dataset

Evaluations run against a dataset: a collection of test cases with inputs and expected outputs. Create one for the cake recipe prompt.

Click **New** -> **Dataset** and name it `cake-recipes-test`.

<Frame>
  <img alt="Creating a dataset" />
</Frame>

Add a few test cases. Each row needs the input variables your prompt expects, `cake_type` and `serving_size`, plus an optional expected output to compare against.

<Accordion title="Sample CSV for cake recipe dataset">
  ```csv theme={null}
  cake_type,serving_size,expected_output
  Chocolate Cake,8,"Should include cocoa or chocolate, have clear measurements"
  Vanilla Birthday Cake,12,"Should be festive, mention frosting options"
  Gluten-Free Lemon Cake,6,"Must not include wheat flour, should use alternatives"
  Vegan Carrot Cake,10,"No eggs or dairy, should suggest substitutes"
  ```

  <a href="/onboarding-guides/example-dataset/cake-recipes.csv">Download this CSV</a> or add rows manually in the UI.
</Accordion>

Learn more about [datasets](/features/evaluations/datasets-overview).

### Create an eval pipeline

Now build a pipeline that runs your prompt against each test case and scores the results.

Click **New** -> **Evaluation** and select your dataset.

First, add a **Prompt Template** column. This runs your prompt against each row in the dataset, using the column values as input variables. The output appears in a new column.

Next, add an **LLM-as-judge** scoring column. This uses AI to score each output against criteria you define. For the recipe prompt, check whether:

* The recipe includes the required sections: Overview, Ingredients, Instructions, and Variations
* Measurements are provided in both metric and US units
* The serving size is correct

<Frame>
  <img alt="LLM as judge" />
</Frame>

You can also add an **Equality Comparison** column to compare the prompt output against the `expected_output` column in your dataset.

<Frame>
  <img alt="Eval pipeline setup" />
</Frame>

Run the evaluation to see scores across all test cases. Learn more about [evaluations](/features/evaluations/overview).

<Accordion title="Other evaluation types">
  Beyond LLM-as-judge, PromptLayer supports:

  * **Human grading**: Collect scores from domain experts
  * **Equality Comparison**: Compare outputs to expected results
  * **Cosine similarity**: Measure semantic similarity between outputs
  * **Code evaluators**: Write custom Python scoring functions

  Workflow nodes work the same way in eval pipelines.
</Accordion>

### CI/CD evaluations

Attach an evaluation pipeline to run automatically every time you save a new prompt version, similar to GitHub Actions running tests on each commit.

When saving a prompt, the commit dialog lets you select an evaluation pipeline. Choose one and click **Next**.

From then on, each new version you create will run through the eval and show its score in the version history. This makes it easier to spot regressions before they reach production.

<Frame>
  <img alt="Eval scores by version" />
</Frame>

Learn more about [continuous integration](/features/evaluations/continuous-integration).

## Learn more

* [Release Labels](/features/prompt-registry/release-labels) - Deploy the right prompt version without code changes
* [Evaluations](/features/evaluations/overview) - Learn how evaluation pipelines work
* [Deployment strategies](/onboarding-guides/deployment-strategies) - Choose a production integration pattern


# Add Column to Evaluation Pipeline
Source: https://docs.promptlayer.com/reference/add-report-columns

POST /report-columns

<Warning>
  Legacy Dataset, Evaluation, and Report endpoints are deprecated for new workflows. Use the [Tables API](/reference/tables-create) for new dataset import, evaluation, scoring, recalculation, and reporting workflows.
</Warning>

This endpoint adds evaluation steps (columns) to an existing evaluation pipeline. Columns execute sequentially from left to right, with each column able to reference outputs from previous columns.

## Important Notes

* **Single Column Per Request**: This endpoint only allows adding one column at a time. To add multiple columns, make separate API calls for each.
* **Column Order Matters**: Columns execute left to right. A column can only reference columns to its left.
* **Unique Names Required**: Each column name must be unique within the pipeline.
* **Dataset Columns Protected**: You cannot overwrite columns that come from the dataset.

## Scoring

By default, only the last column in a pipeline is used for score calculation. To include multiple columns in the final score:

* Set `is_part_of_score: true` on each column you want to include in the score
* Columns must produce boolean or numeric values to be scored
* When multiple columns are marked for scoring, the final score is the average of all included columns

## Column Types and Configuration

For the complete list of supported column types and their detailed configuration options, see the [Node & Column Types](/features/evaluations/column-types) documentation.

## Batch Adding Columns

Since columns must be added one at a time, here's a pattern for adding multiple columns:

```python theme={null}
import requests

columns = [
    {
        "column_type": "PROMPT_TEMPLATE",
        "name": "Generate",
        "configuration": {...}
    },
    {
        "column_type": "LLM_ASSERTION",
        "name": "Validate",
        "configuration": {...}
    }
]

for column in columns:
    response = requests.post(
        "https://api.promptlayer.com/report-columns",
        headers={"X-API-KEY": "your_key"},
        json={
            "report_id": 456,
            **column
        }
    )
    if response.status_code != 201:
        print(f"Failed: {column['name']}")
        break
```

## Column Reference Syntax

When configuring columns that reference other columns:

* **Dataset columns**: Use exact column name from dataset (e.g., `"question"`)
* **Previous columns**: Use the name you assigned (e.g., `"AI Response"`)
* **Variable columns**: Reference by their name

## Error Handling

The endpoint validates:

1. Column type is valid
2. Column name is unique within the pipeline
3. Configuration matches the column type schema
4. Referenced columns exist (for dependent columns)
5. User has permission to modify the pipeline

Common errors:

* `400`: Invalid configuration or duplicate column name
* `403`: Cannot overwrite dataset columns or lacking permissions
* `404`: Report not found or not accessible


# Add Request Log to Dataset
Source: https://docs.promptlayer.com/reference/add-request-log-to-dataset

POST /api/public/v2/dataset-versions/add-request-log

<Warning>
  Legacy Dataset, Evaluation, and Report endpoints are deprecated for new workflows. Use the [Tables API](/reference/tables-create) for new dataset import, evaluation, scoring, recalculation, and reporting workflows.
</Warning>

Add a request log as a row in the draft dataset version for a dataset group. PromptLayer extracts request inputs, metadata, scores, tags, prompt data, and response data into dataset columns.

## Behavior Notes

* A draft dataset version must already exist for the dataset group.
* The request log and dataset group must belong to the same workspace.
* New columns are automatically added from the request log's available data fields.
* If no draft exists, the endpoint returns `404`.

## Related

* [Create Draft Dataset Version](/reference/create-draft-dataset-version)
* [Save Draft Dataset Version](/reference/save-draft-dataset-version)
* [Get Request](/reference/get-request)


# Add Table Sheet Rows
Source: https://docs.promptlayer.com/reference/add-table-sheet-rows

POST /api/public/v2/tables/{table_id}/sheets/{sheet_id}/rows
Append rows to a Table sheet.

Append rows to a Table sheet.

Set `count` to the number of rows to append. The public API accepts 1 to 100 rows per request and defaults to 1 when `count` is omitted.

Pass `values` as an array of objects keyed by column ID when you want to populate text cells as rows are created. Values beyond `count` are ignored, and missing values use the column default behavior.

Rows are appended after the current last row. Output cells are created in a stale state so dependent computations can be recalculated.


# Add Trace to Dataset
Source: https://docs.promptlayer.com/reference/add-trace-to-dataset

POST /api/public/v2/dataset-versions/add-trace

<Warning>
  Legacy Dataset, Evaluation, and Report endpoints are deprecated for new workflows. Use the [Tables API](/reference/tables-create) for new dataset import, evaluation, scoring, recalculation, and reporting workflows.
</Warning>

Add an observability trace (or a specific span subtree) as a row in the draft dataset version for a dataset group.

You can export an entire trace (anchored at the earliest root span) or choose a specific span to anchor on — in which case that span and its direct children become the dataset columns.

## Behavior Notes

* If no draft dataset version exists for the group, one is automatically created and seeded from the most recent published version's rows.
* The trace and dataset group must belong to the same workspace.
* Use `span_id` to export a specific span subtree instead of the full trace root.
* The `mode` field in the response indicates whether the row was created in `trace` mode (root anchor) or `span` mode (chosen anchor).

## Related

* [Create Draft Dataset Version](/reference/create-draft-dataset-version)
* [Save Draft Dataset Version](/reference/save-draft-dataset-version)
* [Add Request Log to Dataset](/reference/add-request-log-to-dataset)


# Cancel Table Sheet Operation
Source: https://docs.promptlayer.com/reference/cancel-table-sheet-operation

DELETE /api/public/v2/tables/{table_id}/sheets/{sheet_id}/operations/{operation_id}
Cancel a Table sheet operation.

Cancel an active recalculation operation for a Table sheet.

If no active execution exists for the provided `operation_id`, the endpoint returns success with zero cancelled cells.


# Close Trace
Source: https://docs.promptlayer.com/reference/close-trace

POST /api/public/v2/traces/{trace_id}/close
Marks a trace as closed, preventing any further span ingestion for that trace. Once closed, subsequent calls to `/spans-bulk` or `/v1/traces` that include spans for this trace will have those spans rejected.

Marks a trace as closed, preventing any further spans from being written to it. Once closed, subsequent calls to [Create Spans Bulk](/reference/spans-bulk) or [Ingest Traces (OTLP)](/reference/otlp-ingest-traces) that include spans for this trace will have those spans rejected.

## Behavior Notes

* Closing is permanent — there is no re-open operation.
* If the trace is already closed, the endpoint returns `409 Conflict`.
* If no spans exist for the given `trace_id` in your workspace, the endpoint returns `404 Not Found`.

## Related

* [Get Trace](/reference/get-trace)
* [Create Spans Bulk](/reference/spans-bulk)
* [Ingest Traces (OTLP)](/reference/otlp-ingest-traces)
* [Traces](/running-requests/traces)


# Configure Table Sheet Score
Source: https://docs.promptlayer.com/reference/configure-table-sheet-score

PATCH /api/public/v2/tables/{table_id}/sheets/{sheet_id}/score
Configure scoring for a Table sheet.

Configure scoring for a Table sheet.

Use `column_ids` or `column_names` for standard boolean or numeric scoring. Column names must be unique in the sheet.

Use `score_type` with `score_config` for explicit configuration. Supported scoring modes are `auto`, `boolean`, `numeric`, and `custom`; `score_type` is required when you pass `score_config`.

For custom scoring, pass `code` and optionally `code_language` (`PYTHON` by default, or `JAVASCRIPT`). Boolean scoring also supports `true_values`, `false_values`, and `assertion_aggregation` (`all`, `any`, or `mean`).

This endpoint updates the configuration and returns `requires_recalculation`; call the recalculation endpoint to queue score calculation.

Changing score configuration creates a new sheet version. The response returns `version`, the current sheet version count after the configuration update.


# Create Dataset Group
Source: https://docs.promptlayer.com/reference/create-dataset-group

POST /api/public/v2/dataset-groups

<Warning>
  Legacy Dataset, Evaluation, and Report endpoints are deprecated for new workflows. Use the [Tables API](/reference/tables-create) for new dataset import, evaluation, scoring, recalculation, and reporting workflows.
</Warning>

Create a dataset group in a workspace. PromptLayer also creates an initial draft dataset for the group with `version_number = -1`.

## Behavior Notes

* Dataset group names must be unique within a workspace.
* If `folder_id` is omitted, the dataset group is created at the workspace root.
* Use [Resolve Folder ID by Path](/reference/resolve-folder-id) to look up a folder ID, or [Create Folder](/reference/create-folder) to create one.

## Related

* [List Datasets](/reference/list-datasets)
* [Create Dataset Version from Request History](/reference/create-dataset-version-from-filter-params)
* [Datasets Overview](/features/evaluations/datasets-overview)


# Create Dataset Version from File
Source: https://docs.promptlayer.com/reference/create-dataset-version-from-file

POST /api/public/v2/dataset-versions/from-file

<Warning>
  Legacy Dataset, Evaluation, and Report endpoints are deprecated for new workflows. Use the [Tables API](/reference/tables-create) for new dataset import, evaluation, scoring, recalculation, and reporting workflows.
</Warning>

Create a dataset version by submitting a base64-encoded CSV or JSON file. PromptLayer queues the file for asynchronous processing and creates a draft dataset while the job runs.

## Behavior Notes

* The draft dataset starts with `version_number = -1` and receives a real version number after processing succeeds.
* The `dataset_version_created_by_file` webhook event is sent when processing succeeds.
* The `dataset_version_created_by_file_failed` webhook event is sent when processing fails.
* Decoded file content must be 100MB or smaller.
* Failed drafts are automatically cleaned up.

## Related

* [Create Dataset Version from Request History](/reference/create-dataset-version-from-filter-params)
* [Create Draft Dataset Version](/reference/create-draft-dataset-version)
* [Create from History](/features/evaluations/datasets-create-from-history)


# Create Dataset Version from Request History
Source: https://docs.promptlayer.com/reference/create-dataset-version-from-filter-params

POST /api/public/v2/dataset-versions/from-filter-params

<Warning>
  Legacy Dataset, Evaluation, and Report endpoints are deprecated for new workflows. Use the [Tables API](/reference/tables-create) for new dataset import, evaluation, scoring, recalculation, and reporting workflows.
</Warning>

Create a dataset version asynchronously from existing request logs. You can populate the version with explicit request log IDs or with structured request-log filters.

## Behavior Notes

* If both `request_log_ids` and `filter_group` are present, explicit request IDs take precedence.
* The endpoint queues a background job and reuses the draft dataset for the dataset group when one already exists.
* Jobs are capped at 50,000 request logs.
* Filter-based datasets persist their filter parameters so refresh flows can replay the same query later.
* Completion triggers the `dataset_version_created_from_filter_params` webhook.

## Related

* [Create from History](/features/evaluations/datasets-create-from-history)
* [Search Request Logs](/reference/search-request-logs)
* [Get Dataset Rows](/reference/get-dataset-rows)


# Create Draft Dataset Version
Source: https://docs.promptlayer.com/reference/create-draft-dataset-version

POST /api/public/v2/dataset-versions/create-draft

<Warning>
  Legacy Dataset, Evaluation, and Report endpoints are deprecated for new workflows. Use the [Tables API](/reference/tables-create) for new dataset import, evaluation, scoring, recalculation, and reporting workflows.
</Warning>

Create a mutable draft dataset version for a dataset group. Drafts use `version_number = -1` until they are saved as a published version.

## Behavior Notes

* Only one draft can exist per dataset group at a time; creating another draft returns `409`.
* Without `source_dataset_id`, PromptLayer creates an empty draft and returns `201`.
* With `source_dataset_id`, rows are copied from the source dataset asynchronously and the endpoint returns `202`.
* The source dataset must belong to the same dataset group.

## Related

* [Add Request Log to Dataset](/reference/add-request-log-to-dataset)
* [Save Draft Dataset Version](/reference/save-draft-dataset-version)
* [Create Dataset Version from File](/reference/create-dataset-version-from-file)


# Create Folder
Source: https://docs.promptlayer.com/reference/create-folder

POST /api/public/v2/folders

Creates a new folder in the workspace. Folders can be nested within other folders by providing a parent\_id. The folder name must be unique within its parent folder (or at the root level if no parent is specified).


# Create Evaluation Pipeline
Source: https://docs.promptlayer.com/reference/create-reports

POST /reports

<Warning>
  Legacy Dataset, Evaluation, and Report endpoints are deprecated for new workflows. Use the [Tables API](/reference/tables-create) for new dataset import, evaluation, scoring, recalculation, and reporting workflows.
</Warning>

Create an Evaluation Pipeline associated with a dataset group. Use this endpoint to create a pipeline blueprint with optional columns, custom scoring configuration, folder placement, and external IDs.

## Behavior Notes

* Evaluation columns use the same node definitions as Workflows. See [Node & Column Types](/features/evaluations/column-types).
* Set `is_part_of_score` on columns for built-in scoring, or provide `score_configuration` for custom scoring logic.
* Custom scoring concepts are covered in [Score Card](/features/evaluations/score-card).

## Related

* [Configure Custom Scoring](/reference/update-report-score-card)
* [Node & Column Types](/features/evaluations/column-types)
* [Score Card](/features/evaluations/score-card)


# Create Skill Collection
Source: https://docs.promptlayer.com/reference/create-skill-collection

POST /api/public/v2/skill-collections

Create a new skill collection using either a JSON body or a multipart form upload with a ZIP archive.

### Request formats

Use `application/json` when sending file contents inline through the `files` array.

Use `multipart/form-data` when uploading a ZIP archive. Pass metadata as a JSON string in `metadata` or `json`, and include the file in `archive` or `zip`.


# Create Table
Source: https://docs.promptlayer.com/reference/create-table

POST /api/public/v2/tables
Create a Table from the public API.

Create a Table with a default `Sheet 1`, a `Column A` text column, and one empty row.

### Workspace and folders

The table is created in the workspace associated with your API key. Pass `folder_id` to create the table inside a registry folder in that workspace.

When `title` is omitted, PromptLayer generates an `Untitled Table` name.


# Create Table Sheet
Source: https://docs.promptlayer.com/reference/create-table-sheet

POST /api/public/v2/tables/{table_id}/sheets
Create a Table sheet by importing file or request log data.

Create a new sheet and start an asynchronous import job. The response returns immediately with `202`, an `operation_id`, the import operation, and the newly created sheet.

### File imports

Use `source.type=file` to import CSV or JSON content. Send the file content as base64 in `file_content_base64`; the decoded file can be up to 100MB. JSON input is converted to CSV before processing.

### Request log imports

Use `source.type=request_logs` to import request history into a sheet. Provide either explicit `request_log_ids` or a request-log `filter_group`. You can also pass `variables_to_parse`, `include_fields`, and `limit` to control the imported columns and rows.

### Operation tracking

Pass `operation_id` when you want a stable client-side identifier for polling and webhook correlation. If omitted, PromptLayer generates one.

Poll [Get Table Sheet Import Operation](/reference/get-table-sheet-operation) until the operation reaches `succeeded` or `failed`.


# Create Table Sheet Column
Source: https://docs.promptlayer.com/reference/create-table-sheet-column

POST /api/public/v2/tables/{table_id}/sheets/{sheet_id}/columns
Create a column in a Table sheet.

Create a column in a Table sheet and create cells for existing rows.

Use `type` and `config` to define the column behavior. Use `dependencies` for config-driven source columns. New columns are appended after the existing columns in the sheet.


# Create Table Sheet File Import
Source: https://docs.promptlayer.com/reference/create-table-sheet-file-import

POST /api/public/v2/tables/{table_id}/sheets/{sheet_id}/imports/file
Import a CSV file into an existing Table sheet.

Start an asynchronous CSV import into an existing Table sheet.

Send `file_name` ending in `.csv` and `file_content_base64` containing the base64-encoded CSV content.

Pass `operation_id` when you want a stable client-side identifier for polling and webhook correlation. If omitted, PromptLayer generates one.

The response returns an `operation_id` and `status_url` for polling with [Get Table Sheet Import Operation](/reference/get-table-sheet-operation).


# Create Table Sheet Operation
Source: https://docs.promptlayer.com/reference/create-table-sheet-operation

POST /api/public/v2/tables/{table_id}/sheets/{sheet_id}/operations
Queue a Table sheet operation.

Queue a `recalculate` operation for selected columns, rows, and cell statuses.

By default, recalculation targets stale computed cells. Pass `column_ids`, `row_ids`, or `statuses` to narrow the operation.

`row_ids` are zero-based row indices. `statuses` accepts `STALE`, `QUEUED`, `DISPATCHED`, `RUNNING`, `COMPLETED`, and `FAILED`; omit it to target stale cells, or pass an empty array to include all statuses. Text columns cannot be recalculated.

If the operation affects many cells, the response may return `requires_confirmation` with a `confirmation_token`; pass that token in a follow-up request to proceed. If no matching cells need work, the endpoint can return success with `cell_count: 0`.

Queued recalculations return `version`, the sheet version count used for the operation response. Computed cells report `last_computed_version` after they complete.


# Create Table Sheet Request Log Import
Source: https://docs.promptlayer.com/reference/create-table-sheet-request-log-import

POST /api/public/v2/tables/{table_id}/sheets/{sheet_id}/imports/request-logs
Import request history into an existing Table sheet.

Start an asynchronous request-history import into an existing Table sheet.

Provide either `request_log_ids` or a request-log `filter_group`. You can also pass `variables_to_parse`, `include_fields`, and `limit` to control imported columns and rows. `request_log_ids` and `limit` support up to 50,000 rows.

Pass `operation_id` when you want a stable client-side identifier for polling and webhook correlation. If omitted, PromptLayer generates one.

The response returns an `operation_id` and `status_url` for polling with [Get Table Sheet Import Operation](/reference/get-table-sheet-operation).


# Create Table Sheet Version
Source: https://docs.promptlayer.com/reference/create-table-sheet-version

POST /api/public/v2/tables/{table_id}/sheets/{sheet_id}/versions
Create a Table sheet version.

Create a named version for a Table sheet.

Pass `name` to save the current sheet state. `name` is required when `source_version_id` is omitted.

Pass `source_version_id` to restore from an existing version while creating a new version.


# Create Workflow
Source: https://docs.promptlayer.com/reference/create-workflow

POST /rest/workflows

Create a new Workflow or create a new version of an existing Workflow programmatically. Use this endpoint when you want to define the full workflow graph, including nodes, edges, input variables, folder placement, release labels, and external IDs.

## Behavior Notes

* To create a new workflow, pass `name`; to create a new version, pass `workflow_id` or `workflow_name`.
* Workflow nodes use the same building blocks as evaluation columns. See [Node & Column Types](/features/evaluations/column-types).
* Conditional edge and workflow authoring concepts are covered in [Workflows](/why-promptlayer/workflows).

## Related

* [Update Workflow](/reference/patch-workflow)
* [Run Workflow](/reference/run-workflow)
* [Workflows](/why-promptlayer/workflows)


# Delete Folder Entities
Source: https://docs.promptlayer.com/reference/delete-folder-entities

DELETE /api/public/v2/folders/entities

Deletes one or more entities from the workspace. Supports deleting folders, prompts, snippets, workflows, datasets, evaluations, AB tests, and input variable sets. Use `cascade=true` to recursively delete all contents of a folder. Without cascade, deleting a non-empty folder returns an error. Entities are soft-deleted where applicable.


# Delete Evaluation Pipeline
Source: https://docs.promptlayer.com/reference/delete-report

DELETE /reports/{report_id}

<Warning>
  Legacy Dataset, Evaluation, and Report endpoints are deprecated for new workflows. Use the [Tables API](/reference/tables-create) for new dataset import, evaluation, scoring, recalculation, and reporting workflows.
</Warning>

Archive a single evaluation pipeline by ID. Prefer this endpoint over [Delete Reports by Name](/reference/delete-reports-by-name) when you know the pipeline ID, since names can collide across pipelines.

## Related

* [Create Evaluation Pipeline](/reference/create-reports)
* [Rename Evaluation Pipeline](/reference/rename-report)
* [Delete Reports by Name](/reference/delete-reports-by-name)


# Delete Evaluation Pipeline Column
Source: https://docs.promptlayer.com/reference/delete-report-column

DELETE /report-columns/{report_column_id}

<Warning>
  Legacy Dataset, Evaluation, and Report endpoints are deprecated for new workflows. Use the [Tables API](/reference/tables-create) for new dataset import, evaluation, scoring, recalculation, and reporting workflows.
</Warning>

Delete a single column from an evaluation pipeline blueprint. Surrounding columns shift left to fill the gap.

## Behavior Notes

* Dataset columns are protected and cannot be deleted.
* Only blueprint pipeline columns can be deleted; columns on finished batch runs cannot.
* Deleting a column re-queues cells in columns to its right because references may have shifted.

## Related

* [Edit Evaluation Pipeline Column](/reference/edit-report-column)
* [Create Evaluation Pipeline](/reference/create-reports)
* [Node & Column Types](/features/evaluations/column-types)


# Delete Reports by Name
Source: https://docs.promptlayer.com/reference/delete-reports-by-name

DELETE /reports/name/{report_name}

<Warning>
  Legacy Dataset, Evaluation, and Report endpoints are deprecated for new workflows. Use the [Tables API](/reference/tables-create) for new dataset import, evaluation, scoring, recalculation, and reporting workflows.
</Warning>

This endpoint archives all reports with the specified name within the workspace associated with the provided API key.


# Edit Evaluation Pipeline Column
Source: https://docs.promptlayer.com/reference/edit-report-column

PATCH /report-columns/{report_column_id}

<Warning>
  Legacy Dataset, Evaluation, and Report endpoints are deprecated for new workflows. Use the [Tables API](/reference/tables-create) for new dataset import, evaluation, scoring, recalculation, and reporting workflows.
</Warning>

Update an existing column on an evaluation pipeline blueprint. Use this to change a column's configuration, rename it, or move it without recreating the whole pipeline.

## Behavior Notes

* Dataset columns are protected and cannot be edited.
* Only blueprint pipeline columns can be edited; columns on finished batch runs cannot.
* Column names must remain unique within the pipeline.
* Editing a column re-queues cells in that column and any columns to its right.

## Related

* [Create Evaluation Pipeline](/reference/create-reports)
* [Delete Evaluation Pipeline Column](/reference/delete-report-column)
* [Node & Column Types](/features/evaluations/column-types)


# Create Tool Env Var
Source: https://docs.promptlayer.com/reference/env-vars-tool-create

POST /api/public/v2/tool-registry/{identifier}/env-vars

Create a tool-scoped environment variable. Tool-level variables take precedence over workspace-level variables with the same key when the tool executes.


# Delete Tool Env Var
Source: https://docs.promptlayer.com/reference/env-vars-tool-delete

DELETE /api/public/v2/tool-registry/{identifier}/env-vars/{var_id}

Permanently delete a tool-scoped environment variable.


# List Tool Env Vars
Source: https://docs.promptlayer.com/reference/env-vars-tool-list

GET /api/public/v2/tool-registry/{identifier}/env-vars

List all environment variables scoped to a specific tool. Tool-level variables override workspace-level variables with the same key at execution time.


# Update Tool Env Var
Source: https://docs.promptlayer.com/reference/env-vars-tool-update

PATCH /api/public/v2/tool-registry/{identifier}/env-vars/{var_id}

Update the value of a tool-scoped environment variable. The new value must be non-empty.


# Create Workspace Env Var
Source: https://docs.promptlayer.com/reference/env-vars-workspace-create

POST /api/public/v2/env-vars

Create a workspace-scoped environment variable. The key must be a valid identifier (`^[A-Za-z_][A-Za-z0-9_]*$`). The value may be empty to create a placeholder that the user fills in later via the Settings UI.


# Delete Workspace Env Var
Source: https://docs.promptlayer.com/reference/env-vars-workspace-delete

DELETE /api/public/v2/env-vars/{var_id}

Permanently delete a workspace-scoped environment variable.


# List Workspace Env Vars
Source: https://docs.promptlayer.com/reference/env-vars-workspace-list

GET /api/public/v2/env-vars

List all environment variables in the workspace scope. Values are not returned — only the key, a 4-character suffix for display, and an `is_empty` flag.


# Update Workspace Env Var
Source: https://docs.promptlayer.com/reference/env-vars-workspace-update

PATCH /api/public/v2/env-vars/{var_id}

Update the value of a workspace-scoped environment variable. The new value must be non-empty.


# Attach Dataset Group External ID
Source: https://docs.promptlayer.com/reference/external-ids-dataset-groups-attach

POST /api/public/v2/dataset-groups/{dataset_group_id}/external-ids

<Warning>
  Legacy Dataset, Evaluation, and Report endpoints are deprecated for new workflows. Use the [Tables API](/reference/tables-create) for new dataset import, evaluation, scoring, recalculation, and reporting workflows.
</Warning>

Attach a `{source, external_id}` mapping to an existing dataset group.

Attaching the same mapping to the same dataset group is idempotent. Reusing the mapping on another entity returns `409 Conflict`. See [External IDs](/reference/external-ids-overview).


# Delete Dataset Group External ID
Source: https://docs.promptlayer.com/reference/external-ids-dataset-groups-delete

DELETE /api/public/v2/dataset-groups/{dataset_group_id}/external-ids/{source}/{external_id}

<Warning>
  Legacy Dataset, Evaluation, and Report endpoints are deprecated for new workflows. Use the [Tables API](/reference/tables-create) for new dataset import, evaluation, scoring, recalculation, and reporting workflows.
</Warning>

Detach an external ID mapping from a dataset group.


# List Dataset Group External IDs
Source: https://docs.promptlayer.com/reference/external-ids-dataset-groups-list

GET /api/public/v2/dataset-groups/{dataset_group_id}/external-ids

<Warning>
  Legacy Dataset, Evaluation, and Report endpoints are deprecated for new workflows. Use the [Tables API](/reference/tables-create) for new dataset import, evaluation, scoring, recalculation, and reporting workflows.
</Warning>

List all external ID mappings attached to a dataset group.


# Attach Folder External ID
Source: https://docs.promptlayer.com/reference/external-ids-folders-attach

POST /api/public/v2/folders/{folder_id}/external-ids

Attach a `{source, external_id}` mapping to an existing folder.

Attaching the same mapping to the same folder is idempotent. Reusing the mapping on another entity returns `409 Conflict`. See [External IDs](/reference/external-ids-overview).


# Delete Folder External ID
Source: https://docs.promptlayer.com/reference/external-ids-folders-delete

DELETE /api/public/v2/folders/{folder_id}/external-ids/{source}/{external_id}

Detach an external ID mapping from a folder.


# List Folder External IDs
Source: https://docs.promptlayer.com/reference/external-ids-folders-list

GET /api/public/v2/folders/{folder_id}/external-ids

List all external ID mappings attached to a folder.


# External IDs
Source: https://docs.promptlayer.com/reference/external-ids-overview

Attach identifiers from your own systems to PromptLayer resources so your integrations can easily find and sync resources.

```json theme={null}
{
  "source": "acme_cms",
  "external_id": "prompt_template_123"
}
```

## Common tasks

| Task                                 | Example                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| ------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Set when creating an entity          | [`POST /rest/prompt-templates`](/reference/templates-publish#body-external-ids) include `external_ids` in the payload body.                                                                                                                                                                                                                                                                                                                                                                                                                        |
| Set on an existing resource          | [`POST /prompt-templates/{prompt_template_id}/external-ids`](/reference/external-ids-prompt-templates-attach)                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| Get an entity by external ID         | Add `external_source` and `external_id` to supported list endpoints such as [`GET /prompt-templates`](/reference/list-prompt-templates), [`GET /workflows`](/reference/list-workflows), [`GET /api/public/v2/datasets`](/reference/list-datasets), [`GET /api/public/v2/evaluations`](/reference/list-evaluations), [`GET /api/public/v2/skill-collections`](/reference/list-skill-collections), [`GET /api/public/v2/tool-registry`](/reference/tool-registry-list), or [`GET /api/public/v2/folders/entities`](/reference/list-folder-entities). |
| Get external IDs for an entity       | [`GET /prompt-templates/{prompt_template_id}/external-ids`](/reference/external-ids-prompt-templates-list)                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| Delete an external ID from an entity | [`DELETE /prompt-templates/{prompt_template_id}/external-ids/{source}/{external_id}`](/reference/external-ids-prompt-templates-delete)                                                                                                                                                                                                                                                                                                                                                                                                             |

## Supported resources

* [Prompt templates](/reference/templates-publish#body-external-ids)
* [Folders](/reference/create-folder#body-external-ids)
* [Workflows](/reference/create-workflow#body-external-ids)
* [Datasets](/reference/create-dataset-group#body-external-ids)
* [Evaluations](/reference/create-reports#body-external-ids)
* [Tools](/reference/tool-registry-create#body-external-ids)
* [Skill collections](/reference/create-skill-collection#body-external-ids)


# Attach Prompt Template External ID
Source: https://docs.promptlayer.com/reference/external-ids-prompt-templates-attach

POST /prompt-templates/{prompt_template_id}/external-ids

Attach a `{source, external_id}` mapping to an existing prompt template.

Attaching the same mapping to the same prompt template is idempotent. Reusing the mapping on another entity returns `409 Conflict`. See [External IDs](/reference/external-ids-overview).


# Delete Prompt Template External ID
Source: https://docs.promptlayer.com/reference/external-ids-prompt-templates-delete

DELETE /prompt-templates/{prompt_template_id}/external-ids/{source}/{external_id}

Detach an external ID mapping from a prompt template.


# List Prompt Template External IDs
Source: https://docs.promptlayer.com/reference/external-ids-prompt-templates-list

GET /prompt-templates/{prompt_template_id}/external-ids

List all external ID mappings attached to a prompt template.


# Upsert Prompt Template by External ID
Source: https://docs.promptlayer.com/reference/external-ids-prompt-templates-upsert

PUT /prompt-templates/by-external-id/{source}/{external_id}

Publish a prompt template by a customer-defined external ID. If the mapping already exists, PromptLayer creates a new version on the mapped prompt template. If the mapping does not exist, PromptLayer creates or updates the prompt template from the request body and attaches the mapping.

This is the only external ID endpoint that uses `PUT`. See [External IDs](/reference/external-ids-overview) for mapping rules and conflict behavior.


# Attach Report External ID
Source: https://docs.promptlayer.com/reference/external-ids-reports-attach

POST /reports/{report_id}/external-ids

<Warning>
  Legacy Dataset, Evaluation, and Report endpoints are deprecated for new workflows. Use the [Tables API](/reference/tables-create) for new dataset import, evaluation, scoring, recalculation, and reporting workflows.
</Warning>

Attach a `{source, external_id}` mapping to an existing report or evaluation pipeline.

Attaching the same mapping to the same report is idempotent. Reusing the mapping on another entity returns `409 Conflict`. See [External IDs](/reference/external-ids-overview).


# Delete Report External ID
Source: https://docs.promptlayer.com/reference/external-ids-reports-delete

DELETE /reports/{report_id}/external-ids/{source}/{external_id}

<Warning>
  Legacy Dataset, Evaluation, and Report endpoints are deprecated for new workflows. Use the [Tables API](/reference/tables-create) for new dataset import, evaluation, scoring, recalculation, and reporting workflows.
</Warning>

Detach an external ID mapping from a report or evaluation pipeline.


# List Report External IDs
Source: https://docs.promptlayer.com/reference/external-ids-reports-list

GET /reports/{report_id}/external-ids

<Warning>
  Legacy Dataset, Evaluation, and Report endpoints are deprecated for new workflows. Use the [Tables API](/reference/tables-create) for new dataset import, evaluation, scoring, recalculation, and reporting workflows.
</Warning>

List all external ID mappings attached to a report or evaluation pipeline.


# Attach Skill Collection External ID
Source: https://docs.promptlayer.com/reference/external-ids-skill-collections-attach

POST /api/public/v2/skill-collections/{skill_collection_id}/external-ids

Attach a `{source, external_id}` mapping to an existing skill collection.

Attaching the same mapping to the same skill collection is idempotent. Reusing the mapping on another entity returns `409 Conflict`. See [External IDs](/reference/external-ids-overview).


# Delete Skill Collection External ID
Source: https://docs.promptlayer.com/reference/external-ids-skill-collections-delete

DELETE /api/public/v2/skill-collections/{skill_collection_id}/external-ids/{source}/{external_id}

Detach an external ID mapping from a skill collection.


# List Skill Collection External IDs
Source: https://docs.promptlayer.com/reference/external-ids-skill-collections-list

GET /api/public/v2/skill-collections/{skill_collection_id}/external-ids

List all external ID mappings attached to a skill collection.


# Attach Tool Registry External ID
Source: https://docs.promptlayer.com/reference/external-ids-tool-registry-attach

POST /api/public/v2/tool-registry/{tool_id}/external-ids

Attach a `{source, external_id}` mapping to an existing tool registry entry.

Attaching the same mapping to the same tool is idempotent. Reusing the mapping on another entity returns `409 Conflict`. See [External IDs](/reference/external-ids-overview).


# Delete Tool Registry External ID
Source: https://docs.promptlayer.com/reference/external-ids-tool-registry-delete

DELETE /api/public/v2/tool-registry/{tool_id}/external-ids/{source}/{external_id}

Detach an external ID mapping from a tool registry entry.


# List Tool Registry External IDs
Source: https://docs.promptlayer.com/reference/external-ids-tool-registry-list

GET /api/public/v2/tool-registry/{tool_id}/external-ids

List all external ID mappings attached to a tool registry entry.


# Attach Workflow External ID
Source: https://docs.promptlayer.com/reference/external-ids-workflows-attach

POST /workflows/{workflow_id}/external-ids

Attach a `{source, external_id}` mapping to an existing workflow.

Attaching the same mapping to the same workflow is idempotent. Reusing the mapping on another entity returns `409 Conflict`. See [External IDs](/reference/external-ids-overview).


# Delete Workflow External ID
Source: https://docs.promptlayer.com/reference/external-ids-workflows-delete

DELETE /workflows/{workflow_id}/external-ids/{source}/{external_id}

Detach an external ID mapping from a workflow.


# List Workflow External IDs
Source: https://docs.promptlayer.com/reference/external-ids-workflows-list

GET /workflows/{workflow_id}/external-ids

List all external ID mappings attached to a workflow.


# Get Dataset Rows
Source: https://docs.promptlayer.com/reference/get-dataset-rows

GET /api/public/v2/datasets/{dataset_id}/rows

<Warning>
  Legacy Dataset, Evaluation, and Report endpoints are deprecated for new workflows. Use the [Tables API](/reference/tables-create) for new dataset import, evaluation, scoring, recalculation, and reporting workflows.
</Warning>

Retrieve paginated rows from a dataset with cell-level data.

## Behavior Notes

* Each row is an array of cells matching the order of the `columns` array.
* Dataset cells use a uniform `type: "dataset"` shape.
* Use `q` to search across dataset row cell values.

## Related

* [List Datasets](/reference/list-datasets)
* [Create Dataset Version from Request History](/reference/create-dataset-version-from-filter-params)
* [Datasets Overview](/features/evaluations/datasets-overview)


# Get Evaluation Rows
Source: https://docs.promptlayer.com/reference/get-evaluation-rows

GET /api/public/v2/evaluations/{evaluation_id}/rows

<Warning>
  Legacy Dataset, Evaluation, and Report endpoints are deprecated for new workflows. Use the [Tables API](/reference/tables-create) for new dataset import, evaluation, scoring, recalculation, and reporting workflows.
</Warning>

Retrieve paginated evaluation rows with dataset input cells followed by evaluation result cells.

## Behavior Notes

* Each row is an array of cells matching the order of the `columns` array.
* Dataset cells contain source row values; evaluation cells contain result status, value, and optional error details.

## Related

* [List Evaluations](/reference/list-evaluations)
* [Create Evaluation Pipeline](/reference/create-reports)
* [Score Card](/features/evaluations/score-card)


# Get Evaluation
Source: https://docs.promptlayer.com/reference/get-report

GET /reports/{report_id}

<Warning>
  Legacy Dataset, Evaluation, and Report endpoints are deprecated for new workflows. Use the [Tables API](/reference/tables-create) for new dataset import, evaluation, scoring, recalculation, and reporting workflows.
</Warning>

This endpoint allows you to retrieve the info about a report.

Please note that if you want to get the score of a report, you should use the `GET /reports/{report_id}/score` endpoint instead ([link](/reference/get-report-score)).


# Get Evaluation Score
Source: https://docs.promptlayer.com/reference/get-report-score

GET /reports/{report_id}/score

<Warning>
  Legacy Dataset, Evaluation, and Report endpoints are deprecated for new workflows. Use the [Tables API](/reference/tables-create) for new dataset import, evaluation, scoring, recalculation, and reporting workflows.
</Warning>

This endpoint allows you to retrieve the score of a specific report by its ID.


# Get Request
Source: https://docs.promptlayer.com/reference/get-request

GET /api/public/v2/requests/{request_id}

Retrieve the full payload of a logged request by ID. The response includes a prompt blueprint, token usage, timing data, pricing, and an associated `trace_id` when tracing data is available.

## Related

* [Request IDs](/features/prompt-history/request-id)
* [Prompt Blueprints](/running-requests/prompt-blueprints)
* [Get Trace](/reference/get-trace)


# Get Skill Collection
Source: https://docs.promptlayer.com/reference/get-skill-collection

GET /api/public/v2/skill-collections/{identifier}

Fetch a skill collection by UUID, collection name, or root path.

## Behavior Notes

* Use `format=zip` to download the selected version as a ZIP archive.
* Omit `format` to receive the JSON payload shown in the generated response schema.
* Use `version` or `label` to pin the response to a specific saved version.

## Related

* [List Skill Collections](/reference/list-skill-collections)
* [Skill Collections Overview](/features/skill-collections/overview)
* [Pulling Skills](/features/skill-collections/pulling-skills)


# Get Snippet Usage
Source: https://docs.promptlayer.com/reference/get-snippet-usage

GET /prompt-templates/{identifier}/snippet-usage

Get all prompts that use a given snippet (prompt template). Returns a list of prompts and their version numbers that reference this snippet, as well as any release labels that reference it. The identifier can be either the prompt name or the prompt id.


# Get Table
Source: https://docs.promptlayer.com/reference/get-table

GET /api/public/v2/tables/{table_id}
Retrieve Table details.

Retrieve one Table by ID.

The response includes `sheet_count` and `sheet_row_counts`, where `sheet_row_counts` maps each sheet ID to its resolved row count.


# Get Table Sheet
Source: https://docs.promptlayer.com/reference/get-table-sheet

GET /api/public/v2/tables/{table_id}/sheets/{sheet_id}
Retrieve Table sheet details.

Retrieve one sheet in a Table.

The response includes the public sheet fields and the resolved `row_count`.


# Get Table Sheet Cell
Source: https://docs.promptlayer.com/reference/get-table-sheet-cell

GET /api/public/v2/tables/{table_id}/sheets/{sheet_id}/cells/{cell_id}
Retrieve a Table sheet cell.

Retrieve one cell from a Table sheet.

The response uses `column_id` in the public payload and expands stored output media URLs when present.


# Get Table Sheet Import Operation
Source: https://docs.promptlayer.com/reference/get-table-sheet-operation

GET /api/public/v2/tables/{table_id}/sheets/operations/{operation_id}
Check the status of an asynchronous Table sheet import.

Poll a sheet import operation started by [Create Table Sheet](/reference/create-table-sheet), [Create Table Sheet File Import](/reference/create-table-sheet-file-import), or [Create Table Sheet Request Log Import](/reference/create-table-sheet-request-log-import).

Operation statuses are `queued`, `running`, `succeeded`, and `failed`. Successful operations can include `rows_added` and `row_count`; failed operations include `error_message` when available.


# Get Table Sheet Operation
Source: https://docs.promptlayer.com/reference/get-table-sheet-operation-status

GET /api/public/v2/tables/{table_id}/sheets/{sheet_id}/operations/{operation_id}
Poll a Table sheet operation.

Poll a Table sheet recalculation operation.

The response includes aggregate cell counts by status plus completed, failed, and pending counts.


# Get Table Sheet Score
Source: https://docs.promptlayer.com/reference/get-table-sheet-score

GET /api/public/v2/tables/{table_id}/sheets/{sheet_id}/score
Retrieve Table sheet score data.

Retrieve the current score payload for a Table sheet.

When explicit scoring is not configured, PromptLayer returns the default score it can infer from output columns. Configured scoring responses include status and configuration metadata.


# Get Table Sheet Score History
Source: https://docs.promptlayer.com/reference/get-table-sheet-score-history

GET /api/public/v2/tables/{table_id}/sheets/{sheet_id}/versions/score-history
Retrieve Table sheet score history.

Retrieve score-history points across Table sheet versions.

Use `range` to limit the history window (`all`, `last_25`, `last_50`, `last_100`, or `last_250`), `resolution` to control sampling (`auto`, `raw`, or `min_max_bucket`), and `max_points` to cap the number of returned points. `max_points` defaults to 1200 and must be between 50 and 5000.


# Get Table Sheet Version
Source: https://docs.promptlayer.com/reference/get-table-sheet-version

GET /api/public/v2/tables/{table_id}/sheets/{sheet_id}/versions/{version_id}
Retrieve a Table sheet version.

Retrieve one Table sheet version, including its snapshot.

The snapshot contains the versioned sheet structure and row data.


# Get Trace
Source: https://docs.promptlayer.com/reference/get-trace

GET /api/public/v2/traces/{trace_id}

Retrieve all spans for a trace ID. Each span includes its metadata and, when the span generated a request log, the associated `request_log_id`.

## Related

* [Get Request](/reference/get-request)
* [Traces](/running-requests/traces)
* [OpenTelemetry](/features/opentelemetry)


# Get Workflow
Source: https://docs.promptlayer.com/reference/get-workflow

GET /workflows/{workflow_id_or_name}

Retrieve a workflow by ID or name, including full node configuration, edges, and version details. By default, the latest version is returned.

## Behavior Notes

* Use `version` to retrieve a specific version number.
* Use `label` to retrieve the version currently assigned to a release label.
* `version` and `label` are mutually exclusive.

## Related

* [Create Workflow](/reference/create-workflow)
* [Update Workflow](/reference/patch-workflow)
* [Workflows](/why-promptlayer/workflows)


# Get Workflow Labels
Source: https://docs.promptlayer.com/reference/get-workflow-labels

GET /workflows/{workflow_id_or_name}/labels

List all release labels for a workflow. Returns each label with its name, ID, and the version it points to.

This mirrors the [Get Prompt Template Labels](/reference/templates-labels-get) endpoint.


# Introduction
Source: https://docs.promptlayer.com/reference/introduction

Use the PromptLayer REST API to manage prompts, workflows, evaluations, datasets, request logs, traces, and other workspace resources programmatically.

The PromptLayer REST API lets you interact with your workspace directly over HTTP so you can automate updates, run workflows, log activity, and export or organize data from your own systems.

## Authentication

Authenticate API requests with a PromptLayer API key using the `X-API-Key` header.

Generate API keys from the API keys page in your PromptLayer dashboard. Each workspace has its own API keys, and requests use the workspace associated with the key by default.

## Endpoints

### Prompt Templates

* [Get Prompt Template](/reference/templates-get)
* [Get Prompt Template (Raw)](/reference/templates-get-raw)
* [List Prompt Templates](/reference/list-prompt-templates)
* [Publish Prompt Template](/reference/templates-publish)
* [List Prompt Template Labels](/reference/templates-labels-get)
* [Create Prompt Template Label](/reference/prompt-labels-create)
* [Move Prompt Template Label](/reference/prompt-labels-patch)
* [Delete Prompt Template Label](/reference/prompt-labels-delete)
* [Get Snippet Usage](/reference/get-snippet-usage)

### Tracking

* [Get Request](/reference/get-request)
* [Search Request Logs](/reference/search-request-logs)
* [Log Request](/reference/log-request)
* [Track Prompt](/reference/track-prompt)
* [Track Score](/reference/track-score)
* [Track Metadata](/reference/track-metadata)
* [Create Spans Bulk](/reference/spans-bulk)
* [Ingest Traces (OTLP)](/reference/otlp-ingest-traces)

### Datasets

<Warning>
  Dataset endpoints are deprecated for new workflows. Use [Tables](/reference/tables-create) and Table sheet imports instead.
</Warning>

* [List Datasets](/reference/list-datasets)
* [Get Dataset Rows](/reference/get-dataset-rows)
* [Create Dataset Group](/reference/create-dataset-group)
* [Create Dataset Version from File](/reference/create-dataset-version-from-file)
* [Create Dataset Version from Filter Params](/reference/create-dataset-version-from-filter-params)

### Evaluations

<Warning>
  Evaluation and Report endpoints are deprecated for new workflows. Use [Tables](/reference/tables-create), Table operations, and Table sheet scoring instead.
</Warning>

* [List Evaluations](/reference/list-evaluations)
* [Get Evaluation Rows](/reference/get-evaluation-rows)
* [Create Evaluation Pipeline](/reference/create-reports)
* [Run Report](/reference/run-report)
* [Get Report](/reference/get-report)
* [Get Report Score](/reference/get-report-score)
* [Add Report Columns](/reference/add-report-columns)
* [Edit Evaluation Pipeline Column](/reference/edit-report-column)
* [Delete Evaluation Pipeline Column](/reference/delete-report-column)
* [Update Report Score Card](/reference/update-report-score-card)
* [Rename Evaluation Pipeline](/reference/rename-report)
* [Delete Evaluation Pipeline](/reference/delete-report)
* [Delete Reports by Name](/reference/delete-reports-by-name)

### Tables

Table endpoints are scoped to the workspace associated with your API key. Sheet read responses include `version_count`; sheet mutations that create a new sheet version return `version`, the current sheet version count after the operation.

* [Create Table](/reference/tables-create)
* [List Tables](/reference/tables-list)
* [Get Table](/reference/tables-get)
* [Update Table](/reference/tables-update)
* [List Sheets](/reference/table-sheets-list)
* [Create Sheet](/reference/table-sheets-create)
* [Import File](/reference/table-sheet-imports-file-create)
* [Import Request Logs](/reference/table-sheet-imports-request-logs-create)
* [Get Sheet Import Operation](/reference/table-sheets-get-operation)
* [Get Sheet](/reference/table-sheets-get)
* [Update Sheet](/reference/table-sheets-update)
* [List Columns](/reference/table-sheet-columns-list)
* [Create Column](/reference/table-sheet-columns-create)
* [Update Column](/reference/table-sheet-columns-update)
* [List Rows](/reference/table-sheet-rows-list)
* [Add Rows](/reference/table-sheet-rows-add)
* [Get Cell](/reference/table-sheet-cells-get)
* [Update Cell](/reference/table-sheet-cells-update)
* [Recalculate Cell](/reference/table-sheet-cells-recalculate)
* [Recalculate Cells](/reference/table-sheet-cells-recalculate-batch)
* [List Operations](/reference/table-sheet-operations-list)
* [Create Operation](/reference/table-sheet-operations-create)
* [Get Operation](/reference/table-sheet-operations-get)
* [Cancel Operation](/reference/table-sheet-operations-cancel)
* [Get Score](/reference/table-sheet-score-get)
* [Configure Score](/reference/table-sheet-score-configure)
* [Recalculate Score](/reference/table-sheet-score-recalculate)
* [List Versions](/reference/table-sheet-versions-list)
* [Create Version](/reference/table-sheet-versions-create)
* [Get Version](/reference/table-sheet-versions-get)
* [Get Score History](/reference/table-sheet-score-history-get)

<Tip>
  Pass `folder_id` on `Create Evaluation Pipeline`, [Create Table](/reference/tables-create), and the create endpoints under [Prompt Templates](#prompt-templates), [Datasets](#datasets), and [Workflows](#workflows) to organize new resources directly into a folder. Use [Resolve Folder ID by Path](/reference/resolve-folder-id) to look up an ID, or [Create Folder](/reference/create-folder) to make one. See [Unified Registry](#unified-registry).
</Tip>

### Workflows

* [List Workflows](/reference/list-workflows)
* [Create Workflow](/reference/create-workflow)
* [Update Workflow](/reference/patch-workflow)
* [Run Workflow](/reference/run-workflow)
* [Get Workflow Version Execution Results](/reference/workflow-version-execution-results)

### Skill Collections

* [List Skill Collections](/reference/list-skill-collections)
* [Create Skill Collection](/reference/create-skill-collection)
* [Get Skill Collection](/reference/get-skill-collection)
* [Update Skill Collection](/reference/update-skill-collection)
* [Save Skill Collection Version](/reference/save-skill-collection-version)

### Unified Registry

* [Create Folder](/reference/create-folder)
* [Update Folder](/reference/update-folder)
* [List Folder Entities](/reference/list-folder-entities)
* [Move Folder Entities](/reference/move-folder-entities)
* [Delete Folder Entities](/reference/delete-folder-entities)
* [Resolve Folder ID by Path](/reference/resolve-folder-id)


# List Datasets
Source: https://docs.promptlayer.com/reference/list-datasets

GET /api/public/v2/datasets

<Warning>
  Legacy Dataset, Evaluation, and Report endpoints are deprecated for new workflows. Use the [Tables API](/reference/tables-create) for new dataset import, evaluation, scoring, recalculation, and reporting workflows.
</Warning>

Retrieve a paginated list of datasets based on various filtering criteria. This endpoint allows you to retrieve datasets with various filtering options including dataset group, prompt, report, workspace, and name filters.


# List Evaluations
Source: https://docs.promptlayer.com/reference/list-evaluations

GET /api/public/v2/evaluations

<Warning>
  Legacy Dataset, Evaluation, and Report endpoints are deprecated for new workflows. Use the [Tables API](/reference/tables-create) for new dataset import, evaluation, scoring, recalculation, and reporting workflows.
</Warning>

Retrieve a paginated list of evaluations in your workspace.


# List Folder Entities
Source: https://docs.promptlayer.com/reference/list-folder-entities

GET /api/public/v2/folders/entities

Lists entities within a folder or at the workspace root. Returns folders, prompts, snippets, workflows, datasets, evaluations, AB tests, and input variable sets.

This endpoint is the primary way to browse the unified registry programmatically — equivalent to viewing the registry page in the PromptLayer dashboard.

## Behavior Notes

When `include_metadata=true`, the `metadata` field contains type-specific information:

| Entity Type                                                    | Metadata Fields                                                                    |
| -------------------------------------------------------------- | ---------------------------------------------------------------------------------- |
| **PROMPT** / **SNIPPET**                                       | `type` (`"chat"` or `"completion"` or `null`), `latest_version_number`             |
| **WORKFLOW**                                                   | `latest_version_number`                                                            |
| **DATASET**                                                    | `isDraft` (boolean — `true` if only draft versions exist), `latest_version_number` |
| **FOLDER**, **REPORT**, **AB\_TEST**, **INPUT\_VARIABLE\_SET** | Empty object `{}`                                                                  |

Sorting behavior:

* Semantic search results are sorted by relevance.
* Text search prioritizes prompts and snippets whose names match before content-only matches.
* When `sort_by` is provided and search ordering does not take precedence, results are sorted by the requested field and `sort_order`.


# List Prompt Templates
Source: https://docs.promptlayer.com/reference/list-prompt-templates

GET /prompt-templates

Get a paginated list of all prompt templates in your workspace. Results are ordered by creation date, newest first.

Each returned prompt template includes the latest version by default. When filtering by `label`, the version associated with that label is returned instead.


# List Skill Collections
Source: https://docs.promptlayer.com/reference/list-skill-collections

GET /api/public/v2/skill-collections

Retrieve all non-deleted skill collections available in the authenticated workspace.


# List Table Sheet Columns
Source: https://docs.promptlayer.com/reference/list-table-sheet-columns

GET /api/public/v2/tables/{table_id}/sheets/{sheet_id}/columns
Retrieve columns from a Table sheet.

List columns for a Table sheet using cursor pagination. Results are sorted by `position_rank`; use `order=asc` or `order=desc` to control direction.

Columns are returned in sheet order and include dependency and source metadata for clients that need to render or recreate column configuration.

Columns are returned in `data`. Pagination metadata is returned in `pagination`, including `next_cursor`, `has_more`, and `limit`.

Treat `next_cursor` as opaque. It includes the current sort, order, cursor value, and a hash of the active filters. Reuse a cursor only with the same `sort`, `order`, and filter parameters from the request that produced it; changing any of those while passing an old cursor returns an invalid cursor error.


# List Table Sheet Operations
Source: https://docs.promptlayer.com/reference/list-table-sheet-operations

GET /api/public/v2/tables/{table_id}/sheets/{sheet_id}/operations
List active operations for a Table sheet.

List active recalculation operations and cell status counts for a Table sheet.

Use this endpoint to determine whether a sheet has queued or running work before mutating columns, rows, cells, or score configuration.


# List Table Sheet Rows
Source: https://docs.promptlayer.com/reference/list-table-sheet-rows

GET /api/public/v2/tables/{table_id}/sheets/{sheet_id}/rows
Retrieve rows from a Table sheet.

Retrieve rows from a Table sheet using cursor pagination. Results are sorted by `row_index`; use `order=asc` or `order=desc` to control direction.

Rows are returned as objects with `row_index` and `cells`. The `cells` object is keyed by column ID, and each cell uses `column_id` in the public payload.

Rows are returned in `data`. Pagination metadata is returned in `pagination`, including `next_cursor`, `has_more`, and `limit`.

Treat `next_cursor` as opaque. It includes the current sort, order, cursor value, and a hash of the active filters. Reuse a cursor only with the same `sort`, `order`, and filter parameters from the request that produced it; changing any of those while passing an old cursor returns an invalid cursor error.

### Columns

Column definitions are included on the first page by default. Later cursor pages return an empty `columns` array unless you pass `include_columns=true`.

### Row count

`row_count` is included by default. Pass `include_row_count=false` when you only need page data.


# List Table Sheet Versions
Source: https://docs.promptlayer.com/reference/list-table-sheet-versions

GET /api/public/v2/tables/{table_id}/sheets/{sheet_id}/versions
List versions for a Table sheet.

List saved versions for a Table sheet using cursor pagination. Results are sorted by `version_number`; use `order=asc` or `order=desc` to control direction.

The response includes version summaries and related metadata in `data`. Pagination metadata is returned in `pagination`, including `next_cursor`, `has_more`, and `limit`.

Treat `next_cursor` as opaque. It includes the current sort, order, cursor value, and a hash of the active filters. Reuse a cursor only with the same `sort`, `order`, and filter parameters from the request that produced it; changing any of those while passing an old cursor returns an invalid cursor error.


# List Table Sheets
Source: https://docs.promptlayer.com/reference/list-table-sheets

GET /api/public/v2/tables/{table_id}/sheets
Retrieve sheets in a Table.

List sheets in a Table using cursor pagination. Results are sorted by sheet `index`; use `order=asc` or `order=desc` to control direction.

### Filtering

Only active sheets are returned.

You can also filter sheets by `prompt_id`, `prompt_version_id`, or `prompt_label_id` when you need sheets that contain matching prompt columns.

### Response shape

Sheets are returned in `data` with resolved `row_count`. Pagination metadata is returned in `pagination`, including `next_cursor`, `has_more`, and `limit`. Applied filters are returned in `filters`, and `count` is the number of sheets returned on the current page.

Treat `next_cursor` as opaque. It includes the current sort, order, cursor value, and a hash of the active filters. Reuse a cursor only with the same `sort`, `order`, and filter parameters from the request that produced it; changing any of those while passing an old cursor returns an invalid cursor error.


# List Tables
Source: https://docs.promptlayer.com/reference/list-tables

GET /api/public/v2/tables
Retrieve Tables.

Retrieve Tables from the workspace associated with your API key.

### Filtering

Use `name` for a case-insensitive partial title match and `folder_id` for registry folder membership.

You can also filter to tables that contain prompt columns for a specific `prompt_id`, `prompt_version_id`, or `prompt_label_id`.

### Pagination

This endpoint uses cursor pagination with `cursor` and `limit`. Results are sorted by `created_at`; use `order=asc` or `order=desc` to control direction.

Tables are returned in `data`. Pagination metadata is returned in `pagination`, including `next_cursor`, `has_more`, and `limit`. Applied filters are returned in `filters`, and `count` is the number of Tables returned on the current page.

Treat `next_cursor` as opaque. It includes the current sort, order, cursor value, and a hash of the active filters. Reuse a cursor only with the same `sort`, `order`, and filter parameters from the request that produced it; changing any of those while passing an old cursor returns an invalid cursor error.


# List Workflows
Source: https://docs.promptlayer.com/reference/list-workflows

GET /workflows

Get a list of all workflows in the system.


# Log Request
Source: https://docs.promptlayer.com/reference/log-request

POST /log-request

Log a request made outside of PromptLayer's managed run APIs. Use this endpoint for custom providers, background jobs, or direct LLM client calls that you still want to inspect, search, score, and replay in PromptLayer.

## Behavior Notes

* `input` and `output` must use Prompt Blueprint format.
* Chat message `content` must be an array of content blocks, not a plain string.
* Provider-specific settings such as structured outputs, tools, tool choice, thinking, and reasoning should be recorded in the prompt blueprint or `parameters` payload.
* Use `status`, `error_type`, and `error_message` to log failed or degraded requests.

## Related

* [Custom Logging](/features/prompt-history/custom-logging)
* [Prompt Blueprints](/running-requests/prompt-blueprints)
* [Structured Output Logging](/features/prompt-history/structured-output-logging)


# Move Folder Entities
Source: https://docs.promptlayer.com/reference/move-folder-entities

POST /api/public/v2/folders/entities

Moves one or more entities into a target folder, or to the workspace root if no folder\_id is provided. Supports moving folders, prompts, snippets, workflows, datasets, evaluations, AB tests, and input variable sets. Requires appropriate edit permissions for each entity type.


# Ingest Traces (OTLP)
Source: https://docs.promptlayer.com/reference/otlp-ingest-traces

POST /v1/traces

Ingest OpenTelemetry traces through PromptLayer's OTLP/HTTP endpoint.

## Behavior Notes

* This endpoint accepts an `ExportTraceServiceRequest` as defined by the [OpenTelemetry specification](https://opentelemetry.io/docs/specs/otel/protocol/otlp/#otlphttp).
* Spans carrying [GenAI semantic convention](https://opentelemetry.io/docs/specs/semconv/gen-ai/) attributes are automatically converted into PromptLayer request logs.
* Supported content types are `application/x-protobuf` for binary protobuf encoding and `application/json` for JSON encoding.
* Gzip `Content-Encoding` is supported for both formats.
* Spans can include `promptlayer.prompt.name`, optionally with `promptlayer.prompt.version`, to link the generated request log to an existing prompt template in your workspace.
* Spans can include `user.id`/`enduser.id`, `gen_ai.conversation.id`/`session.id`, and `promptlayer.metadata.*` attributes to attach searchable user identity and metadata to the generated request log.
* For SDK setup, GenAI semantic conventions, prompt template linking, metadata, and collector configuration, see [OpenTelemetry](/features/opentelemetry).

## Related

* [OpenTelemetry](/features/opentelemetry)
* [Traces](/running-requests/traces)
* [Create Spans Bulk](/reference/spans-bulk)


# Update Workflow (PATCH)
Source: https://docs.promptlayer.com/reference/patch-workflow

PATCH /rest/workflows/{workflow_id_or_name}

Partially update a Workflow by creating a new version from an existing base version. Use this endpoint when you want to change only selected nodes or version metadata without resending the complete workflow graph.

## Behavior Notes

* If `base_version` is omitted, PromptLayer patches from the latest version.
* `nodes` are merged by name: unmentioned nodes are preserved, object values add or update nodes, and `null` removes a node.
* Node `configuration` is deep-merged, while `dependencies`, `required_input_variables`, and `edges` are replaced when provided.
* `release_labels` are moved or attached to the newly created version.

## Related

* [Create Workflow](/reference/create-workflow)
* [Run Workflow](/reference/run-workflow)
* [Workflows](/why-promptlayer/workflows)


# Create a Prompt Template Label
Source: https://docs.promptlayer.com/reference/prompt-labels-create

POST /prompts/{prompt_id}/label

Create a release label for a prompt template version.


# Delete a Prompt Template Label
Source: https://docs.promptlayer.com/reference/prompt-labels-delete

DELETE /prompt-labels/{prompt_label_id}

Delete a prompt label from a prompt version.


# Move Prompt Template Labels
Source: https://docs.promptlayer.com/reference/prompt-labels-patch

PATCH /prompt-labels/{prompt_label_id}

Move a prompt label from one prompt version to another


# Recalculate Table Sheet Cell
Source: https://docs.promptlayer.com/reference/recalculate-table-sheet-cell

POST /api/public/v2/tables/{table_id}/sheets/{sheet_id}/cells/{cell_id}/recalculations
Queue recalculation for one Table sheet cell.

Queue recalculation for one computed cell in a Table sheet.

The response returns an `execution_id` when recalculation is queued. If the cell does not need recalculation, the endpoint returns successfully without an execution.


# Recalculate Table Sheet Cells
Source: https://docs.promptlayer.com/reference/recalculate-table-sheet-cells

POST /api/public/v2/tables/{table_id}/sheets/{sheet_id}/cells/recalculations
Queue recalculation for multiple Table sheet cells.

Queue recalculation for a set of cells in a Table sheet.

Pass `cell_ids` for the cells to recalculate. The response includes the queued `execution_id`, the number of cells selected, and the number of cells queued.


# Recalculate Table Sheet Score
Source: https://docs.promptlayer.com/reference/recalculate-table-sheet-score

POST /api/public/v2/tables/{table_id}/sheets/{sheet_id}/score
Queue score recalculation for a Table sheet.

Queue recalculation for an existing Table sheet score configuration.

This endpoint requires scoring to already be configured on the sheet. The response returns the `score_configuration_id` and score calculation `status` (`queued`, `running`, `completed`, `failed`, or `null`).


# Rename Evaluation Pipeline
Source: https://docs.promptlayer.com/reference/rename-report

PATCH /reports/{report_id}/rename

<Warning>
  Legacy Dataset, Evaluation, and Report endpoints are deprecated for new workflows. Use the [Tables API](/reference/tables-create) for new dataset import, evaluation, scoring, recalculation, and reporting workflows.
</Warning>

Rename or retag an evaluation pipeline. Provide `name`, `tags`, or both when you want to update pipeline metadata without recreating the pipeline.

## Related

* [Create Evaluation Pipeline](/reference/create-reports)
* [Delete Evaluation Pipeline](/reference/delete-report)


# Request Analytics
Source: https://docs.promptlayer.com/reference/request-analytics

POST /api/public/v2/requests/analytics

Get aggregated analytics for request logs using the same filter syntax as [Search Request Logs](/reference/search-request-logs). The response contains precomputed totals, time-series buckets, latency percentiles, and model, prompt, provider, tag, and metadata breakdowns.

## Behavior Notes

* The response is an aggregated payload, not a paginated list of rows.
* Bucket size is selected automatically based on the filtered time range.
* `sort_by` and `sort_order` are accepted for query compatibility but do not affect aggregated output.

## Related

* [Search Request Logs](/reference/search-request-logs)
* [Search Request Suggestions](/reference/search-request-suggestions)
* [Analytics](/why-promptlayer/analytics)


# Request Analytics — Custom Queries
Source: https://docs.promptlayer.com/reference/request-analytics-custom-analytics

POST /api/public/v2/requests/analytics/custom-analytics

Run custom aggregations over your request logs. Define what to measure and how to slice it; the API returns structured data you can use however you want — feed it into a chart, run analysis on it, pipe it into a dashboard, or process it programmatically.

Each query in the `customCharts` array specifies a metric and optionally a breakdown dimension or time bucketing:

| Shape                    | Fields required                                         |
| ------------------------ | ------------------------------------------------------- |
| Single aggregate         | `metric` (and `metricField` unless `metric` is `count`) |
| Grouped breakdown        | Add `groupByField` or `groupByMetadataKey`              |
| Over time                | Add `timeSeries: true`                                  |
| Multiple metrics at once | Replace `metric`/`metricField` with a `series` array    |

## Metrics

`metric` controls the aggregation function:

| Value        | Description                                           |
| ------------ | ----------------------------------------------------- |
| `count`      | Number of matching requests (no `metricField` needed) |
| `sum`        | Total of a numeric field                              |
| `avg`        | Average of a numeric field                            |
| `min`        | Minimum value                                         |
| `max`        | Maximum value                                         |
| `percentile` | Arbitrary percentile — requires `percentile` (0–100)  |

## Supported metric fields (`metricField`)

`input_tokens`, `output_tokens`, `cost`, `latency_ms`, `prompt_version_number`, `turn_count`, `tool_call_count`, `cached_tokens`, `thinking_tokens`

> Latency values are returned in **seconds** (converted from milliseconds internally).

## Group-by fields (`groupByField`)

`engine`, `provider_type`, `prompt_id`, `prompt_version_number`, `status`, `error_type`, `tags`, `metadata_keys`, `output_keys`, `input_variable_keys`, `tool_names`

Use `groupByMetadataKey` instead to break down by values of a specific metadata key (e.g. `"environment"` or `"user_id"`).

## Filters

All filter fields from [Search Request Logs](/reference/search-request-logs) are supported (`filter_group`, `q`, `sort_by`, `sort_order`). Filters are applied before aggregation.

## Response shape

Each entry in the response `customCharts` array contains:

* **`id`** — echoes the id you sent
* **`series`** — array of series descriptors: `{ key, label, unit }` — describes what each numeric key in the data rows represents
* **`data`** — array of rows, each with a `label` and one numeric key per series. Time-bucketed rows also include `bucketKey` (ISO date string).
* **`derivedInsights`** — (multi-metric only) pre-computed ratio summaries

`chartType` and `title` are also echoed back but are optional hints — use them if you're rendering a chart, ignore them if you're just processing the numbers.

## Behavior Notes

* `sort_by` / `sort_order` are accepted for compatibility but do not affect aggregated output.
* Overall aggregates (no `timeSeries`, no `groupByField`) return a single row with `label: "Overall"`.
* Multi-metric grouped time-series is not supported — use multiple single-metric queries instead.
* `series` keys must be unique within a query; `id` values must be unique within the request.

## Related

* [Request Analytics](/reference/request-analytics)
* [Search Request Logs](/reference/search-request-logs)
* [Analytics](/why-promptlayer/analytics)


# Resolve Folder ID by Path
Source: https://docs.promptlayer.com/reference/resolve-folder-id

GET /api/public/v2/folders/resolve-id

Resolves a folder's ID from its dot-separated path (e.g., `"My Folder.Subfolder"`). Useful for programmatically navigating the folder hierarchy when you know the folder names but not the IDs.


# Run Full Evaluation
Source: https://docs.promptlayer.com/reference/run-report

POST /reports/{report_id}/run

<Warning>
  Legacy Dataset, Evaluation, and Report endpoints are deprecated for new workflows. Use the [Tables API](/reference/tables-create) for new dataset import, evaluation, scoring, recalculation, and reporting workflows.
</Warning>

This endpoint allows you to run an evaluation pipeline. You can optionally update the dataset.


# Run Workflow
Source: https://docs.promptlayer.com/reference/run-workflow

POST /workflows/{workflow_name}/run

Initiate an execution of a Workflow by name. You can run the latest version, choose a version by label or version number, pass input variables and metadata, and optionally receive results asynchronously with a callback URL.

## Behavior Notes

* `workflow_label_name` and `workflow_version_number` identify a specific version; omit both to run the latest version.
* When `callback_url` is provided, the API accepts the run immediately and posts results to your callback when execution finishes.
* Use [Get Workflow Version Execution Results](/reference/workflow-version-execution-results) to poll for execution output.

## Related

* [Create Workflow](/reference/create-workflow)
* [Get Workflow Version Execution Results](/reference/workflow-version-execution-results)
* [Workflows](/why-promptlayer/workflows)


# Save Draft Dataset Version
Source: https://docs.promptlayer.com/reference/save-draft-dataset-version

POST /api/public/v2/dataset-versions/save-draft

<Warning>
  Legacy Dataset, Evaluation, and Report endpoints are deprecated for new workflows. Use the [Tables API](/reference/tables-create) for new dataset import, evaluation, scoring, recalculation, and reporting workflows.
</Warning>

Publish a draft dataset version by assigning it the next real version number. PromptLayer queues the save operation and returns immediately.

## Behavior Notes

* The draft's `version_number` changes from `-1` to the next sequential version number when the background job completes.
* If no draft exists for the dataset group, the endpoint returns `404`.

## Related

* [Create Draft Dataset Version](/reference/create-draft-dataset-version)
* [Add Request Log to Dataset](/reference/add-request-log-to-dataset)
* [Get Dataset Rows](/reference/get-dataset-rows)


# Save Skill Collection Version
Source: https://docs.promptlayer.com/reference/save-skill-collection-version

POST /api/public/v2/skill-collections/{identifier}/versions

Create a new saved version of a skill collection.

### Request formats

Use `application/json` to send `file_updates`, `moves`, `deletes`, `commit_message`, and `release_label` directly.

Use `multipart/form-data` to upload a ZIP archive plus metadata encoded as a JSON string in `metadata` or `json`.


# Search Request Logs
Source: https://docs.promptlayer.com/reference/search-request-logs

POST /api/public/v2/requests/search

Search logged requests using structured filters, free-text search, sorting, and pagination. This endpoint is useful for analytics, debugging, data export, and custom dashboards.

## Behavior Notes

* This endpoint is rate limited to 10 requests per minute.
* Results are capped at 25 items per page.
* Use `filter_group` for structured filters and `q` for fuzzy prefix search across prompt input and LLM output text.
* Search indexing, nested fields, and operator behavior are explained in [Search Data Model](/features/prompt-history/search-data-model).

## Related

* [Search Request Suggestions](/reference/search-request-suggestions)
* [Request Analytics](/reference/request-analytics)
* [Search Data Model](/features/prompt-history/search-data-model)


# Search Request Suggestions
Source: https://docs.promptlayer.com/reference/search-request-suggestions

GET /api/public/v2/requests/suggestions

Get autocomplete suggestions for request log fields. Use this endpoint to power search UIs, discover values in your logs, or scope suggestions with the same filters used by request-log search.

## Behavior Notes

* This endpoint is rate limited to 10 requests per minute.
* `filter_group` uses the same JSON-encoded structured filter syntax as [Search Request Logs](/reference/search-request-logs).
* Fields that return nested values require `metadata_key` to identify the nested key.

## Related

* [Search Request Logs](/reference/search-request-logs)
* [Search Data Model](/features/prompt-history/search-data-model)
* [Advanced Search](/why-promptlayer/advanced-search)


# Create Spans Bulk
Source: https://docs.promptlayer.com/reference/spans-bulk

POST /spans-bulk

Create multiple observability spans in one request, optionally creating a request log alongside each span. Use this endpoint when ingesting telemetry data that is not already sent through OTLP.

## Behavior Notes

* When `log_request` is provided, the created request log is associated with the span by `span_id`.
* If `request_start_time` or `request_end_time` is omitted from `log_request`, PromptLayer inherits the value from the parent span.
* If a referenced prompt is not found, the span is still created but the request log creation for that span is skipped.
* Bulk span creation is atomic; if any span creation fails, the entire batch is rolled back.

## Related

* [Ingest Traces (OTLP)](/reference/otlp-ingest-traces)
* [Traces](/running-requests/traces)
* [OpenTelemetry](/features/opentelemetry)


# Get Cell
Source: https://docs.promptlayer.com/reference/table-sheet-cells-get

GET /api/public/v2/tables/{table_id}/sheets/{sheet_id}/cells/{cell_id}
Retrieve a single cell by ID. For prompt-template column cells, the response includes a `request_metrics` object with price, latency, and token usage when available. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

Retrieve a single cell by ID, including its current status, display value, and structured value.

Cell statuses: `completed`, `stale`, `running`, `queued`, `error`, `cancelled`.


# Recalculate Cell
Source: https://docs.promptlayer.com/reference/table-sheet-cells-recalculate

POST /api/public/v2/tables/{table_id}/sheets/{sheet_id}/cells/{cell_id}/recalculations
Trigger recalculation for a single cell. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

Trigger recalculation for a single cell. Returns `202 Accepted` with an `execution_id` that can be used to track progress.


# Recalculate Cells (Batch)
Source: https://docs.promptlayer.com/reference/table-sheet-cells-recalculate-batch

POST /api/public/v2/tables/{table_id}/sheets/{sheet_id}/cells/recalculations
Trigger recalculation for a batch of cells identified by ID. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

Trigger recalculation for a batch of cells by ID. Returns `202 Accepted` with an `execution_id`.

Use this endpoint to kick off computation for specific cells after adding rows or updating text cell values.


# Update Cell
Source: https://docs.promptlayer.com/reference/table-sheet-cells-update

PATCH /api/public/v2/tables/{table_id}/sheets/{sheet_id}/cells/{cell_id}
Edit the value of a text column cell. Only cells in `text` type columns can be edited directly; computed cells are recalculated automatically. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

Edit the value of a text column cell. Only cells in `text` type columns can be edited directly — non-text column cells are computed automatically.

Editing a cell marks downstream dependent cells as `stale`.

Editing a cell creates a new sheet version. The response returns `version`, and cells include `last_computed_version` when they were produced by computation.


# Create Column
Source: https://docs.promptlayer.com/reference/table-sheet-columns-create

POST /api/public/v2/tables/{table_id}/sheets/{sheet_id}/columns
Add a new column to a sheet. Non-text columns will generate cells for all existing rows. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

Add a new column to a sheet. For non-text columns, cells for all existing rows are created with `stale` status and queued for computation.

Use `dependencies` to declare which other columns this column's config references. PromptLayer enforces a DAG (no cycles allowed) and uses the dependency graph to propagate staleness when upstream cells change.

Creating a column creates a new sheet version. The response returns `version`, the current sheet version count after the column is added.


# List Columns
Source: https://docs.promptlayer.com/reference/table-sheet-columns-list

GET /api/public/v2/tables/{table_id}/sheets/{sheet_id}/columns
List columns in a sheet, ordered by position rank. By default, system-managed metadata columns such as price and latency columns are excluded; pass `include_system_columns=true` to include them. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

List all columns in a sheet, ordered by their position rank.

Each column has a `type` that determines how its cells are populated:

* `text` — Free-text cells, editable directly.
* `prompt_template`, `llm`, `code`, `score`, `comparison`, `composition` — Computed columns that run automatically.


# Update Column
Source: https://docs.promptlayer.com/reference/table-sheet-columns-update

PATCH /api/public/v2/tables/{table_id}/sheets/{sheet_id}/columns/{column_id}
Update a column's title, config, or dependencies. Returns `requires_recalculation: true` when the change invalidates existing cell values. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

Update a column's title, config, or dependencies.

When the change invalidates existing cell values (e.g., the prompt template or code changes), the response includes `requires_recalculation: true` and the list of `affected_column_ids`.

When the update changes column metadata or configuration, PromptLayer creates a new sheet version and returns `version`.


# Import File
Source: https://docs.promptlayer.com/reference/table-sheet-imports-file-create

POST /api/public/v2/tables/{table_id}/sheets/{sheet_id}/imports/file
Start an asynchronous CSV import into an existing Table sheet. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

Start an asynchronous CSV import into an existing sheet.

Send `file_name` ending in `.csv` and `file_content_base64` containing the base64-encoded CSV content.

Pass `operation_id` when you want a stable client-side identifier for polling and webhook correlation. If omitted, PromptLayer generates one.

The response returns an `operation_id` and `status_url` for polling with [Get Sheet Import Operation](/reference/table-sheets-get-operation).


# Import Request Logs
Source: https://docs.promptlayer.com/reference/table-sheet-imports-request-logs-create

POST /api/public/v2/tables/{table_id}/sheets/{sheet_id}/imports/request-logs
Start an asynchronous request-history import into an existing Table sheet. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

Start an asynchronous request-history import into an existing sheet.

Provide either `request_log_ids` or a request-log `filter_group`. You can also pass `variables_to_parse`, `include_fields`, and `limit` to control imported columns and rows. `request_log_ids` and `limit` support up to 50,000 rows.

Pass `operation_id` when you want a stable client-side identifier for polling and webhook correlation. If omitted, PromptLayer generates one.

The response returns an `operation_id` and `status_url` for polling with [Get Sheet Import Operation](/reference/table-sheets-get-operation).


# Cancel Operation
Source: https://docs.promptlayer.com/reference/table-sheet-operations-cancel

DELETE /api/public/v2/tables/{table_id}/sheets/{sheet_id}/operations/{operation_id}
Cancel an active Table sheet recalculation operation. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

Cancel an active recalculation operation for a sheet.

If no active execution exists for the provided `operation_id`, the endpoint returns success with zero cancelled cells.


# Create Operation
Source: https://docs.promptlayer.com/reference/table-sheet-operations-create

POST /api/public/v2/tables/{table_id}/sheets/{sheet_id}/operations
Queue a recalculation operation for selected columns, rows, and cell statuses. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

Queue a `recalculate` operation for selected columns, rows, and cell statuses.

By default, recalculation targets stale computed cells. Pass `column_ids`, `row_ids`, or `statuses` to narrow the operation.

`row_ids` are zero-based row indices. `statuses` accepts `STALE`, `QUEUED`, `DISPATCHED`, `RUNNING`, `COMPLETED`, and `FAILED`; omit it to target stale cells, or pass an empty array to include all statuses. Text columns cannot be recalculated.

If the operation affects many cells, the response may return `requires_confirmation` with a `confirmation_token`; pass that token in a follow-up request to proceed. If no matching cells need work, the endpoint can return success with `cell_count: 0`.

Queued recalculations return `version`, the sheet version count used for the operation response. Computed cells report `last_computed_version` after they complete.


# Get Operation
Source: https://docs.promptlayer.com/reference/table-sheet-operations-get

GET /api/public/v2/tables/{table_id}/sheets/{sheet_id}/operations/{operation_id}
Poll a Table sheet recalculation operation by operation ID. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

Poll a sheet recalculation operation.

The response includes aggregate cell counts by status plus completed, failed, and pending counts.


# List Operations
Source: https://docs.promptlayer.com/reference/table-sheet-operations-list

GET /api/public/v2/tables/{table_id}/sheets/{sheet_id}/operations
List active recalculation operations and cell status counts for a Table sheet. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

List active recalculation operations and cell status counts for a sheet.

Use this endpoint to determine whether a sheet has queued or running work before mutating columns, rows, cells, or score configuration.


# Add Rows
Source: https://docs.promptlayer.com/reference/table-sheet-rows-add

POST /api/public/v2/tables/{table_id}/sheets/{sheet_id}/rows
Append one or more rows to a sheet. Text column values can be set immediately; non-text column cells are created with `stale` status and must be triggered via a recalculation. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

Append one or more rows (up to 100 at a time) to a sheet.

Text column values can be set immediately via the `values` array. Non-text column cells are created with `stale` status — trigger a [recalculation](/reference/table-sheet-cells-recalculate) to compute them.

Adding rows creates a new sheet version. The response returns `version`, the current sheet version count after the rows are appended.


# List Rows
Source: https://docs.promptlayer.com/reference/table-sheet-rows-list

GET /api/public/v2/tables/{table_id}/sheets/{sheet_id}/rows
List rows in a sheet, each containing a map of column_id to cell. For prompt-template column cells, each cell can include a `request_metrics` object with price, latency, and token usage when available. Pass `include_columns=true` on the first page to receive column metadata alongside rows. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

List rows in a sheet. Each row contains a map of `column_id → cell`.

Pass `include_columns=true` (the default on the first page) to receive column metadata alongside rows — useful for building column headers.

Use cursor-based pagination for large sheets.

The response includes `version`, the current sheet version count for the returned rows.


# Configure Score
Source: https://docs.promptlayer.com/reference/table-sheet-score-configure

PATCH /api/public/v2/tables/{table_id}/sheets/{sheet_id}/score
Configure scoring for a Table sheet. This endpoint updates the configuration and returns whether recalculation is required; it does not queue score calculation. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

Configure scoring for a sheet.

Use `column_ids` or `column_names` for standard boolean or numeric scoring. Column names must be unique in the sheet.

Use `score_type` with `score_config` for explicit configuration. Supported scoring modes are `auto`, `boolean`, `numeric`, and `custom`; `score_type` is required when you pass `score_config`.

For custom scoring, pass `code` and optionally `code_language` (`PYTHON` by default, or `JAVASCRIPT`). Boolean scoring also supports `true_values`, `false_values`, and `assertion_aggregation` (`all`, `any`, or `mean`).

This endpoint updates the configuration and returns `requires_recalculation`; call the recalculation endpoint to queue score calculation.

Changing score configuration creates a new sheet version. The response returns `version`, the current sheet version count after the configuration update.


# Get Score
Source: https://docs.promptlayer.com/reference/table-sheet-score-get

GET /api/public/v2/tables/{table_id}/sheets/{sheet_id}/score
Retrieve the current score payload for a Table sheet. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

Retrieve the current score payload for a sheet.

When explicit scoring is not configured, PromptLayer returns the default score it can infer from output columns. Configured scoring responses include status and configuration metadata.


# Get Score History
Source: https://docs.promptlayer.com/reference/table-sheet-score-history-get

GET /api/public/v2/tables/{table_id}/sheets/{sheet_id}/versions/score-history
Retrieve score-history points across Table sheet versions. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

Retrieve score-history points across sheet versions.

Use `range` to limit the history window (`all`, `last_25`, `last_50`, `last_100`, or `last_250`), `resolution` to control sampling (`auto`, `raw`, or `min_max_bucket`), and `max_points` to cap the number of returned points. `max_points` defaults to 1200 and must be between 50 and 5000.


# Recalculate Score
Source: https://docs.promptlayer.com/reference/table-sheet-score-recalculate

POST /api/public/v2/tables/{table_id}/sheets/{sheet_id}/score
Queue recalculation for an existing Table sheet score configuration. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

Queue recalculation for an existing sheet score configuration.

This endpoint requires scoring to already be configured on the sheet. The response returns the `score_configuration_id` and score calculation `status` (`queued`, `running`, `completed`, `failed`, or `null`).


# Create Version
Source: https://docs.promptlayer.com/reference/table-sheet-versions-create

POST /api/public/v2/tables/{table_id}/sheets/{sheet_id}/versions
Create a named Table sheet version, or restore from an existing version while creating a new version. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

Create a named version for a sheet.

Pass `name` to save the current sheet state. `name` is required when `source_version_id` is omitted.

Pass `source_version_id` to restore from an existing version while creating a new version.


# Get Version
Source: https://docs.promptlayer.com/reference/table-sheet-versions-get

GET /api/public/v2/tables/{table_id}/sheets/{sheet_id}/versions/{version_id}
Retrieve one Table sheet version, including its snapshot. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

Retrieve one sheet version, including its snapshot.

The snapshot contains the versioned sheet structure and row data.


# List Versions
Source: https://docs.promptlayer.com/reference/table-sheet-versions-list

GET /api/public/v2/tables/{table_id}/sheets/{sheet_id}/versions
List saved versions for a Table sheet. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

List saved versions for a sheet using cursor pagination. Results are sorted by `version_number`; use `order=asc` or `order=desc` to control direction.

The response includes version summaries and related metadata in `data`. Pagination metadata is returned in `pagination`, including `next_cursor`, `has_more`, and `limit`.

Treat `next_cursor` as opaque. It includes the current sort, order, cursor value, and a hash of the active filters. Reuse a cursor only with the same `sort`, `order`, and filter parameters from the request that produced it; changing any of those while passing an old cursor returns an invalid cursor error.


# Create Sheet
Source: https://docs.promptlayer.com/reference/table-sheets-create

POST /api/public/v2/tables/{table_id}/sheets
Create a new sheet in a table by importing data from a file (CSV or JSON, base64-encoded) or from request log history. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

Create a new sheet in a table by importing data. Two source types are supported:

* **file** — Upload a CSV or JSON file (base64-encoded, max 100 MB). The import runs asynchronously; poll the operation endpoint to track progress.
* **request\_logs** — Import from your PromptLayer request history. Filter by prompt, version, label, or date range.


# Get Sheet
Source: https://docs.promptlayer.com/reference/table-sheets-get

GET /api/public/v2/tables/{table_id}/sheets/{sheet_id}
Get Sheet. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

Retrieve a single sheet by ID, including its current row count.

The `version_count` field is the current sheet version number.


# Get Sheet Import Operation
Source: https://docs.promptlayer.com/reference/table-sheets-get-operation

GET /api/public/v2/tables/{table_id}/sheets/operations/{operation_id}
Poll the status of an asynchronous sheet import operation. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

Poll the status of an asynchronous sheet import operation. The `operation_id` is returned when a sheet is created via the [Create Sheet](/reference/table-sheets-create) endpoint.


# List Sheets
Source: https://docs.promptlayer.com/reference/table-sheets-list

GET /api/public/v2/tables/{table_id}/sheets
List all sheets in a table, ordered by their index. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

List all sheets in a table, ordered by their index (display position).


# Update Sheet
Source: https://docs.promptlayer.com/reference/table-sheets-update

PATCH /api/public/v2/tables/{table_id}/sheets/{sheet_id}
Update Sheet. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

Update a sheet's title or display index (position within the table).

Updating a sheet creates a new sheet version. The response returns `version`, the current sheet version count after the update.


# Create Table
Source: https://docs.promptlayer.com/reference/tables-create

POST /api/public/v2/tables
Create a new Table. A default sheet with one text column is created automatically. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

Create a new Table. A default sheet with one text column is created automatically.

The table is created in the workspace associated with your API key.

Tables are versioned, multi-sheet tables that can run LLM, code, and comparison columns to generate or evaluate data at scale.


# Get Table
Source: https://docs.promptlayer.com/reference/tables-get

GET /api/public/v2/tables/{table_id}
Get Table. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

Retrieve a single Table by ID, including its sheet count and per-sheet row counts.


# List Tables
Source: https://docs.promptlayer.com/reference/tables-list

GET /api/public/v2/tables
List Tables in the workspace. Supports cursor-based pagination and optional filtering by folder or prompt column. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

List Tables in the workspace. Supports cursor-based pagination. Filter by folder, title, or by prompt columns that reference specific prompts.

Results are scoped to the workspace associated with your API key.


# Update Table
Source: https://docs.promptlayer.com/reference/tables-update

PATCH /api/public/v2/tables/{table_id}
Update Table. Requests are scoped to the workspace associated with the API key; table, sheet, column, cell, operation, and version IDs must belong to that workspace.

Update a Table's title or folder.


# Get Prompt Template
Source: https://docs.promptlayer.com/reference/templates-get

POST /prompt-templates/{identifier}

Retrieve a prompt template using either the `prompt_name` or `prompt_id`. Optionally, specify `version` (version number) or `label` (release label like "prod") to retrieve a specific version. If not specified, the latest version is returned.

PromptLayer will try to read the model provider from the parameters you attached to the prompt template. You can optionally pass in a `provider` to override the one set in the Prompt Registry. This will return LLM-specific arguments that can be passed directly into your LLM client. To format the template with input variables, use `input_variables`.

<Warning>
  **Provider-Specific Schema Notice**

  The `llm_kwargs` object in the response is provider-specific and its structure may change without notice as LLM providers update their APIs (e.g., Provider's system message format changing from string to an array).

  For stable, provider-agnostic prompt data, use `prompt_template` instead of `llm_kwargs`. Do not hard-code assumptions about `llm_kwargs` structure in production applications.
</Warning>


# Get Prompt Template (Raw)
Source: https://docs.promptlayer.com/reference/templates-get-raw

GET /prompt-templates/{identifier}

Retrieve raw prompt template data without applying input variables. This endpoint is useful for template inspection, local caching, and GitHub sync workflows.

## Behavior Notes

* `resolve_snippets=false` returns raw `@@@snippet@@@` references instead of expanded snippet content.
* `include_llm_kwargs=true` includes provider-specific LLM API arguments for local execution or caching.
* To bypass the default cache, send `Cache-Control: no-cache`.
* The `llm_kwargs` shape is provider-specific and may change as provider APIs change; use `prompt_template` for stable, provider-agnostic data.

## Related

* [Patch Prompt Template Version](/reference/templates-patch)
* [Snippets](/features/prompt-registry/snippets)
* [Prompt Editor & Versioning](/features/prompt-registry/prompt-editor-versioning)


# List Prompt Template Labels
Source: https://docs.promptlayer.com/reference/templates-labels-get

GET /prompt-templates/{identifier}/labels

Retrieve all the release labels assigned to a prompt template. Identifiers can be either `prompt_name` or `prompt_id`.


# Patch Prompt Template Version
Source: https://docs.promptlayer.com/reference/templates-patch

PATCH /rest/prompt-templates/{identifier}

Partially update a prompt template by creating a new version from an existing base version. Use this endpoint to change selected prompt fields, model parameters, release labels, or version metadata without resending the full template.

## Behavior Notes

* `version` and `label` select the base version to patch; omit both to patch from the latest version.
* Chat template fields such as `messages`, `tools`, and `functions` can be patched by index with an object or fully replaced with an array.
* Completion template `content` follows the same patch-or-replace behavior.
* `model_parameters` are shallow-merged; existing keys not mentioned in the request are preserved.
* `release_labels` are created or moved to the newly created version.

## Related

* [Get Prompt Template Raw](/reference/templates-get-raw)
* [Prompt Editor & Versioning](/features/prompt-registry/prompt-editor-versioning)
* [Release Labels](/features/prompt-registry/release-labels)


# Publish Prompt Template
Source: https://docs.promptlayer.com/reference/templates-publish

POST /rest/prompt-templates

Publish Prompt Template allows you to programmatically create a new version of a prompt template and make it available for use in the application.


# Create Tool Registry
Source: https://docs.promptlayer.com/reference/tool-registry-create

POST /api/public/v2/tool-registry

Create a new tool in the Tool Registry with an initial version. The tool definition should be in OpenAI function-calling format.

Tool names must be unique within a workspace. The initial version is created with version number 1.

Pass an optional `execution` payload to attach a sandbox-executable body. PromptLayer will run it between LLM turns whenever a prompt references the tool. See [Auto Tool Execution](/features/tool-registry/auto-execution) for the body model and conventions.


# Create Tool Version
Source: https://docs.promptlayer.com/reference/tool-registry-create-version

POST /api/public/v2/tool-registry/{identifier}/versions

Create a new immutable version of an existing tool. Use this to update the schema or to attach an [auto-execution](/features/tool-registry/auto-execution) body.

The new version is assigned the next sequential version number. Pass a `commit_message` to leave an audit-trail note.

If you include `execution`, PromptLayer will run the body in a sandbox whenever a prompt referencing this version is invoked. The `code` field is the function **body only**. The signature is generated server-side.


# Get Tool Registry
Source: https://docs.promptlayer.com/reference/tool-registry-get

GET /api/public/v2/tool-registry/{identifier}

Retrieve a tool from the Tool Registry by ID or name. Optionally resolve a specific version using the `label` or `version` query parameter. If neither is specified, the latest version is returned.

The `identifier` can be either:

* A numeric tool ID (e.g., `123`)
* A tool name (e.g., `get_weather`)

The resolved version includes the `tool_definition`. If the version was saved with an [auto-execution](/features/tool-registry/auto-execution) body, it also includes an `execution: { type, language, code }` object.


# List Tool Registries
Source: https://docs.promptlayer.com/reference/tool-registry-list

GET /api/public/v2/tool-registry

List all tools in the Tool Registry for the workspace. Returns tool names, IDs, and metadata ordered by most recently updated.


# Test Execute Tool
Source: https://docs.promptlayer.com/reference/tool-registry-test-execute

POST /api/public/v2/tool-registry/{identifier}/test-execute

Run a tool's [execution body](/features/tool-registry/auto-execution) in the sandbox against test inputs. This is what the editor's **Test Run** button calls behind the scenes; useful for verifying a body works as expected before promoting the version.

## Resolving the version

Without `label` or `version` query params, the latest version is used. Pass `label=production` to test the production version, or `version=3` to pin to a specific number.

## In-flight overrides

Both `execution` and `tool_definition` in the body are **optional overrides**. If you supply them, they take precedence over what's stored on the version. This lets you test changes without saving a new version first.

## Result shape

The endpoint returns `200` for both successful runs and user-code errors. Inspect `result.status`:

* `"success"` → your body returned a value, available at `result.result`
* `"error"` → your body raised, with details under `result.error.message`

Sandbox infrastructure failures (sandbox unreachable, internal errors) return `502`. These are operational issues, not your code's fault.


# Track Metadata
Source: https://docs.promptlayer.com/reference/track-metadata

POST /rest/track-metadata

Associate metadata with an existing request log. Use this for values such as session IDs, user IDs, environment names, regions, or other searchable request context.

## Related

* [Track Prompt](/reference/track-prompt)
* [Metadata](/features/prompt-history/metadata)
* [Log Request](/reference/log-request)


# Track Prompt
Source: https://docs.promptlayer.com/reference/track-prompt

POST /rest/track-prompt

Associate a prompt template with an existing request log. Use this after logging a request when you want PromptLayer to connect that request back to a prompt template version or release label.

## Related

* [Track Metadata](/reference/track-metadata)
* [Tracking Templates](/features/prompt-history/tracking-templates)
* [Log Request](/reference/log-request)


# Track Score
Source: https://docs.promptlayer.com/reference/track-score

POST /rest/track-score

Track score allows you to associate a score 0-100 with each request.


# Update Folder
Source: https://docs.promptlayer.com/reference/update-folder

PATCH /api/public/v2/folders/{folder_id}

Renames an existing folder. The new name must be unique within the folder's parent (or at root level). The folder must belong to a workspace accessible by the authenticated user.


# Configure Custom Scoring
Source: https://docs.promptlayer.com/reference/update-report-score-card

PATCH /reports/{report_id}/score-card

<Warning>
  Legacy Dataset, Evaluation, and Report endpoints are deprecated for new workflows. Use the [Tables API](/reference/tables-create) for new dataset import, evaluation, scoring, recalculation, and reporting workflows.
</Warning>

Configure the score card for an evaluation pipeline. Use this endpoint to choose the columns that contribute to the score and, optionally, provide custom Python or JavaScript scoring code.

## Behavior Notes

* Custom code receives a `data` variable containing evaluation rows and must return an object with at least a `score` key from 0 to 100.
* If no custom code is provided, PromptLayer uses default scoring over the selected columns.
* Updating a blueprint affects future runs; completed runs recalculate their score after the score card changes.

## Related

* [Create Evaluation Pipeline](/reference/create-reports)
* [Score Card](/features/evaluations/score-card)
* [Node & Column Types](/features/evaluations/column-types)


# Update Skill Collection
Source: https://docs.promptlayer.com/reference/update-skill-collection

PATCH /api/public/v2/skill-collections/{identifier}

Rename an existing skill collection by identifier.

This endpoint updates collection metadata only. To create a new saved version of the collection contents, use [Save Skill Collection Version](/reference/save-skill-collection-version).


# Update Table
Source: https://docs.promptlayer.com/reference/update-table

PATCH /api/public/v2/tables/{table_id}
Update a Table title or folder.

Update a Table.

Use `title` to rename the Table. Use `folder_id` to move it into a registry folder, or pass `folder_id: null` to remove it from a folder.


# Update Table Sheet
Source: https://docs.promptlayer.com/reference/update-table-sheet

PATCH /api/public/v2/tables/{table_id}/sheets/{sheet_id}
Update a Table sheet title or index.

Update a sheet in a Table.

Use `title` to rename the sheet and `index` to change its zero-based position in the Table. Updating a sheet creates a new sheet version.


# Update Table Sheet Cell
Source: https://docs.promptlayer.com/reference/update-table-sheet-cell

PATCH /api/public/v2/tables/{table_id}/sheets/{sheet_id}/cells/{cell_id}
Update a Table sheet cell.

Update a text-column cell with `display_value`, `value`, or both.

Only text-column cells can be edited directly. Updating a cell marks dependent cells stale and updates the sheet version.

The response returns the updated `cell`, the new sheet `version`, and `stale_count`, which is the number of dependent cells marked stale by the update.


# Update Table Sheet Column
Source: https://docs.promptlayer.com/reference/update-table-sheet-column

PATCH /api/public/v2/tables/{table_id}/sheets/{sheet_id}/columns/{column_id}
Update a Table sheet column.

Update a column title, config, or dependencies.

Config or dependency changes may require recalculation. The response includes `requires_recalculation` and `affected_column_ids` so you can decide whether to run a recalculation operation.


# Get Workflow Version Execution Results
Source: https://docs.promptlayer.com/reference/workflow-version-execution-results

GET /workflow-version-execution-results

Retrieve the execution results of a specific Workflow version. You can include all output nodes by setting the `return_all_outputs` query parameter to `true`.

| Status  | Meaning                                                                                               |
| ------- | ----------------------------------------------------------------------------------------------------- |
| **200** | Execution is **complete**. All nodes have reached a final status (`SUCCESS`, `FAILED`, or `SKIPPED`). |
| **202** | Execution is **still running**. At least one node is in a non-final status (`QUEUED` or `RUNNING`).   |

The response body schema is the same for both status codes. Poll until you receive a `200`.

<ResponseExample>
  ```json 200 (all) theme={null}
  {
    "Node 1": {
      "status": "SUCCESS",
      "value": "First node result",
      "error_message": null,
      "raw_error_message": null,
      "is_output_node": false
    },
    "Node 2": {
      "status": "SUCCESS",
      "value": "Final result",
      "error_message": null,
      "raw_error_message": null,
      "is_output_node": true
    }
  }
  ```

  ```json 200 (output_node) theme={null}
  "Final result"
  ```

  ```json 202 theme={null}
  {
    "Node 1": {
      "status": "SUCCESS",
      "value": "First node result",
      "error_message": null,
      "raw_error_message": null,
      "is_output_node": false
    },
    "Node 2": {
      "status": "RUNNING",
      "value": null,
      "error_message": null,
      "raw_error_message": null,
      "is_output_node": true
    }
  }
  ```
</ResponseExample>


# Prompt Blueprints
Source: https://docs.promptlayer.com/running-requests/prompt-blueprints


Prompt Blueprints are a core concept in PromptLayer that provides a standardized, model-agnostic representation of prompts. They serve as an abstraction layer that:

* Creates a unified format that works across all LLM providers (OpenAI, Anthropic, etc.)
* Enables seamless switching between different models without code changes
* Standardizes how prompts, responses, and tool calls are structured and stored
* Ensures consistent handling of various content types (text, images, function calls)

Think of Prompt Blueprints as a universal language for LLM interactions that shields your application from provider-specific implementation details.

## Accessing the Prompt Blueprint

Instead of accessing the raw LLM response via `response["raw_response"]`, it's recommended to use the standardized `response["prompt_blueprint"]`. This ensures consistency across different providers.

<Warning>
  **Provider-Specific Schema Notice**

  The `raw_response` object structure is provider-specific and may change without notice as LLM providers update their APIs. PromptLayer passes through the native format from each provider. For stable, provider-agnostic prompt data, always use `prompt_blueprint` instead.
</Warning>

```python theme={null}
response = promptlayer_client.run(
    prompt_name="ai-poet",
    input_variables={'topic': 'food'},
)

print(response["prompt_blueprint"]["prompt_template"]["messages"][-1]["content"][-1]["text"])
```

With this approach, you can update from one provider to another (e.g., OpenAI to Anthropic) without any code changes.

## Streaming Support

PromptLayer now supports streaming responses with prompt\_blueprint integration. When streaming is enabled, each chunk includes both the raw streaming response and the progressively built prompt\_blueprint, allowing you to track how the response is constructed in real-time.

### OpenAI Streaming Example

```python theme={null}
import promptlayer

# Initialize PromptLayer client
promptlayer_client = promptlayer.PromptLayer()

# Run with streaming enabled
response_stream = promptlayer_client.run(
    prompt_name="ai-poet",
    input_variables={'topic': 'food'},
    stream=True
)

# Process streaming chunks
for chunk in response_stream:
    # Access the raw streaming response
    raw_chunk = chunk["raw_response"]
    
    # Access the progressively built prompt blueprint
    prompt_blueprint = chunk["prompt_blueprint"]
    
    if raw_chunk.choices and raw_chunk.choices[0].delta.content:
        print(f"Streaming content: {raw_chunk.choices[0].delta.content}")
    
    # The prompt_blueprint shows the current state of the response
    if prompt_blueprint and prompt_blueprint["prompt_template"]["messages"]:
        current_response = prompt_blueprint["prompt_template"]["messages"][-1]
        if current_response.get("content"):
            print(f"Current response: {current_response['content']}")
```

### Anthropic Streaming Example

```python theme={null}
import promptlayer

# Initialize PromptLayer client
promptlayer_client = promptlayer.PromptLayer()

# Run with streaming enabled for Anthropic
response_stream = promptlayer_client.run(
    prompt_name="helpful-assistant",
    input_variables={'task': 'prompt engineering tip'},
    stream=True
)

# Process streaming chunks
for chunk in response_stream:
    # Access the raw streaming response
    raw_chunk = chunk["raw_response"]
    
    # Access the progressively built prompt blueprint
    prompt_blueprint = chunk["prompt_blueprint"]
    
    # Handle different Anthropic streaming event types
    if raw_chunk.get("type") == "content_block_delta":
        delta = raw_chunk.get("delta", {})
        if delta.get("type") == "text_delta":
            print(f"Streaming content: {delta.get('text', '')}")
    
    # The prompt_blueprint shows the current state of the response
    if prompt_blueprint and prompt_blueprint["prompt_template"]["messages"]:
        current_response = prompt_blueprint["prompt_template"]["messages"][-1]
        if current_response.get("content") and len(current_response["content"]) > 0:
            # Get the text content from the current response
            text_content = current_response["content"][0].get("text", "")
            if text_content:
                print(f"Current response: {text_content}")
```

### Key Features of Streaming with Prompt Blueprint

* **Progressive Building**: Each streaming chunk includes the current state of the prompt\_blueprint, showing how the response is built incrementally
* **Real-time Access**: You can access both the raw streaming data and the structured prompt blueprint format simultaneously
* **Consistent Format**: The prompt\_blueprint maintains the same standardized format across all streaming chunks
* **Final State**: The last chunk contains the complete prompt\_blueprint with the full response

### Streaming Response Structure

Each streaming chunk contains:

```python theme={null}
{
    "request_id": None,  # Present only in the final chunk
    "raw_response": ChatCompletionChunk(...),  # Raw streaming response from provider
    "prompt_blueprint": {
        "prompt_template": {
            "type": "chat",
            "messages": [
                {
                    "role": "assistant",
                    "content": [
                        {"type": "text", "text": "Current response text..."}
                    ],
                    "input_variables": [],
                    "template_format": "f-string"
                }
            ],
            "input_variables": []
        },
        "metadata": {
            "model": {
                "provider": "openai",
                "name": "gpt-4o",
                "parameters": {
                    "temperature": 1,
                    "stream": True,
                    # ... other parameters
                }
            }
        }
    }
}
```

The `request_id` is only included in the final chunk, indicating the completion of the streaming response.

## Placeholder Messages

Placeholder Messages are a powerful feature that allows you to inject messages into a prompt template at runtime. By using the `placeholder` role, you can define placeholders within your prompt template that can be replaced with full messages when the prompt is executed.

For more detailed information on Placeholder Messages, including how to create and use them, please refer to our dedicated [Placeholder Messages Documentation](/features/prompt-registry/placeholder-messages) page.

### Running a Template with Placeholders

When running a prompt that includes placeholders, you need to supply the messages that will replace the placeholders in the input variables.

```python theme={null}
response = promptlayer_client.run(
    prompt_name="template-name",
    input_variables={
        "fill_in_message": [
            {
                "role": "user",
                "content": [{"type": "text", "text": "My age is 29"}],
            },
            {
                "role": "assistant",
                "content": [{"type": "text", "text": "What a wonderful age!"}],
            }
        ]
    },
)
```

**Note**: The messages provided must conform to the Prompt Blueprint format.

## Prompt Blueprint Message Format

Each message in a Prompt Blueprint should be a dictionary with the following structure:

* **`role`**: The role of the message sender (`user`, `assistant`, `system`, `tool`, `function`, `placeholder`, `developer`).
* **`content`**: A list of content items, where each item has a **`type`** field that determines the content structure.

### Supported Content Types

| Type                                     | Description                                                           |
| ---------------------------------------- | --------------------------------------------------------------------- |
| `text`                                   | Text content with a `text` field                                      |
| `thinking`                               | Model reasoning with `thinking` and optional `signature` fields       |
| `code`                                   | Code block with a `code` field (from code execution tools)            |
| `image_url`                              | Image content with an `image_url` object containing a `url`           |
| `media`                                  | Media content (images, PDFs) with a `media` object                    |
| `media_variable`                         | Dynamic media variable with a `name` field                            |
| `output_media`                           | LLM-generated media (e.g. images) with a `url` and `mime_type`        |
| `server_tool_use`                        | Server-side tool invocation with `id`, `name`, and `input`            |
| `web_search_tool_result`                 | Web search results with citations                                     |
| `code_execution_result`                  | Code execution output with `output` and `outcome` fields              |
| `mcp_list_tools`                         | MCP server tool listing with `server_label` and `tools`               |
| `mcp_call`                               | MCP tool call with `name`, `server_label`, and `arguments`            |
| `mcp_approval_request`                   | MCP tool approval request                                             |
| `mcp_approval_response`                  | MCP tool approval response                                            |
| `bash_code_execution_tool_result`        | Bash tool execution result with `tool_use_id` and `content`           |
| `text_editor_code_execution_tool_result` | Text editor tool result with `tool_use_id` and `content`              |
| `shell_call`                             | Shell tool call with `action` containing commands                     |
| `shell_call_output`                      | Shell tool output with execution results                              |
| `apply_patch_call`                       | Apply patch tool call with `operation` (create, update, delete files) |
| `apply_patch_call_output`                | Apply patch tool output                                               |

### Example Message

```python theme={null}
{
    "role": "user",
    "content": [{"type": "text", "text": "Hello, how are you?"}],
}
```

### Example Message with Thinking support

```python theme={null}
{
    "role": "user",
    "content": [
        {"signature": "xxxxxx-xxxxx-xxxx-xxxx", "type": "thinking", "thinking": "User is greeting and asking for my wellbeing."},
        {"type": "text", "text": "Hello, how are you?"},
    ]
}
```

### Example Assistant Message with Generated Image

When a model generates an image (via OpenAI Images API, Responses API `image_generation` tool, or Google Gemini image models), the response includes `output_media` content blocks:

```python theme={null}
{
    "role": "assistant",
    "content": [
        {"type": "text", "text": "Here is the image you requested:"},
        {
            "type": "output_media",
            "id": "ig_abc123",
            "url": "https://...",
            "mime_type": "image/png",
            "media_type": "image",
            "provider_metadata": {
                "revised_prompt": "A photorealistic sunset over mountains...",
                "size": "1024x1024",
                "quality": "high"
            }
        }
    ]
}
```

The `provider_metadata` field varies by provider:

* **OpenAI**: `revised_prompt`, `size`, `quality`, `background`, `output_format`
* **Google Gemini**: `aspect_ratio`, `image_size`

See the [Image Generation guide](/features/image-generation) for full details on setup and usage.

## Tools and Function Calling

The Prompt Blueprint supports tool and function calling capabilities. This section demonstrates how to define available tools, handle assistant tool calls, and provide tool responses.

### Defining Available Tools

When creating a prompt template, you can specify available tools under the `tools` field. Each tool definition follows this structure:

```python theme={null}
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                }
            }
        }
    }
]


prompt_template = {
    "type": "chat",
    "messages": messages,
    "tools": tools
}
```

The `parameters` field is of interest because it specifies the expected input parameters for the function. The LLM provider will use this information to generate the appropriate tool call. You can define the `parameters` using [JSON Schema](https://json-schema.org) format. You can read more about how OpenAI uses JSON Schema for defining parameters [here](https://platform.openai.com/docs/guides/function-calling). And you can read more about how Anthropic uses JSON Schema for defining parameters [here](https://docs.anthropic.com/en/docs/build-with-claude/tool-use).

### Built-in Tools

In addition to custom function tools, the Prompt Blueprint supports **built-in tools** provided by LLM providers. These are pre-defined tools like web search, file search, code interpreter, and image generation.

```python theme={null}
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "parameters": {
                "type": "object",
                "properties": {"location": {"type": "string"}}
            }
        }
    },
    {
        "type": "image_generation",
        "id": "builtin_image_gen",
        "name": "Image Generation",
        "description": "Generate images from text prompts",
        "provider": "openai",
        "config": {"type": "image_generation"}
    }
]
```

The `type` field distinguishes between custom function tools (`"function"`) and built-in tools (`"web_search"`, `"file_search"`, `"code_interpreter"`, `"image_generation"`, `"google_maps"`). Built-in tools are configured in the Prompt Registry UI — see [Tool Calling](/features/prompt-registry/tool-calling) for details.

### Tool Variables

You can also use **tool variables** to inject tools dynamically at runtime, rather than defining them statically. Tool variables act as placeholders in the `tools` array that get replaced with actual tool definitions from `input_variables`.

For details and examples, see [Tool Calling - Tool Variables](/features/prompt-registry/tool-calling#tool-variables).

### Assistant Tool Calls

When the assistant decides to use a tool, the response will include a `tool_calls` field in the message. The format is:

```python theme={null}
{
    "role": "assistant",
    "tool_calls": [
        {
            "id": "call_abc123",
            "type": "function",
            "function": {
                "name": "get_weather",
                "arguments": "{\"location\": \"Paris\"}"
            }
        }
    ]
}
```

* `id` is used by the assistant to track the tool call.
* `type` is always `function`.
* `function` contains the function details
  * `name` tells us which function to call
  * `arguments` is a JSON string containing the function's input parameters.

For more information about how PromptLayer structures tool calls, please refer to schema definition towards end of this page.

### Providing Tool Responses

After executing the requested function, you can provide the result back to the assistant using a "tool" role message. The response should be structured JSON data:

```python theme={null}
{
    "role": "tool",
    "content": [
        {
            "type": "text",
            "text": "{\"temperature\": 72, \"conditions\": \"sunny\", \"humidity\": 45}"
        }
    ],
    "tool_call_id": "call_abc123"
}
```

Here is an example of how to log a request with tool calls and responses using OpenAI:

```python theme={null}
from openai import OpenAI
client = OpenAI()
model = "gpt-4o"
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "parameters": {
                "type": "object",
                "properties": {"location": {"type": "string"}},
            },
        },
    }
]
messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "What's the weather like in Paris today?"}
        ],
    }
]
prompt_template = {
    "type": "chat",
    "messages": messages,
    "tools": tools,
}

request_start_time = time.time()
completion = client.chat.completions.create(
    model=model,
    messages=prompt_template["messages"],
    tools=prompt_template["tools"],
)
request_end_time = time.time()
print(completion.choices[0].message.tool_calls)

promptlayer.log_request(
    provider="openai",
    model=model,
    input=prompt_template,
    output={
        "type": "chat",
        "messages": [
            {
                "role": "assistant",
                "tool_calls": [
                    tool_call.model_dump()
                    for tool_call in completion.choices[0].message.tool_calls
                ],
            }
        ],
    },
    input_tokens=completion.usage.prompt_tokens,
    output_tokens=completion.usage.completion_tokens,
    request_start_time=request_start_time,
    request_end_time=request_end_time,
)
```

## Multi-Modal Variables

PromptLayer supports any number of modalities in a single prompt. You can include text, images, videos, and other media types in your prompt templates.

The `media_variable` content allows you to dynamically insert a list of medias into prompt template messages.

The `media_variable` is nested within the message content. The `type` and `name` are required fields specifying the type of content and the name of the variable, respectively. The `name` is the name of the list of medias to be dynamically inserted.

```json theme={null}
{
    "role": "user",
    "content": [
        {
            "type": "media_variable",
            "name": "media"
        }
    ]
}
```

When defining a prompt template, you can specify an `media_variable` to dynamically include medias in your messages.

<img />

#### Running with Media Variables

```python theme={null}
response = pl_client.run(
    prompt_name="image-prompt",
    input_variables={
        "media": [
            "https://example.com/image1.jpg",
            "https://example.com/image2.jpg"
        ]
    },
)

print(response)
```

<Note>Notice that the `media` is a list of strings, they can either be public URLs or base64 strings.</Note>

## Structured Outputs

Prompt Blueprints can be configured to produce structured outputs that follow a specific format defined by JSON Schema. This ensures consistent response formats that are easier to parse and integrate with your applications.

For detailed information on creating and using structured outputs with your prompt templates, see our [Structured Outputs documentation](/features/prompt-registry/structured-outputs).

## Prompt Blueprint Schema


# Traces
Source: https://docs.promptlayer.com/running-requests/traces


Traces are a powerful feature in PromptLayer that allow you to monitor and analyze the execution flow of your applications, including LLM requests. Built on OpenTelemetry, Traces provide detailed insights into function calls, their durations, inputs, and outputs.

<Note>
  This page covers tracing with the **PromptLayer SDK** (`@traceable`, `wrapWithSpan`). If you want to send traces from any OpenTelemetry SDK or Collector without using the PromptLayer SDK, see the [OpenTelemetry](/features/opentelemetry) page. For framework-specific integrations (Vercel AI SDK, OpenAI Agents, Claude Code), see [Telemetry Integrations](/features/integrations).
</Note>

## Overview

Traces in PromptLayer offer a comprehensive view of your application's performance and behavior. They allow you to:

* Visualize the execution flow of your functions
* Track LLM requests and their associated metadata
* Measure function durations and identify performance bottlenecks
* Inspect function inputs and outputs for debugging

**Note:** The left menu in the PromptLayer UI only shows root spans, which represent the entry function of your program. While your program is running, you might not see all spans in the UI immediately, even though child spans are being sent to the backend. The root span, along with all its child spans, will only appear in the UI once the program completes. This behavior is particularly noticeable in long-running programs or those with complex execution flows.

<img alt="Traces Overview" />

## Automatic LLM Request Tracing

When you initialize the PromptLayer class with `enable_tracing` set to `True`, PromptLayer will automatically track any LLM calls made using the PromptLayer library. This allows you to capture detailed information about your LLM requests, including:

* Model used
* Input prompts
* Generated responses
* Request duration
* Associated metadata

<CodeGroup>
  ```python Python theme={null}
  from promptlayer import PromptLayer

  # Initialize PromptLayer with tracing enabled
  pl_client = PromptLayer(enable_tracing=True)
  ```

  ```javascript JavaScript theme={null}
  import { PromptLayer } from "promptlayer";

  // Initialize PromptLayer with tracing enabled
  const promptlayer = new PromptLayer({
      apiKey: process.env.PROMPTLAYER_API_KEY,
      enableTracing: true,
  });
  ```
</CodeGroup>

Once PromptLayer is initialized with tracing enabled, you can use the `run()` method to execute prompts. All LLM calls made through this method will be automatically traced, providing detailed insights into your prompt executions.

<CodeGroup>
  ```python Python theme={null}
  response = pl_client.run(
      prompt_name="simple-greeting",
      input_variables={
          "name": "Alice"
      },
      metadata={
          "user_id": "12345"
      }
  )

  print(response)
  ```

  ```javascript JavaScript theme={null}
  async function runPrompt() {
      try {
          const response = await promptlayer.run({
              promptName: "simple-greeting",
              inputVariables: {
                  name: "Alice"
              },
              metadata: {
                  user_id: "12345"
              }
          });

          console.log(response);
      } catch (error) {
          console.error("Error running prompt:", error);
      }
  }

  runPrompt();
  ```
</CodeGroup>

## Custom Function Tracing

In addition to automatic LLM request tracing, you can also use the `traceable` decorator (for Python) or `wrapWithSpan` (for JavaScript) to explicitly track span data for additional functions. This allows you to gather detailed information about function executions.

<CodeGroup>
  ```python Python theme={null}
  # Use the @pl_client.traceable() decorator to trace a function
  @pl_client.traceable()
  def greet(name):
      return f"Hello, {name}!"

  # Use the decorator with custom attributes
  @pl_client.traceable(attributes={"function_type": "math"})
  def calculate_sum(a, b):
      return a + b

  result1 = greet("Alice")
  print(result1)

  result2 = calculate_sum(5, 3)
  print(result2)
  ```

  ```javascript JavaScript theme={null}
  // Define and wrap a function with PromptLayer tracing
  const greet = promptlayer.wrapWithSpan('greet', (name: string): string => {
      return `Hello, ${name}!`;
  });

  const result = greet("Alice");
  console.log(result);
  ```
</CodeGroup>

## Setting Custom Span Names

When tracing functions, you may want to set custom names for your spans to make them more descriptive. Both Python and JavaScript implementations of PromptLayer allow you to set custom span names.

### Python

In Python, you can set a custom span name by passing the `name` parameter to the `traceable` decorator:

```python theme={null}
@pl_client.traceable(name="CustomGreeting")
def greet(name):
    return f"Hello, {name}!"

result = greet("Alice")
print(result)
```

If you don't provide a name parameter, the span will use the function's name by default.

### JavaScript

In JavaScript, you can set a custom span name by passing it as the first argument to the wrapWithSpan function:

```javascript JavaScript theme={null}
const greet = promptlayer.wrapWithSpan('CustomGreeting', (name) => {
    return `Hello, ${name}!`;
});

const result = greet("Alice");
console.log(result);
```

If you want to use the function's name as the span name, you can simply pass the function name as a string:

```javascript JavaScript theme={null}
const greet = promptlayer.wrapWithSpan('greet', (name) => {
    return `Hello, ${name}!`;
});
```

## Creating Parent Spans and Grouping Function Calls

To create a parent span and group multiple function calls within it, you can use the traceable decorator on a main function that calls other traced functions.
Here's an example that demonstrates this concept:

```python theme={null}
from promptlayer import PromptLayer

# Initialize PromptLayer with tracing enabled
pl_client = PromptLayer(enable_tracing=True)

@pl_client.traceable(name="custom-span")
def main():
    # This function will be the parent span
    openai_call()
    anthropic_call()
    custom_function()
    run_prompt()

@pl_client.traceable()
def openai_call():
    response = pl_client.run(
        prompt_name="simple-greeting",
        input_variables={}
    )
    print("OpenAI response:", response["prompt_blueprint"]["prompt_template"]["messages"][-1])

@pl_client.traceable()
def anthropic_call():
    response = pl_client.run(
        prompt_name="simple-greeting",
        input_variables={},
        provider="anthropic",
        model="claude-sonnet-4-20250514"
    )
    print("Anthropic response:", response["prompt_blueprint"]["prompt_template"]["messages"][-1])

@pl_client.traceable()
def custom_function():
    # This is a custom function that will be traced
    result = "Custom function executed"
    print(result)
    return result

@pl_client.traceable()
def run_prompt():
    response = pl_client.run(
        prompt_name="simple-greeting",
        input_variables={}
    )
    print("Prompt response:", response["prompt_blueprint"]["prompt_template"]["messages"][-1])

if __name__ == "__main__":
    main()
```

<img alt="Group nests spans" />

## Filtering Traces by Span Attributes

The trace list can be filtered by metadata and resource attribute values. Filters search across the **entire span hierarchy** — a trace appears in results if any span within it (root or child) carries a matching attribute.

For example, if only a nested LLM call span has `{"environment": "production"}` in its attributes, filtering the trace list by `environment = production` will still surface the parent trace in results, even if the root span itself does not carry that attribute.

### Adding Filters from the Span Detail Panel

When you open a span in the trace detail view, hovering over a scalar metadata or resource value reveals a filter button. Clicking it adds the key-value pair as an active filter on the trace list. The filter button appears only for top-level scalar values (string, number, or boolean). Nested objects and arrays are displayed for inspection but cannot be used as direct filter targets.


# JavaScript
Source: https://docs.promptlayer.com/sdks/javascript


<Card title="JavaScript SDK" icon="github" href="https://github.com/MagnivOrg/prompt-layer-js">
  Official JavaScript/TypeScript SDK for interacting with the PromptLayer API from server-side runtimes.
</Card>

## Installation

```bash theme={null}
npm install promptlayer
```

## Using the `run` Method (Recommended)

The easiest way to use PromptLayer is with the `run()` method. It fetches a prompt template from the [Prompt Registry](/features/prompt-registry/new-overview), executes it against your configured LLM provider, and logs the result — all in one call.

```js theme={null}
import { PromptLayer } from "promptlayer";

const promptLayerClient = new PromptLayer({
  apiKey: process.env.PROMPTLAYER_API_KEY,
});

const response = await promptLayerClient.run({
  promptName: "my-prompt",
  inputVariables: { topic: "poetry" },
  tags: ["getting-started"],
  metadata: { user_id: "123" }
});

console.log(response.prompt_blueprint.prompt_template.messages.slice(-1)[0].content);
```

<Info>
  Your LLM API keys (OpenAI, Anthropic, etc.) are **never** sent to our servers. All LLM requests are
  made locally from your machine, PromptLayer just logs the request.
</Info>

The `run()` method works with any provider configured in your prompt template — OpenAI, Anthropic, Google, and more. See the [Run documentation](/sdks/javascript#using-the-run-method-recommended) for full details.

After making your first few requests, you should be able to see them in the PromptLayer dashboard!

<img />

### Basic Usage

<Note>
  For any LLM provider you plan to use, you must set its corresponding API key as an environment variable (for example, `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GOOGLE_API_KEY` etc.). The PromptLayer client does not support passing these keys directly in code. If the relevant environment variables are not set, any requests to those LLM providers will fail.
</Note>

<Accordion title="Provider-Specific Configuration">
  #### Using Gemini models through Vertex AI

  **JavaScript SDK**: Set these environment variables:

  * `VERTEX_AI_PROJECT_ID="<google_cloud_project_id>"`
  * `VERTEX_AI_PROJECT_LOCATION="region"`
  * `GOOGLE_APPLICATION_CREDENTIALS="path/to/google_service_account_file.json"`

  #### Using Claude models through Vertex AI

  **JavaScript SDK**: Set these environment variables:

  * `GOOGLE_APPLICATION_CREDENTIALS="path/to/google_service_account_file.json"`
  * `CLOUD_ML_REGION="region"`
</Accordion>

```js JavaScript theme={null}
import { PromptLayer } from "promptlayer";

const pl = new PromptLayer({ apiKey: "your_api_key" });

const response = await pl.run({
  promptName: "your-prompt-name",
  inputVariables: { variableName: "value" }
});

console.log(response.prompt_blueprint.prompt_template.messages.slice(-1)[0].content.slice(-1)[0].text);
```

### Parameters

* `prompt_name` / `promptName` (str, required): The name of the prompt to run.
* `prompt_version` / `promptVersion` (int, optional): Specific version of the prompt to use.
* `prompt_release_label` / `promptReleaseLabel` (str, optional): Release label of the prompt (e.g., "prod", "staging").
* `input_variables` / `inputVariables` (Dict\[str, Any], optional): Variables to be inserted into the prompt template.
* `tags` (List\[str], optional): Tags to associate with this run.
* `metadata` (Dict\[str, str], optional): Additional metadata for the run.
* `model_parameter_overrides` / `modelParameterOverrides` (Union\[Dict\[str, Any], None], optional): Model-specific parameter overrides.
* `stream` (bool, default=False): Whether to stream the response.
* `provider` (str, optional): The LLM provider to use (e.g., "openai", "anthropic", "google"). This is useful if you want to override the provider specified in the prompt template.
* `model` (str, optional): The model to use (e.g., "gpt-4o", "claude-3-7-sonnet-latest", "gemini-2.5-flash"). This is useful if you want to override the model specified in the prompt template.

### Return Value

The method returns a dictionary (Python) or object (JavaScript) with the following keys:

* `request_id`: Unique identifier for the request.
* `raw_response`: The raw response from the LLM provider.
* `prompt_blueprint`: The prompt blueprint used for the request.

### Advanced Usage

#### Streaming

To stream the response:

```js JavaScript theme={null}
const stream = await pl.run({
  promptName: "your-prompt",
  stream: true
});

for await (const chunk of stream) {
  // Access raw streaming response
  console.log(chunk.raw_response);

  // Access progressively built prompt blueprint
  if (chunk.prompt_blueprint) {
    const currentResponse = chunk.prompt_blueprint.prompt_template.messages.slice(-1)[0];
    if (currentResponse.content) {
      console.log("Current response:", currentResponse.content);
    }
  }
}
```

When streaming is enabled, each chunk includes both the raw streaming response and the progressively built `prompt_blueprint`, allowing you to track how the response is constructed in real-time. The `request_id` is only included in the final chunk.

#### Using Different Versions or Release Labels

```js JavaScript theme={null}
const response = await pl.run({
  promptName: "your-prompt",
  promptVersion: 2,  // or
  promptReleaseLabel: "staging"
});
```

#### Adding Tags and Metadata

```js JavaScript theme={null}
const response = await pl.run({
  promptName: "your-prompt",
  tags: ["test", "experiment"],
  metadata: { userId: "12345" }
});
```

#### Overriding Model Parameters

You can also override `provider` and `model` at runtime to choose a different LLM provider or model. This is useful if you want to use a different provider than the one specified in the prompt template. PromptLayer will automatically return the correct `llm_kwargs` for the specified provider and model with default values for the parameters corresponding to the `provider` and `model`.

<Warning>
  **Provider-Specific Schema Notice**

  The `llm_kwargs` and `raw_response` objects have provider-specific structures that may change as LLM providers update their APIs. PromptLayer passes through the native format required by each provider.

  For stable, provider-agnostic prompt data, use `prompt_blueprint.prompt_template` instead of relying on the structure of provider-specific objects.
</Warning>

```js JavaScript theme={null}
const response = await pl.run({
  promptName: "your-prompt",
  provider: "openai",  // or "anthropic", "google", etc.
  model: "gpt-4"  // or "claude-2", "gemini-1.5-pro", etc.
});
```

<Tip>
  Make sure to set both `model` and `provider` in order to run the request against correct LLM provider with correct parameters.
</Tip>

## Running Workflows

Use `runWorkflow()` to execute a PromptLayer Workflow from the JavaScript SDK. Workflows are multi-step pipelines that can combine prompt, tool, code, and conditional nodes.

```js JavaScript theme={null}
import { PromptLayer } from "promptlayer";

const pl = new PromptLayer({ apiKey: "your_api_key" });

const response = await pl.runWorkflow({
  workflowName: "Data Analysis Workflow",
  inputVariables: { dataset_url: "https://example.com/data.csv" }
});

console.log(response);
```

### Workflow Parameters

* `workflowName` (string, required): The Workflow name to run.
* `inputVariables` (object, optional): Variables to pass into the Workflow.
* `metadata` (object, optional): Metadata to attach to the Workflow run.
* `workflowLabelName` (string, optional): Label name for the Workflow version, such as `"production"`.
* `workflowVersion` (number, optional): Specific Workflow version number to run.
* `returnAllOutputs` (boolean, default=false): Whether to return outputs for every Workflow node.

### Workflow Return Value

By default, `runWorkflow()` returns the final output node's value. When `returnAllOutputs` is `true`, it returns an object keyed by node name, including each node's status, value, errors, and whether the node is an output node.

```js JavaScript theme={null}
const response = await pl.runWorkflow({
  workflowName: "Data Analysis Workflow",
  inputVariables: { dataset_url: "https://example.com/data.csv" },
  metadata: { user_id: "12345" },
  workflowLabelName: "production",
  returnAllOutputs: true
});
```

Example response with `returnAllOutputs: true`:

```json theme={null}
{
  "Load Dataset": {
    "status": "SUCCESS",
    "value": "Loaded 100 rows",
    "error_message": null,
    "raw_error_message": null,
    "is_output_node": false
  },
  "Summarize Dataset": {
    "status": "SUCCESS",
    "value": "The dataset contains customer feedback grouped by region.",
    "error_message": null,
    "raw_error_message": null,
    "is_output_node": true
  }
}
```

## SDK Cache

The PromptLayer JavaScript SDK supports an in-memory template cache to reduce fetch latency and improve resilience when the PromptLayer API has transient failures.

Enable cache when you want to:

* Reduce repeated template fetch latency
* Lower dependency on real-time PromptLayer API availability
* Continue serving recently known-good templates during temporary API issues

Pass `cacheTtlSeconds` when creating a client:

```js theme={null}
import { PromptLayer } from "promptlayer";

const promptLayerClient = new PromptLayer({
  apiKey: process.env.PROMPTLAYER_API_KEY,
  cacheTtlSeconds: 300, // each prompt template is cached for 5 minutes
});
```

### How It Works

When cache is enabled, `templates.get()` and `run()` use this flow:

1. Return a fresh cached template if available.
2. If cache is stale or missing, fetch from API and refresh cache.
3. If API fetch fails with a transient error and a stale template exists, serve the stale template.

<Info>
  Stale fallback applies to transient API failures such as retryable HTTP errors (including `429` and `5xx`) and network-level issues.
</Info>

### Important Behavior

* Cache is in-memory and process-local (not shared across machines/containers).
* Requests with `metadataFilters` or `modelParameterOverrides` bypass cache.
* Publishing via `templates.publish()` invalidates cache for that prompt name.
* Call `promptLayerClient.invalidate("prompt-name")` to clear one prompt from cache.
* Call `promptLayerClient.invalidate()` to clear the full SDK cache.

### Practical Guidance

* Start with `cacheTtlSeconds` between `60` and `300`.
* Use a shorter TTL if your prompts change frequently.
* Use a longer TTL if your prompts are stable and lower latency matters most.
* Keep `throwOnError: true` if you want hard failures when no cache entry is available.

## Custom Logging with `logRequest`

If you need more control — for example, using your own LLM client, a custom provider, or background processing — you can use `logRequest` to manually log requests to PromptLayer.

### OpenAI Example

```js theme={null}
import { PromptLayer } from "promptlayer";
import OpenAI from "openai";

const plClient = new PromptLayer();
const openai = new OpenAI();

const messages = [{ role: "user", content: "Say this is a test" }];

const requestStartTime = Date.now();
const completion = await openai.chat.completions.create({
  messages,
  model: "gpt-4o",
});
const requestEndTime = Date.now();

await plClient.logRequest({
  provider: "openai",
  model: "gpt-4o",
  input: {
    type: "chat",
    messages: messages.map(m => ({
      role: m.role,
      content: [{ type: "text", text: m.content }]
    }))
  },
  output: {
    type: "chat",
    messages: [{
      role: "assistant",
      content: [{ type: "text", text: completion.choices[0].message.content }]
    }]
  },
  requestStartTime,
  requestEndTime,
  tags: ["test"]
});
```

### Anthropic Example

```js theme={null}
import { PromptLayer } from "promptlayer";
import Anthropic from "@anthropic-ai/sdk";

const plClient = new PromptLayer();
const anthropic = new Anthropic();

const messages = [{ role: "user", content: "How many toes do dogs have?" }];

const requestStartTime = Date.now();
const response = await anthropic.messages.create({
  messages,
  model: "claude-sonnet-4-20250514",
  max_tokens: 100,
});
const requestEndTime = Date.now();

await plClient.logRequest({
  provider: "anthropic",
  model: "claude-sonnet-4-20250514",
  input: {
    type: "chat",
    messages: messages.map(m => ({
      role: m.role,
      content: [{ type: "text", text: m.content }]
    }))
  },
  output: {
    type: "chat",
    messages: [{
      role: "assistant",
      content: [{ type: "text", text: response.content[0].text }]
    }]
  },
  requestStartTime,
  requestEndTime,
  tags: ["test-anthropic-1"]
});
```

See the [Custom Logging documentation](/features/prompt-history/custom-logging) and [Log Request API Reference](/reference/log-request) for full details.

## Error Handling

PromptLayer provides robust error handling with configurable error behavior for JavaScript/TypeScript applications.

### Using `throwOnError`

By default, PromptLayer throws errors when API requests fail. You can control this behavior using the `throwOnError` parameter:

```js theme={null}
import { PromptLayer } from "promptlayer";

// Default behavior: throws errors on API failures
const promptLayerClient = new PromptLayer({ 
  apiKey: "pl_****", 
  throwOnError: true 
});

// Alternative: logs warnings instead of throwing errors
const promptLayerClient = new PromptLayer({ 
  apiKey: "pl_****", 
  throwOnError: false 
});
```

**Example with error handling:**

```js theme={null}
import { PromptLayer } from "promptlayer";

const promptLayerClient = new PromptLayer({ apiKey: process.env.PROMPTLAYER_API_KEY });

try {
  // Attempt to get a template that might not exist
  const template = await promptLayerClient.templates.get("NonExistentTemplate");
  console.log(template);
} catch (error) {
  console.error("Failed to get template:", error.message);
}
```

**Example with warnings (throwOnError: false):**

```js theme={null}
import { PromptLayer } from "promptlayer";

// Initialize with throwOnError: false to get warnings instead of errors
const promptLayerClient = new PromptLayer({ 
  apiKey: process.env.PROMPTLAYER_API_KEY,
  throwOnError: false 
});

// This will log a warning instead of throwing an error if the template doesn't exist
const template = await promptLayerClient.templates.get("NonExistentTemplate");
// Returns null if not found, with a warning logged to console
```

### Automatic Retry Mechanism

PromptLayer includes a built-in retry mechanism using the industry-standard [p-retry](https://github.com/sindresorhus/p-retry) library to handle transient failures gracefully. This ensures your application remains resilient when temporary issues occur.

**Retry Behavior:**

* **Total Attempts**: 4 attempts (1 initial + 3 retries)
* **Exponential Backoff**: Retries wait progressively longer between attempts (2s, 4s, 8s)
* **Max Wait Time**: 15 seconds maximum wait between retries

**What Triggers Retries:**

* **5xx Server Errors**: Internal server errors, service unavailable, etc.
* **429 RateLimit Errors**: API RateLimit Error.
* **Network Errors**: Connection failures (ENOTFOUND, ECONNREFUSED, ETIMEDOUT, etc.)

**What Fails Immediately (No Retries):**

* **4xx Client Errors**: Bad requests, authentication errors, not found, validation errors, etc. except for 429 Ratelimit error.

<Info>
  The retry mechanism operates transparently in the background. You don't need to implement retry logic yourself - PromptLayer handles it automatically for recoverable errors.
</Info>

### Logging

PromptLayer logs info to the console before each retry attempt. When a retry occurs, you'll see log messages like:

```
INFO: Retrying PromptLayer API request in 2.0 seconds...
INFO: Retrying PromptLayer API request in 4.0 seconds...
INFO: Retrying PromptLayer API request in 8.0 seconds...
```

To capture these logs in your application, you can monitor `console.info` output or use a logging library that intercepts console methods.

## Edge

PromptLayer can be used with Edge functions. Use either the `run()` method, `logRequest`, or our [REST API](/reference/introduction) directly.

```js theme={null}
import { PromptLayer } from "promptlayer";
const promptLayerClient = new PromptLayer({ apiKey: process.env.PROMPTLAYER_API_KEY });

// Add this line
export const runtime = "edge";

export const POST = async () => {
  const response = await promptLayerClient.run({
    promptName: "my-prompt",
    inputVariables: { question: "What is the capital of France?" },
  });
  const content = response.prompt_blueprint.prompt_template.messages.slice(-1)[0].content;
  return Response.json(content);
};
```


# Python
Source: https://docs.promptlayer.com/sdks/python


<Card title="Python SDK" icon="github" href="https://github.com/MagnivOrg/prompt-layer-library?tab=readme-ov-file">
  Official Python SDK for interacting with the PromptLayer API.
</Card>

## Installation

```bash theme={null}
pip install promptlayer
```

## Using the `run` Method (Recommended)

The easiest way to use PromptLayer is with the `run()` method. It fetches a prompt template from the [Prompt Registry](/features/prompt-registry/new-overview), executes it against your configured LLM provider, and logs the result — all in one call.

```python theme={null}
from promptlayer import PromptLayer
promptlayer_client = PromptLayer()

response = promptlayer_client.run(
    prompt_name="my-prompt",
    input_variables={"topic": "poetry"},
    tags=["getting-started"],
    metadata={"user_id": "123"}
)

print(response["prompt_blueprint"]["prompt_template"]["messages"][-1]["content"])
```

<Info>Your LLM API keys (OpenAI, Anthropic, etc.) are **never** sent to our servers. All LLM requests are made locally from your machine, PromptLayer just logs the request.</Info>

The `run()` method works with any provider configured in your prompt template — OpenAI, Anthropic, Google, and more. See the [Run documentation](/sdks/python#using-the-run-method-recommended) for full details.

After making your first few requests, you should be able to see them in the PromptLayer dashboard!

<img />

### Basic Usage

<Note>
  For any LLM provider you plan to use, you must set its corresponding API key as an environment variable (for example, `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GOOGLE_API_KEY` etc.). The PromptLayer client does not support passing these keys directly in code. If the relevant environment variables are not set, any requests to those LLM providers will fail.
</Note>

<Accordion title="Provider-Specific Configuration">
  #### Using Gemini models through Vertex AI

  **Python SDK**: Set these environment variables:

  * `GOOGLE_GENAI_USE_VERTEXAI=true`
  * `GOOGLE_CLOUD_PROJECT="<google_cloud_project_id>"`
  * `GOOGLE_CLOUD_LOCATION="region"`
  * `GOOGLE_APPLICATION_CREDENTIALS="path/to/google_service_account_file.json"`

  #### Using Claude models through Vertex AI

  **Python SDK**: Set these environment variables:

  * `ANTHROPIC_VERTEX_PROJECT_ID="<google_cloud_project_id>"`
  * `CLOUD_ML_REGION="region"`
  * `GOOGLE_APPLICATION_CREDENTIALS="path/to/google_service_account_file.json"`
</Accordion>

```python Python theme={null}
from promptlayer import PromptLayer

pl = PromptLayer(api_key="your_api_key")

response = pl.run(
    prompt_name="your-prompt-name",
    input_variables={"variable_name": "value"}
)

print(response["prompt_blueprint"]["prompt_template"]["messages"][-1]["content"][-1]["text"])
```

### Parameters

* `prompt_name` / `promptName` (str, required): The name of the prompt to run.
* `prompt_version` / `promptVersion` (int, optional): Specific version of the prompt to use.
* `prompt_release_label` / `promptReleaseLabel` (str, optional): Release label of the prompt (e.g., "prod", "staging").
* `input_variables` / `inputVariables` (Dict\[str, Any], optional): Variables to be inserted into the prompt template.
* `tags` (List\[str], optional): Tags to associate with this run.
* `metadata` (Dict\[str, str], optional): Additional metadata for the run.
* `model_parameter_overrides` / `modelParameterOverrides` (Union\[Dict\[str, Any], None], optional): Model-specific parameter overrides.
* `stream` (bool, default=False): Whether to stream the response.
* `provider` (str, optional): The LLM provider to use (e.g., "openai", "anthropic", "google"). This is useful if you want to override the provider specified in the prompt template.
* `model` (str, optional): The model to use (e.g., "gpt-4o", "claude-3-7-sonnet-latest", "gemini-2.5-flash"). This is useful if you want to override the model specified in the prompt template.

### Return Value

The method returns a dictionary (Python) or object (JavaScript) with the following keys:

* `request_id`: Unique identifier for the request.
* `raw_response`: The raw response from the LLM provider.
* `prompt_blueprint`: The prompt blueprint used for the request.

### Advanced Usage

#### Streaming

To stream the response:

```python Python theme={null}
for chunk in pl.run(prompt_name="your-prompt", stream=True):
    # Access raw streaming response
    print(chunk["raw_response"])

    # Access progressively built prompt blueprint
    if chunk["prompt_blueprint"]:
        current_response = chunk["prompt_blueprint"]["prompt_template"]["messages"][-1]
        if current_response.get("content"):
            print(f"Current response: {current_response['content']}")
```

When streaming is enabled, each chunk includes both the raw streaming response and the progressively built `prompt_blueprint`, allowing you to track how the response is constructed in real-time. The `request_id` is only included in the final chunk.

#### Using Different Versions or Release Labels

```python Python theme={null}
response = pl.run(
    prompt_name="your-prompt",
    prompt_version=2,  # or
    prompt_release_label="staging"
)
```

#### Adding Tags and Metadata

```python Python theme={null}
response = pl.run(
    prompt_name="your-prompt",
    tags=["test", "experiment"],
    metadata={"user_id": "12345"}
)
```

#### Overriding Model Parameters

You can also override `provider` and `model` at runtime to choose a different LLM provider or model. This is useful if you want to use a different provider than the one specified in the prompt template. PromptLayer will automatically return the correct `llm_kwargs` for the specified provider and model with default values for the parameters corresponding to the `provider` and `model`.

<Warning>
  **Provider-Specific Schema Notice**

  The `llm_kwargs` and `raw_response` objects have provider-specific structures that may change as LLM providers update their APIs. PromptLayer passes through the native format required by each provider.

  For stable, provider-agnostic prompt data, use `prompt_blueprint.prompt_template` instead of relying on the structure of provider-specific objects.
</Warning>

```python Python SDK theme={null}
response = pl.run(
    prompt_name="your-prompt",
    provider="openai",  # or "anthropic", "google", etc.
    model="gpt-4",  # or "claude-2", "gemini-1.5-pro", etc.
)
```

<Tip>
  Make sure to set both `model` and `provider` in order to run the request against correct LLM provider with correct parameters.
</Tip>

## Running Workflows

Use `run_workflow()` to execute a PromptLayer Workflow from the Python SDK. Workflows are multi-step pipelines that can combine prompt, tool, code, and conditional nodes.

```python Python theme={null}
from promptlayer import PromptLayer

pl = PromptLayer(api_key="your_api_key")

response = pl.run_workflow(
    workflow_id_or_name="Data Analysis Workflow",
    input_variables={"dataset_url": "https://example.com/data.csv"},
)

print(response)
```

### Workflow Parameters

* `workflow_id_or_name` (str or int, required): The Workflow name or ID to run.
* `input_variables` (Dict\[str, Any], optional): Variables to pass into the Workflow.
* `metadata` (Dict\[str, str], optional): Metadata to attach to the Workflow run.
* `workflow_label_name` (str, optional): Label name for the Workflow version, such as `"production"`.
* `workflow_version` (int, optional): Specific Workflow version number to run.
* `return_all_outputs` (bool, default=False): Whether to return outputs for every Workflow node.
* `timeout` (int or float, optional): Maximum time, in seconds, to wait for the Workflow to complete.

<Info>
  `workflow_name` is still supported for backward compatibility, but `workflow_id_or_name` is the preferred parameter.
</Info>

### Workflow Return Value

By default, `run_workflow()` returns the final output node's value. When `return_all_outputs=True`, it returns a dictionary keyed by node name, including each node's status, value, errors, and whether the node is an output node.

```python Python theme={null}
response = pl.run_workflow(
    workflow_id_or_name="Data Analysis Workflow",
    input_variables={"dataset_url": "https://example.com/data.csv"},
    metadata={"user_id": "12345"},
    workflow_label_name="production",
    return_all_outputs=True,
    timeout=300,
)
```

Example response with `return_all_outputs=True`:

```json theme={null}
{
  "Load Dataset": {
    "status": "SUCCESS",
    "value": "Loaded 100 rows",
    "error_message": null,
    "raw_error_message": null,
    "is_output_node": false
  },
  "Summarize Dataset": {
    "status": "SUCCESS",
    "value": "The dataset contains customer feedback grouped by region.",
    "error_message": null,
    "raw_error_message": null,
    "is_output_node": true
  }
}
```

To run Workflows asynchronously, use `AsyncPromptLayer`. See [Async Workflow Execution](#example-2-async-workflow-execution) for an async example.

## SDK Cache

The PromptLayer Python SDK supports an in-memory template cache to reduce fetch latency and improve resilience when the PromptLayer API has transient failures.

Enable cache when you want to:

* Reduce repeated template fetch latency
* Lower dependency on real-time PromptLayer API availability
* Continue serving recently known-good templates during temporary API issues

Pass `cache_ttl_seconds` when creating a client:

```python theme={null}
from promptlayer import PromptLayer

promptlayer_client = PromptLayer(
    api_key="pl_****",
    cache_ttl_seconds=300,  # each prompt template is cached for 5 minutes
)
```

Async client works the same way:

```python theme={null}
from promptlayer import AsyncPromptLayer

async_promptlayer_client = AsyncPromptLayer(
    api_key="pl_****",
    cache_ttl_seconds=300,
)
```

### How It Works

When cache is enabled, `templates.get()` and `run()` use this flow:

1. Return a fresh cached template if available.
2. If cache is stale or missing, fetch from API and refresh cache.
3. If API fetch fails with a transient error and a stale template exists, serve the stale template.

<Info>
  Stale fallback only applies to transient API errors (for example, timeout, connection, or internal server errors).
</Info>

### Important Behavior

* Cache is in-memory and process-local (not shared across machines/containers).
* Requests with `metadata_filters` or `model_parameter_overrides` bypass cache.
* Publishing via `templates.publish()` invalidates cache for that prompt name.

### Practical Guidance

* Start with `cache_ttl_seconds` between `60` and `300`.
* Use a shorter TTL if your prompts change frequently.
* Use a longer TTL if your prompts are stable and lower latency matters most.
* Keep `throw_on_error=True` if you want hard failures when no cache entry is available.

## Custom Logging with `log_request`

If you need more control — for example, using your own LLM client, a custom provider, or background processing — you can use `log_request` to manually log requests to PromptLayer.

```python theme={null}
from openai import OpenAI
from promptlayer import PromptLayer
import time

pl_client = PromptLayer()
client = OpenAI()

messages = [
    {"role": "system", "content": "You are an AI."},
    {"role": "user", "content": "Compose a poem please."}
]

request_start_time = time.time()
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=messages
)
request_end_time = time.time()

# Log to PromptLayer
pl_client.log_request(
    provider="openai",
    model="gpt-4o",
    input={"type": "chat", "messages": [
        {"role": m["role"], "content": [{"type": "text", "text": m["content"]}]}
        for m in messages
    ]},
    output={"type": "chat", "messages": [
        {"role": "assistant", "content": [{"type": "text", "text": completion.choices[0].message.content}]}
    ]},
    request_start_time=request_start_time,
    request_end_time=request_end_time,
    tags=["getting-started"]
)
```

This works with any LLM provider, including Anthropic:

```python theme={null}
import anthropic
from promptlayer import PromptLayer
import time

pl_client = PromptLayer()
client = anthropic.Anthropic()

messages = [{"role": "user", "content": "How many toes do dogs have?"}]

request_start_time = time.time()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=100,
    messages=messages
)
request_end_time = time.time()

# Log to PromptLayer
pl_client.log_request(
    provider="anthropic",
    model="claude-sonnet-4-20250514",
    input={"type": "chat", "messages": [
        {"role": m["role"], "content": [{"type": "text", "text": m["content"]}]}
        for m in messages
    ]},
    output={"type": "chat", "messages": [
        {"role": "assistant", "content": [{"type": "text", "text": response.content[0].text}]}
    ]},
    request_start_time=request_start_time,
    request_end_time=request_end_time,
    tags=["animal-toes"]
)
```

See the [Custom Logging documentation](/features/prompt-history/custom-logging) and [Log Request API Reference](/reference/log-request) for full details.

## Error Handling

PromptLayer provides robust error handling with specialized exception classes and configurable error behavior.

### Exception Classes

The library includes specific exception types following industry best practices:

```python theme={null}
from promptlayer import (
    PromptLayerAPIError,              # General API errors
    PromptLayerBadRequestError,       # 400 errors
    PromptLayerAuthenticationError,   # 401 errors
    PromptLayerNotFoundError,         # 404 errors
    PromptLayerValidationError,       # Input validation errors
    PromptLayerAPIConnectionError,    # Connection failures
    PromptLayerAPITimeoutError,       # Timeout errors
    PromptLayerRateLimitError,        # 429 rate limit errors
)
```

### Using `throw_on_error`

By default, PromptLayer throws exceptions when errors occur. You can control this behavior using the `throw_on_error` parameter:

```python theme={null}
from promptlayer import PromptLayer

# Default behavior: throws exceptions on errors
promptlayer_client = PromptLayer(api_key="pl_****", throw_on_error=True)

# Alternative: logs warnings instead of throwing exceptions
promptlayer_client = PromptLayer(api_key="pl_****", throw_on_error=False)
```

**Example with exception handling:**

```python theme={null}
from promptlayer import PromptLayer, PromptLayerNotFoundError, PromptLayerValidationError

promptlayer_client = PromptLayer()

try:
    # Attempt to get a template that might not exist
    template = promptlayer_client.templates.get("NonExistentTemplate")
except PromptLayerNotFoundError as e:
    print(f"Template not found: {e}")
except PromptLayerValidationError as e:
    print(f"Invalid input: {e}")
```

**Example with warnings (throw\_on\_error=False):**

```python theme={null}
from promptlayer import PromptLayer

# Initialize with throw_on_error=False to get warnings instead of exceptions
promptlayer_client = PromptLayer(throw_on_error=False)

# This will log a warning instead of throwing an exception if the template doesn't exist
template = promptlayer_client.templates.get("NonExistentTemplate")
# Returns None if not found, with a warning logged
```

### Automatic Retry Mechanism

PromptLayer includes a built-in retry mechanism to handle transient failures gracefully. This ensures your application remains resilient when temporary issues occur.

**Retry Behavior:**

* **Total Attempts**: 4 attempts (1 initial + 3 retries)
* **Exponential Backoff**: Retries wait progressively longer between attempts (2s, 4s, 8s)
* **Max Wait Time**: 15 seconds maximum wait between retries

**What Triggers Retries:**

* **5xx Server Errors**: Internal server errors, service unavailable, etc.
* **429 Rate Limit Errors**: When rate limits are exceeded

**What Fails Immediately (No Retries):**

* **Connection Errors**: Network connectivity issues
* **Timeout Errors**: Request timeouts
* **4xx Client Errors** (except 429): Bad requests, authentication errors, not found, etc.

<Info>
  The retry mechanism operates transparently in the background. You don't need to implement retry logic yourself - PromptLayer handles it automatically for recoverable errors.
</Info>

### Logging

PromptLayer uses Python's built-in `logging` module for all log output:

```python theme={null}
import logging
from promptlayer import PromptLayer

# Configure logging to see PromptLayer logs
logging.basicConfig(level=logging.INFO)

promptlayer_client = PromptLayer()

# Now you'll see log output from PromptLayer operations
```

**Setting log levels:**

```python theme={null}
import logging

# Get the PromptLayer logger
logger = logging.getLogger("promptlayer")

# Set to WARNING to only see warnings and errors
logger.setLevel(logging.WARNING)

# Set to DEBUG to see detailed information
logger.setLevel(logging.DEBUG)
```

**Viewing Retry Logs:**

When retries occur, PromptLayer logs warnings before each retry attempt:

```python theme={null}
import logging
from promptlayer import PromptLayer

# Set up logging to see retry attempts
logging.basicConfig(
    level=logging.WARNING,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

promptlayer_client = PromptLayer()

# If a retry occurs, you'll see log messages like:
# "Retrying in 2 seconds..."
# "Retrying in 4 seconds..."
```

## Async Support

PromptLayer supports asynchronous operations, ideal for managing concurrent tasks in non-blocking environments like web servers, microservices, or Jupyter notebooks.

### Initializing the Async Client

To use asynchronous non-blocking methods, initialize AsyncPromptLayer as shown:

```python theme={null}
from promptlayer import AsyncPromptLayer

# Initialize an asynchronous client with your API key
async_promptlayer_client = AsyncPromptLayer(api_key="pl_****")
```

### Async Usage Examples

The asynchronous client functions similarly to the synchronous version, but allows for non-blocking execution with `asyncio`. Below are example uses.

#### Example 1: Async Template Management

Use asynchronous methods to manage templates:

```python theme={null}
import asyncio
from promptlayer import AsyncPromptLayer

async def main():
    async_promptlayer_client = AsyncPromptLayer(api_key="pl_****")

    # Fetch a template asynchronously
    template = await async_promptlayer_client.templates.get("Test1")
    print(template)

    # Fetch all templates asynchronously
    templates = await async_promptlayer_client.templates.all()
    print(templates)

# Run the async function
asyncio.run(main())
```

#### Example 2: Async Workflow Execution

Run Workflows asynchronously for better efficiency:

```python theme={null}
import asyncio
from promptlayer import AsyncPromptLayer

async def main():
    async_promptlayer_client = AsyncPromptLayer(api_key="pl_****")

    response = await async_promptlayer_client.run_workflow(
        workflow_name="example_workflow",
        workflow_version=1,
        input_variables={"num1": "1", "num2": "2"},
        return_all_outputs=True,
    )
    print(response)

# Run the async function
asyncio.run(main())
```

#### Example 3: Async Tracking and Logging

Track and log requests asynchronously:

```python theme={null}
import asyncio
from promptlayer import AsyncPromptLayer

async def main():
    async_promptlayer_client = AsyncPromptLayer(api_key="pl_****")

    # Track metadata asynchronously
    request_id = "pl_request_id_example"
    await async_promptlayer_client.track.metadata(request_id, {"key": "value"})

    # Log request asynchronously (for detailed logging, refer to the custom logging page)
    await async_promptlayer_client.log_request(
        provider="openai",
        model="gpt-3.5-turbo",
        input=prompt_template,
        output=output_template,
        request_start_time=1630945600,
        request_end_time=1630945605,
    )

# Run the async function
asyncio.run(main())
```

For more information on custom logging, please visit our [Custom Logging Documentation](/features/prompt-history/custom-logging).

#### Example 4: Asynchronous Prompt Execution with run Method

You can execute prompt templates asynchronously using the run method. This allows you to run a prompt template by name with given input variables.

```python theme={null}
import asyncio
from promptlayer import AsyncPromptLayer

async def main():
    async_promptlayer_client = AsyncPromptLayer(api_key="pl_****")

    # Execute a prompt template asynchronously
    response = await async_promptlayer_client.run(
        prompt_name="TestPrompt",
        input_variables={"variable1": "value1", "variable2": "value2"}
    )
    print(response)

# Run the async function
asyncio.run(main())
```

#### Example 5: Asynchronous Streaming Prompt Execution with run Method

You can run streaming prompt template using the run method as well.

```python theme={null}

import asyncio
import os
from promptlayer import AsyncPromptLayer


async def main():
    async_promptlayer_client = AsyncPromptLayer(
        api_key=os.environ.get("PROMPTLAYER_API_KEY")
    )

    response_generator = await async_promptlayer_client.run(
        prompt_name="TestPrompt",
        input_variables={"variable1": "value1", "variable2": "value2"}, stream=True
    )

    final_response = ""
    async for response in response_generator:
        # Access raw streaming response
        print("Raw streaming response:", response["raw_response"])
        
        # Access progressively built prompt blueprint
        if response["prompt_blueprint"]:
            current_response = response["prompt_blueprint"]["prompt_template"]["messages"][-1]
            if current_response.get("content"):
                print(f"Current response: {current_response['content']}")

# Run the async function
asyncio.run(main())
```

In this example, replace "TestPrompt" with the name of your prompt template, and provide any required input variables. When streaming is enabled, each chunk includes both the raw streaming response and the progressively built `prompt_blueprint`, allowing you to track how the response is constructed in real-time.

***

Want to say hi 👋, submit a feature request, or report a bug? [✉️ Contact us](mailto:hello@promptlayer.com)


# Self-Hosted PromptLayer
Source: https://docs.promptlayer.com/self-hosted

Deploy PromptLayer in your own infrastructure for complete data control and compliance

# Self-Hosted PromptLayer

<Note>
  Self-hosted PromptLayer is an **Enterprise-only** feature. [Contact us](mailto:hello@promptlayer.com) to learn more about licensing and custom installation support.
</Note>

PromptLayer can be deployed entirely within your own infrastructure, giving you complete control over your data while maintaining all the powerful features of our cloud platform. Our self-hosted solution inherits the same SOC 2, HIPAA, and GDPR compliance standards as our cloud offering.

## Architecture Overview

<img alt="PromptLayer Architecture" />

### Core Components

Our self-hosted architecture consists of fully dockerized services designed for scalability and reliability:

#### **Frontend**

The web interface for accessing PromptLayer's dashboard, analytics, and management features. Connects directly to the Backend API Service for all operations.

#### **Backend API Service**

The core Python Flask application that handles all API requests, authentication, and business logic. This service orchestrates communication between all other components and serves as the primary entry point for both the frontend and SDK integrations.

#### **PostgreSQL Database (v15)**

The primary relational database storing all metadata, configurations, user data, and system state. We use PostgreSQL 15 for its robust performance, reliability, and advanced features.

#### **Object Storage**

High-performance storage for request/response data, logs, and large payloads. Supports both Amazon S3 and Google Cloud Storage, allowing you to use your existing cloud storage infrastructure.

#### **Redis/Valkey (v8.1.0)**

In-memory data store using Valkey 8.1.0 (Redis-compatible) for:

* Job queue management
* Caching frequently accessed data
* Session management
* Real-time data processing

#### **Background Services**

* **APScheduler**: Handles scheduled tasks, periodic jobs, and cron-like operations
* **Celery Background Workers**: Distributed task queue for asynchronous processing, data pipelines, and heavy computations
* **Redis Queue Background Workers**: Lightweight job processing for real-time operations and quick tasks

#### **Code Executor**

Isolated Docker container environment for safely executing code blocks in evaluations and the workflow builder. Provides sandboxed execution with resource limits and security controls.

## System Requirements

### Minimum Infrastructure

For a production deployment, you'll need:

* **5+ backend nodes** for core services (exact number depends on scale)
* **2 Redis/Valkey instances**
* **1 PostgreSQL instance** (with recommended replication for production)
* **Object storage** (S3 or GCS bucket)

<Info>
  We provide consultation to help determine the optimal number of backend nodes and resource allocation based on your expected usage patterns and scale requirements.
</Info>

### Supported Platforms

* **PostgreSQL**: Version 15
* **Redis**: Valkey 8.1.0 (Redis-compatible)
* **Object Storage**: Amazon S3, Google Cloud Storage
* **Container Runtime**: Docker, Kubernetes

## Deployment Options

### Cloud Providers

We provide pre-built Docker images with Helm charts for:

<CardGroup>
  <Card title="AWS" icon="aws" href="/enterprise-deployments/aws">
    Optimized for Amazon Web Services with EKS, RDS, and S3 integration. See the [AWS deployment guide](/enterprise-deployments/aws) for infrastructure and Helm install steps.
  </Card>

  <Card title="Google Cloud" icon="google">
    Designed for Google Cloud Platform with GKE, Cloud SQL, and GCS
  </Card>
</CardGroup>

<Note>
  While we officially support AWS and GCP, we can work with other cloud providers based on your requirements. Contact our team for custom deployment options.
</Note>

### Deployment Methods

1. **Kubernetes with Helm** (Recommended for production)
   * Full orchestration and scaling capabilities
   * Built-in health checks and auto-recovery
   * Horizontal pod autoscaling support

2. **Docker Compose** (Development/testing)
   * Quick setup for evaluation
   * Suitable for single-node deployments

3. **Custom Installation**
   * Available with enterprise support
   * Tailored to your specific infrastructure

## Security & Authentication

### Authentication Methods

* **Built-in Authentication**: Default user management system with secure password policies
* **Single Sign-On (SSO)**: Integration via Auth0 supporting:
  * SAML 2.0
  * OAuth 2.0 / OpenID Connect
  * Active Directory / LDAP

### Compliance & Security

Our self-hosted solution maintains the same security standards as our cloud platform:

* **SOC 2 Type II** compliant architecture
* **HIPAA** ready configurations
* **GDPR** compliant data handling
* Encryption at rest and in transit
* API key management with role-based access control
* Audit logging and compliance reporting

## Scaling & Performance

### Auto-scaling Configuration

We expose parameters for automatic scaling based on:

* CPU utilization
* Memory usage
* Queue depth
* Request rate

Our team provides consultation to help configure auto-scaling rules optimized for your usage patterns.

### High Availability

* Multi-node backend deployment
* Redis replication with automatic failover
* PostgreSQL streaming replication
* Load balancing across service instances

## Monitoring & Maintenance

### Observability

We recommend **Datadog** for comprehensive monitoring, providing:

* Real-time metrics and dashboards
* Log aggregation and analysis
* APM tracing
* Custom alerts and notifications

Alternative monitoring solutions can be integrated based on your existing infrastructure.

### Updates & Upgrades

We follow industry best practices for updates:

* **Versioned Docker images** with detailed release notes
* **Rolling updates** via Kubernetes
* **Automated database migrations** with rollback capabilities
* **GitOps-compatible** deployment workflows

<Warning>
  Always review release notes and test updates in a staging environment before applying to production.
</Warning>

## Data Management

### Migration Support

* **Export from Cloud**: We provide full data export from PromptLayer Cloud for migration to self-hosted
* **Import tools**: Automated scripts for importing existing data
* **Zero-downtime migration**: Support for gradual migration strategies

### Backup & Recovery

* Automated backup schedules for PostgreSQL and object storage
* Point-in-time recovery capabilities
* Disaster recovery playbooks
* Data retention policies configurable to your requirements

## Licensing & Support

### License Tiers

<CardGroup>
  <Card title="Basic Self-Hosted" icon="server">
    * Software license
    * Documentation access
    * Community support
    * Quarterly updates
  </Card>

  <Card title="Enterprise Support" icon="building">
    * Priority support SLA
    * Custom installation assistance
    * Dedicated success manager
    * Training and onboarding
  </Card>
</CardGroup>

### Professional Services

Our team offers additional services to ensure successful deployment:

* **Installation Support**: Expert assistance with initial setup
* **Architecture Review**: Optimization recommendations for your use case
* **Custom Integration**: Tailored solutions for unique requirements
* **Training**: Comprehensive onboarding for your team

## Getting Started

<Steps>
  <Step title="Contact Sales">
    [Reach out to our team](mailto:hello@promptlayer.com) to discuss your requirements and obtain a license.
  </Step>

  <Step title="Architecture Planning">
    Work with our solutions team to design your deployment architecture.
  </Step>

  <Step title="Deployment">
    Receive access to Docker images, Helm charts, and deployment guides. For AWS, follow the [enterprise AWS deployment guide](/enterprise-deployments/aws).
  </Step>

  <Step title="Configuration">
    Configure authentication, monitoring, and scaling parameters.
  </Step>

  <Step title="Migration">
    Import existing data from PromptLayer Cloud if applicable.
  </Step>

  <Step title="Go Live">
    Launch your self-hosted PromptLayer instance with ongoing support.
  </Step>
</Steps>

## Frequently Asked Questions

<AccordionGroup>
  <Accordion title="How long does deployment typically take?">
    Initial deployment can be completed in 1-2 days with our pre-built Helm charts. Custom installations may take 3+ days depending on requirements.
  </Accordion>

  <Accordion title="Can I use my existing PostgreSQL and Redis instances?">
    Yes, as long as they meet our version requirements (PostgreSQL 15+, Valkey/Redis 8.1.0+). We'll help validate compatibility during setup.
  </Accordion>

  <Accordion title="What happens to my data if I switch from cloud to self-hosted?">
    We provide complete data export tools and migration support. Your data remains intact and accessible throughout the transition.
  </Accordion>

  <Accordion title="How are updates handled?">
    We release versioned Docker images quarterly with security patches as needed. You control when and how updates are applied to your instance.
  </Accordion>

  <Accordion title="Is there a difference in features between cloud and self-hosted?">
    Self-hosted PromptLayer includes all core features of our cloud platform. Some cloud-specific integrations may require additional configuration.
  </Accordion>
</AccordionGroup>

## Contact Us

Ready to deploy PromptLayer in your infrastructure? Our enterprise team is here to help.

<Card title="Get Started" icon="rocket" href="mailto:hello@promptlayer.com">
  Contact our team to discuss your self-hosted deployment requirements and get a customized solution.
</Card>


# Tutorial Videos
Source: https://docs.promptlayer.com/tutorial-videos


These tutorial videos will walk you through key features and help you get up and running quickly.

Let's start with a quick tour of PromptLayer in action: prompt management, testing, deployment, workflows, and evaluations all in one place.

<iframe />

Read more about the [Bakery Demo on our blog](https://blog.promptlayer.com/promptlayer-bakery-demo/) to learn how we built Artificial Indulgence and how it showcases PromptLayer's features in action.

## Prompt Management

### Creating Your First Prompt

Learn how to create structured templates with variables that you can reuse across your applications.

<iframe />

### Deploying Prompts to Production

Learn how to safely deploy prompt versions to production and staging environments.

<iframe />

## Evaluation & Testing

### Building Your First Evaluation

Create evaluations that are use-case specific and help prevent prompt regression.

<iframe />

### Choosing the Best AI Model

Learn how to test your use cases across multiple models to find the best fit.

<iframe />

### Testing With Production Data

Build comprehensive test sets from your historical data for effective evaluation.

<iframe />

### Conversation Simulation Evals

Learn how to evaluate conversational AI systems using simulated user interactions.

<iframe />

## Workflows

### Building Multi-Step Workflows

Chain multiple LLM calls together to tackle complex problems.

<iframe />

## Logging and Optimization

### Monitoring LLM Requests

Track and analyze your LLM requests in real-time to understand usage patterns, monitor performance, and debug issues across your applications.

<iframe />

### A/B Testing for Prompts

Systematically compare different prompt versions to make data-driven decisions.

<iframe />

## Additional Resources

For more in-depth information, check out our comprehensive documentation:

* [Prompt Management Guide](https://docs.promptlayer.com/onboarding-guides/prompt-management)
* [Evaluation Guide](https://docs.promptlayer.com/onboarding-guides/evaluation)
* [Workflows Guide](https://docs.promptlayer.com/onboarding-guides/agentic-workflows)
* [Observability Guide](https://docs.promptlayer.com/onboarding-guides/observability)


# A/B Testing
Source: https://docs.promptlayer.com/why-promptlayer/ab-releases


A/B Releases is a powerful feature that allows you to test different versions of your prompts in production, safely roll out updates, and segment users. 🚀

For technical details and usage instructions, check out the [Dynamic Release Labels](/features/prompt-registry/dynamic-release-labels) page.

## Overview

A/B Releases work by dynamically overloading your release labels. You can split traffic between different prompt versions based on percentages or user segments. This lets you:

* Test new prompt versions with a subset of users before a full rollout
* Gradually release updates to minimize risk
* Segment users to receive specific versions (e.g., beta users, internal employees)

## Use Cases

### Testing Prompt Updates

Have a stable prompt version that's working well but want to test an update? Create an A/B Release!

You can direct a small percentage of traffic (e.g., 20%) to the new version. If there are no issues after a week, you can slowly increase the percentage. This minimizes the risk of rolling out an update to all users at once.

<img alt="Dynamic Release Labels Diagram" />

### Gradual Rollouts

Ready to roll out a new prompt version but want to minimize risk? Use A/B Releases to gradually ramp up traffic to the new version.

Start with a 5% rollout, then increase to 10%, 25%, 50%, and eventually 100% as you gain confidence in the new version. This staged approach ensures a smooth transition for your users.

### User Segmentation

Want to give certain users access to a dev version of your prompt? A/B Releases make this easy.

Define user segments based on metadata (e.g., user ID, company) and specify which prompt version each segment should receive. This lets you test new versions with beta users or give internal employees access to dev versions.

For example, you could create a segment for internal user IDs and configure their traffic split to be 50% dev version and 50% stable version. Alternatively, you could segment based on the user's subscription level, giving free users access to experimental prompt versions first before rolling them out to paying customers. This allows you to gather feedback and iterate on new features without affecting your premium user base.

***

A/B Releases give you the power to experiment, safely roll out updates, and deliver targeted experiences. Try it out and take control of your prompt releases! 🎉


# Advanced Search
Source: https://docs.promptlayer.com/why-promptlayer/advanced-search


PromptLayer advanced search capabilities allows you to find exactly what you want using tags, search queries, metadata, favorites, and score filtering.

## Using the Search Bar

To start your search, enter the keywords you want to find into the search bar and click on the "Search" button. You can use freeform search to find any text within the PromptLayer.

<img alt="Advanced Search" />

## Advanced Search Filters

#### Metadata Search

Use the metadata search filter to search for specific metadata within the PromptLayer. You can search for user IDs, session IDs, tokens, error messages, status codes, and other metadata by entering the metadata field name and value into the search bar.

PromptLayer allows you to attach multiple key value pairs as metadata to a request. In the dashboard, you can look up requests and analyze analytics using metadata. The method for adding metadata to a request can be found in our documentation [here](/features/prompt-history/metadata).

<CodeGroup>
  ```python Python theme={null}
  promptlayer_client.track.metadata(
    request_id=pl_request_id,
    metadata={
        "user_id":"1abf2345f",
        "session_id": "2cef2345f",
        "error_message": "None"
    }
  )
  ```

  ```js JavaScript theme={null}
  promptLayerClient.track.metadata({
    request_id:pl_request_id,
    metadata:{
        "user_id":"1abf2345f",
        "session_id": "2cef2345f",
        "error_message": "None"
    }
  })
  ```
</CodeGroup>

The metadata search filter works by clicking on "Key" in the advanced search filter, selecting the desired metadata key (in this case, user\_id), selecting the relevant value under "Value", and clicking "Add filter".

<img />

#### Score Filtering

Use the score filtering feature to search for prompts based on their scores. You can filter prompts by selecting the score range in the "Score" dropdown.

Score filtering is a powerful tool for analyzing the performance of your prompts. You can use it to identify high-performing prompts, or to find prompts that may need improvement.

<img />

Below is an example of how you can score a request programmatically. It can also be done through the dashboard as shown [here](/features/prompt-history/scoring-requests).

<CodeGroup>
  ```python Python theme={null}
  promptlayer_client.track.score(
    request_id=pl_request_id, 
    score_name="summarization", # optional score name
    score=100
  )
  ```

  ```js JavaScript theme={null}
  promptLayerClient.track.score({
    request_id: pl_request_id,
    score: 100
  })
  ```
</CodeGroup>

#### Tags Search

Use the tags search filter to search for specific tags within the PromptLayer.

Tags are used to group product features, prod/dev versions, and other categories. You can search for tags by selecting them in the "Tags" dropdown.

Tagging a request is easy. Read more about it [here](/features/prompt-history/tagging-requests).

<CodeGroup>
  ```python Python theme={null}
  from promptlayer import PromptLayer
  pl_client = PromptLayer()

  response = pl_client.run(
    prompt_name="my-prompt",
    input_variables={"name": "world"},
    tags=["mytag1", "mytag2"]
  )
  ```

  ```js JavaScript theme={null}
  import { PromptLayer } from "promptlayer";
  const plClient = new PromptLayer();

  const response = await plClient.run({
    promptName: "my-prompt",
    inputVariables: { name: "world" },
    tags: ["mytag1", "mytag2"]
  });
  ```
</CodeGroup>

#### Favorites

By selecting the "favorite" tag, you can narrow by favorited requests. To favorite a request, click the star on the top right on the dashboard.

<img alt="Favorites" />


# Analytics
Source: https://docs.promptlayer.com/why-promptlayer/analytics


The Analytics page provides valuable insights into the performance and usage of your application. By leveraging various features and metrics, you can make data-driven decisions to optimize your application and enhance user experience. This documentation will guide you through the different analytics features available.

<img alt="Analytics Page Screenshot" />

## Metrics

Here you can find key performance indicators to assess your application's performance and track its usage. Metrics include: average latency, total cost, and total requests. These metrics provide valuable information on response time, financial impact, and usage volume.

## Analyzing Usage Patterns

Understanding usage patterns is crucial for optimizing your application and improving user experience. Analyzing usage patterns involves exploring prompt registry states, model distributions, tokens and requests over time, latency and cost analytics, and prompt template overall costs. These features provide insights into how prompts, models, and resources are utilized, helping you make informed decisions to enhance your application's performance.

## Filtering and Organization

To streamline your analysis, the analytics page offers filtering options based on metadata and tags.

### Filtering by Metadata

<img alt="Analytics Page Metadata Filter Screenshot" />

You can filter the analytics page using [metadata attributes](/features/prompt-history/metadata) such as user ID, location, version, and more. This allows you to narrow down the data and focus on specific segments for in-depth analysis.

### Tag Filtering

[Tag filtering](/features/prompt-history/tagging-requests) allows you to categorize and organize your requests based on specific tags you assign. It simplifies the process of analyzing specific groups of requests, making it easier to identify trends and patterns.


# Fine-Tuning
Source: https://docs.promptlayer.com/why-promptlayer/fine-tuning


Fine-tuning is incredibly powerful. PromptLayer lets you build and iterate models in a few clicks.

If you are already logging your `gpt-4` requests in PromptLayer, it only takes a few clicks to fine-tune a `gpt-3.5-turbo` model on those requests! ✨

## What is fine-tuning?

Fine-tuning is a technique to specialize a pre-trained large language model (LLM) for a specific task. It involves training the LLM on a small dataset of examples, where the input is the text to be processed and the output is the desired output, such as a classification label, a translation, or a generated text.

Fine-tuning is powerful because it allows developers to create a model that is tailored to their specific needs. This could be used to improve model output quality, shorten a system prompt without degrading performance, or to decrease latency by building off of a smaller model.

Here are some examples of how fine-tuning can be used:

* **Reduce latency and cost:** Fine-tune `gpt-3.5-turbo` on `gpt-4` outputs to achieve `gpt-4`-quality results on a faster and cheaper model.
* **Save on tokens:** Generate training data using a long and complex prompt. When fine-tuning, change the prompt to something shorter and save on tokens.
* **Improve output format:** Generate synthetic training data to teach a base model to only output text in JSON.

## Create training data

The first step to fine-tuning is preparing the training data you want the model to learn from. Training data in this case are just LLM requests.

### Log in the background

The simplest way to do this is to just connect your application to PromptLayer and start logging requests. Just wait a week and your production users will have created tons of training data for you!

### Batch run prompts

Alternatively, you can use PromptLayer to generate these training requests. Visit the [Evaluations page](/features/evaluations/overview) to run batch jobs of your prompts.

For example, to generate fine-tuning data you can run a prompt template from the [Prompt Registry](/features/prompt-registry) against 200 test cases on `gpt-4`. Then just filter the sidebar based its specific test run tag.

<img alt="Generate Training Data" />

## Select training data

Use the sidebar search area to filter for your training data. All the data that appears from that search query will be used to fine-tune.

[Learn more about search filters](/why-promptlayer/advanced-search)

<div>
  <img alt="Select Training Data" />
</div>

## Start the fine-tune job

Click "Fine-Tune" in the sidebar, follow the steps, and kick off a job.

## Test out your new model

**Success!** 🎉 Now you have a new fine-tuned model. Let's see if it's any good...

<div>
  <img alt="Successful fine-tuning" />
</div>

### Try it in Playground

Copy the model name and navigate to the PromptLayer Playground. There you can run an arbitrary request on the new model. See how it does!

<div>
  <img alt="Try out Fine-Tuned Model" />
</div>

### Try it in Evaluations

It's important to test your fine-tune model a little more rigorously than one-off Playground requests. Navigate to the [Evaluations page](/features/evaluations/overview) and run some batch tests. See how the fine-tuned candidate compares to a standard `gpt-4` candidate.

<img alt="Evaluate fine-tuned model" />


# Multi-Turn Chat
Source: https://docs.promptlayer.com/why-promptlayer/multi-turn-chat


Building reliable conversational AI systems requires careful management of state and conversation history. This guide explains how to implement multi-turn chat using PromptLayer's stateless approach, which enhances reliability and makes your workflows easier to test and debug.

## Why Stateless Turns?

Traditional conversational AI systems often maintain complex internal state, making them difficult to debug, test, and scale. The stateless approach treats each turn of the conversation as an independent, deterministic function that receives all necessary context through input variables.

### The Black Box Approach

The best way to build reliable conversational AI is to treat each turn as a black box. You provide inputs (conversation history, current query, available tools) and receive outputs (response, tool calls, next actions). This approach optimizes for rapid development and iteration - you're simply crafting prompts in natural language and validating outputs. By building your conversational system around this principle, you enable quick prompt iterations and fast feedback cycles, which are essential for developing robust multi-turn interactions.

The stateless approach particularly shines when it comes to systematic evaluation of conversation flows. For a deeper dive into evaluating multi-turn conversations, check out our blog post on [best practices for evaluating back-and-forth conversational AI](https://blog.promptlayer.com/best-practi-to-evaluate-back-and-forth-conversational-ai-in-promptlayer/).

## Implementation Pattern

Here's the core pattern for implementing stateless multi-turn chat using [`promptlayer_client.run()`](/sdks/python#using-the-run-method-recommended):

### Basic Conversation (No Tools)

Maintain a running history of the conversation, adding each exchange as you go.

* Start with empty conversation history
* Loop:
  * Send user question + history to PromptLayer
  * Get AI response
  * Add both to history
  * Get next user question
  * If no more questions, exit loop

<Accordion title="View Flow Diagram">
  ```mermaid theme={null}
  flowchart TD
      Start([Start with empty history]) --> Input[Get user question]
      Input --> Send[Send question + history to PromptLayer]
      Send --> Response[Get AI response]
      Response --> AddHistory[Add user question & AI response to history]
      AddHistory --> NextInput{Get next user question}
      NextInput -->|Has question| Send
      NextInput -->|No question| End([End conversation])
  ```
</Accordion>

<Accordion title="View Python Code">
  ```python theme={null}
  def run_conversation(user_question):
      chat_history = []

      while True:
          # For basic prompts without tools
          result = promptlayer_client.run(
              prompt_name="multi-turn-assistant",
              input_variables={
                  "user_question": user_question,
                  "chat_history": chat_history
              },
              tags=["multi-turn-chat"]
          )

          # Extract the assistant's last message
          last_message = result["prompt_blueprint"]["prompt_template"]["messages"][-1]

          # Add to conversation history
          chat_history.append({
              "role": "user",
              "content": [{"type": "text", "text": user_question}]
          })
          chat_history.append(last_message)

          # Get next user input
          user_question = get_next_user_input()
          if not user_question:
              break

      return last_message if 'last_message' in locals() else None
  ```
</Accordion>

### Conversation with Tools

The AI can make multiple tool calls before responding, accumulating results in a separate message buffer. For a deeper understanding of when and how to use tool calling, check out our blog post on [tool calling with LLMs](https://blog.promptlayer.com/tool-calling-with-llms-how-and-when-to-use-it/).

* Start with empty history and empty tool messages
* Loop:
  * Send user question + history + tool messages to PromptLayer
  * Get AI response
  * If AI wants to use tools:
    * Add AI message to tool messages
    * Execute each tool
    * Add tool results to tool messages
    * Loop back (AI might need more tools)
  * Else (final response):
    * Add everything to history
    * Clear tool messages for next turn
    * Get next user question

<Accordion title="View Flow Diagram">
  ```mermaid theme={null}
  flowchart TD
      Start([Start with empty history & tool messages]) --> Input[Get user question]
      Input --> Send[Send question + history + tool messages to PromptLayer]
      Send --> Response[Get AI response]
      Response --> CheckTools{AI wants to use tools?}

      CheckTools -->|Yes| AddToolCall[Add AI message to tool messages]
      AddToolCall --> ExecuteTool[Execute each tool]
      ExecuteTool --> AddToolResult[Add tool results to tool messages]
      AddToolResult --> Send

      CheckTools -->|No - Final response| AddToHistory[Add everything to history]
      AddToHistory --> ClearTools[Clear tool messages]
      ClearTools --> NextInput{Get next user question}
      NextInput -->|Has question| Send
      NextInput -->|No question| End([End conversation])
  ```
</Accordion>

<Accordion title="View Python Code">
  ```python theme={null}
  def run_conversation_with_tools(user_question):
      chat_history = []
      ai_in_progress = []  # Messages from AI's tool interactions

      while True:
          # With tools, include ai_in_progress for multi-step operations
          result = promptlayer_client.run(
              prompt_name="multi-turn-assistant-with-tools",
              input_variables={
                  "chat_history": chat_history,
                  "user_question": user_question,
                  "ai_in_progress": ai_in_progress  # [AI call, tool response, AI call, tool response, ...]
              },
              tags=["multi-turn-chat"]
          )

          last_message = result["prompt_blueprint"]["prompt_template"]["messages"][-1]

          # Check if conversation should end
          function_call = last_message.get("function_call")
          if function_call and function_call.get("name") == "end_conversation":
              return last_message if 'last_message' in locals() else None

          # Handle tool calls - AI might need multiple tool calls before responding
          if last_message.get("tool_calls") or last_message.get("function_call"):
              # Add AI's message with tool call to ai_in_progress
              ai_in_progress.append(last_message)

              # Execute tool and add response
              if last_message.get("tool_calls"):
                  for tool_call in last_message["tool_calls"]:
                      tool_result = execute_tool(tool_call)
                      ai_in_progress.append({
                          "role": "tool",
                          "content": [{"type": "text", "text": str(tool_result)}],
                          "tool_call_id": tool_call["id"]
                      })
              elif last_message.get("function_call"):
                  tool_result = execute_tool(last_message["function_call"])
                  ai_in_progress.append({
                      "role": "function",
                      "name": last_message["function_call"]["name"],
                      "content": str(tool_result)
                  })
              # Loop continues - AI can make another tool call or respond to user
          else:
              # AI provided final response - add everything to history
              chat_history.append({
                  "role": "user",
                  "content": [{"type": "text", "text": user_question}]
              })

              # Add any tool interactions from ai_in_progress to history
              if ai_in_progress:
                  chat_history.extend(ai_in_progress)

              # Add final response
              chat_history.append(last_message)

              # Clear ai_in_progress for next user turn
              ai_in_progress = []

              # Get next user input
              user_question = get_next_user_input()
              if not user_question:
                  break

      return last_message if 'last_message' in locals() else None
  ```
</Accordion>

<Note>
  You can also implement this pattern using workflows for more complex pipelines with multiple nodes and conditional logic. See [Running Workflows](/sdks/python#running-workflows) for details on using `promptlayer_client.run_workflow()`.
</Note>

## Designing Your Stateless Prompt

Your prompt template should be designed to receive all necessary state through input variables. Here's an example of a properly configured multi-turn assistant with tools:

<img alt="Multi-turn assistant with tools prompt configuration" />

Notice how the prompt template includes:

* A system message with instructions and tool usage behavior
* Placeholder for `{{chat_history}}` to inject conversation context
* User message with `{{user_question}}`
* Placeholder for `{{ai_in_progress}}` to handle tool interactions

### Required Input Variables

1. **chat\_history**: Array of previous messages in the conversation
2. **user\_question**: The current user message or query
3. **ai\_in\_progress**: Array of messages representing ongoing tool interactions (only used with tools, placed AFTER user\_question)

### Understanding ai\_in\_progress

The `ai_in_progress` variable is specifically for handling multi-step tool interactions where the AI needs to make multiple tool calls before responding to the user. It's placed AFTER the user\_question because it represents the AI's response to that question. It contains a sequence of messages like:

* AI's tool call (in response to user\_question)
* Tool's response
* AI's next tool call
* Tool's response
* Final AI message to user

This allows the AI to perform complex operations (like searching, then filtering, then formatting results) without requiring user input between each step. Once the AI provides a final response, all these messages are moved to chat\_history for the next turn. If your prompt doesn't use tools, you can omit this variable entirely.

### Using Message Placeholders

[Message Placeholders](/features/prompt-registry/placeholder-messages) are crucial for injecting conversation context into your prompts. They allow you to dynamically insert the conversation history into your prompt template. For more details on template variables and dynamic prompts, see our [Template Variables](/features/prompt-registry/template-variables) guide:

```python theme={null}
# In your prompt template, use placeholders:
# Role: placeholder
# Content: {{chat_history}}
#
# Role: user
# Content: {{user_question}}

# For basic conversations without tools:
promptlayer_client.run(
    prompt_name="multi-turn-assistant",
    input_variables={
        "chat_history": [
            {
                "role": "user",
                "content": [{"type": "text", "text": "What's the weather?"}]
            },
            {
                "role": "assistant",
                "content": [{"type": "text", "text": "I'll check the weather for you."}]
            }
        ],
        "user_question": "How about tomorrow?"
    }
)

# For conversations with tools, add ai_in_progress after user_question:
# Role: placeholder
# Content: {{chat_history}}
#
# Role: user
# Content: {{user_question}}
#
# Role: placeholder
# Content: {{ai_in_progress}}
```

## Handling Tool Calls

For workflows that use tools, maintain tool state externally. See our [Tool Calling](/features/prompt-registry/tool-calling) documentation for setting up tool definitions in your prompts:

```python theme={null}
def handle_tool_execution(agent_response, tool_registry):
    """Execute tools and prepare results for next turn"""

    if not agent_response.get("tool_calls"):
        return None

    tool_results = []
    for call in agent_response["tool_calls"]:
        tool = tool_registry.get(call["name"])
        if tool:
            result = tool.execute(call["arguments"])
            tool_results.append({
                "tool": call["name"],
                "call_id": call["id"],
                "result": result
            })

    return tool_results
```

## Integration with Evaluation

The stateless approach makes it easy to evaluate your conversational AI:

1. **Record real conversations** as sequences of inputs and outputs
2. **Replay conversations** with modified parameters to test variations
3. **Evaluate individual turns** for quality and correctness
4. **Test edge cases** by crafting specific conversation states

Learn more about setting up comprehensive evaluations in our [Evaluation and Ranking](/features/evaluations/overview) guide.

## Next Steps

* Explore [Message Placeholders](/features/prompt-registry/placeholder-messages) for dynamic prompt construction
* Set up [Evaluations](/features/evaluations/overview) for your conversational flows
* Learn about [Workflow development](/why-promptlayer/workflows) for complex pipelines
* Read our guide on [Tool Calling](/features/prompt-registry/tool-calling) for implementing tool-enabled assistants
* Check out [Structured Outputs](/features/prompt-registry/structured-outputs) for formatted responses


# Organizations
Source: https://docs.promptlayer.com/why-promptlayer/organizations

Manage company-level members, roles, billing, settings, and usage across PromptLayer workspaces.

## Organizations

An organization represents your company or top-level entity in PromptLayer. It provides:

* Centralized billing and subscription management
* User management across all workspaces
* Organization-wide settings and configurations
* Usage tracking and analytics

### Organization Roles

Organizations support different user roles:

* **Owner**: Full administrative access, billing management, and ability to manage all workspaces
* **Admin**: Can manage users, create workspaces, and access all workspace data
* **Member**: Default role with access to assigned workspaces

For more granular workspace-level permission control, [RBAC (Role-Based Access Control)](/why-promptlayer/rbac) allows you to define custom roles and fine-grained permissions at the workspace level.

### Managing Your Organization

Organization settings allow you to:

* Manage organization members and their roles
* View billing and subscription details
* Configure organization-wide settings
* Monitor usage across all workspaces

## Billing and Limits

* Billing is managed at the organization level
* Workspace limits depend on your subscription plan
* Usage is tracked across all workspaces in your organization


# Playground
Source: https://docs.promptlayer.com/why-promptlayer/playground


The Playground is a native way to create and run new LLM requests all through PromptLayer. Your run history will be tracked in the sidebar. The Playground is most useful as a tool to "replay" and debug old requests.

## Replay requests

The Playground allows PromptLayer users to rerun previous LLM requests. Simply click "Open in Playground" on any historical request.

<div>
  <img alt="Open in Playground screenshot" />
</div>

## OpenAI Tools

The Playground fully supports [OpenAI function calling](https://platform.openai.com/docs/guides/function-calling). These tools can be accessed directly from the Playground interface and can be incorporated into your requests as needed. *Not even OpenAI's playground does this 👀*

<div>
  <img alt="Tools dialogue" />
</div>

## Image Generation

The Playground supports image generation across multiple providers:

* **OpenAI Images API** — Select "Images API" in the API dropdown to use models like `gpt-image-1`, `dall-e-3`, and `dall-e-2`. Write a text prompt and receive generated images directly in the output.
* **OpenAI Responses API** — Add the `image_generation` built-in tool to let the model generate images during a conversation.
* **Google Gemini** — Select a Gemini image model (e.g., `gemini-2.5-flash-image`) to generate images natively within chat responses.

Generated images are displayed with a rich card showing generation parameters, the revised prompt (when available), and the image itself. See the full [Image Generation guide](/features/image-generation) for details.

<img alt="Playground with generated image output" />

## Custom models

The Playground also supports the use of custom models for LLM requests. This means you can use a fine-tuned model or a dedicated OpenAI instance.


# RBAC
Source: https://docs.promptlayer.com/why-promptlayer/rbac

Define reusable roles and assign fine-grained permissions to workspace members.

**Role-based Access Control (RBAC)** provides fine-grained permission management for your workspaces. With RBAC enabled, you can define roles at the organization level and apply them to workspace members, giving you precise control over who can do what in each workspace.

## How RBAC Works

RBAC operates on a simple but powerful principle: **roles are defined at the organization level, but applied at the workspace level**. This means you can create a set of roles once for your organization and reuse them across all your workspaces, while still maintaining workspace-specific access control.

### Role Definition (Organization Level)

Roles belong to your organization and can be used across all workspaces. You have two options:

* **Default Roles**: System-provided roles available to all organizations (`Contributor`, `Publisher`, `Developer`, `Admin`)
* **Custom Roles**: Organization-specific roles you create to match your team's needs

Each role consists of a name and a set of permissions that define what actions that role allows.

### Role Application (Workspace Level)

When you assign roles to workspace members, you're granting them specific permissions within that workspace. A user can have different roles in different workspaces, and can even have multiple roles in the same workspace. When a user has multiple roles, their effective permissions are the combination of all permissions from their assigned roles.

For example, if Alice has both `Contributor` and `Publisher` roles in Workspace A, she can edit prompts and deploy them to production. But if she only has the `Contributor` role in Workspace B, she can edit prompts there but cannot deploy them.

## Default Roles

<img alt="RBAC Roles" />

PromptLayer provides four default roles that cover most common use cases:

### Contributor

**What they can do**: Create and edit content

* **Prompt templates and versions**: Create, update, rename, delete, move, and duplicate prompt templates and versions
* **Workflows**: Create, update, rename, delete, move, and duplicate workflows and workflow versions
* **Datasets**: Create, update, rename, delete, move, and duplicate datasets and dataset groups
* **Reports**: Create, update, rename, delete, move, and duplicate reports and evaluations
* **Metadata**: Edit metadata associated with requests and entities

**Best for**: Team members who need to create and iterate on prompts, workflows, and evaluations, but don't need to deploy changes to production.

### Publisher

**What they can do**: Deploy changes to production

* Create and manage prompt labels
* Deploy prompt changes through changelogs
* Create and manage workflow labels
* Move labels between versions

**Best for**: Team members who need to deploy changes to production. Typically combined with `Contributor` for users who need both editing and deployment capabilities.

### Developer

**What they can do**: Manage API access

* Create, view, and delete API keys

**Best for**: Developers who need to manage API keys for programmatic access to your prompts and workflows.

### Admin

**What they can do**: Everything

* All permissions from other roles
* Manage workspace member roles and permissions
* Approve protected label changes
* Full administrative access

**Best for**: Workspace administrators who need complete control over the workspace.

<Warning>
  Users with the `Admin` role can perform destructive workspace-wide actions,
  including inviting and removing other members from the workspace, even other
  admins. Grant this role only to trusted team members who need full
  administrative control.
</Warning>

## Custom Roles

Beyond the default roles, you can create custom roles tailored to your organization's specific needs. Custom roles are defined at the organization level and can be reused across all workspaces.

Only organization owners can create custom roles. When creating a custom role, you select which permissions to include, allowing you to create roles that match your team's workflow exactly. For example, you might create a "QA Tester" role that can only edit reports and datasets, or a "Deployment Manager" role that combines `Publisher` and `Developer` permissions.

### Creating a Custom Role

<Note>
  RBAC must be enabled for your organization before you can create or view
  custom roles. Only organization owners can create custom roles.
</Note>

1. Go to your organization settings
2. Select your organization to open its details
3. Open the **Workspace Roles** tab
4. Click **+ Create Role** in the top right
5. Enter a **Role Name** that describes the role's purpose (e.g. "QA Tester")
6. Select the **permissions** to include, grouped by resource (Prompts, Workflows, Datasets, Evaluations, Workspace)
7. Click **Create Role** to save

<img alt="Create Custom Role" />

Once created, the role appears alongside the default roles in the **Workspace Roles** tab and can be assigned to any workspace member in your organization. You can update or delete custom roles from the same tab using the row actions menu.

<Warning>
  Granting the `ADMIN` permission gives a role full administrative control,
  including the ability to manage members and other roles. Only include it when
  you intend to create an admin-equivalent role.
</Warning>

## Permissions

RBAC uses fine-grained permissions that control specific actions:

* **`PROMPT_CREATE`**: Create and duplicate prompt templates
* **`PROMPT_EDIT`**: Edit prompt templates, create versions, modify metadata
* **`PROMPT_DELETE`**: Delete prompt templates
* **`PROMPT_DEPLOY`**: Create labels, deploy changes, move labels between versions
* **`WORKFLOW_CREATE`**: Create and duplicate workflows
* **`WORKFLOW_EDIT`**: Edit workflows, create versions, modify structure
* **`WORKFLOW_DELETE`**: Delete workflows
* **`WORKFLOW_DEPLOY`**: Create workflow labels, deploy workflow changes
* **`DATASET_CREATE`**: Create and duplicate datasets and dataset groups
* **`DATASET_EDIT`**: Edit datasets and dataset groups
* **`DATASET_DELETE`**: Delete datasets and dataset groups
* **`REPORT_CREATE`**: Create and duplicate reports and evaluations
* **`REPORT_EDIT`**: Edit and run reports and evaluations
* **`REPORT_DELETE`**: Delete reports and evaluations
* **`METADATA_EDIT`**: Edit metadata associated with requests
* **`MANAGE_API_KEYS`**: Create and delete API keys
* **`ADMIN`**: Full administrative access, including managing member roles

## Enabling RBAC

RBAC is enabled per organization. When RBAC is disabled, all workspace members receive default permissions (all permissions except `ADMIN`) automatically. When RBAC is enabled, workspace members have no permissions by default and must be explicitly assigned roles to gain access.

This secure-by-default approach ensures that when RBAC is enabled, users only get the permissions they need, following the principle of least privilege.

## Managing Roles

Users with the `ADMIN` permission in a workspace can assign roles to members of that workspace. Organization owners can assign roles to members in any workspace within their organization.

To manage a member's roles:

1. Go to your organization settings
2. Select your organization to open its details
3. On the **Members** tab, click the member you want to manage
4. In the **User Details** panel, open the **User's Workspaces** tab
5. On the workspace row you want to update, open the row actions menu
6. Choose **Manage Roles**

<img alt="Manage Roles" />

When assigning roles, remember that:

* Roles are assigned to workspace members (not directly to users)
* Effective permissions are the union of all permissions from assigned roles
* Role assignments are workspace-specific

## Best Practices

* **Start with default roles**: The default roles cover most common scenarios. Use them before creating custom roles.
* **Follow least privilege**: Only grant the minimum permissions needed for each team member to do their job.
* **Combine roles strategically**: Assign multiple roles when users need permissions from different roles (e.g., `Contributor` + `Publisher` for someone who edits and deploys).
* **Review regularly**: Periodically review role assignments to ensure they still match your team's needs as roles and responsibilities evolve.
* **Use custom roles thoughtfully**: Create custom roles when you have a recurring pattern that doesn't fit the default roles, not for one-off cases.


# Overview
Source: https://docs.promptlayer.com/why-promptlayer/shared-workspaces

Use organizations and workspaces to manage teams, environments, access, billing, and shared resources in PromptLayer.

Organizations and workspaces help your team separate projects, environments, and access in PromptLayer. Use them to manage members, organize resources, control permissions, and track usage across your company.

An **organization** is the top-level account for your company. It manages billing, members, settings, and usage across every workspace.

A **workspace** is a shared environment inside an organization. Teams use workspaces to collaborate on prompts, evaluations, datasets, logs, and other PromptLayer resources.

## What you can do

* Manage members, organization roles, billing, settings, and usage in one place.
* Create workspaces for teams, projects, or environments such as production, staging, and development.
* Share prompts, evaluations, datasets, and request history with the right teammates.
* Control workspace access with default roles, custom roles, and fine-grained RBAC.

## How it fits together

1. Create or join an organization for your company.
2. Create workspaces for the teams, projects, or environments you want to separate.
3. Invite members and assign organization roles or workspace roles.
4. Use RBAC when you need precise permissions for editing, publishing, API keys, datasets, evaluations, and admin actions.
5. Review billing and usage at the organization level.

## Next steps

<CardGroup>
  <Card title="Organizations" icon="users-gear" href="/why-promptlayer/organizations">
    Manage company-level members, roles, billing, settings, and usage.
  </Card>

  <Card title="Workspaces" icon="handshake-simple" href="/why-promptlayer/workspaces">
    Create shared environments for teams, projects, and deployment stages.
  </Card>

  <Card title="RBAC" icon="shield" href="/why-promptlayer/rbac">
    Define reusable roles and assign fine-grained permissions by workspace.
  </Card>
</CardGroup>


# Voice Agents
Source: https://docs.promptlayer.com/why-promptlayer/voice-agents


Voice agents represent a powerful evolution in AI-powered customer interactions, combining speech-to-text (STT), language understanding, and text-to-speech (TTS) to create natural conversational experiences. PromptLayer provides comprehensive tools to help you build, observe, and continuously evaluate voice agents—from prompt management and multi-step workflows to rigorous testing and cost tracking.

## How PromptLayer Helps with Voice Agents

Building a production-ready voice agent (like an after-hours appointment assistant or customer support line) requires careful orchestration of multiple AI components. PromptLayer serves as your central platform for:

* **[Prompt Engineering & Version Control](/features/prompt-registry/new-overview)**: Iterate rapidly on conversation prompts without code deployments
* **[Multi-Step Workflow Design](/why-promptlayer/workflows)**: Build complex voice agent logic with visual drag-and-drop interfaces
* **Comprehensive [Observability](/why-promptlayer/analytics)**: Track every interaction with full context of what was said and how the agent responded
* **[Rigorous Evaluation](/features/evaluations/overview)**: Test conversation flows, measure quality, and catch issues before they reach customers
* **Cost Optimization**: Monitor token usage and latency across all voice interactions

Whether you're using [ElevenLabs](https://elevenlabs.io) for text-to-speech, [VAPI](https://vapi.ai) for telephony integration, [Hume AI](https://hume.ai) for emotion analysis, or OpenAI's Realtime API, PromptLayer helps you manage the conversational intelligence at the heart of your voice agent.

## Prompt Management for Voice Conversations

The quality of your voice agent starts with well-crafted prompts. PromptLayer's [Prompt Registry](/features/prompt-registry/new-overview) acts as a content management system for all conversation logic, enabling your team to iterate without engineering involvement.

### Versioned Conversation Templates

Design your voice agent's system prompts, conversation flow, and response templates visually in the dashboard. Each change creates a new version with full history, making it easy to:

* Track who changed what and when
* Compare prompt versions side-by-side with diff views
* Roll back to previous versions if needed
* Test new conversation approaches without affecting production

<CodeGroup>
  ```python Python theme={null}
  from promptlayer import PromptLayer

  pl = PromptLayer(api_key="your_api_key")

  # Run with conversation context using the production version
  result = pl.run(
      prompt_name="customer-service-assistant",
      prompt_release_label="production",
      input_variables={
          "customer_query": transcribed_text,
          "business_hours": "Monday-Friday 8 AM - 6 PM",
          "current_time": "7:30 PM"
      },
      tags=["voice-agent", "after-hours"]
  )
  ```

  ```javascript JavaScript theme={null}
  import { PromptLayer } from "promptlayer";

  const pl = new PromptLayer({ apiKey: "your_api_key" });

  // Run with conversation context using the production version
  const result = await pl.run({
    promptName: "customer-service-assistant",
    promptReleaseLabel: "production",
    inputVariables: {
      customer_query: transcribedText,
      business_hours: "Monday-Friday 8 AM - 6 PM",
      current_time: "7:30 PM"
    },
    tags: ["voice-agent", "after-hours"]
  });
  ```
</CodeGroup>

### A/B Testing Conversation Strategies

Use [Dynamic Release Labels](/features/prompt-registry/dynamic-release-labels) to test different conversation approaches in production. For example, test two different greeting styles:

* **Version A**: Warm and conversational ("Hi there! Thanks for calling...")
* **Version B**: Professional and concise ("Thank you for calling. How can I help?")

Route 50% of calls to each version and use PromptLayer's analytics to determine which yields better customer satisfaction scores or appointment booking rates.

## Building Multi-Step Voice Workflows

Voice agents often require complex logic: transcribe speech → understand intent → fetch information → generate response → synthesize speech. PromptLayer's [Workflows](/why-promptlayer/workflows) feature lets you design these workflows visually.

### Agent Workflow Example

Here's how you might structure a voice agent workflow in PromptLayer:

1. **Input Node**: Receives transcribed customer query from your STT service
2. **Prompt Template Node**: Processes the query with your conversation prompt
3. **Conditional Logic**: Branches based on customer intent
   * If asking about hours → Provide recorded answer
   * If upset (detected via sentiment) → Route to empathetic response path
   * If requesting appointment → Proceed to booking flow
4. **Callback Endpoint Node**: Calls external APIs (e.g., ElevenLabs for TTS, your scheduling system)
5. **Output Node**: Returns final response to speak to the customer

### Integrating Voice APIs with PromptLayer Agents

PromptLayer Agents let you orchestrate your entire voice workflow visually. Within your agent, use **Callback Endpoint Nodes** to integrate external voice services like ElevenLabs for text-to-speech, OpenAI's Realtime API for voice-enabled responses, or your own telephony platform.

These callback nodes can:

* Convert your agent's text responses to speech (TTS)
* Call your scheduling system to check appointment availability
* Trigger webhooks to your voice platform (VAPI, Twilio, etc.)
* Return results that feed into subsequent nodes in your workflow

All of these integrations are logged and traced by PromptLayer, giving you full visibility into your voice agent's execution flow.

## Evaluating Voice Agent Quality

Rigorous evaluation is critical for voice agents where mistakes directly impact customer experience. PromptLayer's [Evaluations](/features/evaluations/overview) framework provides multiple approaches to test and improve conversation quality.

### 1. Conversation Simulator (Text Content)

The [Conversation Simulator](/features/evaluations/eval-types#conversation-simulator) tests the **conversational content and logic** of your voice agent—not the audio quality itself. Define realistic customer personas and let PromptLayer simulate entire text-based conversations:

```python theme={null}
# Define a test persona
difficult_customer_persona = """
You are a frustrated customer calling after hours about a missed appointment.
You are upset and won't provide your phone number until the assistant apologizes.
You speak in short, terse sentences.
"""

# The simulator will automatically:
# 1. Generate user messages based on the persona
# 2. Get text responses from your voice agent prompt
# 3. Continue the conversation for 8-10 turns
# 4. Return full transcript as JSON
```

This helps you test the **conversation quality** (what your agent says):

* Context retention across multiple turns
* Goal achievement (did agent collect name, phone, and appointment time?)
* Handling difficult personalities
* Recovery from misunderstandings

<Note>
  The Conversation Simulator evaluates text content only. For voice-specific quality (pronunciation, tone, audio clarity), you'll need to test with actual voice output using your TTS provider's tools.
</Note>

### 2. Dataset-Driven Testing

Create evaluation datasets from typical customer queries:

| Input Query                         | Expected Behavior                     | Expected Information       |
| ----------------------------------- | ------------------------------------- | -------------------------- |
| "What are your hours tomorrow?"     | Provide hours, offer to take message  | Must mention opening time  |
| "Do you service electric vehicles?" | Provide info or offer callback        | Must not make false claims |
| "I need an emergency tow"           | Urgent tone, provide emergency number | Must prioritize urgency    |

Run your agent against each test case and use PromptLayer's evaluation types including LLM-based assertions to judge subjective quality criteria (e.g., "Does this response address the customer's question directly?" or "Is the tone appropriate?"). PromptLayer will score your agent's responses automatically, giving you pass/fail metrics across hundreds of test cases.

### 3. Human Feedback Integration

For production calls, capture customer satisfaction scores using the [Scoring API](/features/prompt-history/scoring-requests):

<CodeGroup>
  ```python Python theme={null}
  # After call completes, log customer rating
  pl.track.score(
      request_id=voice_call_request_id,
      score=customer_rating  # 0-100 scale (or convert 1-5 stars)
  )
  ```

  ```javascript JavaScript theme={null}
  // Track customer satisfaction
  await pl.track.score({
    requestId: voiceCallRequestId,
    score: customerRating  // 0-100
  });
  ```
</CodeGroup>

Aggregate these scores by prompt version to identify which conversation approaches yield higher satisfaction.

### 4. Voice-Specific Quality Checks

#### Speech Content Parity

Verify your TTS output matches intended text:

1. Generate audio with ElevenLabs/OpenAI TTS
2. Transcribe it back with Whisper
3. Compare transcript to original text
4. Flag mismatches indicating pronunciation issues

#### Latency Benchmarks

PromptLayer evaluations automatically track and display latency for each request, helping you monitor response times throughout your voice agent workflow. You can use PromptLayer's analytics to ensure your agent stays under acceptable thresholds (typically under 2000ms for voice interactions) and identify any bottlenecks in your LLM processing.

## Observability for Voice Interactions

PromptLayer's [Observability](/features/observability) suite gives you full visibility into every voice interaction, even though the audio itself flows through external services.

### What You Can Track

* **Full Conversation Context**: See the transcribed text of what customers said and how your agent responded
* **Prompt Versions Used**: Know exactly which prompt template was active for each call
* **Token Usage & Costs**: Track spending per conversation, per shop location, or per time period
* **Latency Breakdown**: Identify slow points in your workflow (STT, LLM, TTS)
* **Metadata Filtering**: Tag calls with `customer_id`, `shop_location`, `call_type` for granular analysis

### Traces for Multi-Step Workflows

When using PromptLayer Agents for voice workflows, [traces](/running-requests/traces) show each step:

```
Voice Call Trace #1234
├─ Input: "I need an oil change for tomorrow"
├─ Node 1: Intent Classification → "appointment_request"
├─ Node 2: Slot Filling Prompt → extracted {service: "oil change", timeframe: "tomorrow"}
├─ Node 3: Availability Check (Callback) → slots available at 9 AM, 2 PM
├─ Node 4: Confirmation Prompt → "We have openings at 9 AM or 2 PM..."
└─ Output: Confirmation message + collected phone number
```

This makes debugging failed conversations straightforward—you can see exactly where logic went wrong.

## Best Practices for Voice Agent Evaluation

Test with diverse conversation scenarios (cooperative customers, difficult cases, edge cases) and track metrics aligned with your business goals:

* **Conversation quality**: Information capture rate, task completion, customer satisfaction
* **Continuous improvement**: Build regression test suites from failed conversations, backtest new prompts against production data

## Getting Started

To begin building voice agents with PromptLayer:

1. **Create voice agent prompts** in the [Prompt Registry](/features/prompt-registry/new-overview)
2. **Design multi-step workflows** with [Workflows](/why-promptlayer/workflows) if needed
3. **Build evaluation datasets** covering your expected call types
4. **Set up evaluation pipelines** with relevant quality checks
5. **Integrate with your voice platform** (VAPI, ElevenLabs, etc.) via API
6. **Monitor production calls** using observability and analytics
7. **Iterate based on data** using A/B tests and regression testing

PromptLayer provides the prompt management, workflow orchestration, observability, and evaluation infrastructure you need to build production-ready voice agents that continuously improve over time.


# Workflows
Source: https://docs.promptlayer.com/why-promptlayer/workflows


<iframe title="YouTube video player" />

PromptLayer Workflows let you quickly build, launch, and manage AI workflows that use multiple LLMs and business rules. You can create and test these AI systems easily using a visual drag-and-drop tool, and then deploy them without needing to worry about complex infrastructure management.

<img alt="Workflow DAG" />

## Use Cases

### 1. **Combining Multiple LLM Calls into a Single Output**

Improve AI-generated responses by using results from multiple LLM calls, either by merging outputs or choosing the best one. This can lead to:

* More thorough and precise outputs
* Enhanced decision-making by considering multiple perspectives
* Higher reliability through comparing multiple AI answers

### 2. **Building Complex Workflows**

Create advanced AI systems that can handle multi-step tasks and solve complex problems. These systems can:

* Integrate multiple LLM calls
* Incorporate external data sources
* Automate complex decision-making processes

## Key Concepts

### 1. Input Variables

Input Variables are the data you feed into a Workflow. They can be text, numbers, or other information the Workflow uses in its various steps to produce the final result.

<img alt="Input Variables" />

### 2. Nodes

Nodes are the building blocks of the Workflow. Each node represents a specific action or decision. Types include:

* **Prompt Template**: Make an LLM call using a prompt template from the registry or an [inline template](#inline-templates) defined directly in the node configuration.
* **Callback Endpoint**: Make external API calls (ex: RAG steps) or trigger callback requests after workflow processes finish.
* **Coding Agent**: Execute AI coding agents (such as Claude Code) in a sandboxed environment for data transformations, file processing, and complex analysis. [Learn more about Coding Agent](/features/evaluations/eval-types#coding-agent)
* **For Loop**: Iterate over collections, running a prompt or sub-workflow on each item.
* **While Loop**: Execute repeatedly until a condition is met.
* **Math Operator**: Perform numerical comparisons or calculations between different data sources.
* **Parse Value**: Extract and process specific data types like strings, numbers, or JSON from inputs.

<Info>
  **Want to learn about all available node types?** Workflow nodes use the same
  building blocks as evaluation types. [View all eval
  types](/features/evaluations/eval-types) to see the full catalog of nodes you
  can use in your workflows, including LLM assertions, data extraction,
  conversation simulators, and more.
</Info>

<img alt="Nodes" />

### 3. Conditional Edges

Conditional Edges allow you to create branching logic within your Workflows. By clicking on an edge between nodes, you can define conditions that determine the path your workflow will take. Conditions can be combined using logical operators such as **AND** or **OR**, and support comparisons including:

* Equal (`==`)
* Not Equal (`!=`)
* Less Than (`<`)
* Greater Than (`>`)
* Less Than or Equal To (`<=`)
* Greater Than or Equal To (`>=`)

You can compare values against numbers or booleans, and multiple conditions can be combined to create complex branching logic. This enables your Workflow to dynamically route execution paths based on intermediate results or external data, allowing for more sophisticated and context-aware workflows.

<img alt="Conditional Edges" />

### 4. Output Nodes

Output Nodes determine what your Workflow returns as its final result. When using Conditional Edges to create different paths in your workflow, you can place multiple Output Nodes at the end of different branches. Similar to a "return statement" in programming, whichever Output Node executes successfully first will provide the final output. This allows your Workflow to deliver different results based on the specific conditions that were met during the workflow.

<img alt="Output Nodes" />

### 5. Inline Templates

Prompt Template nodes can reference a template from the Prompt Registry or define a template inline. Inline templates are useful for quick iteration and experimentation without committing a prompt to the registry.

When creating or updating a workflow programmatically, use `inline_template` instead of `template` in a Prompt Template node's configuration:

```json theme={null}
{
  "name": "Generate Summary",
  "node_type": "PROMPT_TEMPLATE",
  "configuration": {
    "inline_template": {
      "inline": true,
      "prompt_template": {
        "type": "chat",
        "messages": [
          {
            "role": "system",
            "content": [{"type": "text", "text": "Summarize the following text concisely."}]
          },
          {
            "role": "user",
            "content": [{"type": "text", "text": "{input_text}"}]
          }
        ]
      },
      "metadata": {
        "model": {
          "provider": "openai",
          "name": "gpt-4",
          "parameters": {"temperature": 0.3}
        }
      }
    },
    "prompt_template_variable_mappings": {
      "input_text": "document"
    }
  },
  "dependencies": ["document"],
  "is_output_node": false
}
```

<Info>
  You must provide exactly one of `template` (registry reference) or `inline_template` (inline content) in a Prompt Template node's configuration. They are mutually exclusive. You can convert an inline template to a registry template at any time from the UI using "Save to Registry".
</Info>

## Versioning

Workflow versioning automatically tracks changes over time. Each update creates a new version, allowing you to safely experiment with new ideas while keeping the current production version stable. You can view the full history of your Workflow's changes, which helps with team collaboration and iterative development.

<img alt="Versioning" />

### Programmatic Updates

When updating a workflow through the REST API, use [Update Workflow](/reference/patch-workflow) to create a new version from the latest workflow version or from a specific `base_version`.

Patch updates merge nodes by name:

* Unmentioned nodes are preserved.
* Object values add or update a node.
* `null` removes a node.
* Node `configuration` is deep-merged.
* `dependencies`, `required_input_variables`, and `edges` are replaced when provided.
* `release_labels` move to the newly created version.

## Running a Workflow

You can run a Workflow in three ways: using the [Python SDK](/sdks/python#running-workflows) or [JavaScript SDK](/sdks/javascript#running-workflows), via the [REST API with polling](/reference/workflow-version-execution-results), or with the [REST API using callback webhooks](/reference/run-workflow) for long-running workflows.

After running a Workflow, the full trace, including spans from all nodes, will be visible in the left traces menu. This allows you to visualize the execution path and see intermediate outputs at each step, helping you debug and optimize your Workflow.

<img alt="Traces" />


# Workspaces
Source: https://docs.promptlayer.com/why-promptlayer/workspaces

Create shared environments for teams, projects, and deployment stages in PromptLayer.

## Workspaces

Workspaces are collaborative environments within your organization where teams can:

* Share prompt templates and evaluations
* Collaborate on projects
* Organize resources by team or project
* Maintain separate environments (dev, staging, production)

### Creating a Workspace

You can create a new workspace by clicking the 'Create Workspace' button from the workspace dropdown.

<img alt="Create Workspaces Button" />

Choose a unique name for your workspace to enhance its identity and ease of recognition among team members.

<img alt="Create Workspaces Modal" />

### Workspace Management

Upon creating a workspace, you'll be designated as the workspace administrator. This role empowers you to:

* Invite new members to the workspace
* Manage member permissions
* Remove members from the workspace
* Configure workspace-specific settings
* Delete the workspace if needed

<img alt="Workspaces Management" />

### Workspace Roles and Permissions

Each workspace supports granular permission control:

* **Workspace Admin**: Full control over workspace settings and members
* **Editor**: Can create, edit, and delete resources within the workspace
* **Viewer**: Read-only access to workspace resources

## Best Practices

Organizations typically structure workspaces by team (Engineering, Data Science, Product), by project, or by environment (dev, staging, production). When managing access, follow the principle of least privilege and regularly review workspace memberships to ensure proper access control.