Python - PromptLayer

Python SDK

Official Python SDK for interacting with the PromptLayer API.

Installation

pip install promptlayer

Using the `run` Method (Recommended)

The easiest way to use PromptLayer is with the run() method. It fetches a prompt template from the Prompt Registry, executes it against your configured LLM provider, and logs the result — all in one call.

from promptlayer import PromptLayer
promptlayer_client = PromptLayer()

response = promptlayer_client.run(
    prompt_name="my-prompt",
    input_variables={"topic": "poetry"},
    tags=["getting-started"],
    metadata={"user_id": "123"}
)

print(response["prompt_blueprint"]["prompt_template"]["messages"][-1]["content"])

Your LLM API keys (OpenAI, Anthropic, etc.) are never sent to our servers. All LLM requests are made locally from your machine, PromptLayer just logs the request.

The run() method works with any provider configured in your prompt template — OpenAI, Anthropic, Google, and more. See the Run documentation for full details. After making your first few requests, you should be able to see them in the PromptLayer dashboard!

Basic Usage

For any LLM provider you plan to use, you must set its corresponding API key as an environment variable (for example, OPENAI_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY etc.). The PromptLayer client does not support passing these keys directly in code. If the relevant environment variables are not set, any requests to those LLM providers will fail.

Provider-Specific Configuration

Using Gemini models through Vertex AI

Python SDK: Set these environment variables:

GOOGLE_GENAI_USE_VERTEXAI=true
GOOGLE_CLOUD_PROJECT="<google_cloud_project_id>"
GOOGLE_CLOUD_LOCATION="region"
GOOGLE_APPLICATION_CREDENTIALS="path/to/google_service_account_file.json"

Using Claude models through Vertex AI

Python SDK: Set these environment variables:

ANTHROPIC_VERTEX_PROJECT_ID="<google_cloud_project_id>"
CLOUD_ML_REGION="region"
GOOGLE_APPLICATION_CREDENTIALS="path/to/google_service_account_file.json"

Python

from promptlayer import PromptLayer

pl = PromptLayer(api_key="your_api_key")

response = pl.run(
    prompt_name="your-prompt-name",
    input_variables={"variable_name": "value"}
)

print(response["prompt_blueprint"]["prompt_template"]["messages"][-1]["content"][-1]["text"])

Parameters

prompt_name / promptName (str, required): The name of the prompt to run.
prompt_version / promptVersion (int, optional): Specific version of the prompt to use.
prompt_release_label / promptReleaseLabel (str, optional): Release label of the prompt (e.g., “prod”, “staging”).
input_variables / inputVariables (Dict[str, Any], optional): Variables to be inserted into the prompt template.
tags (List[str], optional): Tags to associate with this run.
metadata (Dict[str, str], optional): Additional metadata for the run.
model_parameter_overrides / modelParameterOverrides (Union[Dict[str, Any], None], optional): Model-specific parameter overrides.
stream (bool, default=False): Whether to stream the response.
provider (str, optional): The LLM provider to use (e.g., “openai”, “anthropic”, “google”). This is useful if you want to override the provider specified in the prompt template.
model (str, optional): The model to use (e.g., “gpt-4o”, “claude-3-7-sonnet-latest”, “gemini-2.5-flash”). This is useful if you want to override the model specified in the prompt template.

Return Value

The method returns a dictionary (Python) or object (JavaScript) with the following keys:

request_id: Unique identifier for the request.
raw_response: The raw response from the LLM provider.
prompt_blueprint: The prompt blueprint used for the request.

Advanced Usage

Streaming

To stream the response:

Python

for chunk in pl.run(prompt_name="your-prompt", stream=True):
    # Access raw streaming response
    print(chunk["raw_response"])

    # Access progressively built prompt blueprint
    if chunk["prompt_blueprint"]:
        current_response = chunk["prompt_blueprint"]["prompt_template"]["messages"][-1]
        if current_response.get("content"):
            print(f"Current response: {current_response['content']}")

When streaming is enabled, each chunk includes both the raw streaming response and the progressively built prompt_blueprint, allowing you to track how the response is constructed in real-time. The request_id is only included in the final chunk.

Using Different Versions or Release Labels

Python

response = pl.run(
    prompt_name="your-prompt",
    prompt_version=2,  # or
    prompt_release_label="staging"
)

Adding Tags and Metadata

Python

response = pl.run(
    prompt_name="your-prompt",
    tags=["test", "experiment"],
    metadata={"user_id": "12345"}
)

Overriding Model Parameters

You can also override provider and model at runtime to choose a different LLM provider or model. This is useful if you want to use a different provider than the one specified in the prompt template. PromptLayer will automatically return the correct llm_kwargs for the specified provider and model with default values for the parameters corresponding to the provider and model.

Provider-Specific Schema NoticeThe llm_kwargs and raw_response objects have provider-specific structures that may change as LLM providers update their APIs. PromptLayer passes through the native format required by each provider.For stable, provider-agnostic prompt data, use prompt_blueprint.prompt_template instead of relying on the structure of provider-specific objects.

Python SDK

response = pl.run(
    prompt_name="your-prompt",
    provider="openai",  # or "anthropic", "google", etc.
    model="gpt-4",  # or "claude-2", "gemini-1.5-pro", etc.
)

Make sure to set both model and provider in order to run the request against correct LLM provider with correct parameters.

Running Workflows

Use run_workflow() to execute a PromptLayer Workflow from the Python SDK. Workflows are multi-step pipelines that can combine prompt, tool, code, and conditional nodes.

Python

from promptlayer import PromptLayer

pl = PromptLayer(api_key="your_api_key")

response = pl.run_workflow(
    workflow_id_or_name="Data Analysis Workflow",
    input_variables={"dataset_url": "https://example.com/data.csv"},
)

print(response)

Workflow Parameters

workflow_id_or_name (str or int, required): The Workflow name or ID to run.
input_variables (Dict[str, Any], optional): Variables to pass into the Workflow.
metadata (Dict[str, str], optional): Metadata to attach to the Workflow run.
workflow_label_name (str, optional): Label name for the Workflow version, such as "production".
workflow_version (int, optional): Specific Workflow version number to run.
return_all_outputs (bool, default=False): Whether to return outputs for every Workflow node.
timeout (int or float, optional): Maximum time, in seconds, to wait for the Workflow to complete.

workflow_name is still supported for backward compatibility, but workflow_id_or_name is the preferred parameter.

Workflow Return Value

By default, run_workflow() returns the final output node’s value. When return_all_outputs=True, it returns a dictionary keyed by node name, including each node’s status, value, errors, and whether the node is an output node.

Python

response = pl.run_workflow(
    workflow_id_or_name="Data Analysis Workflow",
    input_variables={"dataset_url": "https://example.com/data.csv"},
    metadata={"user_id": "12345"},
    workflow_label_name="production",
    return_all_outputs=True,
    timeout=300,
)

Example response with return_all_outputs=True:

{
  "Load Dataset": {
    "status": "SUCCESS",
    "value": "Loaded 100 rows",
    "error_message": null,
    "raw_error_message": null,
    "is_output_node": false
  },
  "Summarize Dataset": {
    "status": "SUCCESS",
    "value": "The dataset contains customer feedback grouped by region.",
    "error_message": null,
    "raw_error_message": null,
    "is_output_node": true
  }
}

To run Workflows asynchronously, use AsyncPromptLayer. See Async Workflow Execution for an async example.

SDK Cache

The PromptLayer Python SDK supports an in-memory template cache to reduce fetch latency and improve resilience when the PromptLayer API has transient failures. Enable cache when you want to:

Reduce repeated template fetch latency
Lower dependency on real-time PromptLayer API availability
Continue serving recently known-good templates during temporary API issues

Pass cache_ttl_seconds when creating a client:

from promptlayer import PromptLayer

promptlayer_client = PromptLayer(
    api_key="pl_****",
    cache_ttl_seconds=300,  # each prompt template is cached for 5 minutes
)

Async client works the same way:

from promptlayer import AsyncPromptLayer

async_promptlayer_client = AsyncPromptLayer(
    api_key="pl_****",
    cache_ttl_seconds=300,
)

How It Works

When cache is enabled, templates.get() and run() use this flow:

Return a fresh cached template if available.
If cache is stale or missing, fetch from API and refresh cache.
If API fetch fails with a transient error and a stale template exists, serve the stale template.

Stale fallback only applies to transient API errors (for example, timeout, connection, or internal server errors).

Important Behavior

Cache is in-memory and process-local (not shared across machines/containers).
Requests with metadata_filters or model_parameter_overrides bypass cache.
Publishing via templates.publish() invalidates cache for that prompt name.

Practical Guidance

Start with cache_ttl_seconds between 60 and 300.
Use a shorter TTL if your prompts change frequently.
Use a longer TTL if your prompts are stable and lower latency matters most.
Keep throw_on_error=True if you want hard failures when no cache entry is available.

Custom Logging with `log_request`

If you need more control — for example, using your own LLM client, a custom provider, or background processing — you can use log_request to manually log requests to PromptLayer.

from openai import OpenAI
from promptlayer import PromptLayer
import time

pl_client = PromptLayer()
client = OpenAI()

messages = [
    {"role": "system", "content": "You are an AI."},
    {"role": "user", "content": "Compose a poem please."}
]

request_start_time = time.time()
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=messages
)
request_end_time = time.time()

# Log to PromptLayer
pl_client.log_request(
    provider="openai",
    model="gpt-4o",
    input={"type": "chat", "messages": [
        {"role": m["role"], "content": [{"type": "text", "text": m["content"]}]}
        for m in messages
    ]},
    output={"type": "chat", "messages": [
        {"role": "assistant", "content": [{"type": "text", "text": completion.choices[0].message.content}]}
    ]},
    request_start_time=request_start_time,
    request_end_time=request_end_time,
    tags=["getting-started"]
)

This works with any LLM provider, including Anthropic:

import anthropic
from promptlayer import PromptLayer
import time

pl_client = PromptLayer()
client = anthropic.Anthropic()

messages = [{"role": "user", "content": "How many toes do dogs have?"}]

request_start_time = time.time()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=100,
    messages=messages
)
request_end_time = time.time()

# Log to PromptLayer
pl_client.log_request(
    provider="anthropic",
    model="claude-sonnet-4-20250514",
    input={"type": "chat", "messages": [
        {"role": m["role"], "content": [{"type": "text", "text": m["content"]}]}
        for m in messages
    ]},
    output={"type": "chat", "messages": [
        {"role": "assistant", "content": [{"type": "text", "text": response.content[0].text}]}
    ]},
    request_start_time=request_start_time,
    request_end_time=request_end_time,
    tags=["animal-toes"]
)

See the Custom Logging documentation and Log Request API Reference for full details.

Error Handling

PromptLayer provides robust error handling with specialized exception classes and configurable error behavior.

Exception Classes

The library includes specific exception types following industry best practices:

from promptlayer import (
    PromptLayerAPIError,              # General API errors
    PromptLayerBadRequestError,       # 400 errors
    PromptLayerAuthenticationError,   # 401 errors
    PromptLayerNotFoundError,         # 404 errors
    PromptLayerValidationError,       # Input validation errors
    PromptLayerAPIConnectionError,    # Connection failures
    PromptLayerAPITimeoutError,       # Timeout errors
    PromptLayerRateLimitError,        # 429 rate limit errors
)

Using `throw_on_error`

By default, PromptLayer throws exceptions when errors occur. You can control this behavior using the throw_on_error parameter:

from promptlayer import PromptLayer

# Default behavior: throws exceptions on errors
promptlayer_client = PromptLayer(api_key="pl_****", throw_on_error=True)

# Alternative: logs warnings instead of throwing exceptions
promptlayer_client = PromptLayer(api_key="pl_****", throw_on_error=False)

Example with exception handling:

from promptlayer import PromptLayer, PromptLayerNotFoundError, PromptLayerValidationError

promptlayer_client = PromptLayer()

try:
    # Attempt to get a template that might not exist
    template = promptlayer_client.templates.get("NonExistentTemplate")
except PromptLayerNotFoundError as e:
    print(f"Template not found: {e}")
except PromptLayerValidationError as e:
    print(f"Invalid input: {e}")

Example with warnings (throw_on_error=False):

from promptlayer import PromptLayer

# Initialize with throw_on_error=False to get warnings instead of exceptions
promptlayer_client = PromptLayer(throw_on_error=False)

# This will log a warning instead of throwing an exception if the template doesn't exist
template = promptlayer_client.templates.get("NonExistentTemplate")
# Returns None if not found, with a warning logged

Automatic Retry Mechanism

PromptLayer includes a built-in retry mechanism to handle transient failures gracefully. This ensures your application remains resilient when temporary issues occur. Retry Behavior:

Total Attempts: 4 attempts (1 initial + 3 retries)
Exponential Backoff: Retries wait progressively longer between attempts (2s, 4s, 8s)
Max Wait Time: 15 seconds maximum wait between retries

What Triggers Retries:

5xx Server Errors: Internal server errors, service unavailable, etc.
429 Rate Limit Errors: When rate limits are exceeded

What Fails Immediately (No Retries):

Connection Errors: Network connectivity issues
Timeout Errors: Request timeouts
4xx Client Errors (except 429): Bad requests, authentication errors, not found, etc.

The retry mechanism operates transparently in the background. You don’t need to implement retry logic yourself - PromptLayer handles it automatically for recoverable errors.

Logging

PromptLayer uses Python’s built-in logging module for all log output:

import logging
from promptlayer import PromptLayer

# Configure logging to see PromptLayer logs
logging.basicConfig(level=logging.INFO)

promptlayer_client = PromptLayer()

# Now you'll see log output from PromptLayer operations

Setting log levels:

import logging

# Get the PromptLayer logger
logger = logging.getLogger("promptlayer")

# Set to WARNING to only see warnings and errors
logger.setLevel(logging.WARNING)

# Set to DEBUG to see detailed information
logger.setLevel(logging.DEBUG)

Viewing Retry Logs: When retries occur, PromptLayer logs warnings before each retry attempt:

import logging
from promptlayer import PromptLayer

# Set up logging to see retry attempts
logging.basicConfig(
    level=logging.WARNING,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

promptlayer_client = PromptLayer()

# If a retry occurs, you'll see log messages like:
# "Retrying in 2 seconds..."
# "Retrying in 4 seconds..."

Async Support

PromptLayer supports asynchronous operations, ideal for managing concurrent tasks in non-blocking environments like web servers, microservices, or Jupyter notebooks.

Initializing the Async Client

To use asynchronous non-blocking methods, initialize AsyncPromptLayer as shown:

from promptlayer import AsyncPromptLayer

# Initialize an asynchronous client with your API key
async_promptlayer_client = AsyncPromptLayer(api_key="pl_****")

Async Usage Examples

The asynchronous client functions similarly to the synchronous version, but allows for non-blocking execution with asyncio. Below are example uses.

Example 1: Async Template Management

Use asynchronous methods to manage templates:

import asyncio
from promptlayer import AsyncPromptLayer

async def main():
    async_promptlayer_client = AsyncPromptLayer(api_key="pl_****")

    # Fetch a template asynchronously
    template = await async_promptlayer_client.templates.get("Test1")
    print(template)

    # Fetch all templates asynchronously
    templates = await async_promptlayer_client.templates.all()
    print(templates)

# Run the async function
asyncio.run(main())

Example 2: Async Workflow Execution

Run Workflows asynchronously for better efficiency:

import asyncio
from promptlayer import AsyncPromptLayer

async def main():
    async_promptlayer_client = AsyncPromptLayer(api_key="pl_****")

    response = await async_promptlayer_client.run_workflow(
        workflow_name="example_workflow",
        workflow_version=1,
        input_variables={"num1": "1", "num2": "2"},
        return_all_outputs=True,
    )
    print(response)

# Run the async function
asyncio.run(main())

Example 3: Async Tracking and Logging

Track and log requests asynchronously:

import asyncio
from promptlayer import AsyncPromptLayer

async def main():
    async_promptlayer_client = AsyncPromptLayer(api_key="pl_****")

    # Track metadata asynchronously
    request_id = "pl_request_id_example"
    await async_promptlayer_client.track.metadata(request_id, {"key": "value"})

    # Log request asynchronously (for detailed logging, refer to the custom logging page)
    await async_promptlayer_client.log_request(
        provider="openai",
        model="gpt-3.5-turbo",
        input=prompt_template,
        output=output_template,
        request_start_time=1630945600,
        request_end_time=1630945605,
    )

# Run the async function
asyncio.run(main())

For more information on custom logging, please visit our Custom Logging Documentation.

Example 4: Asynchronous Prompt Execution with run Method

You can execute prompt templates asynchronously using the run method. This allows you to run a prompt template by name with given input variables.

import asyncio
from promptlayer import AsyncPromptLayer

async def main():
    async_promptlayer_client = AsyncPromptLayer(api_key="pl_****")

    # Execute a prompt template asynchronously
    response = await async_promptlayer_client.run(
        prompt_name="TestPrompt",
        input_variables={"variable1": "value1", "variable2": "value2"}
    )
    print(response)

# Run the async function
asyncio.run(main())

Example 5: Asynchronous Streaming Prompt Execution with run Method

You can run streaming prompt template using the run method as well.

import asyncio
import os
from promptlayer import AsyncPromptLayer


async def main():
    async_promptlayer_client = AsyncPromptLayer(
        api_key=os.environ.get("PROMPTLAYER_API_KEY")
    )

    response_generator = await async_promptlayer_client.run(
        prompt_name="TestPrompt",
        input_variables={"variable1": "value1", "variable2": "value2"}, stream=True
    )

    final_response = ""
    async for response in response_generator:
        # Access raw streaming response
        print("Raw streaming response:", response["raw_response"])
        
        # Access progressively built prompt blueprint
        if response["prompt_blueprint"]:
            current_response = response["prompt_blueprint"]["prompt_template"]["messages"][-1]
            if current_response.get("content"):
                print(f"Current response: {current_response['content']}")

# Run the async function
asyncio.run(main())

In this example, replace “TestPrompt” with the name of your prompt template, and provide any required input variables. When streaming is enabled, each chunk includes both the raw streaming response and the progressively built prompt_blueprint, allowing you to track how the response is constructed in real-time.

Want to say hi 👋, submit a feature request, or report a bug? ✉️ Contact us

Python SDK

​Installation

​Using the run Method (Recommended)

​Basic Usage

​Using Gemini models through Vertex AI

​Using Claude models through Vertex AI

​Parameters

​Return Value

​Advanced Usage

​Streaming

​Using Different Versions or Release Labels

​Adding Tags and Metadata

​Overriding Model Parameters

​Running Workflows

​Workflow Parameters

​Workflow Return Value

​SDK Cache

​How It Works

​Important Behavior

​Practical Guidance

​Custom Logging with log_request

​Error Handling

​Exception Classes

​Using throw_on_error

​Automatic Retry Mechanism

​Logging

​Async Support

​Initializing the Async Client

​Async Usage Examples

​Example 1: Async Template Management

​Example 2: Async Workflow Execution

​Example 3: Async Tracking and Logging

​Example 4: Asynchronous Prompt Execution with run Method

​Example 5: Asynchronous Streaming Prompt Execution with run Method

Installation

Using the `run` Method (Recommended)

Basic Usage

Using Gemini models through Vertex AI

Using Claude models through Vertex AI

Parameters

Return Value

Advanced Usage

Streaming

Using Different Versions or Release Labels

Adding Tags and Metadata

Overriding Model Parameters

Running Workflows

Workflow Parameters

Workflow Return Value

SDK Cache

How It Works

Important Behavior

Practical Guidance

Custom Logging with `log_request`

Error Handling

Exception Classes

Using `throw_on_error`

Automatic Retry Mechanism

Logging

Async Support

Initializing the Async Client

Async Usage Examples

Example 1: Async Template Management

Example 2: Async Workflow Execution

Example 3: Async Tracking and Logging

Example 4: Asynchronous Prompt Execution with run Method

Example 5: Asynchronous Streaming Prompt Execution with run Method