The easiest way to use PromptLayer is with the run() method. It fetches a prompt template from the Prompt Registry, executes it against your configured LLM provider, and logs the result — all in one call.
Your LLM API keys (OpenAI, Anthropic, etc.) are never sent to our servers. All LLM requests are made locally from your machine, PromptLayer just logs the request.
The run() method works with any provider configured in your prompt template — OpenAI, Anthropic, Google, and more. See the Run documentation for full details.After making your first few requests, you should be able to see them in the PromptLayer dashboard!
For any LLM provider you plan to use, you must set its corresponding API key as an environment variable (for example, OPENAI_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY etc.). The PromptLayer client does not support passing these keys directly in code. If the relevant environment variables are not set, any requests to those LLM providers will fail.
stream (bool, default=False): Whether to stream the response.
provider (str, optional): The LLM provider to use (e.g., “openai”, “anthropic”, “google”). This is useful if you want to override the provider specified in the prompt template.
model (str, optional): The model to use (e.g., “gpt-4o”, “claude-3-7-sonnet-latest”, “gemini-2.5-flash”). This is useful if you want to override the model specified in the prompt template.
for chunk in pl.run(prompt_name="your-prompt", stream=True): # Access raw streaming response print(chunk["raw_response"]) # Access progressively built prompt blueprint if chunk["prompt_blueprint"]: current_response = chunk["prompt_blueprint"]["prompt_template"]["messages"][-1] if current_response.get("content"): print(f"Current response: {current_response['content']}")
When streaming is enabled, each chunk includes both the raw streaming response and the progressively built prompt_blueprint, allowing you to track how the response is constructed in real-time. The request_id is only included in the final chunk.
You can also override provider and model at runtime to choose a different LLM provider or model. This is useful if you want to use a different provider than the one specified in the prompt template. PromptLayer will automatically return the correct llm_kwargs for the specified provider and model with default values for the parameters corresponding to the provider and model.
Provider-Specific Schema NoticeThe llm_kwargs and raw_response objects have provider-specific structures that may change as LLM providers update their APIs. PromptLayer passes through the native format required by each provider.For stable, provider-agnostic prompt data, use prompt_blueprint.prompt_template instead of relying on the structure of provider-specific objects.
Python SDK
response = pl.run( prompt_name="your-prompt", provider="openai", # or "anthropic", "google", etc. model="gpt-4", # or "claude-2", "gemini-1.5-pro", etc.)
Make sure to set both model and provider in order to run the request against correct LLM provider with correct parameters.
Use run_workflow() to execute a PromptLayer Workflow from the Python SDK. Workflows are multi-step pipelines that can combine prompt, tool, code, and conditional nodes.
By default, run_workflow() returns the final output node’s value. When return_all_outputs=True, it returns a dictionary keyed by node name, including each node’s status, value, errors, and whether the node is an output node.
The PromptLayer Python SDK supports an in-memory template cache to reduce fetch latency and improve resilience when the PromptLayer API has transient failures.Enable cache when you want to:
Reduce repeated template fetch latency
Lower dependency on real-time PromptLayer API availability
Continue serving recently known-good templates during temporary API issues
Pass cache_ttl_seconds when creating a client:
from promptlayer import PromptLayerpromptlayer_client = PromptLayer( api_key="pl_****", cache_ttl_seconds=300, # each prompt template is cached for 5 minutes)
Async client works the same way:
from promptlayer import AsyncPromptLayerasync_promptlayer_client = AsyncPromptLayer( api_key="pl_****", cache_ttl_seconds=300,)
If you need more control — for example, using your own LLM client, a custom provider, or background processing — you can use log_request to manually log requests to PromptLayer.
from openai import OpenAIfrom promptlayer import PromptLayerimport timepl_client = PromptLayer()client = OpenAI()messages = [ {"role": "system", "content": "You are an AI."}, {"role": "user", "content": "Compose a poem please."}]request_start_time = time.time()completion = client.chat.completions.create( model="gpt-4o", messages=messages)request_end_time = time.time()# Log to PromptLayerpl_client.log_request( provider="openai", model="gpt-4o", input={"type": "chat", "messages": [ {"role": m["role"], "content": [{"type": "text", "text": m["content"]}]} for m in messages ]}, output={"type": "chat", "messages": [ {"role": "assistant", "content": [{"type": "text", "text": completion.choices[0].message.content}]} ]}, request_start_time=request_start_time, request_end_time=request_end_time, tags=["getting-started"])
This works with any LLM provider, including Anthropic:
By default, PromptLayer throws exceptions when errors occur. You can control this behavior using the throw_on_error parameter:
from promptlayer import PromptLayer# Default behavior: throws exceptions on errorspromptlayer_client = PromptLayer(api_key="pl_****", throw_on_error=True)# Alternative: logs warnings instead of throwing exceptionspromptlayer_client = PromptLayer(api_key="pl_****", throw_on_error=False)
Example with exception handling:
from promptlayer import PromptLayer, PromptLayerNotFoundError, PromptLayerValidationErrorpromptlayer_client = PromptLayer()try: # Attempt to get a template that might not exist template = promptlayer_client.templates.get("NonExistentTemplate")except PromptLayerNotFoundError as e: print(f"Template not found: {e}")except PromptLayerValidationError as e: print(f"Invalid input: {e}")
Example with warnings (throw_on_error=False):
from promptlayer import PromptLayer# Initialize with throw_on_error=False to get warnings instead of exceptionspromptlayer_client = PromptLayer(throw_on_error=False)# This will log a warning instead of throwing an exception if the template doesn't existtemplate = promptlayer_client.templates.get("NonExistentTemplate")# Returns None if not found, with a warning logged
PromptLayer includes a built-in retry mechanism to handle transient failures gracefully. This ensures your application remains resilient when temporary issues occur.Retry Behavior:
Total Attempts: 4 attempts (1 initial + 3 retries)
Max Wait Time: 15 seconds maximum wait between retries
What Triggers Retries:
5xx Server Errors: Internal server errors, service unavailable, etc.
429 Rate Limit Errors: When rate limits are exceeded
What Fails Immediately (No Retries):
Connection Errors: Network connectivity issues
Timeout Errors: Request timeouts
4xx Client Errors (except 429): Bad requests, authentication errors, not found, etc.
The retry mechanism operates transparently in the background. You don’t need to implement retry logic yourself - PromptLayer handles it automatically for recoverable errors.
PromptLayer uses Python’s built-in logging module for all log output:
import loggingfrom promptlayer import PromptLayer# Configure logging to see PromptLayer logslogging.basicConfig(level=logging.INFO)promptlayer_client = PromptLayer()# Now you'll see log output from PromptLayer operations
Setting log levels:
import logging# Get the PromptLayer loggerlogger = logging.getLogger("promptlayer")# Set to WARNING to only see warnings and errorslogger.setLevel(logging.WARNING)# Set to DEBUG to see detailed informationlogger.setLevel(logging.DEBUG)
Viewing Retry Logs:When retries occur, PromptLayer logs warnings before each retry attempt:
import loggingfrom promptlayer import PromptLayer# Set up logging to see retry attemptslogging.basicConfig( level=logging.WARNING, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')promptlayer_client = PromptLayer()# If a retry occurs, you'll see log messages like:# "Retrying in 2 seconds..."# "Retrying in 4 seconds..."
PromptLayer supports asynchronous operations, ideal for managing concurrent tasks in non-blocking environments like web servers, microservices, or Jupyter notebooks.
To use asynchronous non-blocking methods, initialize AsyncPromptLayer as shown:
from promptlayer import AsyncPromptLayer# Initialize an asynchronous client with your API keyasync_promptlayer_client = AsyncPromptLayer(api_key="pl_****")
Example 5: Asynchronous Streaming Prompt Execution with run Method
You can run streaming prompt template using the run method as well.
import asyncioimport osfrom promptlayer import AsyncPromptLayerasync def main(): async_promptlayer_client = AsyncPromptLayer( api_key=os.environ.get("PROMPTLAYER_API_KEY") ) response_generator = await async_promptlayer_client.run( prompt_name="TestPrompt", input_variables={"variable1": "value1", "variable2": "value2"}, stream=True ) final_response = "" async for response in response_generator: # Access raw streaming response print("Raw streaming response:", response["raw_response"]) # Access progressively built prompt blueprint if response["prompt_blueprint"]: current_response = response["prompt_blueprint"]["prompt_template"]["messages"][-1] if current_response.get("content"): print(f"Current response: {current_response['content']}")# Run the async functionasyncio.run(main())
In this example, replace “TestPrompt” with the name of your prompt template, and provide any required input variables. When streaming is enabled, each chunk includes both the raw streaming response and the progressively built prompt_blueprint, allowing you to track how the response is constructed in real-time.Want to say hi 👋, submit a feature request, or report a bug? ✉️ Contact us