The easiest way to use PromptLayer is with the run() method. It fetches a prompt template from the Prompt Registry, executes it against your configured LLM provider, and logs the result — all in one call.
Your LLM API keys (OpenAI, Anthropic, etc.) are never sent to our servers. All LLM requests are made locally from your machine, PromptLayer just logs the request.
The run() method works with any provider configured in your prompt template — OpenAI, Anthropic, Google, and more. See the Run documentation for full details.After making your first few requests, you should be able to see them in the PromptLayer dashboard!
The PromptLayer Python SDK supports an in-memory template cache to reduce fetch latency and improve resilience when the PromptLayer API has transient failures.Enable cache when you want to:
Reduce repeated template fetch latency
Lower dependency on real-time PromptLayer API availability
Continue serving recently known-good templates during temporary API issues
Pass cache_ttl_seconds when creating a client:
from promptlayer import PromptLayerpromptlayer_client = PromptLayer( api_key="pl_****", cache_ttl_seconds=300, # each prompt template is cached for 5 minutes)
Async client works the same way:
from promptlayer import AsyncPromptLayerasync_promptlayer_client = AsyncPromptLayer( api_key="pl_****", cache_ttl_seconds=300,)
If you need more control — for example, using your own LLM client, a custom provider, or background processing — you can use log_request to manually log requests to PromptLayer.
from openai import OpenAIfrom promptlayer import PromptLayerimport timepl_client = PromptLayer()client = OpenAI()messages = [ {"role": "system", "content": "You are an AI."}, {"role": "user", "content": "Compose a poem please."}]request_start_time = time.time()completion = client.chat.completions.create( model="gpt-4o", messages=messages)request_end_time = time.time()# Log to PromptLayerpl_client.log_request( provider="openai", model="gpt-4o", input={"type": "chat", "messages": [ {"role": m["role"], "content": [{"type": "text", "text": m["content"]}]} for m in messages ]}, output={"type": "chat", "messages": [ {"role": "assistant", "content": [{"type": "text", "text": completion.choices[0].message.content}]} ]}, request_start_time=request_start_time, request_end_time=request_end_time, tags=["getting-started"])
This works with any LLM provider, including Anthropic:
By default, PromptLayer throws exceptions when errors occur. You can control this behavior using the throw_on_error parameter:
from promptlayer import PromptLayer# Default behavior: throws exceptions on errorspromptlayer_client = PromptLayer(api_key="pl_****", throw_on_error=True)# Alternative: logs warnings instead of throwing exceptionspromptlayer_client = PromptLayer(api_key="pl_****", throw_on_error=False)
Example with exception handling:
from promptlayer import PromptLayer, PromptLayerNotFoundError, PromptLayerValidationErrorpromptlayer_client = PromptLayer()try: # Attempt to get a template that might not exist template = promptlayer_client.templates.get("NonExistentTemplate")except PromptLayerNotFoundError as e: print(f"Template not found: {e}")except PromptLayerValidationError as e: print(f"Invalid input: {e}")
Example with warnings (throw_on_error=False):
from promptlayer import PromptLayer# Initialize with throw_on_error=False to get warnings instead of exceptionspromptlayer_client = PromptLayer(throw_on_error=False)# This will log a warning instead of throwing an exception if the template doesn't existtemplate = promptlayer_client.templates.get("NonExistentTemplate")# Returns None if not found, with a warning logged
PromptLayer includes a built-in retry mechanism to handle transient failures gracefully. This ensures your application remains resilient when temporary issues occur.Retry Behavior:
Total Attempts: 4 attempts (1 initial + 3 retries)
Max Wait Time: 15 seconds maximum wait between retries
What Triggers Retries:
5xx Server Errors: Internal server errors, service unavailable, etc.
429 Rate Limit Errors: When rate limits are exceeded
What Fails Immediately (No Retries):
Connection Errors: Network connectivity issues
Timeout Errors: Request timeouts
4xx Client Errors (except 429): Bad requests, authentication errors, not found, etc.
The retry mechanism operates transparently in the background. You don’t need to implement retry logic yourself - PromptLayer handles it automatically for recoverable errors.
PromptLayer uses Python’s built-in logging module for all log output:
import loggingfrom promptlayer import PromptLayer# Configure logging to see PromptLayer logslogging.basicConfig(level=logging.INFO)promptlayer_client = PromptLayer()# Now you'll see log output from PromptLayer operations
Setting log levels:
import logging# Get the PromptLayer loggerlogger = logging.getLogger("promptlayer")# Set to WARNING to only see warnings and errorslogger.setLevel(logging.WARNING)# Set to DEBUG to see detailed informationlogger.setLevel(logging.DEBUG)
Viewing Retry Logs:When retries occur, PromptLayer logs warnings before each retry attempt:
import loggingfrom promptlayer import PromptLayer# Set up logging to see retry attemptslogging.basicConfig( level=logging.WARNING, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')promptlayer_client = PromptLayer()# If a retry occurs, you'll see log messages like:# "Retrying in 2 seconds..."# "Retrying in 4 seconds..."
PromptLayer supports asynchronous operations, ideal for managing concurrent tasks in non-blocking environments like web servers, microservices, or Jupyter notebooks.
To use asynchronous non-blocking methods, initialize AsyncPromptLayer as shown:
from promptlayer import AsyncPromptLayer# Initialize an asynchronous client with your API keyasync_promptlayer_client = AsyncPromptLayer(api_key="pl_****")
Example 5: Asynchronous Streaming Prompt Execution with run Method
You can run streaming prompt template using the run method as well.
import asyncioimport osfrom promptlayer import AsyncPromptLayerasync def main(): async_promptlayer_client = AsyncPromptLayer( api_key=os.environ.get("PROMPTLAYER_API_KEY") ) response_generator = await async_promptlayer_client.run( prompt_name="TestPrompt", input_variables={"variable1": "value1", "variable2": "value2"}, stream=True ) final_response = "" async for response in response_generator: # Access raw streaming response print("Raw streaming response:", response["raw_response"]) # Access progressively built prompt blueprint if response["prompt_blueprint"]: current_response = response["prompt_blueprint"]["prompt_template"]["messages"][-1] if current_response.get("content"): print(f"Current response: {current_response['content']}")# Run the async functionasyncio.run(main())
In this example, replace “TestPrompt” with the name of your prompt template, and provide any required input variables. When streaming is enabled, each chunk includes both the raw streaming response and the progressively built prompt_blueprint, allowing you to track how the response is constructed in real-time.Want to say hi 👋, submit a feature request, or report a bug? ✉️ Contact us