promptlayer_client.run
is easy. Streaming allows the API to return responses incrementally, delivering partial outputs as they’re generated rather than waiting for the complete response. This can significantly improve the perceived responsiveness of your application.
prompt_blueprint
, allowing you to track how the response is constructed in real-time. The request_id
is only included in the final chunk, indicating completion of the streaming response.
Learn more about OpenAI streams.
response["raw_response"]
to access the LLM response (as done in earlier code snippets), we recommend using the standardized response["prompt_blueprint"]
.
Using it looks something like this:
prompt_template
return type of get-prompt-template.
If you're using a new model, make sure to add the new key to your .env file
python-dotenv
:app.py
and load the environment variables:gpt-3.5-turbo
model on more expensive gpt-4
historical request data.
Be warned, fine-tuning is hard to get right. We wrote a blog post on why most teams should not rely on fine-tuning.