Eval Examples
Building & Evaluating a RAG Chatbot
This example shows how you can use PromptLayer to evaluate Retrieval Augmented Generation (RAG) systems. As a cornerstone of the LLM revolution, RAG systems enhance our ability to extract precise information from vast datasets, significantly improving question-answering capabilities.
We will create a RAG system designed for financial data analysis using a dataset from the New York Stock Exchange. The tutorial video elaborates on the step-by-step process of constructing a pipeline that encompasses prompt creation, data retrieval, and the evaluation of the system’s efficacy in answering finance-related queries.
Most importantly, you can use PromptLayer to build end-to-end evaluation tests for RAG systems.
Migrating Prompts to Open-Source Models
Click Here to Read the Tutorial
This tutorial demonstrates how to use PromptLayer to migrate prompts between different language models, with a focus on open-source models like Mistral. It covers techniques for batch model comparisons, allowing you to evaluate the performance of your prompt across multiple models. The example showcases migrating an existing prompt for a RAG system to the open-source Mistral model and comparing the new outputs with visual diffs.
The key steps include:
- Setting up a batch evaluation pipeline to run the prompt on both the original model (e.g., GPT) and the new target model (Mistral), while diffing the outputs.
- Analyzing the results, including accuracy scores, cost/latency metrics, and string output diffs, to assess the impact of migrating to the new model.
- Seamlessly updating the prompt template to use the new model (Mistral) if the migration is beneficial.
This example highlights PromptLayer’s capabilities for efficient prompt iteration and evaluation across different language models, facilitating the adoption of open-source alternatives like Mistral.
Was this page helpful?