- Select Your Dataset: Choose or upload datasets to serve as the basis for your evaluations, whether for scoring, regression testing, or bulk job processing.
- Build Your Pipeline: Start by visually constructing your evaluation pipeline, defining each step from input data processing to final evaluation.
- Run Evaluations: Execute your pipeline, observe the results in a spreadsheet-like interface, and make informed decisions based on comprehensive metrics and scores.
Creating a Pipeline
- Initiate a Batch Run: Start by creating a new batch run, which requires specifying a name and selecting a dataset.
- Dataset Selection: Upload a CSV/JSON dataset, or create a dataset from historical data using filters like time range, prompt template logs, scores, and metadata. Learn more here.
Setting up the Pipeline
Adding Steps
Click ‘Add Step’ to start building your pipeline, with each column representing a step in the evaluation process. Steps execute in order left to right. That means that if a column depends on a previous column, make sure it appears to the right of the dependency.Common Step Types
- Prompt Template: Select a prompt template from the registry, set model parameters, LLM, arguments, and template version.
- Custom API Endpoint: Define a URL to send and receive data, suitable for custom evaluators or external systems.
- Human Input: Engage human graders by adding a step that allows for textual input.
- String Comparison: Use this step to compare the outputs of two previous step, showing a visual diff when relevant.