The overall process of building an evaluation pipeline looks like this:
Select Your Dataset: Choose or upload datasets to serve as the basis for your evaluations, whether for scoring, regression testing, or bulk job processing.
Build Your Pipeline: Start by visually constructing your evaluation pipeline, defining each step from input data processing to final evaluation.
Run Evaluations: Execute your pipeline, observe the results in a spreadsheet-like interface, and make informed decisions based on comprehensive metrics and scores.
Initiate a Batch Run: Start by creating a new batch run, which requires specifying a name and selecting a dataset.
Dataset Selection: Upload a CSV/JSON dataset, or create a dataset from historical data using filters like time range, prompt template logs, scores, and metadata. Learn more here.
You now have a pipeline. Preview mode allows you to iterate with live feedback, allowing for adjustments in real-time.
If the last step of your evaluation pipeline contains all booleans or numeric values, that will be consider the score for the row. Your full evaluation report will have a scorecard of the average of this last step.
NOTE: All cells in the last column must be boolean or all must be numeric. If any cell deviates, the score will not be calculated