Eval Types
This page provides an overview of the various evaluation column types available on our platform.
Primary Types
Prompt Template
The Prompt Template evaluation type allows you to execute a prompt template from the Prompt Registry. You have the flexibility to select the latest version, a specific label, or a particular version of the prompt template. You also have the ability to assign the input variables based on available inputs from the dataset or other columns. You can override the model parameters that are set in the Prompt Registry. This functionality is particularly useful for testing a prompt template within a larger evaluation pipeline, comparing different model parameters, or implementing an “LLM as a judge” prompt template.
Custom API Endpoint
The Custom API Endpoint enables you to set up a webhook that our system will call (POST) with all the columns to the left of the API endpoint when that cell is executed. As cells are processed sequentially, we will call this endpoint with all the columns to the left as the given payload, and the returned result will be displayed. This feature allows for extensive customization to accommodate specific use cases and integrate with external systems or custom evaluators.
The payload will be in the form of
Human Input
The Human Input evaluation type allows the addition of either numeric or text input where an evaluator can provide feedback via a slider or a text box. This input can then be utilized in subsequent columns in the evaluation pipeline, allowing for the incorporation of human judgment.
Code Execution
The Code Execution evaluation type allows you to write and execute code for each row in your dataset. You can access the data through the data
variable and return the cell value. Note that stdout will be ignored.
Code example to return a list of the names of each column:
Python Runtime
JavaScript Runtime
Simple Evals
Equality Comparison
Equality Comparison allows you to compare two different columns as strings. It provides a visual diff if there is a difference between the columns. Note that the diff is not used when calculating the score in that column and the column will be treated as a boolean for the purposes of a score. If there is no difference, it this column return true.
Contains Value
The Contains evaluation type enables you to search for a substring within a column. For instance, you could search for a specific word or phrase within each cell in the column. It is using the python in
operator to check if the substring is in the cell and is case insensitive.
Regex Match
The Regex Match evaluation type allows you to define a regular expression pattern to search within the column. This provides powerful pattern matching capabilities for complex text analysis tasks.
Absolute Numeric Distance
The Absolute Numeric Distance evaluation type allows you to select two different columns and output the absolute distance between their numeric values in a new column. Both source columns must contain numeric values.
LLM Evals
Run LLM Assertion
The LLM Assertion evaluation type enables you to run an assertion on a column using natural language prompts. You can create prompts such as “Does this contain an API key?”, “Is this sensitive content?”, or “Is this in English?”. Our system uses a backend prompt template that processes your assertion and returns either true or false. Assertions should be framed as questions.
Cosine Similarity
Cosine Similarity allows you to compare the vector distance between two columns. The system takes the two columns you supply, converts them into strings, and then embeds them using OpenAI’s embedding vectors. It then calculates the cosine similarity, resulting in a number between 0 and 1. This metric is useful for understanding how semantically similar two bodies of text are, which can be valuable in assessing topic adherence or content similarity.
Helper Functions
JSON Extraction
The JSON Extraction evaluation type allows you to define a JSON path and extract either the first match or all matches in that path. We will automatically cast the source column into a JSON object. This is particularly useful for parsing structured data within your evaluations.
Parse Value
The Parse Value column type enables you to convert another column into one of the following value types: string, number, Boolean, or JSON.
Static Value
The Static Value evaluation type allows you to pre-populate a column with a specific value. This is useful for adding constant values or context that you may need to use later in one of the other columns in your evaluation pipeline.
Type Validation
Type Validation returns a boolean for the given source column if it fits one of the specified types. The types supported for validation are JSON, number, or SQL. It will return true
if the value is valid for the specified type, and false
otherwise. For SQL validation, the system utilizes the SQLGlot library.
Coalesce
The Coalesce evaluation type allows you to take multiple different columns and coalesce them, similar to SQL’s COALESCE function.
Count
The Count evaluation type allows you to select a source column and count either the characters, words, or paragraphs within it. This will output a numeric value, which can be useful for analyzing the length or complexity of LLM outputs.
Please reach out to us if you have any other evaluation types you would like to see on the platform. We are always looking to expand our evaluation capabilities to better serve your needs.
Was this page helpful?