autoevals
AutoEvals is a tool to quickly and easily evaluate AI model outputs.
Quickstart
pip install autoevals
Example
from autoevals.llm import *
# Create a new LLM-based evaluator
evaluator = Factuality()
# Evaluate an example LLM completion
input = "Which country has the highest population?"
output = "People's Republic of China"
expected = "China"
result = evaluator(output, expected, input=input)
# The evaluator returns a score from [0,1] and includes the raw outputs from the evaluator
print(f"Factuality score: {result.score}")
print(f"Factuality metadata: {result.metadata['rationale']}")
autoevals.llm
LLMClassifier Objects
class LLMClassifier(OpenAILLMClassifier)
An LLM-based classifier that wraps OpenAILLMClassifier
and provides a standard way to
apply chain of thought, parse the output, and score the result.
Battle Objects
class Battle(SpecFileClassifier)
Test whether an output better performs the instructions
than the original
(expected
) value.
ClosedQA Objects
class ClosedQA(SpecFileClassifier)
Test whether an output answers the input
using knowledge built into the model. You
can specify criteria
to further constrain the answer.
Humor Objects
class Humor(SpecFileClassifier)
Test whether an output is funny.
Factuality Objects
class Factuality(SpecFileClassifier)
Test whether an output is factual, compared to an original (expected
) value.
Possible Objects
class Possible(SpecFileClassifier)
Test whether an output is a possible solution to the challenge posed in the input.
Security Objects
class Security(SpecFileClassifier)
Test whether an output is malicious.
Sql Objects
class Sql(SpecFileClassifier)
Test whether a SQL query is semantically the same as a reference (output) query.
Summary Objects
class Summary(SpecFileClassifier)
Test whether an output is a better summary of the input
than the original (expected
) value.
Translation Objects
class Translation(SpecFileClassifier)
Test whether an output
is as good of a translation of the input
in the specified language
as an expert (expected
) value..
autoevals.string
Levenshtein Objects
class Levenshtein(Scorer)
A simple scorer that uses the Levenshtein distance to compare two strings.
LevenshteinScorer
backcompat
EmbeddingSimilarity Objects
class EmbeddingSimilarity(Scorer)
A simple scorer that uses cosine similarity to compare two strings.
__init__
def __init__(prefix="",
model=MODEL,
expected_min=0.7,
api_key=None,
base_url=None)
Create a new EmbeddingSimilarity scorer.
Arguments:
prefix
: A prefix to prepend to the prompt. This is useful for specifying the domain of the inputs.model
: The model to use for the embedding distance. Defaults to "text-embedding-ada-002".expected_min
: The minimum expected score. Defaults to 0.7. Values below this will be scored as 0, and values between this and 1 will be scaled linearly.
autoevals.number
NumericDiff Objects
class NumericDiff(Scorer)
A simple scorer that compares numbers by normalizing their difference.
autoevals.json
JSONDiff Objects
class JSONDiff(Scorer)
A simple scorer that compares JSON objects, using a customizable comparison method for strings (defaults to Levenshtein) and numbers (defaults to NumericDiff).