autoevals
AutoEvals is a tool to quickly and easily evaluate AI model outputs.
Quickstart
npm install autoevals
Example
Use AutoEvals to model-grade an example LLM completion using the factuality prompt.
import { Factuality } from "autoevals";
(async () => {
const input = "Which country has the highest population?";
const output = "People's Republic of China";
const expected = "China";
const result = await Factuality({ output, expected, input });
console.log(`Factuality score: ${result.score}`);
console.log(`Factuality metadata: ${result.metadata.rationale}`);
})();
Interfaces
Functions
Battle
▸ Battle(args
): Score
| Promise
<Score
>
Test whether an output better performs the instructions
than the original
(expected) value.
Parameters
Name | Type |
---|---|
args | ScorerArgs <any , LLMClassifierArgs <{ instructions : string }>> |
Returns
Defined in
ClosedQA
▸ ClosedQA(args
): Score
| Promise
<Score
>
Test whether an output answers the input
using knowledge built into the model.
You can specify criteria
to further constrain the answer.
Parameters
Name | Type |
---|---|
args | ScorerArgs <any , LLMClassifierArgs <{ criteria : any ; input : string }>> |
Returns
Defined in
Factuality
▸ Factuality(args
): Score
| Promise
<Score
>
Test whether an output is factual, compared to an original (expected
) value.
Parameters
Name | Type |
---|---|
args | ScorerArgs <any , LLMClassifierArgs <{ expected? : string ; input : string ; output : string }>> |
Returns
Defined in
Humor
▸ Humor(args
): Score
| Promise
<Score
>
Test whether an output is funny.
Parameters
Name | Type |
---|---|
args | ScorerArgs <any , LLMClassifierArgs <{}>> |
Returns
Defined in
JSONDiff
▸ JSONDiff(args
): Score
| Promise
<Score
>
A simple scorer that compares JSON objects, using a customizable comparison method for strings (defaults to Levenshtein) and numbers (defaults to NumericDiff).
Parameters
Name | Type |
---|---|
args | ScorerArgs <any , { numberScorer? : Scorer <number , {}> ; stringScorer? : Scorer <string , {}> }> |
Returns
Defined in
LLMClassifierFromSpec
▸ LLMClassifierFromSpec<RenderArgs
>(name
, spec
): Scorer
<any
, LLMClassifierArgs
<RenderArgs
>>
Type parameters
Name |
---|
RenderArgs |
Parameters
Name | Type |
---|---|
name | string |
spec | ModelGradedSpec |
Returns
Scorer
<any
, LLMClassifierArgs
<RenderArgs
>>
Defined in
LLMClassifierFromSpecFile
▸ LLMClassifierFromSpecFile<RenderArgs
>(name
, templateName
): Scorer
<any
, LLMClassifierArgs
<RenderArgs
>>
Type parameters
Name |
---|
RenderArgs |
Parameters
Name | Type |
---|---|
name | string |
templateName | "battle" | "closed_q_a" | "factuality" | "humor" | "possible" | "security" | "sql" | "summary" | "translation" |
Returns
Scorer
<any
, LLMClassifierArgs
<RenderArgs
>>
Defined in
LLMClassifierFromTemplate
▸ LLMClassifierFromTemplate<RenderArgs
>(«destructured»
): Scorer
<string
, LLMClassifierArgs
<RenderArgs
>>
Type parameters
Name |
---|
RenderArgs |
Parameters
Name | Type |
---|---|
«destructured» | Object |
› choiceScores | Record <string , number > |
› model? | string |
› name | string |
› promptTemplate | string |
› temperature? | number |
› useCoT? | boolean |
Returns
Scorer
<string
, LLMClassifierArgs
<RenderArgs
>>
Defined in
LevenshteinScorer
▸ LevenshteinScorer(args
): Score
| Promise
<Score
>
A simple scorer that uses the Levenshtein distance to compare two strings.
Parameters
Name | Type |
---|---|
args | Object |
args.expected? | string |
args.output | string |
Returns
Defined in
NumericDiff
▸ NumericDiff(args
): Score
| Promise
<Score
>
A simple scorer that compares numbers by normalizing their difference.
Parameters
Name | Type |
---|---|
args | Object |
args.expected? | number |
args.output | number |
Returns
Defined in
OpenAIClassifier
▸ OpenAIClassifier<RenderArgs
, Output
>(args
): Promise
<Score
>
Type parameters
Name |
---|
RenderArgs |
Output |
Parameters
Name | Type |
---|---|
args | ScorerArgs <Output , OpenAIClassifierArgs <RenderArgs >> |
Returns
Promise
<Score
>
Defined in
Possible
▸ Possible(args
): Score
| Promise
<Score
>
Test whether an output is a possible solution to the challenge posed in the input.
Parameters
Name | Type |
---|---|
args | ScorerArgs <any , LLMClassifierArgs <{ input : string }>> |
Returns
Defined in
Security
▸ Security(args
): Score
| Promise
<Score
>
Test whether an output is malicious.
Parameters
Name | Type |
---|---|
args | ScorerArgs <any , LLMClassifierArgs <{}>> |
Returns
Defined in
Sql
▸ Sql(args
): Score
| Promise
<Score
>
Test whether a SQL query is semantically the same as a reference (output) query.
Parameters
Name | Type |
---|---|
args | ScorerArgs <any , LLMClassifierArgs <{ input : string }>> |
Returns
Defined in
Summary
▸ Summary(args
): Score
| Promise
<Score
>
Test whether an output is a better summary of the input
than the original (expected
) value.
Parameters
Name | Type |
---|---|
args | ScorerArgs <any , LLMClassifierArgs <{ input : string }>> |
Returns
Defined in
Translation
▸ Translation(args
): Score
| Promise
<Score
>
Test whether an output
is as good of a translation of the input
in the specified language
as an expert (expected
) value.
Parameters
Name | Type |
---|---|
args | ScorerArgs <any , LLMClassifierArgs <{ input : string ; language : string }>> |
Returns
Defined in
buildClassificationFunctions
▸ buildClassificationFunctions(useCoT
): { description
: string
= "Call this function to select a choice."; name
: string
= "select_choice"; parameters
: { properties
: { choice
: { description
: string
= "The choice"; title
: string
= "Choice"; type
: string
= "string" } } ; required
: string
[] ; title
: string
= "FunctionResponse"; type
: string
= "object" } }[]
Parameters
Name | Type |
---|---|
useCoT | boolean |
Returns
{ description
: string
= "Call this function to select a choice."; name
: string
= "select_choice"; parameters
: { properties
: { choice
: { description
: string
= "The choice"; title
: string
= "Choice"; type
: string
= "string" } } ; required
: string
[] ; title
: string
= "FunctionResponse"; type
: string
= "object" } }[]
Defined in
Type Aliases
LLMClassifierArgs
Ƭ LLMClassifierArgs<RenderArgs
>: { model?
: string
; useCoT?
: boolean
} & LLMArgs
& RenderArgs
Type parameters
Name |
---|
RenderArgs |
Defined in
OpenAIClassifierArgs
Ƭ OpenAIClassifierArgs<RenderArgs
>: { cache?
: ChatCache
; choiceScores
: Record
<string
, number
> ; classificationFunctions
: ChatCompletionCreateParams.Function
[] ; messages
: ChatCompletionMessage
[] ; model
: string
; name
: string
} & LLMArgs
& RenderArgs
Type parameters
Name |
---|
RenderArgs |
Defined in
Scorer
Ƭ Scorer<Output
, Extra
>: (args
: ScorerArgs
<Output
, Extra
>) => Promise
<Score
> | (args
: ScorerArgs
<Output
, Extra
>) => Score
Type parameters
Name |
---|
Output |
Extra |
Defined in
ScorerArgs
Ƭ ScorerArgs<Output
, Extra
>: { expected?
: Output
; output
: Output
} & Extra
Type parameters
Name |
---|
Output |
Extra |
Defined in
Variables
templates
• Const
templates: Object
Type declaration
Name | Type |
---|---|
battle | string |
closed_q_a | string |
factuality | string |
humor | string |
possible | string |
security | string |
sql | string |
summary | string |
translation | string |