braintrust
An isomorphic JS library for logging data to Braintrust. braintrust
is distributed as a library on NPM.
It is also open source and available on GitHub.
Quickstart
Install the library with npm (or yarn).
npm install braintrust
Then, run a simple experiment with the following code (replace YOUR_API_KEY
with
your Braintrust API key):
import { Eval } from "braintrust";
function isEqual({ output, expected }: { output: string; expected?: string }) {
return { name: "is_equal", score: output === expected ? 1 : 0 };
}
Eval("Say Hi Bot", {
data: () => {
return [
{
input: "Foo",
expected: "Hi Foo",
},
{
input: "Bar",
expected: "Hello Bar",
},
]; // Replace with your eval dataset
},
task: (input: string) => {
return "Hi " + input; // Replace with your LLM call
},
scores: [isEqual],
});
Classes
Interfaces
- DataSummary
- DatasetSummary
- Evaluator
- ExperimentSummary
- LogOptions
- MetricSummary
- ObjectMetadata
- ParentExperimentIds
- ParentProjectLogIds
- ScoreSummary
- Span
Functions
BaseExperiment
▸ BaseExperiment<Input
, Expected
, Metadata
>(options?
): BaseExperiment
<Input
, Expected
, Metadata
>
Use this to specify that the dataset should actually be the data from a previous (base) experiment. If you do not specify a name, Braintrust will automatically figure out the best base experiment to use based on your git history (or fall back to timestamps).
Type parameters
Name | Type |
---|---|
Input | unknown |
Expected | unknown |
Metadata | extends BaseMetadata = void |
Parameters
Name | Type | Description |
---|---|---|
options | Object | |
options.name? | string | The name of the base experiment to use. If unspecified, Braintrust will automatically figure out the best base using your git history (or fall back to timestamps). |
Returns
BaseExperiment
<Input
, Expected
, Metadata
>
Eval
▸ Eval<Input
, Output
, Expected
, Metadata
>(name
, evaluator
): Promise
<ExperimentSummary
>
Type parameters
Name | Type |
---|---|
Input | Input |
Output | Output |
Expected | Expected |
Metadata | extends BaseMetadata = void |
Parameters
Name | Type |
---|---|
name | string |
evaluator | Evaluator <Input , Output , Expected , Metadata > |
Returns
Promise
<ExperimentSummary
>
_internalGetGlobalState
▸ _internalGetGlobalState(): BraintrustState
Returns
BraintrustState
_internalSetInitialState
▸ _internalSetInitialState(): void
Returns
void
currentExperiment
▸ currentExperiment(): Experiment
| undefined
Returns the currently-active experiment (set by braintrust.init
). Returns undefined if no current experiment has been set.
Returns
Experiment
| undefined
currentLogger
▸ currentLogger<IsAsyncFlush
>(options?
): Logger
<IsAsyncFlush
> | undefined
Returns the currently-active logger (set by braintrust.initLogger
). Returns undefined if no current logger has been set.
Type parameters
Name | Type |
---|---|
IsAsyncFlush | extends boolean |
Parameters
Name | Type |
---|---|
options? | AsyncFlushArg <IsAsyncFlush > |
Returns
Logger
<IsAsyncFlush
> | undefined
currentSpan
▸ currentSpan(): Span
Return the currently-active span for logging (set by one of the traced
methods). If there is no active span, returns a no-op span object, which supports the same interface as spans but does no logging.
See Span
for full details.
Returns
getSpanParentObject
▸ getSpanParentObject<IsAsyncFlush
>(options?
): Span
| Experiment
| Logger
<IsAsyncFlush
>
Mainly for internal use. Return the parent object for starting a span in a global context.
Type parameters
Name | Type |
---|---|
IsAsyncFlush | extends boolean |
Parameters
Name | Type |
---|---|
options? | AsyncFlushArg <IsAsyncFlush > |
Returns
Span
| Experiment
| Logger
<IsAsyncFlush
>
init
▸ init<IsOpen
>(options
): InitializedExperiment
<IsOpen
>
Log in, and then initialize a new experiment in a specified project. If the project does not exist, it will be created.
Type parameters
Name | Type |
---|---|
IsOpen | extends boolean = false |
Parameters
Name | Type | Description |
---|---|---|
options | Readonly <FullInitOptions <IsOpen >> | Options for configuring init(). |
Returns
InitializedExperiment
<IsOpen
>
The newly created Experiment.
▸ init<IsOpen
>(project
, options?
): InitializedExperiment
<IsOpen
>
Legacy form of init
which accepts the project name as the first parameter,
separately from the remaining options. See init(options)
for full details.
Type parameters
Name | Type |
---|---|
IsOpen | extends boolean = false |
Parameters
Name | Type |
---|---|
project | string |
options? | Readonly <InitOptions <IsOpen >> |
Returns
InitializedExperiment
<IsOpen
>
initDataset
▸ initDataset<IsLegacyDataset
>(options
): Dataset
<IsLegacyDataset
>
Create a new dataset in a specified project. If the project does not exist, it will be created.
Type parameters
Name | Type |
---|---|
IsLegacyDataset | extends boolean = true |
Parameters
Name | Type | Description |
---|---|---|
options | Readonly <FullInitDatasetOptions <IsLegacyDataset >> | Options for configuring initDataset(). |
Returns
Dataset
<IsLegacyDataset
>
The newly created Dataset.
▸ initDataset<IsLegacyDataset
>(project
, options?
): Dataset
<IsLegacyDataset
>
Legacy form of initDataset
which accepts the project name as the first
parameter, separately from the remaining options. See
initDataset(options)
for full details.
Type parameters
Name | Type |
---|---|
IsLegacyDataset | extends boolean = true |
Parameters
Name | Type |
---|---|
project | string |
options? | Readonly <InitDatasetOptions <IsLegacyDataset >> |
Returns
Dataset
<IsLegacyDataset
>
initExperiment
▸ initExperiment<IsOpen
>(options
): InitializedExperiment
<IsOpen
>
Alias for init(options).
Type parameters
Name | Type |
---|---|
IsOpen | extends boolean = false |
Parameters
Name | Type |
---|---|
options | Readonly <InitOptions <IsOpen >> |
Returns
InitializedExperiment
<IsOpen
>
▸ initExperiment<IsOpen
>(project
, options?
): InitializedExperiment
<IsOpen
>
Alias for init(project, options).
Type parameters
Name | Type |
---|---|
IsOpen | extends boolean = false |
Parameters
Name | Type |
---|---|
project | string |
options? | Readonly <InitOptions <IsOpen >> |
Returns
InitializedExperiment
<IsOpen
>
initLogger
▸ initLogger<IsAsyncFlush
>(options?
): Logger
<IsAsyncFlush
>
Create a new logger in a specified project. If the project does not exist, it will be created.
Type parameters
Name | Type |
---|---|
IsAsyncFlush | extends boolean = false |
Parameters
Name | Type | Description |
---|---|---|
options | Readonly <InitLoggerOptions <IsAsyncFlush >> | Additional options for configuring init(). |
Returns
Logger
<IsAsyncFlush
>
The newly created Logger.
log
▸ log(event
): string
Log a single event to the current experiment. The event will be batched and uploaded behind the scenes.
Parameters
Name | Type | Description |
---|---|---|
event | ExperimentLogFullArgs | The event to log. See Experiment.log for full details. |
Returns
string
The id
of the logged event.
login
▸ login(options?
): Promise
<void
>
Log into Braintrust. This will prompt you for your API token, which you can find at
https://www.braintrustdata.com/app/token. This method is called automatically by init()
.
Parameters
Name | Type | Description |
---|---|---|
options | Object | Options for configuring login(). |
options.apiKey? | string | The API key to use. If the parameter is not specified, will try to use the BRAINTRUST_API_KEY environment variable. If no API key is specified, will prompt the user to login. |
options.appUrl? | string | The URL of the Braintrust App. Defaults to https://www.braintrustdata.com. |
options.forceLogin? | boolean | Login again, even if you have already logged in (by default, this function will exit quickly if you have already logged in) |
options.orgName? | string | (Optional) The name of a specific organization to connect to. This is useful if you belong to multiple. |
Returns
Promise
<void
>
startSpan
▸ startSpan<IsAsyncFlush
>(args?
): Span
Lower-level alternative to traced
. This allows you to start a span yourself, and can be useful in situations
where you cannot use callbacks. However, spans started with startSpan
will not be marked as the "current span",
so currentSpan()
and traced()
will be no-ops. If you want to mark a span as current, use traced
instead.
See traced
for full details.
Type parameters
Name | Type |
---|---|
IsAsyncFlush | extends boolean = false |
Parameters
Name | Type |
---|---|
args? | StartSpanArgs & AsyncFlushArg <IsAsyncFlush > |
Returns
summarize
▸ summarize(options?
): Promise
<ExperimentSummary
>
Summarize the current experiment, including the scores (compared to the closest reference experiment) and metadata.
Parameters
Name | Type | Description |
---|---|---|
options | Object | Options for summarizing the experiment. |
options.comparisonExperimentId? | string | The experiment to compare against. If None, the most recent experiment on the origin's main branch will be used. |
options.summarizeScores? | boolean | Whether to summarize the scores. If False, only the metadata will be returned. |
Returns
Promise
<ExperimentSummary
>
A summary of the experiment, including the scores (compared to the closest reference experiment) and metadata.
traced
▸ traced<IsAsyncFlush
, R
>(callback
, args?
): PromiseUnless
<IsAsyncFlush
, R
>
Toplevel function for starting a span. It checks the following (in precedence order):
- Currently-active span
- Currently-active experiment
- Currently-active logger
and creates a span under the first one that is active. If none of these are active, it returns a no-op span object.
See Span.traced
for full details.
Type parameters
Name | Type |
---|---|
IsAsyncFlush | extends boolean = false |
R | void |
Parameters
Name | Type |
---|---|
callback | (span : Span ) => R |
args? | StartSpanArgs & SetCurrentArg & AsyncFlushArg <IsAsyncFlush > |
Returns
PromiseUnless
<IsAsyncFlush
, R
>
withDataset
▸ withDataset<R
, IsLegacyDataset
>(project
, callback
, options?
): R
This function is deprecated. Use initDataset
instead.
Type parameters
Name | Type |
---|---|
R | R |
IsLegacyDataset | extends boolean = true |
Parameters
Name | Type |
---|---|
project | string |
callback | (dataset : Dataset <IsLegacyDataset >) => R |
options | Readonly <InitDatasetOptions <IsLegacyDataset >> |
Returns
R
withExperiment
▸ withExperiment<R
>(project
, callback
, options?
): R
This function is deprecated. Use init
instead.
Type parameters
Name |
---|
R |
Parameters
Name | Type |
---|---|
project | string |
callback | (experiment : Experiment ) => R |
options | Readonly <{ apiKey? : string ; appUrl? : string ; baseExperiment? : string ; baseExperimentId? : string ; dataset? : AnyDataset ; description? : string ; experiment? : string ; gitMetadataSettings? : GitMetadataSettings ; isPublic? : boolean ; metadata? : Record <string , unknown > ; orgName? : string ; projectId? : string ; repoInfo? : RepoInfo ; setCurrent? : boolean ; update? : boolean } & InitOpenOption <false > & SetCurrentArg > |
Returns
R
withLogger
▸ withLogger<IsAsyncFlush
, R
>(callback
, options?
): R
This function is deprecated. Use initLogger
instead.
Type parameters
Name | Type |
---|---|
IsAsyncFlush | extends boolean = false |
R | void |
Parameters
Name | Type |
---|---|
callback | (logger : Logger <IsAsyncFlush >) => R |
options | Readonly <{ apiKey? : string ; appUrl? : string ; forceLogin? : boolean ; orgName? : string ; projectId? : string ; projectName? : string ; setCurrent? : boolean } & AsyncFlushArg <IsAsyncFlush > & SetCurrentArg > |
Returns
R
wrapOpenAI
▸ wrapOpenAI<T
>(openai
): T
Wrap an OpenAI
object (created with new OpenAI(...)
) to add tracing. If Braintrust is
not configured, this is a no-op
Currently, this only supports the v4
API.
Type parameters
Name | Type |
---|---|
T | extends object |
Parameters
Name | Type |
---|---|
openai | T |
Returns
T
The wrapped OpenAI
object.
wrapOpenAIv4
▸ wrapOpenAIv4<T
>(openai
): T
Type parameters
Name | Type |
---|---|
T | extends OpenAILike |
Parameters
Name | Type |
---|---|
openai | T |
Returns
T
Type Aliases
AnyDataset
Ƭ AnyDataset: Dataset
<boolean
>
BaseExperiment
Ƭ BaseExperiment<Input
, Expected
, Metadata
>: Object
Type parameters
Name | Type |
---|---|
Input | Input |
Expected | Expected |
Metadata | extends BaseMetadata = DefaultMetadataType |
Type declaration
Name | Type |
---|---|
_phantom? | [Input , Expected , Metadata ] |
_type | "BaseExperiment" |
name? | string |
BaseMetadata
Ƭ BaseMetadata: Record
<string
, unknown
> | void
CommentEvent
Ƭ CommentEvent: IdField
& { _audit_metadata?
: Record
<string
, unknown
> ; _audit_source
: Source
; comment
: { text
: string
} ; created
: string
; origin
: { id
: string
} } & Omit
<ParentExperimentIds
| ParentProjectLogIds
, "kind"
>
DatasetRecord
Ƭ DatasetRecord<IsLegacyDataset
>: IsLegacyDataset
extends true
? LegacyDatasetRecord
: NewDatasetRecord
Type parameters
Name | Type |
---|---|
IsLegacyDataset | extends boolean = typeof DEFAULT_IS_LEGACY_DATASET |
DefaultMetadataType
Ƭ DefaultMetadataType: void
EndSpanArgs
Ƭ EndSpanArgs: Object
Type declaration
Name | Type |
---|---|
endTime? | number |
EvalCase
Ƭ EvalCase<Input
, Expected
, Metadata
>: { input
: Input
} & Expected
extends void
? {} : { expected
: Expected
} & Metadata
extends void
? {} : { metadata
: Metadata
}
Type parameters
Name |
---|
Input |
Expected |
Metadata |
EvalScorerArgs
Ƭ EvalScorerArgs<Input
, Output
, Expected
, Metadata
>: EvalCase
<Input
, Expected
, Metadata
> & { output
: Output
}
Type parameters
Name | Type |
---|---|
Input | Input |
Output | Output |
Expected | Expected |
Metadata | extends BaseMetadata = DefaultMetadataType |
EvalTask
Ƭ EvalTask<Input
, Output
>: (input
: Input
, hooks
: EvalHooks
) => Promise
<Output
> | (input
: Input
, hooks
: EvalHooks
) => Output
Type parameters
Name |
---|
Input |
Output |
ExperimentLogFullArgs
Ƭ ExperimentLogFullArgs: Partial
<Omit
<OtherExperimentLogFields
, "output"
| "scores"
>> & Required
<Pick
<OtherExperimentLogFields
, "output"
| "scores"
>> & Partial
<InputField
| InputsField
> & Partial
<IdField
>
ExperimentLogPartialArgs
Ƭ ExperimentLogPartialArgs: Partial
<OtherExperimentLogFields
> & Partial
<InputField
| InputsField
>
FullInitOptions
Ƭ FullInitOptions<IsOpen
>: { project?
: string
} & InitOptions
<IsOpen
>
Type parameters
Name | Type |
---|---|
IsOpen | extends boolean |
IdField
Ƭ IdField: Object
Type declaration
Name | Type |
---|---|
id | string |
InitOptions
Ƭ InitOptions<IsOpen
>: { apiKey?
: string
; appUrl?
: string
; baseExperiment?
: string
; baseExperimentId?
: string
; dataset?
: AnyDataset
; description?
: string
; experiment?
: string
; gitMetadataSettings?
: GitMetadataSettings
; isPublic?
: boolean
; metadata?
: Record
<string
, unknown
> ; orgName?
: string
; projectId?
: string
; repoInfo?
: RepoInfo
; setCurrent?
: boolean
; update?
: boolean
} & InitOpenOption
<IsOpen
>
Type parameters
Name | Type |
---|---|
IsOpen | extends boolean |
InputField
Ƭ InputField: Object
Type declaration
Name | Type |
---|---|
input | unknown |
InputsField
Ƭ InputsField: Object
Type declaration
Name | Type |
---|---|
inputs | unknown |
LogCommentFullArgs
Ƭ LogCommentFullArgs: IdField
& { _audit_metadata?
: Record
<string
, unknown
> ; _audit_source
: Source
; comment
: { text
: string
} ; created
: string
; origin
: { id
: string
} } & Omit
<ParentExperimentIds
| ParentProjectLogIds
, "kind"
>
LogFeedbackFullArgs
Ƭ LogFeedbackFullArgs: IdField
& Partial
<Omit
<OtherExperimentLogFields
, "output"
| "metrics"
| "datasetRecordId"
> & { comment
: string
; source
: Source
}>
OtherExperimentLogFields
Ƭ OtherExperimentLogFields: Object
Type declaration
Name | Type |
---|---|
datasetRecordId | string |
expected | unknown |
metadata | Record <string , unknown > |
metrics | Record <string , unknown > |
output | unknown |
scores | Record <string , number | null > |
PromiseUnless
Ƭ PromiseUnless<B
, R
>: B
extends true
? R
: Promise
<Awaited
<R
>>
Type parameters
Name |
---|
B |
R |
SetCurrentArg
Ƭ SetCurrentArg: Object
Type declaration
Name | Type |
---|---|
setCurrent? | boolean |
StartSpanArgs
Ƭ StartSpanArgs: Object
Type declaration
Name | Type |
---|---|
event? | StartSpanEventArgs |
name? | string |
parentId? | string |
spanAttributes? | Record <any , any > |
startTime? | number |
WithTransactionId
Ƭ WithTransactionId<R
>: R
& { _xact_id
: TransactionId
}
Type parameters
Name |
---|
R |
Variables
NOOP_SPAN
• Const
NOOP_SPAN: NoopSpan