braintrust
An isomorphic JS library for logging data to Braintrust. braintrust is distributed as a library on NPM.
It is also open source and available on GitHub.
Quickstart
Install the library with npm (or yarn).
npm install braintrustThen, run a simple experiment with the following code (replace YOUR_API_KEY with
your Braintrust API key):
import { Eval } from "braintrust";
function isEqual({ output, expected }: { output: string; expected?: string }) {
return { name: "is_equal", score: output === expected ? 1 : 0 };
}
Eval("Say Hi Bot", {
data: () => {
return [
{
input: "Foo",
expected: "Hi Foo",
},
{
input: "Bar",
expected: "Hello Bar",
},
]; // Replace with your eval dataset
},
task: (input: string) => {
return "Hi " + input; // Replace with your LLM call
},
scores: [isEqual],
});Classes
Interfaces
- DataSummary
- DatasetSummary
- Evaluator
- ExperimentSummary
- LogOptions
- MetricSummary
- ObjectMetadata
- ParentExperimentIds
- ParentProjectLogIds
- ScoreSummary
- Span
Functions
BaseExperiment
▸ BaseExperiment<Input, Expected, Metadata>(options?): BaseExperiment<Input, Expected, Metadata>
Use this to specify that the dataset should actually be the data from a previous (base) experiment. If you do not specify a name, Braintrust will automatically figure out the best base experiment to use based on your git history (or fall back to timestamps).
Type parameters
| Name | Type |
|---|---|
Input | unknown |
Expected | unknown |
Metadata | extends BaseMetadata = void |
Parameters
| Name | Type | Description |
|---|---|---|
options | Object | |
options.name? | string | The name of the base experiment to use. If unspecified, Braintrust will automatically figure out the best base using your git history (or fall back to timestamps). |
Returns
BaseExperiment<Input, Expected, Metadata>
Eval
▸ Eval<Input, Output, Expected, Metadata>(name, evaluator): Promise<ExperimentSummary>
Type parameters
| Name | Type |
|---|---|
Input | Input |
Output | Output |
Expected | Expected |
Metadata | extends BaseMetadata = void |
Parameters
| Name | Type |
|---|---|
name | string |
evaluator | Evaluator<Input, Output, Expected, Metadata> |
Returns
Promise<ExperimentSummary>
_internalGetGlobalState
▸ _internalGetGlobalState(): BraintrustState
Returns
BraintrustState
_internalSetInitialState
▸ _internalSetInitialState(): void
Returns
void
currentExperiment
▸ currentExperiment(): Experiment | undefined
Returns the currently-active experiment (set by braintrust.init). Returns undefined if no current experiment has been set.
Returns
Experiment | undefined
currentLogger
▸ currentLogger<IsAsyncFlush>(options?): Logger<IsAsyncFlush> | undefined
Returns the currently-active logger (set by braintrust.initLogger). Returns undefined if no current logger has been set.
Type parameters
| Name | Type |
|---|---|
IsAsyncFlush | extends boolean |
Parameters
| Name | Type |
|---|---|
options? | AsyncFlushArg<IsAsyncFlush> |
Returns
Logger<IsAsyncFlush> | undefined
currentSpan
▸ currentSpan(): Span
Return the currently-active span for logging (set by one of the traced methods). If there is no active span, returns a no-op span object, which supports the same interface as spans but does no logging.
See Span for full details.
Returns
getSpanParentObject
▸ getSpanParentObject<IsAsyncFlush>(options?): Span | Experiment | Logger<IsAsyncFlush>
Mainly for internal use. Return the parent object for starting a span in a global context.
Type parameters
| Name | Type |
|---|---|
IsAsyncFlush | extends boolean |
Parameters
| Name | Type |
|---|---|
options? | AsyncFlushArg<IsAsyncFlush> |
Returns
Span | Experiment | Logger<IsAsyncFlush>
init
▸ init<IsOpen>(options): InitializedExperiment<IsOpen>
Log in, and then initialize a new experiment in a specified project. If the project does not exist, it will be created.
Type parameters
| Name | Type |
|---|---|
IsOpen | extends boolean = false |
Parameters
| Name | Type | Description |
|---|---|---|
options | Readonly<FullInitOptions<IsOpen>> | Options for configuring init(). |
Returns
InitializedExperiment<IsOpen>
The newly created Experiment.
▸ init<IsOpen>(project, options?): InitializedExperiment<IsOpen>
Legacy form of init which accepts the project name as the first parameter,
separately from the remaining options. See init(options) for full details.
Type parameters
| Name | Type |
|---|---|
IsOpen | extends boolean = false |
Parameters
| Name | Type |
|---|---|
project | string |
options? | Readonly<InitOptions<IsOpen>> |
Returns
InitializedExperiment<IsOpen>
initDataset
▸ initDataset<IsLegacyDataset>(options): Dataset<IsLegacyDataset>
Create a new dataset in a specified project. If the project does not exist, it will be created.
Type parameters
| Name | Type |
|---|---|
IsLegacyDataset | extends boolean = true |
Parameters
| Name | Type | Description |
|---|---|---|
options | Readonly<FullInitDatasetOptions<IsLegacyDataset>> | Options for configuring initDataset(). |
Returns
Dataset<IsLegacyDataset>
The newly created Dataset.
▸ initDataset<IsLegacyDataset>(project, options?): Dataset<IsLegacyDataset>
Legacy form of initDataset which accepts the project name as the first
parameter, separately from the remaining options. See
initDataset(options) for full details.
Type parameters
| Name | Type |
|---|---|
IsLegacyDataset | extends boolean = true |
Parameters
| Name | Type |
|---|---|
project | string |
options? | Readonly<InitDatasetOptions<IsLegacyDataset>> |
Returns
Dataset<IsLegacyDataset>
initExperiment
▸ initExperiment<IsOpen>(options): InitializedExperiment<IsOpen>
Alias for init(options).
Type parameters
| Name | Type |
|---|---|
IsOpen | extends boolean = false |
Parameters
| Name | Type |
|---|---|
options | Readonly<InitOptions<IsOpen>> |
Returns
InitializedExperiment<IsOpen>
▸ initExperiment<IsOpen>(project, options?): InitializedExperiment<IsOpen>
Alias for init(project, options).
Type parameters
| Name | Type |
|---|---|
IsOpen | extends boolean = false |
Parameters
| Name | Type |
|---|---|
project | string |
options? | Readonly<InitOptions<IsOpen>> |
Returns
InitializedExperiment<IsOpen>
initLogger
▸ initLogger<IsAsyncFlush>(options?): Logger<IsAsyncFlush>
Create a new logger in a specified project. If the project does not exist, it will be created.
Type parameters
| Name | Type |
|---|---|
IsAsyncFlush | extends boolean = false |
Parameters
| Name | Type | Description |
|---|---|---|
options | Readonly<InitLoggerOptions<IsAsyncFlush>> | Additional options for configuring init(). |
Returns
Logger<IsAsyncFlush>
The newly created Logger.
log
▸ log(event): string
Log a single event to the current experiment. The event will be batched and uploaded behind the scenes.
Parameters
| Name | Type | Description |
|---|---|---|
event | ExperimentLogFullArgs | The event to log. See Experiment.log for full details. |
Returns
string
The id of the logged event.
login
▸ login(options?): Promise<void>
Log into Braintrust. This will prompt you for your API token, which you can find at
https://www.braintrustdata.com/app/token. This method is called automatically by init().
Parameters
| Name | Type | Description |
|---|---|---|
options | Object | Options for configuring login(). |
options.apiKey? | string | The API key to use. If the parameter is not specified, will try to use the BRAINTRUST_API_KEY environment variable. If no API key is specified, will prompt the user to login. |
options.appUrl? | string | The URL of the Braintrust App. Defaults to https://www.braintrustdata.com. |
options.forceLogin? | boolean | Login again, even if you have already logged in (by default, this function will exit quickly if you have already logged in) |
options.orgName? | string | (Optional) The name of a specific organization to connect to. This is useful if you belong to multiple. |
Returns
Promise<void>
startSpan
▸ startSpan<IsAsyncFlush>(args?): Span
Lower-level alternative to traced. This allows you to start a span yourself, and can be useful in situations
where you cannot use callbacks. However, spans started with startSpan will not be marked as the "current span",
so currentSpan() and traced() will be no-ops. If you want to mark a span as current, use traced instead.
See traced for full details.
Type parameters
| Name | Type |
|---|---|
IsAsyncFlush | extends boolean = false |
Parameters
| Name | Type |
|---|---|
args? | StartSpanArgs & AsyncFlushArg<IsAsyncFlush> |
Returns
summarize
▸ summarize(options?): Promise<ExperimentSummary>
Summarize the current experiment, including the scores (compared to the closest reference experiment) and metadata.
Parameters
| Name | Type | Description |
|---|---|---|
options | Object | Options for summarizing the experiment. |
options.comparisonExperimentId? | string | The experiment to compare against. If None, the most recent experiment on the origin's main branch will be used. |
options.summarizeScores? | boolean | Whether to summarize the scores. If False, only the metadata will be returned. |
Returns
Promise<ExperimentSummary>
A summary of the experiment, including the scores (compared to the closest reference experiment) and metadata.
traced
▸ traced<IsAsyncFlush, R>(callback, args?): PromiseUnless<IsAsyncFlush, R>
Toplevel function for starting a span. It checks the following (in precedence order):
- Currently-active span
- Currently-active experiment
- Currently-active logger
and creates a span under the first one that is active. If none of these are active, it returns a no-op span object.
See Span.traced for full details.
Type parameters
| Name | Type |
|---|---|
IsAsyncFlush | extends boolean = false |
R | void |
Parameters
| Name | Type |
|---|---|
callback | (span: Span) => R |
args? | StartSpanArgs & SetCurrentArg & AsyncFlushArg<IsAsyncFlush> |
Returns
PromiseUnless<IsAsyncFlush, R>
withDataset
▸ withDataset<R, IsLegacyDataset>(project, callback, options?): R
This function is deprecated. Use initDataset instead.
Type parameters
| Name | Type |
|---|---|
R | R |
IsLegacyDataset | extends boolean = true |
Parameters
| Name | Type |
|---|---|
project | string |
callback | (dataset: Dataset<IsLegacyDataset>) => R |
options | Readonly<InitDatasetOptions<IsLegacyDataset>> |
Returns
R
withExperiment
▸ withExperiment<R>(project, callback, options?): R
This function is deprecated. Use init instead.
Type parameters
| Name |
|---|
R |
Parameters
| Name | Type |
|---|---|
project | string |
callback | (experiment: Experiment) => R |
options | Readonly<{ apiKey?: string ; appUrl?: string ; baseExperiment?: string ; baseExperimentId?: string ; dataset?: AnyDataset ; description?: string ; experiment?: string ; gitMetadataSettings?: GitMetadataSettings ; isPublic?: boolean ; metadata?: Record<string, unknown> ; orgName?: string ; projectId?: string ; repoInfo?: RepoInfo ; setCurrent?: boolean ; update?: boolean } & InitOpenOption<false> & SetCurrentArg> |
Returns
R
withLogger
▸ withLogger<IsAsyncFlush, R>(callback, options?): R
This function is deprecated. Use initLogger instead.
Type parameters
| Name | Type |
|---|---|
IsAsyncFlush | extends boolean = false |
R | void |
Parameters
| Name | Type |
|---|---|
callback | (logger: Logger<IsAsyncFlush>) => R |
options | Readonly<{ apiKey?: string ; appUrl?: string ; forceLogin?: boolean ; orgName?: string ; projectId?: string ; projectName?: string ; setCurrent?: boolean } & AsyncFlushArg<IsAsyncFlush> & SetCurrentArg> |
Returns
R
wrapOpenAI
▸ wrapOpenAI<T>(openai): T
Wrap an OpenAI object (created with new OpenAI(...)) to add tracing. If Braintrust is
not configured, this is a no-op
Currently, this only supports the v4 API.
Type parameters
| Name | Type |
|---|---|
T | extends object |
Parameters
| Name | Type |
|---|---|
openai | T |
Returns
T
The wrapped OpenAI object.
wrapOpenAIv4
▸ wrapOpenAIv4<T>(openai): T
Type parameters
| Name | Type |
|---|---|
T | extends OpenAILike |
Parameters
| Name | Type |
|---|---|
openai | T |
Returns
T
Type Aliases
AnyDataset
Ƭ AnyDataset: Dataset<boolean>
BaseExperiment
Ƭ BaseExperiment<Input, Expected, Metadata>: Object
Type parameters
| Name | Type |
|---|---|
Input | Input |
Expected | Expected |
Metadata | extends BaseMetadata = DefaultMetadataType |
Type declaration
| Name | Type |
|---|---|
_phantom? | [Input, Expected, Metadata] |
_type | "BaseExperiment" |
name? | string |
BaseMetadata
Ƭ BaseMetadata: Record<string, unknown> | void
CommentEvent
Ƭ CommentEvent: IdField & { _audit_metadata?: Record<string, unknown> ; _audit_source: Source ; comment: { text: string } ; created: string ; origin: { id: string } } & Omit<ParentExperimentIds | ParentProjectLogIds, "kind">
DatasetRecord
Ƭ DatasetRecord<IsLegacyDataset>: IsLegacyDataset extends true ? LegacyDatasetRecord : NewDatasetRecord
Type parameters
| Name | Type |
|---|---|
IsLegacyDataset | extends boolean = typeof DEFAULT_IS_LEGACY_DATASET |
DefaultMetadataType
Ƭ DefaultMetadataType: void
EndSpanArgs
Ƭ EndSpanArgs: Object
Type declaration
| Name | Type |
|---|---|
endTime? | number |
EvalCase
Ƭ EvalCase<Input, Expected, Metadata>: { input: Input } & Expected extends void ? {} : { expected: Expected } & Metadata extends void ? {} : { metadata: Metadata }
Type parameters
| Name |
|---|
Input |
Expected |
Metadata |
EvalScorerArgs
Ƭ EvalScorerArgs<Input, Output, Expected, Metadata>: EvalCase<Input, Expected, Metadata> & { output: Output }
Type parameters
| Name | Type |
|---|---|
Input | Input |
Output | Output |
Expected | Expected |
Metadata | extends BaseMetadata = DefaultMetadataType |
EvalTask
Ƭ EvalTask<Input, Output>: (input: Input, hooks: EvalHooks) => Promise<Output> | (input: Input, hooks: EvalHooks) => Output
Type parameters
| Name |
|---|
Input |
Output |
ExperimentLogFullArgs
Ƭ ExperimentLogFullArgs: Partial<Omit<OtherExperimentLogFields, "output" | "scores">> & Required<Pick<OtherExperimentLogFields, "output" | "scores">> & Partial<InputField | InputsField> & Partial<IdField>
ExperimentLogPartialArgs
Ƭ ExperimentLogPartialArgs: Partial<OtherExperimentLogFields> & Partial<InputField | InputsField>
FullInitOptions
Ƭ FullInitOptions<IsOpen>: { project?: string } & InitOptions<IsOpen>
Type parameters
| Name | Type |
|---|---|
IsOpen | extends boolean |
IdField
Ƭ IdField: Object
Type declaration
| Name | Type |
|---|---|
id | string |
InitOptions
Ƭ InitOptions<IsOpen>: { apiKey?: string ; appUrl?: string ; baseExperiment?: string ; baseExperimentId?: string ; dataset?: AnyDataset ; description?: string ; experiment?: string ; gitMetadataSettings?: GitMetadataSettings ; isPublic?: boolean ; metadata?: Record<string, unknown> ; orgName?: string ; projectId?: string ; repoInfo?: RepoInfo ; setCurrent?: boolean ; update?: boolean } & InitOpenOption<IsOpen>
Type parameters
| Name | Type |
|---|---|
IsOpen | extends boolean |
InputField
Ƭ InputField: Object
Type declaration
| Name | Type |
|---|---|
input | unknown |
InputsField
Ƭ InputsField: Object
Type declaration
| Name | Type |
|---|---|
inputs | unknown |
LogCommentFullArgs
Ƭ LogCommentFullArgs: IdField & { _audit_metadata?: Record<string, unknown> ; _audit_source: Source ; comment: { text: string } ; created: string ; origin: { id: string } } & Omit<ParentExperimentIds | ParentProjectLogIds, "kind">
LogFeedbackFullArgs
Ƭ LogFeedbackFullArgs: IdField & Partial<Omit<OtherExperimentLogFields, "output" | "metrics" | "datasetRecordId"> & { comment: string ; source: Source }>
OtherExperimentLogFields
Ƭ OtherExperimentLogFields: Object
Type declaration
| Name | Type |
|---|---|
datasetRecordId | string |
expected | unknown |
metadata | Record<string, unknown> |
metrics | Record<string, unknown> |
output | unknown |
scores | Record<string, number | null> |
PromiseUnless
Ƭ PromiseUnless<B, R>: B extends true ? R : Promise<Awaited<R>>
Type parameters
| Name |
|---|
B |
R |
SetCurrentArg
Ƭ SetCurrentArg: Object
Type declaration
| Name | Type |
|---|---|
setCurrent? | boolean |
StartSpanArgs
Ƭ StartSpanArgs: Object
Type declaration
| Name | Type |
|---|---|
event? | StartSpanEventArgs |
name? | string |
parentId? | string |
spanAttributes? | Record<any, any> |
startTime? | number |
WithTransactionId
Ƭ WithTransactionId<R>: R & { _xact_id: TransactionId }
Type parameters
| Name |
|---|
R |
Variables
NOOP_SPAN
• Const NOOP_SPAN: NoopSpan