Create text, image, and audio embeddings

The Resources.Embed class provides methods to create text, image, and audio embeddings.

Create text, image, and audio embeddings

Description: This method creates a new embedding.

Note that you must specify at least the following parameters:

  • modelName: The name of the video understanding model to use.

  • One or more of the following input types:

    • text: For text embeddings
    • audioUrl or audioFile: For audio embeddings. If you specify both, the audioUrl parameter takes precedence.
    • imageUrl or imageFile: For image embeddings. If you specify both, the imageUrl parameter takes precedence.

You must provide at least one input type, but you can include multiple types in a single function call.

Function signature and example:

1create(
2 request: TwelvelabsApi.EmbedCreateRequest,
3 requestOptions?: Embed.RequestOptions
4): core.HttpResponsePromise<TwelvelabsApi.EmbeddingResponse>

Parameters:

NameTypeRequiredDescription
modelNamestringYesThe name of the video understanding model to use. Available models: “Marengo-retrieval-2.7”.
textstringNoThe text for which you want to create an embedding. Text embeddings are limited to 77 tokens.
textTruncatestringNoSpecifies how to truncate text that exceeds 77 tokens. Values: start, end, none. Default: end.
imageUrlstringNoThe publicly accessible URL of the image for which you wish to create an embedding. Required for image embeddings if imageFile is not provided.
imageFileFile | fs.ReadStream | BlobNoA local image file. Required for image embeddings if imageUrl is not provided.
audioUrlstringNoThe publicly accessible URL of the audio file for which you wish to create an embedding. Required for audio embeddings if audioFile is not provided.
audioFileFile | fs.ReadStream | BlobNoA local audio file. Required for audio embeddings if audioUrl is not provided.
audioStartOffsetSecnumberNoSpecifies the start time, in seconds, from which the platform generates the audio embeddings. Default: 0.
requestOptionsEmbed.RequestOptionsNoRequest-specific configuration.

Return value: Returns a HttpResponsePromise that resolves to an EmbeddingResponse object containing the embedding results.

The EmbeddingResponse interface contains the following properties:

NameTypeDescription
modelNamestringThe name of the video understanding model the platform has used to create this embedding.
textEmbeddingTextEmbeddingResultAn object that contains the generated text embedding vector and associated information. Present when a text was processed.
imageEmbeddingImageEmbeddingResultAn object that contains the generated image embedding vector and associated information. Present when an image was processed.
audioEmbeddingAudioEmbeddingResultAn object that contains the generated audio embedding vector and associated information. Present when an audio file was processed.

The TextEmbeddingResult interface contains the following properties:

NameTypeDescription
errorMessagestringError message if the embedding generation failed.
segmentsBaseSegment[]An object that contains the embedding.

The ImageEmbeddingResult interface contains the following properties:

NameTypeDescription
errorMessagestringError message if the embedding generation failed.
segmentsBaseSegment[]An object that contains the embedding.
metadataBaseEmbeddingMetadataMetadata about the embedding.

The AudioEmbeddingResult interface contains the following properties:

NameTypeDescription
segmentsAudioSegment[]An object that contains the embedding and its start time.
errorMessagestringError message if the embedding generation failed.
metadataBaseEmbeddingMetadataMetadata about the embedding.

The BaseSegment interface contains the following properties:

NameTypeDescription
floatnumber[]An array of floating point numbers representing the embedding. You can use this array with cosine similarity for various downstream tasks.

The AudioSegment interface extends BaseSegment and contains the following additional properties:

NameTypeDescription
startOffsetSecnumberThe start time, in seconds, from which the platform generated the audio embedding.

The BaseEmbeddingMetadata interface contains the following properties:

NameTypeDescription
inputUrlstringThe URL of the media file used to generate the embedding. Present if a URL was provided in the request.
inputFilenamestringThe name of the media file used to generate the embedding. Present if a file was provided in the request.

API Reference: Create text, image, and audio embeddings.

Related guide:

Error codes

This section lists the most common error messages you may encounter while creating text, image, and audio embeddings.

  • parameter_invalid
    • The text parameter is invalid. The text token length should be less than or equal to 77.
    • The text_truncate parameter is invalid. You should use one of the following values: none, start, end.