Create text, image, and audio embeddings

The resources.Embed class provides methods to create text, image, and audio embeddings.

Create text, image, and audio embeddings

Description: This method creates a new embedding.

Note that you must specify at least the following parameters:

  • model_name: The name of the video understanding model to use.

  • One or more of the following input types:

    • text: For text embeddings
    • audio_url or audio_file: For audio embeddings. If you specify both, the audio_url parameter takes precedence.
    • image_url or image_file: For image embeddings. If you specify both, the image_url parameter takes precedence.

You must provide at least one input type, but you can include multiple types in a single function call.

Function signature and example:

1def create(
2 self,
3 model_name: Literal["Marengo-retrieval-2.7"],
4 *,
5 # text params
6 text: str = None,
7 text_truncate: Literal["none", "start", "end"] = None,
8 # audio params
9 audio_url: str = None,
10 audio_file: Union[str, BinaryIO, None] = None,
11 # image params
12 image_url: str = None,
13 image_file: Union[str, BinaryIO, None] = None,
14 **kwargs,
15) -> models.CreateEmbeddingsResult

Parameters:

NameTypeRequiredDescription
model_namestrYesThe name of the video understanding model to use. Available models: “Marengo-retrieval-2.7”.
textstrNoThe text for which you want to create an embedding. Text embeddings are limited to 77 tokens.
text_truncatestrNoSpecifies how to truncate text that exceeds 77 tokens. Values: start, end, none. Default: end.
image_urlstrNoThe publicly accessible URL of the image for which you wish to create an embedding. Required for image embeddings if image_file is not provided.
image_filecore.FileNoA local image file. Required for image embeddings if image_url is not provided.
audio_urlstrNoThe publicly accessible URL of the audio file for which you wish to create an embedding. Required for audio embeddings if audio_file is not provided.
audio_filecore.FileNoA local audio file. Required for audio embeddings if audio_url is not provided.
audio_start_offset_secfloatNoSpecifies the start time, in seconds, from which the platform generates the audio embeddings. Default: 0.
request_optionsRequestOptionsNoRequest-specific configuration.

Return value: Returns an EmbeddingResponse object containing the embedding results.

The EmbeddingResponse class contains the following properties:

NameTypeDescription
model_namestrThe name of the video understanding model the platform has used to create this embedding.
text_embeddingOptional[TextEmbeddingResult]An object that contains the generated text embedding vector and associated information. Present when a text was processed.
image_embeddingOptional[ImageEmbeddingResult]An object that contains the generated image embedding vector and associated information. Present when an image was processed.
audio_embeddingOptional[AudioEmbeddingResult]An object that contains the generated audio embedding vector and associated information. Present when an audio file was processed.

The TextEmbeddingResult class contains the following properties:

NameTypeDescription
error_messageOptional[str]Error message if the embedding generation failed.
segmentsOptional[List[BaseSegment]]An object that contains the embedding.

The AudioEmbeddingResult class contains the following properties:

NameTypeDescription
segmentsOptional[List[AudioSegment]]An object that contains the embedding and its start time.
error_messageOptional[str]Error message if the embedding generation failed.
metadataOptional[BaseEmbeddingMetadata]Metadata about the embedding.

The ImageEmbeddingResult class contains the following properties:

NameTypeDescription
error_messageOptional[str]Error message if the embedding generation failed.
segmentsOptional[List[BaseSegment]]An object that contains the embedding.
metadataOptional[BaseEmbeddingMetadata]Metadata about the embedding.

The BaseSegment class contains the following properties:

NameTypeDescription
float_Optional[List[float]]An array of floating point numbers representing the embedding. You can use this array with cosine similarity for various downstream tasks.

The AudioSegment class extends BaseSegment and contains the following additional properties:

NameTypeDescription
start_offset_secOptional[float]The start time, in seconds, from which the platform generated the audio embedding.

The BaseEmbeddingMetadata class contains the following properties:

NameTypeDescription
input_urlOptional[str]The URL of the media file used to generate the embedding. Present if a URL was provided in the request.
input_filenameOptional[str]The name of the media file used to generate the embedding. Present if a file was provided in the request.

API Reference: Create text, audio, and image embeddings.

Related guides:

Error codes

This section lists the most common error messages you may encounter while creating text, image, and audio embeddings.

  • parameter_invalid
    • The text parameter is invalid. The text token length should be less than or equal to 77.
    • The text_truncate parameter is invalid. You should use one of the following values: none, start, end.