The resources.Embed class provides methods to create text, image, and audio embeddings.

Create text, image, and audio embeddings

Description: This method creates a new embedding.

Note that you must specify at least the following parameters:

model_name: The name of the video understanding model to use.
One or more of the following input types:
- text: For text embeddings
- audio_url or audio_file: For audio embeddings. If you specify both, the audio_url parameter takes precedence.
- image_url or image_file: For image embeddings. If you specify both, the image_url parameter takes precedence.

You must provide at least one input type, but you can include multiple types in a single function call.

Function signature and example:

1 def create(
2     self,
3     model_name: Literal["Marengo-retrieval-2.7"],
4     *,
5     # text params
6     text: str = None,
7     text_truncate: Literal["none", "start", "end"] = None,
8     # audio params
9     audio_url: str = None,
10     audio_file: Union[str, BinaryIO, None] = None,
11     # image params
12     image_url: str = None,
13     image_file: Union[str, BinaryIO, None] = None,
14     **kwargs,
15 ) -> models.CreateEmbeddingsResult

Parameters:

Name	Type	Required	Description
`model_name`	`str`	Yes	The name of the video understanding model to use. Available models: “Marengo-retrieval-2.7”.
`text`	`str`	No	The text for which you want to create an embedding. Text embeddings are limited to 77 tokens.
`text_truncate`	`str`	No	Specifies how to truncate text that exceeds 77 tokens. Values: `start`, `end`, `none`. Default: `end`.
`image_url`	`str`	No	The publicly accessible URL of the image for which you wish to create an embedding. Required for image embeddings if `image_file` is not provided.
`image_file`	`core.File`	No	A local image file. Required for image embeddings if `image_url` is not provided.
`audio_url`	`str`	No	The publicly accessible URL of the audio file for which you wish to create an embedding. Required for audio embeddings if `audio_file` is not provided.
`audio_file`	`core.File`	No	A local audio file. Required for audio embeddings if `audio_url` is not provided.
`audio_start_offset_sec`	`float`	No	Specifies the start time, in seconds, from which the platform generates the audio embeddings. Default: `0`.
`request_options`	`RequestOptions`	No	Request-specific configuration.

Return value: Returns an EmbeddingResponse object containing the embedding results.

The EmbeddingResponse class contains the following properties:

Name	Type	Description
`model_name`	`str`	The name of the video understanding model the platform has used to create this embedding.
`text_embedding`	`Optional[TextEmbeddingResult]`	An object that contains the generated text embedding vector and associated information. Present when a text was processed.
`image_embedding`	`Optional[ImageEmbeddingResult]`	An object that contains the generated image embedding vector and associated information. Present when an image was processed.
`audio_embedding`	`Optional[AudioEmbeddingResult]`	An object that contains the generated audio embedding vector and associated information. Present when an audio file was processed.

The TextEmbeddingResult class contains the following properties:

Name	Type	Description
`error_message`	`Optional[str]`	Error message if the embedding generation failed.
`segments`	`Optional[List[BaseSegment]]`	An object that contains the embedding.

The AudioEmbeddingResult class contains the following properties:

Name	Type	Description
`segments`	`Optional[List[AudioSegment]]`	An object that contains the embedding and its start time.
`error_message`	`Optional[str]`	Error message if the embedding generation failed.
`metadata`	`Optional[BaseEmbeddingMetadata]`	Metadata about the embedding.

The ImageEmbeddingResult class contains the following properties:

Name	Type	Description
`error_message`	`Optional[str]`	Error message if the embedding generation failed.
`segments`	`Optional[List[BaseSegment]]`	An object that contains the embedding.
`metadata`	`Optional[BaseEmbeddingMetadata]`	Metadata about the embedding.

The BaseSegment class contains the following properties:

Name	Type	Description
`float_`	`Optional[List[float]]`	An array of floating point numbers representing the embedding. You can use this array with cosine similarity for various downstream tasks.

The AudioSegment class extends BaseSegment and contains the following additional properties:

Name	Type	Description
`start_offset_sec`	`Optional[float]`	The start time, in seconds, from which the platform generated the audio embedding.

The BaseEmbeddingMetadata class contains the following properties:

Name	Type	Description
`input_url`	`Optional[str]`	The URL of the media file used to generate the embedding. Present if a URL was provided in the request.
`input_filename`	`Optional[str]`	The name of the media file used to generate the embedding. Present if a file was provided in the request.

API Reference: Create text, audio, and image embeddings.

Related guides:

Error codes

This section lists the most common error messages you may encounter while creating text, image, and audio embeddings.

parameter_invalid
- The text parameter is invalid. The text token length should be less than or equal to 77.
- The text_truncate parameter is invalid. You should use one of the following values: none, start, end.