Create sync embeddings
This endpoint synchronously creates embeddings for multimodal content and returns the results immediately in the response.
<Note title="Note">
This method only supports Marengo version 3.0 or newer.
</Note>
**When to use this endpoint**:
- Create embeddings for text, images, audio, or video content
- Get immediate results without waiting for background processing
- Process audio or video content up to 10 minutes in duration
**Do not use this endpoint for**:
- Audio or video content longer than 10 minutes. Use the [`POST`](/v1.3/api-reference/create-embeddings-v2/create-async-embedding-task) method of the `/embed-v2/tasks` endpoint instead.
<Accordion title="Input requirements">
**Text**:
- Maximum length: 500 tokens
**Images**:
- Formats: JPEG, PNG
- Minimum size: 128x128 pixels
- Maximum file size: 5 MB
**Audio and video**:
- Maximum duration: 10 minutes
- Maximum file size for base64 encoded strings: 36 MB
- Audio formats: WAV (uncompressed), MP3 (lossy), FLAC (lossless)
- Video formats: [FFmpeg supported formats](https://ffmpeg.org/ffmpeg-formats.html)
- Video resolution: 360x360 to 3840x2160 pixels
- Aspect ratio: Between 1:1 and 1:2.4, or between 2.4:1 and 1:1
</Accordion>
Authentication
x-api-keystring
Your API key.
<Note title="Note">
You can find your API key on the <a href="https://playground.twelvelabs.io/dashboard/api-key" target="_blank">API Key</a> page.
</Note>
Request
This endpoint expects an object.
input_type
The type of content for which you wish to create embeddings.
Allowed values:
model_name
The video understanding model you wish to use.
text
This field is required if input_type is text.
image
This field is required if input_type is image.
text_image
This field is required if input_type is text_image.
audio
This field is required if input_type is audio.
video
Required if the input_type parameter is video.
Response
Successful request; normal operation
data
Array of embedding results
metadata
Metadata for the media input. Available for image, text_image, audio, and video inputs.