This guide shows how you can create video embeddings using the Marengo video understanding model. For a list of available versions, complete specifications and input requirements for each version, see the Marengo page.

The Marengo video understanding model generates embeddings for all modalities in the same latent space. This shared space enables any-to-any searches across different types of content.

For details on how your usage is measured and billed, see the Pricing page.

Key concepts

This section explains the key concepts and terminology used in this guide:

Asset: Your uploaded content. Once created, you can reference the same asset across multiple operations without uploading the file again.
Embedding: Vector representation of your content.
Embedding task: An asynchronous operation for processing your content and creating embeddings. Contains a status and the resulting embeddings when complete.

Workflow

This guide shows how to upload your video as an asset and create embeddings asynchronously. You can also pass a URL or base64-encoded data inline instead of creating an asset; both are shown as commented-out lines in the code examples.

For videos under 10 minutes, synchronous processing returns embeddings immediately without polling. For details, see the Short videos (synchronous) section.

Customize your embeddings

You can configure embedding types (visual, audio, transcription), output format (separate, fused, or both), scope (clip or asset), and segmentation strategy (dynamic or fixed).

Use these embeddings for similarity search, content classification, clustering, recommendations, or Retrieval-Augmented Generation (RAG).

Prerequisites

To use the platform, you need an API key:

1
If you don’t have an account, sign up for a free account.
2
Go to the API Keys page.
3
If you need to create a new key, select the Create API Key button. Enter a name and set the expiration period. The default is 12 months.
4
Select the Copy icon next to your key to copy it to your clipboard.
Depending on the programming language you are using, install the TwelveLabs SDK by entering one of the following commands:
```
$ pip install --upgrade twelvelabs
```
Your video files must meet the following requirements:
- Upload limits: Public video URLs up to 4 GB or local video files up to 200 MB. For local files up to 4 GB, see the Upload and processing methods page.
- Embedding method: Videos up to 4 hours. This guide uses the asynchronous method. For videos under 10 minutes, see the synchronous approach below.
- Model capabilities: See the complete requirements for resolution, aspect ratio, and supported formats.

Complete example

Copy and paste the code below, replacing the placeholders surrounded by <> with your values.

1 import time
2 from twelvelabs import (
3     TwelveLabs,
4     VideoInputRequest,
5     MediaSource,
6     # For dynamic segmentation uncomment the next two lines:
7     # VideoSegmentation_Dynamic,
8     # VideoSegmentationDynamicDynamic,
9     # For fixed segmentation uncomment the next two lines:
10     # VideoSegmentation_Fixed,
11     # VideoSegmentationFixedFixed,
12 )
13 
14 # 1. Initialize the client
15 client = TwelveLabs(api_key="<YOUR_API_KEY>")
16 
17 # 2. Upload a video
18 asset = client.assets.create(
19     method="url",
20     url="<YOUR_VIDEO_URL>" # Use direct links to raw media files. Video hosting platforms and cloud storage sharing links are not supported
21     # Or use method="direct" and file=open("<PATH_TO_VIDEO_FILE>", "rb") to upload a local file up to 200 MB
22 )
23 print(f"Created asset: id={asset.id}")
24 
25 # 3. Check the status of the asset
26 print("Waiting for asset to be ready...")
27 while True:
28     asset = client.assets.retrieve(asset.id)
29     if asset.status == "ready":
30         print("Asset is ready")
31         break
32     if asset.status == "failed":
33         raise RuntimeError(f"Asset processing failed: id={asset.id}")
34     time.sleep(5)
35 
36 # 4. Create video embeddings
37 task = client.embed.v_2.tasks.create(
38     input_type="video",
39     model_name="marengo3.0",
40     video=VideoInputRequest(
41         media_source=MediaSource(
42             asset_id=asset.id,
43             # url="<YOUR_VIDEO_URL>", # Use direct links to raw media files. Video hosting platforms and cloud storage sharing links are not supported
44             # base_64_string="<BASE_64_ENCODED_DATA>",
45         ),
46         # start_sec=0,
47         # end_sec=10,
48         # embedding_option=["visual", "audio", "transcription"],
49         # embedding_scope=["clip", "asset"],
50         # embedding_type=["separate_embedding", "fused_embedding"],
51         # For dynamic segmentation:
52         # segmentation=VideoSegmentation_Dynamic(
53         #     dynamic=VideoSegmentationDynamicDynamic(
54         #         min_duration_sec=3  # Minimum segment duration in seconds
55         #     )
56         # ),
57         # For fixed segmentation:
58         # segmentation=VideoSegmentation_Fixed(
59         #     fixed=VideoSegmentationFixedFixed(
60         #         duration_sec=5  # Exact segment duration in seconds
61         #     )
62         # ),
63     ),
64 )
65 print(f"Task ID: {task.id}")
66 
67 # 5. Monitor the status
68 while True:
69     task = client.embed.v_2.tasks.retrieve(task_id=task.id)
70 
71     if task.status == "ready":
72         print(f"Task completed")
73         break
74     elif task.status == "failed":
75         print("Task failed")
76         break
77     else:
78         print("Task still processing...")
79         time.sleep(5)
80 
81 # 6. Process the results
82 print(f"\n{'='*80}")
83 print(f"EMBEDDINGS SUMMARY: {len(task.data)} total embeddings")
84 print(f"{'='*80}\n")
85 
86 for idx, embedding_data in enumerate(task.data, 1):
87     print(f"[{idx}/{len(task.data)}] {embedding_data.embedding_option.upper()} | {embedding_data.embedding_scope.upper()}")
88     print(f"├─ Time range: {embedding_data.start_sec}s - {embedding_data.end_sec}s")
89     print(f"├─ Dimensions: {len(embedding_data.embedding)}")
90     print(f"└─ First 10 values: {embedding_data.embedding[:10]}")
91     print()

Code explanation

Python

Node.js

Import the SDK and initialize the client

Create a client instance to interact with the TwelveLabs Video Understanding Platform.
Function call: You call the constructor of the TwelveLabs class.
Parameters:

api_key: The API key to authenticate your requests to the platform.

Return value: An object of type TwelveLabs configured for making API calls.

Upload a video

Upload a video to create an asset.
Function call: You call the assets.create function.
Parameters:

method: The upload method for your asset. Use url for a publicly accessible or direct to upload a local file. This example uses url.
url or file: The publicly accessible URL of your video or an opened file object in binary read mode. This example uses url.

Return value: An object of type Asset. This object contains, among other information, a field named id representing the unique identifier of your asset.

Note

For local files larger than 200 MB, use multipart uploads. Multipart uploads support automatic retry, progress tracking, parallel chunk uploads, and improved reliability, performance, and observability.

Check the status of the asset

Asset processing is asynchronous. Poll the status of the asset until it is ready before you use it.
Function call: You call the assets.retrieve function.
Parameters:

asset_id: The unique identifier of your asset.

Return value: An object of type Asset containing, among other information, a field named status representing the current status of the asset. Check this field until its value is ready.

Create video embeddings

Create an embedding task to start processing your video. This operation is asynchronous.
Function call: You call the embed.v_2.tasks.create function.
Parameters:

input_type: The type of content. Set this parameter to video.
model_name: The model you want to use. This example uses marengo3.0.
video: An object containing the following properties:
- media_source: An object specifying the source of the video file. You can specify one of the following:
  - asset_id: The unique identifier of an asset from a previous upload.
  - url: The publicly accessible URL of the video file.
  - base_64_string: The base64-encoded video data.
    
    This example uses the asset ID from the previous step.
- (Optional) start_sec: The start time in seconds for processing the video file. By default, the platform processes videos from the beginning.
- (Optional) end_sec: The end time in seconds for processing the video file. By default, the platform processes videos to the end of the video file.
- (Optional) embedding_option: The types of embeddings to generate. Valid values are the following:
  - visual: Generates visual embeddings.
  - audio: Generates embeddings for non-verbal audio (musical tones, beeping, environmental sounds).
  - transcription: Generates embeddings for transcribed speech (the actual words spoken in the video).
  You can specify multiple values to generate different types of embeddings. The default value is ["visual", "audio", "transcription"].
- (Optional) embedding_scope: The scope for which to generate embeddings. Valid values are the following:
  - clip: Generates one embedding for each segment.
  - asset: Generates one embedding for the entire video file. Use this scope for videos up to 10-30 seconds to maintain optimal performance.
  You can specify multiple scopes to generate embeddings at different levels. The default value is ["clip", "asset"].
- (Optional) segmentation: An object that specifies how the platform divides the video into segments. You can use one of the following strategies:
  - VideoSegmentation_Dynamic: Divides the video into segments that adapt to scene changes. Requires a property named dynamic with a min_duration_sec field specifying the minimum duration in seconds for each segment.
  - VideoSegmentation_Fixed: Divides the video into segments of a fixed length. Requires a property named fixed with a duration_sec field specifying the exact duration in seconds for each segment.
- (Optional) embedding_type: An array specifying how to structure the embedding. Use this parameter only when embedding_option specifies two or more values. Valid values are the following:
  - separate_embedding: Returns separate embeddings for each modality specified in embedding_option.
  - fused_embedding: Returns a single combined embedding that integrates all modalities into one vector.
  To receive both types in the same response, set this to ["separate_embedding", "fused_embedding"].

Return value: An object of type TasksCreateResponse containing, among other information, a field named id, which represents the unique identifier of your embedding task. You can use this identifier to track the status of your embedding task.

Monitor the status

The platform requires some time to process videos. Poll the status of the embedding task until processing completes. This example uses a loop to check the status every 5 seconds.
Function call: You repeatedly call the embed.v_2.tasks.retrieve function until the task completes.

Parameters:

task_id: The unique identifier of your embedding task.

Return value: An object of type EmbeddingTaskResponse containing, among other information, the following fields:

status: The current status of the task. The possible values are:
- processing: The platform is creating the embeddings.
- ready: Processing is complete. Embeddings are available in the data field.
- failed: The task failed.
data: When the status is ready, this field contains a list of embedding objects. Each embedding object includes:
- embedding: The embedding vector (a list of floats).
- embedding_option: The type of embedding. Possible values are visual, audio, transcription, and fused. The platform returns fused only when embedding_type includes fused_embedding.
- embedding_scope: The scope of the embedding (clip or asset).
- start_sec: The start time of the segment in seconds.
- end_sec: The end time of the segment in seconds.

Process the results

This example iterates through the embeddings in the data field and prints the embedding type, scope, time range, dimensions, and the first 10 vector values for each segment.

Short videos (synchronous)

For videos shorter than 10 minutes, you can use a synchronous approach that returns embeddings immediately without requiring polling.

1 import time
2 from twelvelabs import (
3     TwelveLabs,
4     VideoInputRequest,
5     MediaSource,
6     # For dynamic segmentation uncomment the next two lines:
7     # VideoSegmentation_Dynamic,
8     # VideoSegmentationDynamicDynamic,
9     # For fixed segmentation uncomment the next two lines:
10     # VideoSegmentation_Fixed,
11     # VideoSegmentationFixedFixed,
12 )
13 
14 # 1. Initialize the client
15 client = TwelveLabs(api_key="<YOUR_API_KEY>")
16 
17 # 2. Upload a file
18 asset = client.assets.create(
19     method="url",
20     url="<YOUR_VIDEO_URL>" # Use direct links to raw media files. Video hosting platforms and cloud storage sharing links are not supported
21     # Or use method="direct" and file=open("<PATH_TO_VIDEO_FILE>", "rb") to upload a local file up to 200 MB
22 )
23 print(f"Created asset: id={asset.id}")
24 
25 # 3. Check the status of the asset
26 print("Waiting for asset to be ready...")
27 while True:
28     asset = client.assets.retrieve(asset.id)
29     if asset.status == "ready":
30         print("Asset is ready")
31         break
32     if asset.status == "failed":
33         raise RuntimeError(f"Asset processing failed: id={asset.id}")
34     time.sleep(5)
35 
36 # 4. Create video embeddings
37 response = client.embed.v_2.create(
38     input_type="video",
39     model_name="marengo3.0",
40     video=VideoInputRequest(
41         media_source=MediaSource(
42             asset_id=asset.id,
43             # url="<YOUR_VIDEO_URL>", # Use direct links to raw media files. Video hosting platforms and cloud storage sharing links are not supported
44             # base_64_string="<BASE_64_ENCODED_DATA>",
45         ),
46         # start_sec=0,
47         # end_sec=10,
48         # embedding_option=["visual", "audio", "transcription"],
49         # embedding_scope=["clip", "asset"],
50         # embedding_type=["separate_embedding", "fused_embedding"],
51         # For dynamic segmentation:
52         # segmentation=VideoSegmentation_Dynamic(
53         #     dynamic=VideoSegmentationDynamicDynamic(
54         #         min_duration_sec=3  # Minimum segment duration in seconds
55         #     )
56         # ),
57         # For fixed segmentation:
58         # segmentation=VideoSegmentation_Fixed(
59         #     fixed=VideoSegmentationFixedFixed(
60         #         duration_sec=5  # Exact segment duration in seconds
61         #     )
62         # ),
63     ),
64 )
65 
66 # 5. Process the results
67 print(f"\n{'='*80}")
68 print(f"EMBEDDINGS SUMMARY: {len(response.data)} total embeddings")
69 print(f"{'='*80}\n")
70 
71 for idx, embedding_data in enumerate(response.data, 1):
72     print(f"[{idx}/{len(response.data)}] {embedding_data.embedding_option.upper()} | {embedding_data.embedding_scope.upper()}")
73     print(f"├─ Time range: {embedding_data.start_sec}s - {embedding_data.end_sec}s")
74     print(f"├─ Dimensions: {len(embedding_data.embedding)}")
75     print(f"└─ First 10 values: {embedding_data.embedding[:10]}")
76     print()

All the fields of the video object function similarly to the asynchronous approach.