Use the Marengo video understanding model to generate embeddings from video, audio, text, and image inputs. These embeddings enable similarity search, content clustering, recommendation systems, and other machine learning applications.

Note

AWS Bedrock supports Marengo 3.0 and Marengo 2.7. Marengo 2.7 will be deprecated in a future release. Migrate to Marengo 3.0 to ensure continued support and access to new features. For details, see the Migration guide page.

For the key enhancements in Marengo 3.0, see the TwelveLabs Marengo Embed 3.0 page in the AWS Bedrock documentation.

Regional availability

Marengo is available in the following regions: US East (N. Virginia), Europe (Ireland), Asia Pacific (Seoul)

Model specification

Specification	Marengo 3.0	Marengo 2.7
Model ID	`twelvelabs.marengo-embed-3-0-v1:0`	`twelvelabs.marengo-embed-2-7-v1:0`
Input	Video, audio, image, text, image and text	Video, audio, image, text
Input methods	S3 URI or base64 encoded string	S3 URI or base64 encoded string
Output	512-dimensional embeddings	1024-dimensional embeddings
Similarity metric	Cosine similarity	Cosine similarity

The model has two types of limits: the maximum input size you can submit and the portion of content that it embeds.

Input requirements

This table shows the maximum size for each type of input:

Input type	Marengo 3.0	Marengo 2.7
Video	- S3: 6 GB - base64: 36 MB - Duration: 4 hours	- S3: 2 GB - base64: 36 MB - Duration: 2 hours
Audio	- S3: 6 GB - base64: 36 MB - Duration: 4 hours	S3: 2 GB - base64: 36 MB - Duration: 2 hours
Image	5 MB	5 MB
Text	500 tokens	77 tokens

Embedding coverage per input type

This table shows what portion of your input the model processes into embeddings:

Input type	Embedding behavior
Video	Creates multiple embeddings for segments throughout the video. Segments are 1-10 seconds each. You can specify which portion of the video to process.
Audio	Creates multiple embeddings, dividing the audio into segments as close to 10 seconds as possible. You can specify which portion of the audio to process.
Image	Processes the entire image.
Text	Processes up to the maximum tokens supported, and automatically truncates text exceeding 500 tokens from the end.
Text with image	Processes both text and image together to create a single embedding.

Pricing

For details on pricing, see the Amazon Bedrock pricing page.

Choose the processing method

Select the processing method based on your use case and performance requirements. Synchronous processing returns embeddings immediately in the API response, while asynchronous processing handles larger files and batch operations by saving results to S3.

Note

Synchronous processing supports text and image inputs. Asynchronous processing supports video, audio, and image inputs.

Use synchronous processing to:

Build real-time applications like chatbots, search, and recommendation systems.
Enable interactive features that require immediate results.

Use asynchronous processing to:

Build applications that process video, audio, and image files.
Run batch operations and background workflows.

Prerequisites

Before you start, ensure you have the following:

An AWS account with access to a region where the TwelveLabs models are supported.
An AWS IAM principal with sufficient Amazon Bedrock permissions. For details on setting permissions, see the Identity and access management for Amazon Bedrock page.
S3 permissions to read input files and write output files for Marengo operations.
The AWS CLI and configured with your credentials.
Python 3.7 or later with the boto3 library.
Access to the model you want to use. Navigate to the AWS Console > Bedrock > Model Access page and request access. Note that the availability of the models varies by region.

Create embeddings

Marengo supports base64 encoded strings and S3 URIs for media input. Note that the base64 method has a 36MB file size limit. This guide uses S3 URIs.

Note

Your S3 input and output buckets must be in the same region as the model. If regions don’t match, the API returns a ValidationException error.

To generate embeddings from your content, you use one of two Amazon Bedrock APIs, depending on your processing needs.

Synchronous processing

The InvokeModel API processes your request synchronously and returns embeddings directly in the response.

The InvokeModel API requires two parameters:

modelId: The inference profile ID for the model.
body: A JSON-encoded string containing your input parameters.

The request body contains the following fields:

inputType: The type of content. Values: “text”, “image”, or “text_image”.
For text inputs, include a string named inputText with the text to embed.
For image inputs, include an object named image with with the following fields:
- mediaSource: The image source containing either base64String or s3Location
For text with image inputs, include an object named text_image with the following fields:
- inputText: The text to embed
- mediaSource: The image source containing either base64String or s3Location

Examples

Text

Image

Text with image

Ensure you replace the placeholders surrounded by <> with your values.

Python

1 import boto3
2 import json
3 
4 INFERENCE_PROFILE_ID = "twelvelabs.marengo-embed-3-0-v1:0"
5 REGION_NAME = "<YOUR_REGION_NAME>"
6 PROFILE_NAME = "<YOUR_PROFILE_NAME>"
7 INPUT_TEXT="<YOUR_TEXT>"
8                             
9 model_input = {
10     "inputType": "text",
11     "text": {
12         "inputText": INPUT_TEXT
13     }
14 }
15 
16 # Initialize the Bedrock Runtime client
17 boto3_session = boto3.Session(profile_name=PROFILE_NAME, region_name=REGION_NAME)
18 client = boto3_session.client('bedrock-runtime')
19 
20 # Make the request
21 response = client.invoke_model(
22     modelId=INFERENCE_PROFILE_ID,
23     body=json.dumps(model_input)
24 )
25 
26 # Print the response body
27 response_body = json.loads(response['body'].read().decode('utf-8'))
28 print(response_body)

Asynchronous processing

The StartAsyncInvoke API processes your request asynchronously, storing the results in your S3 bucket.

To create embeddings asynchronously, you must complete the following steps:

Submit your request, providing an S3 location for your input media file and an S3 location for the output. Note that this example uses the same bucket.

Check the job status using the returned invocation ARN.

Retrieve the results from the S3 output location once the job has completed.

The StartAsyncInvoke API requires three parameters:

modelId: The model ID.
modelInput: A dictionary containing your input parameters.
outputDataConfig: A dictionary specifying where to save the results

The modelInput dictionary contains the following required fields:

inputType: The type of content (“video”, “audio”, “image”, “text”, or “text_image”)
For video inputs, include an object named video containing at least the following fields:
- mediaSource: The S3 location of your video file
For audio inputs, include an object named audio containing at least the following required fields:
- mediaSource: The S3 location of your audio file
For image inputs, include an image object with:
- mediaSource: The S3 location of your image file
For text inputs, include a text object with:
- inputText: The text to embed
- For text with image inputs, include a text_image object with:
- inputText: The text to embed
- mediaSource: The S3 location of your image file

S3 output structure

Each invocation creates a unique directory in your S3 bucket with two files:

manifest.json: Contains metadata including the request ID.
output.json: Contains the actual embeddings.

Examples

Video

Text inputs

Audio inputs

Image inputs

Text and image inputs

Ensure you replace the placeholders surrounded by <> with your values.

Python

1 import boto3
2 import time
3 
4 REGION_NAME = "<YOUR_REGION_NAME>"
5 PROFILE_NAME = "<YOUR_PROFILE_NAME>"
6 MODEL_ID = "twelvelabs.marengo-embed-3-0-v1:0"
7 ACCOUNT_ID = "<YOUR_ACCOUNT_ID>"
8 BUCKET = "<YOUR_BUCKET_NAME>"
9 FILE_NAME = "<YOUR_FILE>"
10 INPUT_TYPE = "<YOUR_INPUT_TYPE>"
11 
12 boto3_session = boto3.Session(profile_name=PROFILE_NAME, region_name=REGION_NAME)
13 bedrock_client = boto3_session.client('bedrock-runtime')
14 
15 # Start async video embedding
16 model_input = {
17     "inputType": "video",
18     "video": {
19         "mediaSource": {
20             "s3Location": {
21                 "uri": f"s3://{BUCKET}/{VIDEO_FILE}",
22                 "bucketOwner": ACCOUNT_ID
23             }
24         },
25     }
26 }
27 
28 async_request_response = bedrock_client.start_async_invoke(
29     modelId=MODEL_ID,
30     modelInput=model_input,
31     outputDataConfig={
32         "s3OutputDataConfig": {
33             "s3Uri": f"s3://{BUCKET}",
34             "bucketOwner": ACCOUNT_ID
35         }
36     }
37 )
38 
39 print("async_request_response: ", async_request_response)
40 
41 # Get the invocation arn
42 invocation_arn = async_request_response.get("invocationArn")
43 
44 # Wait for the async job to complete
45 max_retries = 60
46 retries = 0
47 while True:
48     response = bedrock_client.get_async_invoke(
49         invocationArn=invocation_arn
50     )
51     print(f"status: {response.get('status')}")
52     if response.get("status") == "Completed":
53         break
54     time.sleep(1)
55     retries += 1
56     if retries > max_retries:
57         break
58 
59 print(response)
60 
61 # Extract the S3 URI where results are stored
62 output_s3_uri = response.get("outputDataConfig", {}).get("s3OutputDataConfig", {}).get("s3Uri")
63 print(f"Results stored at: {output_s3_uri}")

Use embeddings

After generating embeddings, you can store them in a vector database for efficient similarity search and retrieval.

The typical workflow is as follows:

Generate embeddings for your content.

Store embeddings with metadata in your chosen vector database.

Generate an embedding for user queries.

Use cosine similarity to find the most relevant content.

Retrieve the original content or use the results for RAG applications.

Request parameters and response fields

For a complete list of request parameters and response fields, see the TwelveLabs Marengo Embed 3.0 page in the Amazon Bedrock documentation.

Using Marengo 2.7

Note

Marengo 2.7 will be deprecated in a future release. Migrate to Marengo 3.0 to ensure continued support and access to new features. For details, see the Migration guide page.

Synchronous processing

Request body structure

The request body contains the following fields:

inputType: The type of content. Values: “text” or “image”.
inputText: The text to embed. Required for text inputs.
mediaSource: The image source containing either base64String or s3Location. Required for image inputs.

Examples

Text

Image

Replace <YOUR_TEXT> with the text for which you wish to create an embedding.

Python

1 import boto3
2 import json
3 
4 # Replace the `us` prefix depending on your region
5 INFERENCE_PROFILE_ID = "us.twelvelabs.marengo-embed-2-7-v1:0"
6 INPUT_TEXT = "<YOUR_TEXT>"
7                             
8 model_input = {
9     "inputType": "text",
10     "inputText": INPUT_TEXT
11 }
12 
13 # Initialize the Bedrock Runtime client
14 client = boto3.client('bedrock-runtime')
15 
16 # Make the request
17 response = client.invoke_model(
18     modelId=INFERENCE_PROFILE_ID,
19     body=json.dumps(model_input)
20 )
21 
22 # Print the response body
23 response_body = json.loads(response['body'].read().decode('utf-8'))
24 print(response_body)

Asynchronous processing

Model input structure

The modelInput dictionary contains the following fields:

inputType: The type of content (“video”, “audio”, “image”, or “text”)
mediaSource: The S3 location of your input file (for video, audio, and image)
inputText: The text content (for text inputs only)

Examples

Video, audio, or image

Text

Replace the following placeholders with your values:

<YOUR_REGION>: Your AWS region
<YOUR_ACCOUNT_ID>: Your AWS account ID
<YOUR_BUCKET_NAME>: The name of your S3 bucket
<YOUR_FILE>: The name of your file
<YOUR_INPUT_TYPE>: The type of media (“video”, “audio”, or “image”)

Python

1 import boto3
2 import time
3 
4 REGION = "<YOUR_REGION>"
5 MODEL_ID = "twelvelabs.marengo-embed-2-7-v1:0"
6 ACCOUNT_ID = "<YOUR_ACCOUNT_ID>"
7 BUCKET = "<YOUR_BUCKET_NAME>"
8 FILE_NAME = "<YOUR_FILE>"
9 INPUT_TYPE = "<YOUR_INPUT_TYPE>"
10 
11 bedrock_client = boto3.client(service_name="bedrock-runtime", region_name=REGION)
12 
13 # Start async embedding
14 model_input = {
15     "mediaSource": {
16         "s3Location": {
17             "uri": f"s3://{BUCKET}/{FILE_NAME}",
18             "bucketOwner": ACCOUNT_ID
19         }
20     },
21     "inputType": INPUT_TYPE
22 }
23 
24 async_request_response = bedrock_client.start_async_invoke(
25     modelId=MODEL_ID,
26     modelInput=model_input,
27     outputDataConfig={
28         "s3OutputDataConfig": {
29             "s3Uri": f"s3://{BUCKET}",
30             "bucketOwner": ACCOUNT_ID
31         }
32     }
33 )
34 
35 print("async_request_response: ", async_request_response)
36 
37 # Get the invocation ARN
38 invocation_arn = async_request_response.get("invocationArn")
39 
40 # Wait for the async job to complete
41 max_retries = 60
42 retries = 0
43 while True:
44     response = bedrock_client.get_async_invoke(
45         invocationArn=invocation_arn
46     )
47     print(f"status: {response.get('status')}")
48     if response.get("status") == "Completed":
49         break
50     time.sleep(1)
51     retries += 1
52     if retries > max_retries:
53         break
54 
55 print(response)
56 
57 # Extract the S3 URI where results are stored
58 output_s3_uri = response.get("outputDataConfig", {}).get("s3OutputDataConfig", {}).get("s3Uri")
59 print(f"Results stored at: {output_s3_uri}")

Request parameters and response fields

For a complete list of request parameters and response fields, see the TwelveLabs Marengo Embed 2.7 page in the Amazon Bedrock documentation.