Create embeddings

Use the Marengo video understanding model to generate embeddings from video, audio, text, and image inputs. These embeddings enable similarity search, content clustering, recommendation systems, and other machine learning applications.

Model specification

SpecificationDetails
Model IDtwelvelabs.marengo-embed-2-7-v1:0
Regional availabilityUS East (N. Virginia), Europe (Ireland), Asia Pacific (Seoul)
InputVideo, audio, image, text
Input methodsS3 URI or base64 encoded string.
Output1024-dimensional embeddings
Similarity metricCosine similarity

The model has two types of limits: the maximum input size you can submit and the portion of content that it embeds.

Input requirements

This table shows the maximum size for each type of input:

Input typeMaximum file size
Video- S3: 2 GB
- base64: 36 MB
Audio- S3: 2 GB
- base64: 36 MB
Image5 MB
Text77 tokens

Embedding coverage per input type

This table shows what portion of your input the model processes into embeddings:

Input typeEmbedding behavior
VideoCreates multiple embeddings for segments throughout the video. Segments are 2-10 seconds each. You can specify which portion of the video to process.
AudioCreates multiple embeddings, dividing the audio into segments as close to 10 seconds as possible. You can specify which portion of the audio to process.
ImageProcesses the entire image.
TextProcesses up to 77 tokens. You can configure how text exceeding this limit is handled.

Pricing

For details on pricing, see the Amazon Bedrock pricing page.

Choose the processing method

Select the processing method based on your use case and performance requirements. Synchronous processing returns embeddings immediately in the API response, while asynchronous processing handles larger files and batch operations by saving results to S3.

Note

Synchronous processing supports text and image inputs only. For video, audio, and large-scale image files, use asynchronous processing.

Use synchronous processing to:

  • Build real-time applications like chatbots, search, and recommendation systems.
  • Enable interactive features that require immediate results.

Use asynchronous processing to:

  • Build applications that process video, audio, and large-scale image files.
  • Run batch operations and background workflows.

Prerequisites

Before you start, ensure you have the following:

  • An AWS account with access to a region where the TwelveLabs models are supported.
  • An AWS IAM principal with sufficient Amazon Bedrock permissions. For details on setting permissions, see the Identity and access management for Amazon Bedrock page.
  • S3 permissions to read input files and write output files for Marengo operations.
  • The AWS CLI and configured with your credentials.
  • Python 3.7 or later with the boto3 library.
  • Access to the model you want to use. Navigate to the AWS Console > Bedrock > Model Access page and request access. Note that the availability of the models varies by region.

Create embeddings

Marengo supports base64 encoded strings and S3 URIs for media input. Note that the base64 method has a 36MB file size limit. This guide uses S3 URIs.

Note

Your S3 input and output buckets must be in the same region as the model. If regions don’t match, the API returns a ValidationException error.

To generate embeddings from your content, you use one of two Amazon Bedrock APIs, depending on your processing needs.

Synchronous processing

The InvokeModel API processes your request synchronously and returns embeddings directly in the response.

The InvokeModel API requires two parameters:

  • modelId: The inference profile ID for the model.
  • body: A JSON-encoded string containing your input parameters.

The request body contains the following fields:

  • inputType: The type of content. Values: “text” or “image”.
  • inputText: The text to embed. This parameter is required for text inputs.
  • mediaSource: The image source, which contains either:
    • base64String: Your base64-encoded image for inline processing
    • s3Location: The S3 location for images stored in S3.
    This parameter is required for image inputs.

Example

Ensure you replace <YOUR_TEXT> with the text for which you wish to create an embedding (example: “A man walking down the street”).

Python
1import boto3
2import json
3
4# Replace the `us` prefix depending on your region
5INFERENCE_PROFILE_ID = "us.twelvelabs.marengo-embed-2-7-v1:0"
6INPUT_TEXT="<YOUR_TEXT>"
7
8model_input = {
9 "inputType": "text",
10 "inputText": INPUT_TEXT
11}
12
13# Initialize the Bedrock Runtime client
14client = boto3.client('bedrock-runtime')
15
16# Make the request
17response = client.invoke_model(
18 modelId=INFERENCE_PROFILE_ID,
19 body=json.dumps(model_input)
20)
21
22# Print the response body
23response_body = json.loads(response['body'].read().decode('utf-8'))
24print(response_body)

Asynchronous processing

The StartAsyncInvoke API processes your request asynchronously, storing the results in your S3 bucket.

To create embeddings asynchronously, you must complete the following steps:

1

Submit your request, providing an S3 location for your input media file and an S3 location for the output. Note that this example uses the same bucket.

2

Check the job status using the returned invocation ARN.

3

Retrieve the results from the S3 output location once the job has completed.

The StartAsyncInvoke API requires three parameters:

  • modelId: The model ID.
  • modelInput: A dictionary containing your input parameters.
  • outputDataConfig: A dictionary specifying where to save the results

The modelInput dictionary contains the following fields:

  • inputType: The type of content you’re embedding (“video”, “audio”, “image”, or “text”)
  • mediaSource: The S3 location of your input file (for video, audio, and image)
  • inputText: The text content (for text inputs only)

S3 output structure

Each invocation creates a unique directory in your S3 bucket with two files:

  • manifest.json: Contains metadata including the request ID.
  • output.json: Contains the actual embeddings.

Example

Ensure you replace the following placeholders with your values:

  • <YOUR_REGION>: with your AWS region (example: “eu-west-1”)
  • <YOUR_ACCOUNT_ID>: with your AWS account ID (example: “123456789012”)
  • <YOUR_BUCKET_NAME>: with the name of your S3 bucket (example: “my-bucket”)
  • <YOUR_FILE>: with the name of your video file (example: “my_file.mp4”)
  • <YOUR_INPUT_TYPE>: with the type of media you wish to provide. The following values are supported: “video”, “audio”, or “image”.
Python
1import boto3
2import time
3
4REGION = "<YOUR_REGION>"
5MODEL_ID = "twelvelabs.marengo-embed-2-7-v1:0"
6ACCOUNT_ID = "<YOUR_ACCOUNT_ID>"
7BUCKET = "<YOUR_BUCKET_NAME>"
8FILE_NAME = "<YOUR_FILE>"
9INPUT_TYPE = "<YOUR_INPUT_TYPE>"
10
11bedrock_client = boto3.client(service_name="bedrock-runtime", region_name=REGION)
12
13# Start async video embedding
14model_input = {
15 "mediaSource": {
16 "s3Location": {
17 "uri": f"s3://{BUCKET}/{FILE_NAME}",
18 "bucketOwner": ACCOUNT_ID
19 }
20 },
21 "inputType": INPUT_TYPE
22}
23
24async_request_response = bedrock_client.start_async_invoke(
25 modelId=MODEL_ID,
26 modelInput=model_input,
27 outputDataConfig={
28 "s3OutputDataConfig": {
29 "s3Uri": f"s3://{BUCKET}",
30 "bucketOwner": ACCOUNT_ID
31 }
32 }
33)
34
35print("async_request_response: ", async_request_response)
36
37# get the invocation arn
38invocation_arn = async_request_response.get("invocationArn")
39
40# wait for the async job to complete
41max_retries = 60
42retries = 0
43while True:
44 response = bedrock_client.get_async_invoke(
45 invocationArn=invocation_arn
46 )
47 print(f"status: {response.get('status')}")
48 if response.get("status") == "Completed":
49 break
50 time.sleep(1)
51 retries += 1
52 if retries > max_retries:
53 break
54
55print(response)
56
57# Extract the S3 URI where results are stored
58output_s3_uri = response.get("outputDataConfig", {}).get("s3OutputDataConfig", {}).get("s3Uri")
59print(f"Results stored at: {output_s3_uri}")

Use embeddings

After generating embeddings, you can store them in a vector database for efficient similarity search and retrieval.

The typical workflow is as follows:

1

Generate embeddings for your content.

2

Store embeddings with metadata in your chosen vector database.

3

Generate an embedding for user queries.

4

Use cosine similarity to find the most relevant content.

5

Retrieve the original content or use the results for RAG applications.

Request parameters and response fields

For a complete list of request parameters and response fields, see the TwelveLabs Marengo Embed 2.7 page in the Amazon Bedrock documentation.