Audio embeddings

This guide shows how you can create audio embeddings.

The following table lists the available models for generating audio embeddings and their key characteristics:

ModelDescriptionDimensionsMax lengthSimilarity metric
Marengo-retrieval-2.7Use this model to create embeddings that you can use in various downstream tasks102410 secondsCosine similarity

Note that the “Marengo-retrieval-2.7” video understanding model generates embeddings for all modalities in the same latent space. This shared space enables any-to-any searches across different types of content.

The platform processes audio files up to 10 seconds in length. Files longer than 10 seconds are automatically truncated.

Prerequisites

  • To use the platform, you need an API key:

    1

    If you don’t have an account, sign up for a free account.

    2

    Go to the API Key page.

    3

    Select the Copy icon next to your key.

  • Ensure the pre-release version of the TwelveLabs SDK is installed on your computer:

    $pip install twelvelabs --pre
  • The audio files you wish to use must meet the following requirements:

    • Format: WAV (uncompressed), MP3 (lossy), and FLAC (lossless)
    • File size: Must not exceed 10MB.

Complete example

This complete example shows how you can create audio embeddings. Ensure you replace the placeholders surrounded by <> with your values.

1from typing import List
2
3from twelvelabs import TwelveLabs
4from twelvelabs.types import BaseSegment
5
6# 1. Initialize the client
7client = TwelveLabs(api_key="<YOUR_API_KEY>")
8
9# 2. Create audio embeddings
10res = client.embed.create(
11 model_name="Marengo-retrieval-2.7",
12 audio_url="<YOUR_AUDIO_URL>",
13 # audio_start_offset_sec=2
14)
15
16# 3. Process the results
17def print_segments(segments: List[BaseSegment], max_elements: int = 5):
18 for segment in segments:
19 first_few = segment.float_[:max_elements]
20 print(
21 f" embeddings: [{', '.join(str(x) for x in first_few)}...] (total: {len(segment.float_)} values)"
22 )
23
24
25print("Created audio embedding")
26if res.audio_embedding is not None and res.audio_embedding.segments is not None:
27 print_segments(res.audio_embedding.segments)

Step-by-step guide

1

Import the SDK and initialize the client

Create a client instance to interact with the TwelveLabs Video Understanding Platform.
Function call: You call the constructor of the TwelveLabs class.
Parameters:

  • api_key: The API key to authenticate your requests to the platform.

Return value: An object of type TwelveLabs configured for making API calls.

2

Create audio embeddings

Function call: You call the embed.create function.
Parameters:

  • model_name: The name of the model you want to use (“Marengo-retrieval-2.7”).
  • audio_url or audio_file: The publicly accessible URL or the path of your audio file.
  • (Optional) audio_start_offset_sec: The start time, in seconds, from which the platform generates the audio embeddings. This parameter allows you to skip the initial portion of the audio during processing.

Return value: The response contains the following fields:

  • audio_embedding: An object that contains the embedding data for your audio file. It includes the following fields:
    • segments: An object that contains the following:
    • float_: An array of floats representing the embedding
    • start_offset_sec: The start time.
    • metadata: An object that contains metadata about the embedding.
  • model_name: The name ofhe video understanding model the platform has used to create this embedding.
3

Process the results

This example prints the results to the standard output.