This quickstart guide provides a simplified introduction to segmenting videos into structured, timestamped data using the TwelveLabs Video Understanding Platform. It includes the following:

A basic working example
Minimal implementation details
Core parameters for common use cases

For a comprehensive guide, see the Segment videos guide.

Key concepts

This section explains the key concepts and terminology used in this guide:

Asset: Your uploaded content. Once created, you can reference the same asset across multiple operations without uploading the file again.
Segment definition: A description of a type of segment you want to extract. Each definition includes a unique identifier, a natural language description, and optional custom fields.
Segment field: A custom metadata field to extract for each segment. Each field has a name, a type, and a description.

Workflow

This guide shows how to upload your video as an asset, create an asynchronous segmentation task with Pegasus 1.5, and parse the timestamped metadata from the results.

Prerequisites

To use the platform, you need an API key:

1
If you don’t have an account, sign up for a free account.
2
Go to the API Keys page.
3
If you need to create a new key, select the Create API Key button. Enter a name and set the expiration period. The default is 12 months.
4
Select the Copy icon next to your key to copy it to your clipboard.
Depending on the programming language you are using, install the TwelveLabs SDK by entering one of the following commands:
```
$ pip install twelvelabs
```
Your video files must meet the following requirements:
- Upload limits: Public video URLs up to 2 GB or local video files up to 200 MB. For local files up to 2 GB, see the Upload and processing methods page.
- Analysis method: Videos up to 2 hours.
- Model capabilities: See the complete requirements for video files

Starter code

Copy and paste the code below, replacing the placeholders surrounded by <> with your values. The example defines a single segment type (scenes) with three custom fields to illustrate the shape. Adapt the segment definitions to match what you want to extract from your videos.

1 import json
2 import time
3 from twelvelabs import TwelveLabs
4 from twelvelabs.types import AsyncResponseFormat, VideoContext_AssetId
5 
6 # 1. Initialize the client
7 client = TwelveLabs(api_key="<YOUR_API_KEY>")
8 
9 # 2. Upload a video
10 asset = client.assets.create(
11     method="url",
12     url="<YOUR_VIDEO_URL>" # Use direct links to raw media files. Video hosting platforms and cloud storage sharing links are not supported
13     # Or use method="direct" and file=open("<PATH_TO_VIDEO_FILE>", "rb") to upload a local file up to 200 MB
14 )
15 print(f"Created asset: id={asset.id}")
16 
17 # 3. Check the status of the asset
18 print("Waiting for asset to be ready...")
19 while True:
20     asset = client.assets.retrieve(asset.id)
21     if asset.status == "ready":
22         print("Asset is ready")
23         break
24     if asset.status == "failed":
25         raise RuntimeError(f"Asset processing failed: id={asset.id}")
26     time.sleep(5)
27 
28 # 4. Create a video segmentation task
29 video = VideoContext_AssetId(asset_id=asset.id)
30 task = client.analyze_async.tasks.create(
31     video=video,
32     model_name="pegasus1.5",
33     analysis_mode="time_based_metadata",
34     response_format=AsyncResponseFormat(
35         type="segment_definitions",
36         segment_definitions=[
37             {
38                 "id": "scenes",
39                 "description": "Segment the video into distinct scenes based on changes in setting, topic, or visual composition",
40                 "fields": [
41                     {
42                         "name": "sentiment",
43                         "type": "string",
44                         "description": "The overall sentiment of this scene",
45                         "enum": ["positive", "negative", "neutral"]
46                     },
47                     {
48                         "name": "key_objects",
49                         "type": "array",
50                         "description": "Notable objects visible in the scene",
51                         "items": {"type": "string"}
52                     },
53                     {
54                         "name": "contains_speech",
55                         "type": "boolean",
56                         "description": "Whether the scene contains speech or dialogue"
57                     }
58                 ]
59             }
60         ]
61     )
62 )
63 print(f"Task ID: {task.task_id}")
64 
65 # 5. Monitor the status
66 while True:
67     task = client.analyze_async.tasks.retrieve(task.task_id)
68     if task.status == "ready":
69         print("Task completed")
70         break
71     elif task.status == "failed":
72         print("Task failed")
73         break
74     else:
75         print("Task still processing...")
76         time.sleep(5)
77 
78 # 6. Parse and process the results
79 data = json.loads(task.result.data)
80 for segment in data["scenes"]:
81     print(f"\n[{segment['start_time']:.1f}s - {segment['end_time']:.1f}s]")
82     meta = segment["metadata"]
83     print(f"  Sentiment: {meta['sentiment']}")
84     print(f"  Key objects: {', '.join(meta['key_objects'])}")
85     print(f"  Contains speech: {meta['contains_speech']}")

Code explanation

Import the SDK and initialize the client

Create a client instance to interact with the TwelveLabs Video Understanding Platform.

Upload a video

Upload a video to create an asset.

Check the status of the asset

You only need this step for URL uploads larger than 200 MB. The platform processes these files asynchronously.

Create a video segmentation task

Define the segments and fields you want to extract. Set the analysis_mode parameter to time_based_metadata and pass your segment definitions in the response_format parameter.

Monitor the status

Poll the task until it reaches the ready state.

Parse and process the results

The result.data field is a JSON-encoded string. This example parses it, then displays the timestamps and custom metadata for each segment to the standard output.