Modalities
Modalities represent the sources of information that the platform processes and analyzes in a video. Choose the modalities that match your needs:
Visual includes:
- Actions, objects, and events in the video.
- Text that appears on screen (through OCR).
- Brand logos and visual elements.
Audio includes:
- Ambient sounds, music, and sound effects.
- Human speech and conversations.
You specify modalities through different parameters depending on your task:
- Model options: when you create an index.
- Search options: when you search videos.
- Embedding option: when you retrieve embeddings.
Model options
When you create an index, specify which modalities the platform must process. You can include the following values in the model_options array:
- visual: To process visual content
- audio: To process audio content
You can enable one or both model options. The platform extracts and indexes only the modalities you specify.
Related topics
- Pyton SDK Reference > Create an index
- Node.js SDK Reference > Create an index
- API Reference > Create an index
Search options
When you search videos, specify which modalities the platform uses to find relevant matches. You can include the following values in the search_options array:
- visual: To search visual content
- audio: To search audio content.
Notes
- Search options must be a subset of the model options specified when the index was created. For example, if only the visualmodel option is enabled for your index, you cannot search using theaudiosearch option.
- You can combine multiple search options with the operatorparameter to broaden or narrow your search.
Related topics
- Pyton SDK Reference > Make a search request
- Node.js SDK Reference > Make a search request
- API Reference > Make a search request
Embedding options
When you retrieve video embeddings, specify the types of embeddings the platform must return. You can include the following values in the embedding_option array:
- visual-text: To retrieve visual embeddings optimized for text search.
- audio: To retrieve audio embeddings.