Amazon Nova 2 Multimodal Embeddings V1

EMBEDDER

Nova 2 Multimodal Embeddings converts text, documents, images, video, and audio into unified numerical vectors for cross-modal RAG, semantic search, and classification with 8K token context.

Provider

Amazon

Credits per 1k words

0.37

Max input tokens

8,000

Dimensions

256

384

1024

3072

MTEB retrieval score

—

Per-modality rates

The text rate above bills text chunks per ~1k English words. Non-text chunks (image, video, audio) bill at these separate rates with their own units.

Modality	Credits	Units
image	0.80	credit_per_record
video	1.33	credit_per_second

Supported languages

Supported input media

Modalities this embedder accepts natively. Other media types are converted to text (OCR for images, transcription for audio/video) before embedding.

text

image

video

Documentation

https://aws.amazon.com/blogs/aws/amazon-nova-multimodal-embeddings-now-available-in-amazon-bedrock/