Amazon Nova 2 Multimodal Embeddings V1

EMBEDDER

Nova 2 Multimodal Embeddings converts text, documents, images, video, and audio into unified numerical vectors for cross-modal RAG, semantic search, and classification with 8K token context.

Provider

Amazon

Credits per 1k words

0.37

Max input tokens

8,000

Dimensions

256
384
1024
3072

MTEB retrieval score

Per-modality rates

The text rate above bills text chunks per ~1k English words. Non-text chunks (image, video, audio) bill at these separate rates with their own units.

ModalityCreditsUnits
image0.80credit_per_record
video1.33credit_per_second

Supported languages

ar
de
en
es
fr
it
ja
ko
pt
zh

Supported input media

Modalities this embedder accepts natively. Other media types are converted to text (OCR for images, transcription for audio/video) before embedding.

text
image
video