UNIT · ai-tools · standard · freshness 90d
Best Embeddings Model for RAG and Agent Memory in 2026
Comparison of OpenAI text-embedding-3, Voyage AI voyage-3, Jina v3, and BAAI bge models — benchmark performance, dimensionality, and when local beats API.
PREVIEW · free to read
TL;DR
- API, top quality: Voyage AI
voyage-3-large— currently the leading general-purpose API embedder on MTEB. - API, lowest friction: OpenAI
text-embedding-3-large(or-smallfor cost). - Self-hosted, free: BAAI
bge-large-en-v1.5(1024-dim) orbge-small-en-v1.5(384-dim) — runs on CPU, no API key. - Specialised: Jina
jina-embeddings-v3for multilingual; Cohereembed-english-v3.0for compressed dimensions and reranking-friendly outputs.
Quick comparison
| Model | Dim | Approx. MTEB avg | Pricing | Notes |
|---|---|---|---|---|
| OpenAI text-embedding-3-large | 3072 (truncatable) | ~64 | $0.13 / Mtok | Strong default; Matryoshka-style truncation built in |
| OpenAI text-embedding-3-small | 1536 | ~62 | $0.02 / Mtok | Cheap, good enough for most projects |
| Voyage voyage-3-large | 1024 | ~67 | $0.18 / Mtok | Highest quality, Anthropic-recommended pairing |
| Voyage voyage-3 | 1024 | ~65 | $0.06 / Mtok | Best balance of price and quality |
Pay with credits — log in or sign up at /account and use
Authorization: Bearer ck_… on
/api/units/best-embeddings-model-2026. Or, for browser convenience:
Of every $0.0050, the producer earns 90% · Caiche keeps 10% as marketplace fee.
Browser purchase is a stub — replace with a real Stripe-funded credit balance for production use.
Sources
Related units
Nearest neighbours by topic similarity in the same corpus.
Cost comparison across Gemini Flash, Claude Haiku, GPT-5 Nano, DeepSeek, and Groq-hosted open models — including caching, batch discounts, and where quality holds at scale.
pgvector, Pinecone, Weaviate, Qdrant, Chroma, and Milvus compared on operational footprint, latency at scale, hybrid search support, and total cost of ownership.
Comparison of the dominant AI coding extensions for VS Code in 2026 — Cline, Continue, GitHub Copilot, and the new Cursor / Windsurf forks. Practical workflow notes, not a benchmark.
A practical comparison of Claude Opus 4.7, GPT-5, and Gemini 3 inside autonomous coding agents — tool-use reliability, long-horizon task completion, and per-task cost.
Practical breakdown of the three dominant developer-facing coding agents — workflow style, where each shines, and which to pick for refactors, greenfield work, or terminal-heavy projects.
GET /api/units/best-embeddings-model-2026
with Authorization: Bearer ck_… debits your balance and returns full content.