1 — Document Corpus & Embedding

Total Documents PDFs, pages, articles, etc.

Avg Tokens per Document After chunking (recommended: 256–512)

Embedding Model

Monthly Re-embedding (%) % of corpus updated each month

2 — Vector Database

Vector DB Plan

Vector Dimensions

3 — LLM Inference

Daily User Queries

LLM Model

Avg Context Chunks per Query Retrieved docs injected into prompt

Avg Output Tokens per Query

Embedding cost/month

$0.00

Vector DB cost/month

$0.00

LLM inference/month

$0.00

Total monthly cost

$0.00

Embeddings Vector DB LLM Inference

RAG Pipeline Cost Calculator

Retrieval-Augmented Generation (RAG) combines a vector database with a large language model to enable accurate, up-to-date answers grounded in your own documents. This calculator breaks down the three main cost components: embedding generation, vector storage, and LLM inference.

Cost Optimization Strategies

Embedding — Cache embeddings; only re-embed changed documents. Use voyage-3 or text-embedding-3-small for lowest cost.
Vector DB — Self-host Chroma or Qdrant to eliminate licensing costs at small scale.
LLM — The LLM is usually the dominant cost. Use GPT-4o-mini, Gemini Flash, or DeepSeek V3 for 80–95% cost reduction vs GPT-4o/Claude Opus.
Chunk size — Larger chunks reduce query count but increase prompt tokens per query. Tune for your use case.
Retrieval — Retrieve fewer chunks (top-3 instead of top-10) to reduce context window usage.

Typical RAG Pipeline Architecture

Documents → Chunker → Embedding model → Vector database
User query → Embedding model → Vector search → Top-K chunks → LLM prompt → Response

Related Tools

Embedding Cost Calculator — deep-dive into embedding costs for your specific documents
Vector Database Cost Estimator — compare Pinecone, Weaviate, Qdrant, and Milvus pricing
Multi-Model Cost Comparison — find the cheapest LLM for your RAG inference queries
Fine-Tuning Cost Calculator — compare RAG vs fine-tuning costs for your use case