Services / 03
Production RAG (Retrieval-Augmented Generation) systems — grounded in your private documents, with hybrid search, re-ranking and cited answers. Used in PatFace to retrieve prior art before drafting patents. Pinecone, ChromaDB, Weaviate, Qdrant, pgvector.
Ingestion, chunking, embedding, indexing, query, re-ranking, generation, citation. Every stage observable, every stage swappable as the frontier moves.
Dense retrieval (embeddings) + BM25 keyword + cross-encoder re-rank. Beats either approach alone on real corpora.
Semantic chunking tuned to your document type — clause-level for contracts, section-level for research, symbol-level for code.
Every answer links back to source passages. No ungrounded output. Audit trail for regulated industries.
Offline: retrieval recall, MRR, answer-quality scoring. Online: feedback collection and query analytics. You know when the RAG is working and when it isn’t.
For sensitive data: self-hosted LLMs (Llama, Mistral) plus self-hosted vector DBs (Qdrant, Weaviate). No data leaves your VPC.
Vector DBs
Pinecone · ChromaDB · Weaviate · Qdrant · pgvector · FAISS · Milvus
Embeddings
OpenAI text-embedding-3-large · Cohere embed-v3 · Voyage · BGE · E5 · self-hosted sentence-transformers
Re-ranking
Cohere rerank · Jina reranker · cross-encoders (MS MARCO) · custom rerankers
Frameworks
LangChain · LlamaIndex · Haystack · DSPy · custom pipelines
Doc parsing
Unstructured · LlamaParse · PyMuPDF · pdfplumber · OCR via Tesseract/Textract
Eval
Ragas · TruLens · LangSmith · custom eval harnesses
patmaster.online
iklavya.in
Prototype RAG
$3,000 – $6,000
Single-corpus RAG with basic retrieval and a query UI. 1–2 weeks.
Production RAG
$10,000 – $22,000
Hybrid search, re-ranking, eval suite, citations, dashboard, deployed. 4–6 weeks.
Enterprise RAG
$25,000+
Multi-corpus, per-tenant, self-hosted LLM option, access controls, audit logs. 6–12 weeks.
Fixed-price or monthly retainer. NDA and IP-assignment standard. Hourly available on request.
RAG (Retrieval-Augmented Generation) is an architecture that grounds a large language model in your private documents. At query time, the system retrieves the most relevant passages from your data, feeds them to the LLM as context, and generates an answer with citations. It replaces the need to fine-tune a model on your data.
Use RAG when your data changes often, when you need citations, or when you want to restrict the model’s knowledge to a specific corpus. Fine-tuning is appropriate when you need the model to learn a style or behaviour, not facts. 95% of enterprise AI projects I build are RAG, not fine-tuning.
Pinecone for serverless and operational simplicity. Qdrant or Weaviate for self-hosted with advanced filtering. ChromaDB for local/prototype. pgvector if you already run Postgres and queries are under 10M vectors. I make the call based on volume, operational appetite, and cost.
A production RAG pipeline over 100k–1M documents typically costs $6,000–$18,000 to build. Operating costs: LLM API (~$0.002–$0.02 per query), vector DB ($70–$500/month depending on provider), embedding refresh for new documents. A small business deployment often runs under $300/month total.
Semantic chunking based on document structure, not fixed token windows. For contracts: clause-level. For research papers: section-level with sliding overlap. For code: symbol-level. The chunking strategy is decided per corpus, not globally — it’s the biggest determinant of retrieval quality.
Yes. Dense retrieval alone misses exact-match terms (product SKUs, clause numbers, IDs). I combine BM25 keyword search with dense embeddings and re-rank with a cross-encoder. On real corpora this consistently beats either approach alone.
Offline: retrieval recall, MRR, and LLM-judged answer quality on a held-out set of queries. Online: thumbs-up/down feedback, time-to-answer, escalation rate. Every RAG I ship has an evaluation suite the client can re-run.
Yes. Every generated answer includes inline citations linking back to the source document and passage. This is standard — I refuse to ship RAG that can’t cite its evidence.
Same-week start. Email reply within 24 hours. Written enquiries welcome.