Who is Uddit and what does he specialize in?

Uddit is a top-rated AI Engineer specializing in RAG (Retrieval Augmented Generation) systems, MCP Servers, Patent AI Tools, and Agentic AI. He is a NIT Jaipur graduate with 4+ production enterprise systems deployed, including ShipSarthi, ZAMMERNOW, RichieAI, and Gantavyam.

What AI services does Uddit offer?

Uddit offers RAG system development, MCP server implementation, Patent AI automation, vector database solutions, LLM integration, and full-stack AI application development. He works with technologies like LangChain, LlamaIndex, Pinecone, ChromaDB, GPT-4, and Claude.

Is Uddit available for remote/freelance work?

Yes, Uddit is available for freelance projects worldwide. He works remotely with clients across all time zones and can typically start on new projects within 1-2 weeks.

How can I hire Uddit for an AI project?

You can contact Uddit via email at udditalerts247@gmail.com or schedule a call through his Cal.com booking page. You can also reach out via LinkedIn at linkedin.com/in/lorduddit-.

What production systems has Uddit built?

Uddit has built 4+ enterprise production systems including: ShipSarthi (PAN-India logistics platform with 500+ clients), ZAMMERNOW (quick commerce fashion platform), RichieAI (AI financial planning SaaS), Gantavyam (women safety ride-booking), and InventIP (patent AI automation).

Services / 03

Hire a RAG developer to ground LLMs in your company knowledge.

Production RAG (Retrieval-Augmented Generation) systems — grounded in your private documents, with hybrid search, re-ranking and cited answers. Used in PatFace to retrieve prior art before drafting patents. Pinecone, ChromaDB, Weaviate, Qdrant, pgvector.

Book a call See case studies

§ 01

What you get

Production-grade pipelines

Ingestion, chunking, embedding, indexing, query, re-ranking, generation, citation. Every stage observable, every stage swappable as the frontier moves.

Hybrid search

Dense retrieval (embeddings) + BM25 keyword + cross-encoder re-rank. Beats either approach alone on real corpora.

Smart chunking

Semantic chunking tuned to your document type — clause-level for contracts, section-level for research, symbol-level for code.

Cited answers

Every answer links back to source passages. No ungrounded output. Audit trail for regulated industries.

Evaluation harness

Offline: retrieval recall, MRR, answer-quality scoring. Online: feedback collection and query analytics. You know when the RAG is working and when it isn’t.

Self-hosted option

For sensitive data: self-hosted LLMs (Llama, Mistral) plus self-hosted vector DBs (Qdrant, Weaviate). No data leaves your VPC.

§ 02

How I build it

01
Corpus review
We look at a sample of your documents: formats, volumes, update cadence, structure. This determines chunking and storage strategy.
02
Pipeline v1
Ingestion, embedding, indexing, retrieval, generation. Minimal UI for you to test queries on real data within a week.
03
Quality engineering
Chunking tuning, hybrid search, re-ranking, evaluation suite. The biggest quality gains happen here.
04
Guardrails
Citation enforcement, grounded-only generation, refusal behaviour for out-of-corpus queries. No hallucinated answers.
05
Integration & deploy
Embedded into your product, Slack bot, helpdesk, internal portal. Monitoring, cost dashboards, handover docs.

§ 03

Stack used

Vector DBs

Pinecone · ChromaDB · Weaviate · Qdrant · pgvector · FAISS · Milvus

Embeddings

OpenAI text-embedding-3-large · Cohere embed-v3 · Voyage · BGE · E5 · self-hosted sentence-transformers

Re-ranking

Cohere rerank · Jina reranker · cross-encoders (MS MARCO) · custom rerankers

Frameworks

LangChain · LlamaIndex · Haystack · DSPy · custom pipelines

Doc parsing

Unstructured · LlamaParse · PyMuPDF · pdfplumber · OCR via Tesseract/Textract

Eval

Ragas · TruLens · LangSmith · custom eval harnesses

§ 04

Shipped work

PatFace — RAG for patent claims

patmaster.online

Retrieves prior art and examiner-rejection neighbours for every claim being drafted. Hybrid search over USPTO/EPO/WIPO corpora, cross-encoder re-rank, citation-enforced generation.

Grounded drafting

Iklavya — RAG for interview scoring

iklavya.in

Retrieves role-specific rubrics and past strong answers to evaluate candidate responses. Multi-index setup per role family.

Live platform

§ 05

Engagements

Prototype RAG

$3,000 – $6,000

Single-corpus RAG with basic retrieval and a query UI. 1–2 weeks.

Production RAG

$10,000 – $22,000

Hybrid search, re-ranking, eval suite, citations, dashboard, deployed. 4–6 weeks.

Enterprise RAG

$25,000+

Multi-corpus, per-tenant, self-hosted LLM option, access controls, audit logs. 6–12 weeks.

Fixed-price or monthly retainer. NDA and IP-assignment standard. Hourly available on request.

§ 06

Frequently asked

What is a RAG system?

RAG (Retrieval-Augmented Generation) is an architecture that grounds a large language model in your private documents. At query time, the system retrieves the most relevant passages from your data, feeds them to the LLM as context, and generates an answer with citations. It replaces the need to fine-tune a model on your data.

When should I use RAG instead of fine-tuning?

Use RAG when your data changes often, when you need citations, or when you want to restrict the model’s knowledge to a specific corpus. Fine-tuning is appropriate when you need the model to learn a style or behaviour, not facts. 95% of enterprise AI projects I build are RAG, not fine-tuning.

What vector database should I use: Pinecone, ChromaDB, Weaviate, Qdrant or pgvector?

Pinecone for serverless and operational simplicity. Qdrant or Weaviate for self-hosted with advanced filtering. ChromaDB for local/prototype. pgvector if you already run Postgres and queries are under 10M vectors. I make the call based on volume, operational appetite, and cost.

How much does a RAG system cost to build and operate?

A production RAG pipeline over 100k–1M documents typically costs $6,000–$18,000 to build. Operating costs: LLM API (~$0.002–$0.02 per query), vector DB ($70–$500/month depending on provider), embedding refresh for new documents. A small business deployment often runs under $300/month total.

How do you handle chunking for RAG?

Semantic chunking based on document structure, not fixed token windows. For contracts: clause-level. For research papers: section-level with sliding overlap. For code: symbol-level. The chunking strategy is decided per corpus, not globally — it’s the biggest determinant of retrieval quality.

Do you use hybrid search?

Yes. Dense retrieval alone misses exact-match terms (product SKUs, clause numbers, IDs). I combine BM25 keyword search with dense embeddings and re-rank with a cross-encoder. On real corpora this consistently beats either approach alone.

How do you evaluate a RAG system?

Offline: retrieval recall, MRR, and LLM-judged answer quality on a held-out set of queries. Online: thumbs-up/down feedback, time-to-answer, escalation rate. Every RAG I ship has an evaluation suite the client can re-run.

Can you build RAG that cites sources?

Yes. Every generated answer includes inline citations linking back to the source document and passage. This is standard — I refuse to ship RAG that can’t cite its evidence.

Ready to build?

Same-week start. Email reply within 24 hours. Written enquiries welcome.

Book a 30-min call Email udditalerts247@gmail.com