Files
AI_template/docs/llm/rag-embeddings.md

54 lines
1.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# LLM System: RAG & Embeddings (Starter Template)
---
**Last Updated:** 2025-12-12
**Phase:** Phase 0 (Planning)
**Status:** Draft — finalize in Phase 1
**Owner:** AI/LLM Lead + Backend Architect
**References:**
- `/docs/backend/architecture.md`
- `/docs/llm/evals.md`
- `/docs/llm/safety.md`
---
This document describes retrievalaugmented generation (RAG) and embeddings.
Use it only if your archetype needs external knowledge or similarity search.
## 1. When to Use RAG
- You need grounded answers from a knowledge base.
- Inputs are large or dynamic (docs, tickets, policies).
- You want controllable citations/explainability.
Do **not** use RAG when:
- the task is purely generative with no grounding,
- retrieval latency/cost outweighs benefit.
## 2. Data Sources
- Curated docs, useruploaded files, internal DB records, external APIs.
- Mark each source as trusted/untrusted and apply safety rules.
## 3. Chunking & Indexing
- Define chunk size/overlap per domain.
- Store embeddings in a vector index (e.g., `pgvector`, managed vector DB).
- Keep an embedding model/version field to support migrations.
## 4. Retrieval Strategy
- Default: semantic search topk + optional filters (tenant, type, recency).
- Rerank if quality requires it.
- Always include retrieved doc IDs in `reasoning_trace` (not raw text).
## 5. RAG Prompting Pattern
- Provide retrieved snippets in a clearly delimited block.
- Instruct model to answer **only** using retrieved context when grounding is required.
- If context is insufficient → ask for clarification or defer.
## 6. Evaluating Retrieval
- Measure recall/precision of retrieval separately from generation quality.
- Add “noanswer” test cases to avoid hallucinations.
## 7. Privacy & MultiTenancy
- Tenantscoped indexes or strict filters.
- Never crosstenant retrieve.
- Redact PII before embedding if embeddings can be exposed or logged.