53 lines
1.9 KiB
Markdown
53 lines
1.9 KiB
Markdown
# LLM System: RAG & Embeddings (Starter Template)
|
||
|
||
---
|
||
**Phase:** Phase 0 (Planning)
|
||
**Status:** Draft — finalize in Phase 1
|
||
**Owner:** AI/LLM Lead + Backend Architect
|
||
**References:**
|
||
- `/docs/backend/architecture.md`
|
||
- `/docs/llm/evals.md`
|
||
- `/docs/llm/safety.md`
|
||
---
|
||
|
||
This document describes retrieval‑augmented generation (RAG) and embeddings.
|
||
Use it only if your archetype needs external knowledge or similarity search.
|
||
|
||
## 1. When to Use RAG
|
||
- You need grounded answers from a knowledge base.
|
||
- Inputs are large or dynamic (docs, tickets, policies).
|
||
- You want controllable citations/explainability.
|
||
|
||
Do **not** use RAG when:
|
||
- the task is purely generative with no grounding,
|
||
- retrieval latency/cost outweighs benefit.
|
||
|
||
## 2. Data Sources
|
||
- Curated docs, user‑uploaded files, internal DB records, external APIs.
|
||
- Mark each source as trusted/untrusted and apply safety rules.
|
||
|
||
## 3. Chunking & Indexing
|
||
- Define chunk size/overlap per domain.
|
||
- Store embeddings in a vector index (e.g., `pgvector`, managed vector DB).
|
||
- Keep an embedding model/version field to support migrations.
|
||
|
||
## 4. Retrieval Strategy
|
||
- Default: semantic search top‑k + optional filters (tenant, type, recency).
|
||
- Re‑rank if quality requires it.
|
||
- Always include retrieved doc IDs in `reasoning_trace` (not raw text).
|
||
|
||
## 5. RAG Prompting Pattern
|
||
- Provide retrieved snippets in a clearly delimited block.
|
||
- Instruct model to answer **only** using retrieved context when grounding is required.
|
||
- If context is insufficient → ask for clarification or defer.
|
||
|
||
## 6. Evaluating Retrieval
|
||
- Measure recall/precision of retrieval separately from generation quality.
|
||
- Add “no‑answer” test cases to avoid hallucinations.
|
||
|
||
## 7. Privacy & Multi‑Tenancy
|
||
- Tenant‑scoped indexes or strict filters.
|
||
- Never cross‑tenant retrieve.
|
||
- Redact PII before embedding if embeddings can be exposed or logged.
|
||
|