Add foundational documentation templates to support product design and architecture planning, including ADR, archetypes, LLM systems, dev setup, and shared modules.
This commit is contained in:
53
docs/llm/rag-embeddings.md
Normal file
53
docs/llm/rag-embeddings.md
Normal file
@@ -0,0 +1,53 @@
|
||||
# LLM System: RAG & Embeddings (Starter Template)
|
||||
|
||||
---
|
||||
**Last Updated:** 2025-12-12
|
||||
**Phase:** Phase 0 (Planning)
|
||||
**Status:** Draft — finalize in Phase 1
|
||||
**Owner:** AI/LLM Lead + Backend Architect
|
||||
**References:**
|
||||
- `/docs/backend/architecture.md`
|
||||
- `/docs/llm/evals.md`
|
||||
- `/docs/llm/safety.md`
|
||||
---
|
||||
|
||||
This document describes retrieval‑augmented generation (RAG) and embeddings.
|
||||
Use it only if your archetype needs external knowledge or similarity search.
|
||||
|
||||
## 1. When to Use RAG
|
||||
- You need grounded answers from a knowledge base.
|
||||
- Inputs are large or dynamic (docs, tickets, policies).
|
||||
- You want controllable citations/explainability.
|
||||
|
||||
Do **not** use RAG when:
|
||||
- the task is purely generative with no grounding,
|
||||
- retrieval latency/cost outweighs benefit.
|
||||
|
||||
## 2. Data Sources
|
||||
- Curated docs, user‑uploaded files, internal DB records, external APIs.
|
||||
- Mark each source as trusted/untrusted and apply safety rules.
|
||||
|
||||
## 3. Chunking & Indexing
|
||||
- Define chunk size/overlap per domain.
|
||||
- Store embeddings in a vector index (e.g., `pgvector`, managed vector DB).
|
||||
- Keep an embedding model/version field to support migrations.
|
||||
|
||||
## 4. Retrieval Strategy
|
||||
- Default: semantic search top‑k + optional filters (tenant, type, recency).
|
||||
- Re‑rank if quality requires it.
|
||||
- Always include retrieved doc IDs in `reasoning_trace` (not raw text).
|
||||
|
||||
## 5. RAG Prompting Pattern
|
||||
- Provide retrieved snippets in a clearly delimited block.
|
||||
- Instruct model to answer **only** using retrieved context when grounding is required.
|
||||
- If context is insufficient → ask for clarification or defer.
|
||||
|
||||
## 6. Evaluating Retrieval
|
||||
- Measure recall/precision of retrieval separately from generation quality.
|
||||
- Add “no‑answer” test cases to avoid hallucinations.
|
||||
|
||||
## 7. Privacy & Multi‑Tenancy
|
||||
- Tenant‑scoped indexes or strict filters.
|
||||
- Never cross‑tenant retrieve.
|
||||
- Redact PII before embedding if embeddings can be exposed or logged.
|
||||
|
||||
Reference in New Issue
Block a user