Add foundational documentation templates to support product design and architecture planning, including ADR, archetypes, LLM systems, dev setup, and shared modules.

2025-12-12 02:31:03 +02:00
parent 5053235e95
commit c905cbb725
26 changed files with 759 additions and 65 deletions
--- a/docs/llm/rag-embeddings.md
+++ b/docs/llm/rag-embeddings.md
@@ -0,0 +1,53 @@
+# LLM System: RAG & Embeddings (Starter Template)
+
+---
+**Last Updated:** 2025-12-12  
+**Phase:** Phase 0 (Planning)  
+**Status:** Draft — finalize in Phase 1  
+**Owner:** AI/LLM Lead + Backend Architect  
+**References:**
+- `/docs/backend/architecture.md`
+- `/docs/llm/evals.md`
+- `/docs/llm/safety.md`
+---
+
+This document describes retrieval‑augmented generation (RAG) and embeddings.  
+Use it only if your archetype needs external knowledge or similarity search.
+
+## 1. When to Use RAG
+- You need grounded answers from a knowledge base.
+- Inputs are large or dynamic (docs, tickets, policies).
+- You want controllable citations/explainability.
+
+Do **not** use RAG when:
+- the task is purely generative with no grounding,
+- retrieval latency/cost outweighs benefit.
+
+## 2. Data Sources
+- Curated docs, user‑uploaded files, internal DB records, external APIs.
+- Mark each source as trusted/untrusted and apply safety rules.
+
+## 3. Chunking & Indexing
+- Define chunk size/overlap per domain.
+- Store embeddings in a vector index (e.g., `pgvector`, managed vector DB).
+- Keep an embedding model/version field to support migrations.
+
+## 4. Retrieval Strategy
+- Default: semantic search top‑k + optional filters (tenant, type, recency).
+- Re‑rank if quality requires it.
+- Always include retrieved doc IDs in `reasoning_trace` (not raw text).
+
+## 5. RAG Prompting Pattern
+- Provide retrieved snippets in a clearly delimited block.
+- Instruct model to answer **only** using retrieved context when grounding is required.
+- If context is insufficient → ask for clarification or defer.
+
+## 6. Evaluating Retrieval
+- Measure recall/precision of retrieval separately from generation quality.
+- Add “no‑answer” test cases to avoid hallucinations.
+
+## 7. Privacy & Multi‑Tenancy
+- Tenant‑scoped indexes or strict filters.
+- Never cross‑tenant retrieve.
+- Redact PII before embedding if embeddings can be exposed or logged.
+