Add foundational documentation templates to support product design and architecture planning, including ADR, archetypes, LLM systems, dev setup, and shared modules.

This commit is contained in:
olekhondera
2025-12-12 02:31:03 +02:00
parent 5053235e95
commit c905cbb725
26 changed files with 759 additions and 65 deletions

View File

@@ -0,0 +1,53 @@
# LLM System: RAG & Embeddings (Starter Template)
---
**Last Updated:** 2025-12-12
**Phase:** Phase 0 (Planning)
**Status:** Draft — finalize in Phase 1
**Owner:** AI/LLM Lead + Backend Architect
**References:**
- `/docs/backend/architecture.md`
- `/docs/llm/evals.md`
- `/docs/llm/safety.md`
---
This document describes retrievalaugmented generation (RAG) and embeddings.
Use it only if your archetype needs external knowledge or similarity search.
## 1. When to Use RAG
- You need grounded answers from a knowledge base.
- Inputs are large or dynamic (docs, tickets, policies).
- You want controllable citations/explainability.
Do **not** use RAG when:
- the task is purely generative with no grounding,
- retrieval latency/cost outweighs benefit.
## 2. Data Sources
- Curated docs, useruploaded files, internal DB records, external APIs.
- Mark each source as trusted/untrusted and apply safety rules.
## 3. Chunking & Indexing
- Define chunk size/overlap per domain.
- Store embeddings in a vector index (e.g., `pgvector`, managed vector DB).
- Keep an embedding model/version field to support migrations.
## 4. Retrieval Strategy
- Default: semantic search topk + optional filters (tenant, type, recency).
- Rerank if quality requires it.
- Always include retrieved doc IDs in `reasoning_trace` (not raw text).
## 5. RAG Prompting Pattern
- Provide retrieved snippets in a clearly delimited block.
- Instruct model to answer **only** using retrieved context when grounding is required.
- If context is insufficient → ask for clarification or defer.
## 6. Evaluating Retrieval
- Measure recall/precision of retrieval separately from generation quality.
- Add “noanswer” test cases to avoid hallucinations.
## 7. Privacy & MultiTenancy
- Tenantscoped indexes or strict filters.
- Never crosstenant retrieve.
- Redact PII before embedding if embeddings can be exposed or logged.