Add foundational documentation templates to support product design and architecture planning, including ADR, archetypes, LLM systems, dev setup, and shared modules.

2025-12-12 02:31:03 +02:00
parent 5053235e95
commit c905cbb725
26 changed files with 759 additions and 65 deletions
--- a/docs/llm/prompting.md
+++ b/docs/llm/prompting.md
@@ -0,0 +1,110 @@
+# LLM System: Prompting (Starter Template)
+
+---
+**Last Updated:** 2025-12-12  
+**Phase:** Phase 0 (Planning)  
+**Status:** Draft — finalize in Phase 1  
+**Owner:** AI/LLM Lead  
+**References:**
+- `/docs/archetypes.md`
+- `/docs/llm/safety.md`
+- `/docs/llm/evals.md`
+---
+
+This document defines how prompts are designed, versioned, and executed.  
+It is **archetype‑agnostic**: adapt the “interaction surface” (chat, workflow generation, pipeline classification, agentic tasks) to your product.
+
+## 1. Goals
+- Produce **consistent, auditable outputs** across models/providers.
+- Make prompt changes **safe and reversible** (versioning + evals).
+- Keep sensitive data out of prompts unless strictly required (see safety).
+
+## 2. Single LLM Entry Point
+All LLM calls go through one abstraction (e.g., `callLLM()` / “LLM Gateway”):
+- Centralizes model selection, temperature/top_p defaults, retries, timeouts.
+- Applies redaction and policy checks before sending prompts.
+- Emits structured logs + trace IDs to `EventLog`.
+- Enforces output schema validation.
+
+> Lock the exact interface and defaults in Phase 1.
+
+## 3. Prompt Types
+Define prompt families that match your archetype:
+- **Chat‑first:** system prompt + conversation memory + optional retrieval context.
+- **Generation/workflow:** task prompt + constraints + examples + output schema.
+- **Classification/pipeline:** short instruction + label set + few‑shot examples + JSON output.
+- **Agentic automation:** planner prompt + tool policy + step budget + “stop/ask‑human” rules.
+
+## 4. Prompt Structure (recommended)
+Use a predictable layout for every prompt:
+1. **System / role:** who the model is, high‑level mission.
+2. **Safety & constraints:** what not to do, privacy rules, refusal triggers.
+3. **Task spec:** exact objective and success criteria.
+4. **Context:** domain data, retrieved snippets, tool outputs (clearly delimited).
+5. **Few‑shot examples:** 1–3 archetype‑relevant pairs.
+6. **Output schema:** strict JSON/XML/markdown template.
+
+### Example skeleton
+```text
+[SYSTEM]
+You are ...
+
+[CONSTRAINTS]
+- Never ...
+- If unsure, respond with ...
+
+[TASK]
+Given input X, produce Y.
+
+[CONTEXT]
+<untrusted_input>
+...
+</untrusted_input>
+
+[EXAMPLES]
+Input: ...
+Output: ...
+
+[OUTPUT_SCHEMA]
+{ "label": "...", "confidence": 0..1, "reasoning_trace": {...} }
+```
+
+## 5. Prompt Versioning
+- Store prompts in a dedicated location (e.g., `prompts/` folder or DB table).
+- **Semantic versioning**: `prompt_name@major.minor.patch`.
+  - **major:** behavior change or schema change.
+  - **minor:** quality improvement (new examples, clearer instruction).
+  - **patch:** typos / no behavior change.
+- Every version is linked to:
+  - model/provider version,
+  - eval suite run,
+  - changelog entry.
+
+## 6. Output Schemas & Validation
+- Prefer **strict JSON** for machine‑consumed outputs.
+- Validate outputs server‑side:
+  - required fields present,
+  - types/enum values correct,
+  - confidence in range,
+  - no disallowed keys (PII, secrets).
+- If validation fails: retry with “fix‑format” prompt or fallback to safe default.
+
+## 7. Context Management
+- Separate **trusted** vs **untrusted** context:
+  - Untrusted: user input, webhook payloads, retrieved docs.
+  - Trusted: system instructions, tool policies, fixed label sets.
+- Delimit untrusted context explicitly to reduce prompt injection risk.
+- Keep context minimal; avoid leaking irrelevant tenant/user data.
+
+## 8. Memory (if applicable)
+For chat/agentic archetypes:
+- Short‑term memory: last N turns.
+- Long‑term memory: curated summaries or embeddings with strict privacy rules.
+- Never store raw PII in memory unless required and approved.
+
+## 9. Open Questions to Lock in Phase 1
+- Which models/providers are supported at launch?
+- Default parameters and retry/backoff policy?
+- Where prompts live (repo vs DB) and who can change them?
+- How schema validation + fallback works per archetype?
+