Add foundational documentation templates to support product design and architecture planning, including ADR, archetypes, LLM systems, dev setup, and shared modules.

2025-12-12 02:31:03 +02:00
parent 5053235e95
commit c905cbb725
26 changed files with 759 additions and 65 deletions
--- a/docs/llm/safety.md
+++ b/docs/llm/safety.md
@@ -0,0 +1,86 @@
+# LLM System: Safety, Privacy & Reasoning Traces (Starter Template)
+
+---
+**Last Updated:** 2025-12-12  
+**Phase:** Phase 0 (Planning)  
+**Status:** Draft — finalize in Phase 1  
+**Owner:** Security + AI/LLM Lead  
+**References:**
+- `/docs/backend/security.md`
+- `/docs/llm/prompting.md`
+---
+
+This document defines the safety posture for any LLM‑backed feature: privacy, injection defenses, tool safety, and what you log.
+
+## 1. Safety Goals
+- Prevent leakage of PII/tenant secrets to LLMs, logs, or UI.
+- Resist prompt injection and untrusted context manipulation.
+- Ensure outputs are safe to act on (validated, bounded, auditable).
+
+## 2. Data Classification & Handling
+Define categories for your domain:
+- **Public:** safe to send and store.
+- **Internal:** safe to send only if necessary; store minimally.
+- **Sensitive (PII/PHI/PCI/Secrets):** never send unless explicitly approved; never store in traces.
+
+## 3. Redaction Pipeline (before LLM)
+Apply a mandatory pre‑processing step in `callLLM()`:
+1. Detect sensitive fields (allowlist what *can* be sent, not what can’t).
+2. Redact or hash PII (names, emails, phone, addresses, IDs, card data).
+3. Replace with stable placeholders: `{{USER_EMAIL_HASH}}`.
+4. Attach a “redaction summary” to logs (no raw PII).
+
+## 4. Prompt Injection & Untrusted Context
+- Delimit untrusted input (`<untrusted_input>...</untrusted_input>`).
+- Never allow untrusted text to override system constraints.
+- For RAG: treat retrieved docs as untrusted unless curated.
+- If injection detected → refuse or ask for human review.
+
+## 5. Tool / Agent Safety (if applicable)
+- Tool allowlist with scopes and rate limits.
+- Confirm destructive actions with humans (“human checkpoint”).
+- Constrain tool outputs length and validate before reuse.
+
+## 6. `reasoning_trace` Specification
+`reasoning_trace` is **optional** and should be safe to show to humans.  
+Store only **structured, privacy‑safe metadata**, never raw prompts or user PII.
+
+### Allowed fields (example)
+```json
+{
+  "prompt_version": "classify@1.2.0",
+  "model": "provider:model",
+  "inputs": { "redacted": true, "source_ids": ["..."] },
+  "steps": [
+    { "type": "rule_hit", "rule_id": "r_123", "confidence": 0.72 },
+    { "type": "retrieval", "top_k": 5, "doc_ids": ["d1","d2"] },
+    { "type": "llm_call", "confidence": 0.64 }
+  ],
+  "output": { "label": "X", "confidence": 0.64 },
+  "trace_id": "..."
+}
+```
+
+### Explicitly disallowed in traces
+- Raw user input, webhook payloads, or document text.
+- Emails, phone numbers, addresses, names, gov IDs.
+- Payment data, auth tokens, API keys, secrets.
+- Full prompts or full LLM responses (store refs or summaries only).
+
+### How we guarantee “no PII” in traces
+1. **Schema allowlist:** trace is validated against a strict schema with only allowed keys.
+2. **Redaction required:** `callLLM()` sets `inputs.redacted=true` only after redaction succeeded.
+3. **PII linting:** server‑side scan of trace JSON for patterns (emails, phones, IDs) before storing.
+4. **UI gating:** only safe fields are rendered; raw text never shown from trace.
+5. **Audits:** periodic sampling in Phase 3+ to verify zero leakage.
+
+## 7. Storage & Retention
+- Traces stored per tenant; encrypted at rest.
+- Retention window aligned with compliance needs.
+- Ability to disable traces globally or per tenant.
+
+## 8. Open Questions to Lock in Phase 1
+- Exact redaction rules and allowlist fields.
+- Whether to store any raw LLM outputs outside traces (audit vault).
+- Who can access traces in UI and API.
+