# LLM System: Prompting (Starter Template) --- **Last Updated:** 2025-12-12 **Phase:** Phase 0 (Planning) **Status:** Draft — finalize in Phase 1 **Owner:** AI/LLM Lead **References:** - `/docs/archetypes.md` - `/docs/llm/safety.md` - `/docs/llm/evals.md` --- This document defines how prompts are designed, versioned, and executed. It is **archetype‑agnostic**: adapt the “interaction surface” (chat, workflow generation, pipeline classification, agentic tasks) to your product. ## 1. Goals - Produce **consistent, auditable outputs** across models/providers. - Make prompt changes **safe and reversible** (versioning + evals). - Keep sensitive data out of prompts unless strictly required (see safety). ## 2. Single LLM Entry Point All LLM calls go through one abstraction (e.g., `callLLM()` / “LLM Gateway”): - Centralizes model selection, temperature/top_p defaults, retries, timeouts. - Applies redaction and policy checks before sending prompts. - Emits structured logs + trace IDs to `EventLog`. - Enforces output schema validation. > Lock the exact interface and defaults in Phase 1. ## 3. Prompt Types Define prompt families that match your archetype: - **Chat‑first:** system prompt + conversation memory + optional retrieval context. - **Generation/workflow:** task prompt + constraints + examples + output schema. - **Classification/pipeline:** short instruction + label set + few‑shot examples + JSON output. - **Agentic automation:** planner prompt + tool policy + step budget + “stop/ask‑human” rules. ## 4. Prompt Structure (recommended) Use a predictable layout for every prompt: 1. **System / role:** who the model is, high‑level mission. 2. **Safety & constraints:** what not to do, privacy rules, refusal triggers. 3. **Task spec:** exact objective and success criteria. 4. **Context:** domain data, retrieved snippets, tool outputs (clearly delimited). 5. **Few‑shot examples:** 1–3 archetype‑relevant pairs. 6. **Output schema:** strict JSON/XML/markdown template. ### Example skeleton ```text [SYSTEM] You are ... [CONSTRAINTS] - Never ... - If unsure, respond with ... [TASK] Given input X, produce Y. [CONTEXT] ... [EXAMPLES] Input: ... Output: ... [OUTPUT_SCHEMA] { "label": "...", "confidence": 0..1, "reasoning_trace": {...} } ``` ## 5. Prompt Versioning - Store prompts in a dedicated location (e.g., `prompts/` folder or DB table). - **Semantic versioning**: `prompt_name@major.minor.patch`. - **major:** behavior change or schema change. - **minor:** quality improvement (new examples, clearer instruction). - **patch:** typos / no behavior change. - Every version is linked to: - model/provider version, - eval suite run, - changelog entry. ## 6. Output Schemas & Validation - Prefer **strict JSON** for machine‑consumed outputs. - Validate outputs server‑side: - required fields present, - types/enum values correct, - confidence in range, - no disallowed keys (PII, secrets). - If validation fails: retry with “fix‑format” prompt or fallback to safe default. ## 7. Context Management - Separate **trusted** vs **untrusted** context: - Untrusted: user input, webhook payloads, retrieved docs. - Trusted: system instructions, tool policies, fixed label sets. - Delimit untrusted context explicitly to reduce prompt injection risk. - Keep context minimal; avoid leaking irrelevant tenant/user data. ## 8. Memory (if applicable) For chat/agentic archetypes: - Short‑term memory: last N turns. - Long‑term memory: curated summaries or embeddings with strict privacy rules. - Never store raw PII in memory unless required and approved. ## 9. Open Questions to Lock in Phase 1 - Which models/providers are supported at launch? - Default parameters and retry/backoff policy? - Where prompts live (repo vs DB) and who can change them? - How schema validation + fallback works per archetype?