Add foundational documentation templates to support product design and architecture planning, including ADR, archetypes, LLM systems, dev setup, and shared modules.
This commit is contained in:
54
docs/llm/caching-costs.md
Normal file
54
docs/llm/caching-costs.md
Normal file
@@ -0,0 +1,54 @@
|
||||
# LLM System: Caching & Cost Control (Starter Template)
|
||||
|
||||
---
|
||||
**Last Updated:** 2025-12-12
|
||||
**Phase:** Phase 0 (Planning)
|
||||
**Status:** Draft — finalize in Phase 1
|
||||
**Owner:** AI/LLM Lead + Backend Architect
|
||||
**References:**
|
||||
- `/docs/llm/prompting.md`
|
||||
- `/docs/llm/evals.md`
|
||||
---
|
||||
|
||||
This document defines how to keep LLM usage reliable and within budget.
|
||||
|
||||
## 1. Goals
|
||||
- Minimize cost while preserving quality.
|
||||
- Keep latency predictable for user flows.
|
||||
- Avoid repeated work (idempotency + caching).
|
||||
|
||||
## 2. Budgets & Limits
|
||||
Define per tenant and per feature:
|
||||
- monthly token/cost cap,
|
||||
- per‑request max tokens,
|
||||
- max retries/timeouts,
|
||||
- concurrency limits.
|
||||
|
||||
## 3. Caching Layers
|
||||
Pick what applies:
|
||||
1. **Input normalization cache**
|
||||
- canonicalize inputs (trim, stable ordering) to increase hit rate.
|
||||
2. **LLM response cache**
|
||||
- key: `(prompt_version, model, canonical_input_hash, retrieval_config_hash)`.
|
||||
- TTL depends on volatility of the task.
|
||||
3. **Embeddings cache**
|
||||
- store embeddings for reusable texts/items.
|
||||
4. **RAG retrieval cache**
|
||||
- cache top‑k doc IDs for stable queries.
|
||||
|
||||
> Never cache raw PII; cache keys use hashes of redacted inputs.
|
||||
|
||||
## 4. Cost Controls
|
||||
- Prefer cheaper models for low‑risk tasks; escalate to stronger models only when needed.
|
||||
- Use staged pipelines (rules/heuristics/RAG) to reduce LLM calls.
|
||||
- Batch non‑interactive jobs (classification, report gen).
|
||||
- Track tokens in/out per request and per tenant.
|
||||
|
||||
## 5. Fallbacks
|
||||
- On timeouts/errors: retry with backoff, then fallback to safe default or human review.
|
||||
- On budget exhaustion: degrade gracefully (limited features, queue jobs, ask user).
|
||||
|
||||
## 6. Monitoring
|
||||
- Dashboards for cost, latency, cache hit rate, retry rate.
|
||||
- Alerts for spikes, anomaly tenants, or runaway loops.
|
||||
|
||||
73
docs/llm/evals.md
Normal file
73
docs/llm/evals.md
Normal file
@@ -0,0 +1,73 @@
|
||||
# LLM System: Evals & Quality (Starter Template)
|
||||
|
||||
---
|
||||
**Last Updated:** 2025-12-12
|
||||
**Phase:** Phase 0 (Planning)
|
||||
**Status:** Draft — finalize in Phase 1
|
||||
**Owner:** AI/LLM Lead + Test Engineer
|
||||
**References:**
|
||||
- `/docs/llm/prompting.md`
|
||||
- `/docs/llm/safety.md`
|
||||
---
|
||||
|
||||
This document defines how you measure LLM quality and prevent regressions.
|
||||
|
||||
## 1. Goals
|
||||
- Detect prompt/model regressions before production.
|
||||
- Track accuracy, safety, latency, and cost over time.
|
||||
- Provide a repeatable path for improving prompts and RAG.
|
||||
|
||||
## 2. Eval Suite Types
|
||||
Mix 3 layers depending on archetype:
|
||||
1. **Unit evals (offline, deterministic)**
|
||||
- Small golden set, strict expected outputs.
|
||||
2. **Integration evals (offline, realistic)**
|
||||
- Full pipeline including retrieval, tools, and post‑processing.
|
||||
3. **Online evals (production, controlled)**
|
||||
- Shadow runs, A/B, canary prompts, RUM‑style metrics.
|
||||
|
||||
## 3. Datasets
|
||||
- Maintain **versioned eval datasets** with:
|
||||
- input,
|
||||
- expected output or rubric,
|
||||
- metadata (domain, difficulty, edge cases).
|
||||
- Include adversarial cases:
|
||||
- prompt injection,
|
||||
- ambiguous queries,
|
||||
- long/noisy inputs,
|
||||
- PII‑rich inputs (to test redaction).
|
||||
|
||||
## 4. Metrics (suggested)
|
||||
Choose per archetype:
|
||||
- **Task quality:** accuracy/F1, exact‑match, rubric score, human preference rate.
|
||||
- **Safety:** refusal correctness, policy violations, PII leakage rate.
|
||||
- **Robustness:** format‑valid rate, tool‑call correctness, retry rate.
|
||||
- **Performance:** p50/p95 latency, tokens in/out, cost per task.
|
||||
|
||||
## 5. Regression Policy
|
||||
- Every prompt or model change must run evals.
|
||||
- Define gates:
|
||||
- no safety regressions,
|
||||
- quality must improve or stay within tolerance,
|
||||
- latency/cost budgets respected.
|
||||
- If a gate fails: block rollout or require explicit override in `RECOMMENDATIONS.md`.
|
||||
|
||||
## 6. Human Review Loop
|
||||
- For tasks without ground truth, use rubric‑based human grading.
|
||||
- Sample strategy:
|
||||
- new prompt versions → 100% review on small batch,
|
||||
- stable versions → periodic audits.
|
||||
|
||||
## 7. Logging for Evals
|
||||
- Store eval runs with:
|
||||
- prompt version,
|
||||
- model/provider version,
|
||||
- retrieval config version (if used),
|
||||
- inputs/outputs,
|
||||
- metrics + artifacts.
|
||||
|
||||
## 8. Open Questions to Lock in Phase 1
|
||||
- Where datasets live (repo vs storage)?
|
||||
- Which metrics are hard gates for MVP?
|
||||
- Online eval strategy (shadow vs A/B) and sample sizes?
|
||||
|
||||
110
docs/llm/prompting.md
Normal file
110
docs/llm/prompting.md
Normal file
@@ -0,0 +1,110 @@
|
||||
# LLM System: Prompting (Starter Template)
|
||||
|
||||
---
|
||||
**Last Updated:** 2025-12-12
|
||||
**Phase:** Phase 0 (Planning)
|
||||
**Status:** Draft — finalize in Phase 1
|
||||
**Owner:** AI/LLM Lead
|
||||
**References:**
|
||||
- `/docs/archetypes.md`
|
||||
- `/docs/llm/safety.md`
|
||||
- `/docs/llm/evals.md`
|
||||
---
|
||||
|
||||
This document defines how prompts are designed, versioned, and executed.
|
||||
It is **archetype‑agnostic**: adapt the “interaction surface” (chat, workflow generation, pipeline classification, agentic tasks) to your product.
|
||||
|
||||
## 1. Goals
|
||||
- Produce **consistent, auditable outputs** across models/providers.
|
||||
- Make prompt changes **safe and reversible** (versioning + evals).
|
||||
- Keep sensitive data out of prompts unless strictly required (see safety).
|
||||
|
||||
## 2. Single LLM Entry Point
|
||||
All LLM calls go through one abstraction (e.g., `callLLM()` / “LLM Gateway”):
|
||||
- Centralizes model selection, temperature/top_p defaults, retries, timeouts.
|
||||
- Applies redaction and policy checks before sending prompts.
|
||||
- Emits structured logs + trace IDs to `EventLog`.
|
||||
- Enforces output schema validation.
|
||||
|
||||
> Lock the exact interface and defaults in Phase 1.
|
||||
|
||||
## 3. Prompt Types
|
||||
Define prompt families that match your archetype:
|
||||
- **Chat‑first:** system prompt + conversation memory + optional retrieval context.
|
||||
- **Generation/workflow:** task prompt + constraints + examples + output schema.
|
||||
- **Classification/pipeline:** short instruction + label set + few‑shot examples + JSON output.
|
||||
- **Agentic automation:** planner prompt + tool policy + step budget + “stop/ask‑human” rules.
|
||||
|
||||
## 4. Prompt Structure (recommended)
|
||||
Use a predictable layout for every prompt:
|
||||
1. **System / role:** who the model is, high‑level mission.
|
||||
2. **Safety & constraints:** what not to do, privacy rules, refusal triggers.
|
||||
3. **Task spec:** exact objective and success criteria.
|
||||
4. **Context:** domain data, retrieved snippets, tool outputs (clearly delimited).
|
||||
5. **Few‑shot examples:** 1–3 archetype‑relevant pairs.
|
||||
6. **Output schema:** strict JSON/XML/markdown template.
|
||||
|
||||
### Example skeleton
|
||||
```text
|
||||
[SYSTEM]
|
||||
You are ...
|
||||
|
||||
[CONSTRAINTS]
|
||||
- Never ...
|
||||
- If unsure, respond with ...
|
||||
|
||||
[TASK]
|
||||
Given input X, produce Y.
|
||||
|
||||
[CONTEXT]
|
||||
<untrusted_input>
|
||||
...
|
||||
</untrusted_input>
|
||||
|
||||
[EXAMPLES]
|
||||
Input: ...
|
||||
Output: ...
|
||||
|
||||
[OUTPUT_SCHEMA]
|
||||
{ "label": "...", "confidence": 0..1, "reasoning_trace": {...} }
|
||||
```
|
||||
|
||||
## 5. Prompt Versioning
|
||||
- Store prompts in a dedicated location (e.g., `prompts/` folder or DB table).
|
||||
- **Semantic versioning**: `prompt_name@major.minor.patch`.
|
||||
- **major:** behavior change or schema change.
|
||||
- **minor:** quality improvement (new examples, clearer instruction).
|
||||
- **patch:** typos / no behavior change.
|
||||
- Every version is linked to:
|
||||
- model/provider version,
|
||||
- eval suite run,
|
||||
- changelog entry.
|
||||
|
||||
## 6. Output Schemas & Validation
|
||||
- Prefer **strict JSON** for machine‑consumed outputs.
|
||||
- Validate outputs server‑side:
|
||||
- required fields present,
|
||||
- types/enum values correct,
|
||||
- confidence in range,
|
||||
- no disallowed keys (PII, secrets).
|
||||
- If validation fails: retry with “fix‑format” prompt or fallback to safe default.
|
||||
|
||||
## 7. Context Management
|
||||
- Separate **trusted** vs **untrusted** context:
|
||||
- Untrusted: user input, webhook payloads, retrieved docs.
|
||||
- Trusted: system instructions, tool policies, fixed label sets.
|
||||
- Delimit untrusted context explicitly to reduce prompt injection risk.
|
||||
- Keep context minimal; avoid leaking irrelevant tenant/user data.
|
||||
|
||||
## 8. Memory (if applicable)
|
||||
For chat/agentic archetypes:
|
||||
- Short‑term memory: last N turns.
|
||||
- Long‑term memory: curated summaries or embeddings with strict privacy rules.
|
||||
- Never store raw PII in memory unless required and approved.
|
||||
|
||||
## 9. Open Questions to Lock in Phase 1
|
||||
- Which models/providers are supported at launch?
|
||||
- Default parameters and retry/backoff policy?
|
||||
- Where prompts live (repo vs DB) and who can change them?
|
||||
- How schema validation + fallback works per archetype?
|
||||
|
||||
53
docs/llm/rag-embeddings.md
Normal file
53
docs/llm/rag-embeddings.md
Normal file
@@ -0,0 +1,53 @@
|
||||
# LLM System: RAG & Embeddings (Starter Template)
|
||||
|
||||
---
|
||||
**Last Updated:** 2025-12-12
|
||||
**Phase:** Phase 0 (Planning)
|
||||
**Status:** Draft — finalize in Phase 1
|
||||
**Owner:** AI/LLM Lead + Backend Architect
|
||||
**References:**
|
||||
- `/docs/backend/architecture.md`
|
||||
- `/docs/llm/evals.md`
|
||||
- `/docs/llm/safety.md`
|
||||
---
|
||||
|
||||
This document describes retrieval‑augmented generation (RAG) and embeddings.
|
||||
Use it only if your archetype needs external knowledge or similarity search.
|
||||
|
||||
## 1. When to Use RAG
|
||||
- You need grounded answers from a knowledge base.
|
||||
- Inputs are large or dynamic (docs, tickets, policies).
|
||||
- You want controllable citations/explainability.
|
||||
|
||||
Do **not** use RAG when:
|
||||
- the task is purely generative with no grounding,
|
||||
- retrieval latency/cost outweighs benefit.
|
||||
|
||||
## 2. Data Sources
|
||||
- Curated docs, user‑uploaded files, internal DB records, external APIs.
|
||||
- Mark each source as trusted/untrusted and apply safety rules.
|
||||
|
||||
## 3. Chunking & Indexing
|
||||
- Define chunk size/overlap per domain.
|
||||
- Store embeddings in a vector index (e.g., `pgvector`, managed vector DB).
|
||||
- Keep an embedding model/version field to support migrations.
|
||||
|
||||
## 4. Retrieval Strategy
|
||||
- Default: semantic search top‑k + optional filters (tenant, type, recency).
|
||||
- Re‑rank if quality requires it.
|
||||
- Always include retrieved doc IDs in `reasoning_trace` (not raw text).
|
||||
|
||||
## 5. RAG Prompting Pattern
|
||||
- Provide retrieved snippets in a clearly delimited block.
|
||||
- Instruct model to answer **only** using retrieved context when grounding is required.
|
||||
- If context is insufficient → ask for clarification or defer.
|
||||
|
||||
## 6. Evaluating Retrieval
|
||||
- Measure recall/precision of retrieval separately from generation quality.
|
||||
- Add “no‑answer” test cases to avoid hallucinations.
|
||||
|
||||
## 7. Privacy & Multi‑Tenancy
|
||||
- Tenant‑scoped indexes or strict filters.
|
||||
- Never cross‑tenant retrieve.
|
||||
- Redact PII before embedding if embeddings can be exposed or logged.
|
||||
|
||||
86
docs/llm/safety.md
Normal file
86
docs/llm/safety.md
Normal file
@@ -0,0 +1,86 @@
|
||||
# LLM System: Safety, Privacy & Reasoning Traces (Starter Template)
|
||||
|
||||
---
|
||||
**Last Updated:** 2025-12-12
|
||||
**Phase:** Phase 0 (Planning)
|
||||
**Status:** Draft — finalize in Phase 1
|
||||
**Owner:** Security + AI/LLM Lead
|
||||
**References:**
|
||||
- `/docs/backend/security.md`
|
||||
- `/docs/llm/prompting.md`
|
||||
---
|
||||
|
||||
This document defines the safety posture for any LLM‑backed feature: privacy, injection defenses, tool safety, and what you log.
|
||||
|
||||
## 1. Safety Goals
|
||||
- Prevent leakage of PII/tenant secrets to LLMs, logs, or UI.
|
||||
- Resist prompt injection and untrusted context manipulation.
|
||||
- Ensure outputs are safe to act on (validated, bounded, auditable).
|
||||
|
||||
## 2. Data Classification & Handling
|
||||
Define categories for your domain:
|
||||
- **Public:** safe to send and store.
|
||||
- **Internal:** safe to send only if necessary; store minimally.
|
||||
- **Sensitive (PII/PHI/PCI/Secrets):** never send unless explicitly approved; never store in traces.
|
||||
|
||||
## 3. Redaction Pipeline (before LLM)
|
||||
Apply a mandatory pre‑processing step in `callLLM()`:
|
||||
1. Detect sensitive fields (allowlist what *can* be sent, not what can’t).
|
||||
2. Redact or hash PII (names, emails, phone, addresses, IDs, card data).
|
||||
3. Replace with stable placeholders: `{{USER_EMAIL_HASH}}`.
|
||||
4. Attach a “redaction summary” to logs (no raw PII).
|
||||
|
||||
## 4. Prompt Injection & Untrusted Context
|
||||
- Delimit untrusted input (`<untrusted_input>...</untrusted_input>`).
|
||||
- Never allow untrusted text to override system constraints.
|
||||
- For RAG: treat retrieved docs as untrusted unless curated.
|
||||
- If injection detected → refuse or ask for human review.
|
||||
|
||||
## 5. Tool / Agent Safety (if applicable)
|
||||
- Tool allowlist with scopes and rate limits.
|
||||
- Confirm destructive actions with humans (“human checkpoint”).
|
||||
- Constrain tool outputs length and validate before reuse.
|
||||
|
||||
## 6. `reasoning_trace` Specification
|
||||
`reasoning_trace` is **optional** and should be safe to show to humans.
|
||||
Store only **structured, privacy‑safe metadata**, never raw prompts or user PII.
|
||||
|
||||
### Allowed fields (example)
|
||||
```json
|
||||
{
|
||||
"prompt_version": "classify@1.2.0",
|
||||
"model": "provider:model",
|
||||
"inputs": { "redacted": true, "source_ids": ["..."] },
|
||||
"steps": [
|
||||
{ "type": "rule_hit", "rule_id": "r_123", "confidence": 0.72 },
|
||||
{ "type": "retrieval", "top_k": 5, "doc_ids": ["d1","d2"] },
|
||||
{ "type": "llm_call", "confidence": 0.64 }
|
||||
],
|
||||
"output": { "label": "X", "confidence": 0.64 },
|
||||
"trace_id": "..."
|
||||
}
|
||||
```
|
||||
|
||||
### Explicitly disallowed in traces
|
||||
- Raw user input, webhook payloads, or document text.
|
||||
- Emails, phone numbers, addresses, names, gov IDs.
|
||||
- Payment data, auth tokens, API keys, secrets.
|
||||
- Full prompts or full LLM responses (store refs or summaries only).
|
||||
|
||||
### How we guarantee “no PII” in traces
|
||||
1. **Schema allowlist:** trace is validated against a strict schema with only allowed keys.
|
||||
2. **Redaction required:** `callLLM()` sets `inputs.redacted=true` only after redaction succeeded.
|
||||
3. **PII linting:** server‑side scan of trace JSON for patterns (emails, phones, IDs) before storing.
|
||||
4. **UI gating:** only safe fields are rendered; raw text never shown from trace.
|
||||
5. **Audits:** periodic sampling in Phase 3+ to verify zero leakage.
|
||||
|
||||
## 7. Storage & Retention
|
||||
- Traces stored per tenant; encrypted at rest.
|
||||
- Retention window aligned with compliance needs.
|
||||
- Ability to disable traces globally or per tenant.
|
||||
|
||||
## 8. Open Questions to Lock in Phase 1
|
||||
- Exact redaction rules and allowlist fields.
|
||||
- Whether to store any raw LLM outputs outside traces (audit vault).
|
||||
- Who can access traces in UI and API.
|
||||
|
||||
Reference in New Issue
Block a user