Add foundational documentation templates to support product design and architecture planning, including ADR, archetypes, LLM systems, dev setup, and shared modules.

2025-12-12 02:31:03 +02:00
parent 5053235e95
commit c905cbb725
26 changed files with 759 additions and 65 deletions
--- a/docs/backend/api-design.md
+++ b/docs/backend/api-design.md
@@ -16,6 +16,8 @@
 - Tenant-scoped resources; role-based authorization.
 - Idempotent ingestion/webhook endpoints; trace IDs for debugging.

+> Resource set is archetype‑specific. Endpoints below are a **pipeline/classification example** — adapt for chat‑first, generation, or automation products.
+
 ## 2. Core Resources (high-level)
 - `/auth` — login, tenant context, token refresh.
 - `/tenants` — tenant profile, roles, invites.
--- a/docs/backend/architecture.md
+++ b/docs/backend/architecture.md
@@ -12,7 +12,10 @@
 - `/docs/backend/payment-flow.md`
 ---

-> Recommendations for Phase 0. Lock decisions in Phase 1.
+> Recommendations for Phase 0. Lock decisions in Phase 1.  
+> After Phase 1, this file is the **canonical record of locked backend architecture decisions**.  
+> Keep exploratory notes in a separate `*_PLAN.md` (if you use one) and archive/delete it after Phase 1.  
+> The module list below reflects a pipeline/classification archetype. Keep/rename/omit modules per `/docs/archetypes.md`.

 ## 1. Approach & Stack
 - Style: modular monolith with clear modules; containerized.
--- a/docs/backend/overview.md
+++ b/docs/backend/overview.md
@@ -12,22 +12,22 @@
 ---

 ## 1. Role of Backend
- Own business logic for ingestion, processing/classification (rules + embeddings + LLM fallback), approvals, reporting, billing, and audit.
+- Own business logic for integrations, AI capability (chat/generation/pipelines/automation), optional human feedback loops, reporting, billing, and audit.
 - Integrate safely with external providers (OAuth2/webhooks, payment provider, LLM provider) and expose consistent APIs + events.
- Enforce security: tenant isolation, RBAC, webhook verification, event/audit logging.
+- Enforce security appropriate to your archetype (single‑ or multi‑tenant), webhook verification, and event/audit logging.

 ## 2. Main Domain Areas
- **Auth & Tenants:** authentication/authorization, roles, tenant-scoped access.
- **Integrations:** external providers via OAuth2/webhooks; connection health.
- **Records:** normalized feeds, statuses (ingested, processed, needs_approval, approved, failed), `reasoning_trace` JSONB.
- **Rules & Processing:** rules engine, embeddings similarity, LLM fallback; logging with `source_agent`.
- **Approvals:** human-in-the-loop decisions, overrides, optional rule creation; audit trail.
- **Reports & Exports:** dashboards/summaries with export history.
- **Billing:** provider-hosted subscriptions, tenant-scoped access control, webhooks.
- **Events:** `/api/events` feed for downstream agents and internal observability.
+- **Auth & Tenancy (optional):** users, roles, tenant isolation if needed.
+- **Integrations / Ingestion (optional):** OAuth2/webhooks/files; connection health.
+- **Core AI Module:** chat, generation, classification, RAG, or agentic automation.
+- **Processing Pipeline (optional):** staged evaluation (rules/embeddings/LLM); `reasoning_trace` JSONB if used.
+- **Human Feedback Loop (optional):** approvals/edits/ratings/escalations; audit trail.
+- **Reporting & Exports (optional):** dashboards/summaries with history.
+- **Billing (optional):** provider-hosted subscriptions/usage, webhooks.
+- **Events / Audit:** `/api/events` feed for observability and downstream agents.

 ## 3. Integrations
 - **External data providers:** OAuth2 + webhooks; signatures/verification; idempotent writes via workers.
 - **Payment provider:** subscriptions, checkout/portal; webhooks for lifecycle events.
- **LLM provider:** OpenAI API via single helper; configurable model.
+- **LLM provider:** chosen LLM API via a single helper; configurable model/params.
 - **Queues:** BullMQ (Redis) for ingestion/categorization/notifications.
--- a/docs/backend/security.md
+++ b/docs/backend/security.md
@@ -40,6 +40,27 @@
 ## 6. LLM Safety
 - All LLM calls go through a single helper; centralize redaction, logging, and parameter control.
 - Strip/obfuscate sensitive fields before sending to LLM; log only references in traces.
+- Detailed LLM safety and `reasoning_trace` policy live in `/docs/llm/safety.md`.
+
+### 6.1 AI‑Specific Threats & Controls (summary)
+These apply to any archetype that uses LLMs or RAG.
+
+- **Prompt injection / jailbreak**
+  - Treat all user input and retrieved content as **untrusted**.
+  - Delimit untrusted blocks explicitly and never allow them to override system constraints.
+  - Detect injection patterns; on suspicion → refuse or route to human review.
+- **Outbound‑data policy**
+  - Use **allowlists** for what may be sent to the model.
+  - Mandatory redaction pipeline before every LLM call (PII/PHI/PCI/secrets).
+  - Never send cross‑tenant data; never send raw billing/auth secrets.
+- **Output validation**
+  - Validate model outputs against strict schemas (types, enums, bounds).
+  - Reject/repair invalid outputs; fall back to safe defaults or human checkpoints for high‑risk actions.
+  - For agentic tools: validate tool arguments and enforce per‑tool scopes.
+- **Trusted vs untrusted context (RAG)**
+  - Retrieved documents are untrusted unless curated.
+  - Keep retrieval tenant‑scoped; record only doc IDs in traces.
+  - If grounding is required and context is insufficient → ask user or defer.

 ## 7. Audit & Events
 - Log domain events to `EventLog` with `source_agent`; include user ID, tenant, timestamps, and relevant context.