# Backend: Architecture (Recommendations) --- **Last Updated:** 2025-01-17 **Phase:** Phase 0 (Planning) **Status:** Draft — finalize in Phase 1 **Owner:** Backend Architect **References:** - `/docs/project-overview.md` - `/docs/backend/api-design.md` - `/docs/backend/security.md` - `/docs/backend/payment-flow.md` --- > Recommendations for Phase 0. Lock decisions in Phase 1. > After Phase 1, this file is the **canonical record of locked backend architecture decisions**. > Keep exploratory notes in a separate `*_PLAN.md` (if you use one) and archive/delete it after Phase 1. > The module list below reflects a pipeline/classification archetype. Keep/rename/omit modules per `/docs/archetypes.md`. ## 1. Approach & Stack - Style: modular monolith with clear modules; containerized. - Language/Runtime: Node.js (LTS) + TypeScript. - Framework: Express or Fastify with modular structure and DI where helpful. - DB: Postgres (managed: Supabase/RDS). Vector: `pgvector` for embeddings. - Queue: BullMQ (Redis) for ingestion, categorization, notifications. - Auth: Clerk/Auth.js (or equivalent) with tenant-aware RBAC. - Payments: provider-hosted subscriptions; no raw card data stored. - LLM: OpenAI API (or equivalent) via single helper (e.g., `callLLM()`), configurable model/params. ## 2. Modules (logical) - `auth` — tenants, users, roles, sessions. - `integrations` — external connectors, OAuth2, webhooks, connection health. - `records` — normalized record store, statuses, `reasoning_trace` JSONB. - `rules` — rule definitions, evaluation order, testing, hit stats. - `processing` — pipeline: rule engine → embeddings similarity → LLM fallback; writes `PROCESSED`, updates records. - `approvals` — queues for human review, overrides, optional rule creation; logs `TX_APPROVED`/`RULE_CREATED` with `source_agent`. - `reports` — dashboards/exports, history. - `billing` — provider checkout/portal, webhooks, plan enforcement per tenant. - `events` — audit/event log (`EventLog`), read-only `/api/events` for downstream agents. - `files/receipts` — attachment storage metadata (`Receipt` with file URL/mime). ## 3. API Layers - HTTP API (REST) with versioning (`/api/v1`). - Service layer for business logic and database transactions. - Repositories for data access; use migrations for schema evolution. ## 4. Infrastructure & Ops - Environments: dev/stage/prod; Docker images; CI/CD. - Observability: structured logging, metrics, tracing; dead-letter queues for failed jobs. - Secrets management per environment; rotate webhook/LLM/payment provider secrets. ## 5. Data & Schema Notes - Records: store raw payload + normalized fields + `reasoning_trace` JSONB (model, rationale, confidence, source). - EventLog: include `source_agent` (default `balance`) and payload for auditability; ensure filters by tenant/time/type. - Embeddings: table keyed by record text fields (or other domain signals) to support similarity search; index with `pgvector`. - Multi-tenant: all core tables carry `tenantId` and enforce scoped queries; `User` role per tenant. ## 6. Payment & Messaging (High-Level) - Payment provider: initiate sessions via backend; handle webhooks idempotently; map provider status to internal billing/subscription states; update tenant access. - Notifications: optional email/webhook callbacks to surface ingestion/categorization failures; keep out of PII exposure. ## 7. Queues (BullMQ) - `records:ingest` — normalize webhook payloads, write `Record`, emit `INGESTED`. - `records:process` — rule engine → embeddings similarity → LLM fallback; emit `PROCESSED` with `reasoning_trace`. - `reports:generate` — build domain reports/exports, emit `REPORT_GENERATED`. - Dead-letter queues per stream; retries with backoff; idempotent handlers keyed by external event IDs.