# Backend: Architecture (Recommendations)

---
**Last Updated:** 2025-01-17  
**Phase:** Phase 0 (Planning)  
**Status:** Draft — finalize in Phase 1  
**Owner:** Backend Architect  
**References:**
- `/docs/project-overview.md`
- `/docs/backend/api-design.md`
- `/docs/backend/security.md`
- `/docs/backend/payment-flow.md`
---

> Recommendations for Phase 0. Lock decisions in Phase 1.  
> After Phase 1, this file is the **canonical record of locked backend architecture decisions**.  
> Keep exploratory notes in a separate `*_PLAN.md` (if you use one) and archive/delete it after Phase 1.  
> The module list below reflects a pipeline/classification archetype. Keep/rename/omit modules per `/docs/archetypes.md`.

## 1. Approach & Stack
- Style: modular monolith with clear modules; containerized.
- Language/Runtime: Node.js (LTS) + TypeScript.
- Framework: Express or Fastify with modular structure and DI where helpful.
- DB: Postgres (managed: Supabase/RDS). Vector: `pgvector` for embeddings.
- Queue: BullMQ (Redis) for ingestion, categorization, notifications.
- Auth: Clerk/Auth.js (or equivalent) with tenant-aware RBAC.
- Payments: provider-hosted subscriptions; no raw card data stored.
- LLM: OpenAI API (or equivalent) via single helper (e.g., `callLLM()`), configurable model/params.

## 2. Modules (logical)
- `auth` — tenants, users, roles, sessions.
- `integrations` — external connectors, OAuth2, webhooks, connection health.
- `records` — normalized record store, statuses, `reasoning_trace` JSONB.
- `rules` — rule definitions, evaluation order, testing, hit stats.
- `processing` — pipeline: rule engine → embeddings similarity → LLM fallback; writes `PROCESSED`, updates records.
- `approvals` — queues for human review, overrides, optional rule creation; logs `TX_APPROVED`/`RULE_CREATED` with `source_agent`.
- `reports` — dashboards/exports, history.
- `billing` — provider checkout/portal, webhooks, plan enforcement per tenant.
- `events` — audit/event log (`EventLog`), read-only `/api/events` for downstream agents.
- `files/receipts` — attachment storage metadata (`Receipt` with file URL/mime).

## 3. API Layers
- HTTP API (REST) with versioning (`/api/v1`).
- Service layer for business logic and database transactions.
- Repositories for data access; use migrations for schema evolution.

## 4. Infrastructure & Ops
- Environments: dev/stage/prod; Docker images; CI/CD.
- Observability: structured logging, metrics, tracing; dead-letter queues for failed jobs.
- Secrets management per environment; rotate webhook/LLM/payment provider secrets.

## 5. Data & Schema Notes
- Records: store raw payload + normalized fields + `reasoning_trace` JSONB (model, rationale, confidence, source).
- EventLog: include `source_agent` (default `balance`) and payload for auditability; ensure filters by tenant/time/type.
- Embeddings: table keyed by record text fields (or other domain signals) to support similarity search; index with `pgvector`.
- Multi-tenant: all core tables carry `tenantId` and enforce scoped queries; `User` role per tenant.

## 6. Payment & Messaging (High-Level)
- Payment provider: initiate sessions via backend; handle webhooks idempotently; map provider status to internal billing/subscription states; update tenant access.
- Notifications: optional email/webhook callbacks to surface ingestion/categorization failures; keep out of PII exposure.

## 7. Queues (BullMQ)
- `records:ingest` — normalize webhook payloads, write `Record`, emit `INGESTED`.
- `records:process` — rule engine → embeddings similarity → LLM fallback; emit `PROCESSED` with `reasoning_trace`.
- `reports:generate` — build domain reports/exports, emit `REPORT_GENERATED`.
- Dead-letter queues per stream; retries with backoff; idempotent handlers keyed by external event IDs.