Backend: Architecture (Recommendations)

Phase: Phase 0 (Planning)
Status: Draft — finalize in Phase 1
Owner: Backend Architect
References:

/docs/project-overview.md
/docs/backend/api-design.md
/docs/backend/security.md
/docs/backend/payment-flow.md

Recommendations for Phase 0. Lock decisions in Phase 1.
After Phase 1, this file is the canonical record of locked backend architecture decisions.
Keep exploratory notes in a separate *_PLAN.md (if you use one) and archive/delete it after Phase 1.
The module list below reflects a pipeline/classification archetype. Keep/rename/omit modules per /docs/archetypes.md.

1. Approach & Stack

Style: modular monolith with clear modules; containerized.
Language/Runtime: Node.js (LTS) + TypeScript.
Framework: Express or Fastify with modular structure and DI where helpful.
DB: Postgres (managed: Supabase/RDS). Vector: pgvector for embeddings.
Queue: BullMQ (Redis) for ingestion, categorization, notifications.
Auth: Clerk/Auth.js (or equivalent) with tenant-aware RBAC.
Payments: provider-hosted subscriptions; no raw card data stored.
LLM: OpenAI API (or equivalent) via single helper (e.g., callLLM()), configurable model/params.

2. Modules (logical)

auth — tenants, users, roles, sessions.
integrations — external connectors, OAuth2, webhooks, connection health.
records — normalized record store, statuses, reasoning_trace JSONB.
rules — rule definitions, evaluation order, testing, hit stats.
processing — pipeline: rule engine → embeddings similarity → LLM fallback; writes PROCESSED, updates records.
approvals — queues for human review, overrides, optional rule creation; logs TX_APPROVED/RULE_CREATED with source_agent.
reports — dashboards/exports, history.
billing — provider checkout/portal, webhooks, plan enforcement per tenant.
events — audit/event log (EventLog), read-only /api/events for downstream agents.
files/receipts — attachment storage metadata (Receipt with file URL/mime).

3. API Layers

HTTP API (REST) with versioning (/api/v1).
Service layer for business logic and database transactions.
Repositories for data access; use migrations for schema evolution.

4. Infrastructure & Ops

Environments: dev/stage/prod; Docker images; CI/CD.
Observability: structured logging, metrics, tracing; dead-letter queues for failed jobs.
Secrets management per environment; rotate webhook/LLM/payment provider secrets.

5. Data & Schema Notes

Records: store raw payload + normalized fields + reasoning_trace JSONB (model, rationale, confidence, source).
EventLog: include source_agent (default balance) and payload for auditability; ensure filters by tenant/time/type.
Embeddings: table keyed by record text fields (or other domain signals) to support similarity search; index with pgvector.
Multi-tenant: all core tables carry tenantId and enforce scoped queries; User role per tenant.

6. Payment & Messaging (High-Level)

Payment provider: initiate sessions via backend; handle webhooks idempotently; map provider status to internal billing/subscription states; update tenant access.
Notifications: optional email/webhook callbacks to surface ingestion/categorization failures; keep out of PII exposure.

7. Queues (BullMQ)

records:ingest — normalize webhook payloads, write Record, emit INGESTED.
records:process — rule engine → embeddings similarity → LLM fallback; emit PROCESSED with reasoning_trace.
reports:generate — build domain reports/exports, emit REPORT_GENERATED.
Dead-letter queues per stream; retries with backoff; idempotent handlers keyed by external event IDs.

3.7 KiB Raw Blame History

Backend: Architecture (Recommendations)

1. Approach & Stack

2. Modules (logical)

3. API Layers

4. Infrastructure & Ops

5. Data & Schema Notes

6. Payment & Messaging (High-Level)

7. Queues (BullMQ)

3.7 KiB

Raw Blame History