Skip to Content
Architecture

Bliss Finance — Architecture

High-Level Overview

Bliss Finance is a self-hostable financial dashboard built as a monorepo with three application services, one PostgreSQL database (with pgvector), and a Redis instance for job queues and caching.

Bliss Architecture

Browser (React SPA) | :8080 (nginx) | +---------------+---------------+ | | Next.js API (:3000) Express Backend (:3001) - Auth (JWT + cookies) - BullMQ workers (10) - REST endpoints - AI classification - Prisma ORM - Portfolio valuation - File upload - Plaid sync | | +----------- PostgreSQL ---------+ (pgvector) | Redis (queues + cache)

Key properties:

  • The Next.js API is the only service exposed to the browser. It handles authentication, serves REST endpoints, manages file uploads, and owns the Prisma ORM layer.
  • The Express Backend is an internal service. It runs long-lived BullMQ workers for async processing (AI classification, Plaid sync, portfolio valuation, analytics). It is never called directly by the browser.
  • PostgreSQL 16 stores all application data. The pgvector extension powers embedding-based similarity search for transaction classification.
  • Redis 7 acts as the message broker for BullMQ and provides ephemeral caching.

Monorepo Structure

bliss-finance-monorepo/ | +-- apps/ | +-- api/ Next.js Pages Router (ESM, "type": "module") | +-- backend/ Express + BullMQ (CJS, require()) | +-- web/ React SPA -- Vite, shadcn/ui, Tailwind CSS | +-- docs/ Nextra 4 documentation site (port 3002) | +-- packages/ | +-- shared/ Dual ESM/CJS package (built with tsup) | - encryption.js (AES-256-GCM helpers) | - storage adapter (local / GCS) | +-- prisma/ Single Prisma schema shared by api and backend | +-- schema.prisma | +-- migrations/ | +-- seed.js | +-- docker/ Dockerfiles for each service +-- docker-compose.yml Orchestrates all 5 containers +-- scripts/ Dev and deployment helper scripts +-- docs/ Canonical documentation (synced to apps/docs)

Module System Split

AppModule SystemWhy
apps/apiESMNext.js 13+ default; "type": "module"
apps/backendCJSBullMQ worker sandboxing requires require()
packages/sharedDualtsup builds both .mjs and .cjs outputs

The shared package exposes conditional exports so that apps/api resolves the ESM entry and apps/backend resolves the CJS entry, with no runtime import mismatch.


Service Communication

Browser —> API (apps/web —> apps/api)

React SPA Next.js API (Vite, :5173 dev) (:3000) | | +--- fetch / axios ----------->| | Cookie: jwt=... | | +--- withAuth middleware | | validates JWT |<-------- JSON response ------+ scopes by tenantId
  • The SPA calls the API via fetch or axios. Base URL is set by NEXT_PUBLIC_API_URL.
  • Authentication is handled by JWT tokens stored in httpOnly cookies (issued by NextAuth.js).
  • Every API route that requires auth wraps its handler with withAuth, which decodes the JWT, loads the User/Tenant from the database, and attaches them to req.user.

API —> Backend (apps/api —> apps/backend)

Next.js API Express Backend (:3000) (:3001) | | +--- POST /api/events -------->| Header: x-api-key | { type, payload } | | +--- eventSchedulerWorker | | routes event to queue | | +--- GET /api/similar ------->| Vector search proxy +--- POST /api/feedback ------>| Classification feedback
  • Internal HTTP calls from the API to the Backend.
  • Protected by a shared secret (INTERNAL_API_KEY) sent in the x-api-key header.
  • The primary pattern is event-based: the API posts a typed event to BACKEND_URL/api/events, and the eventSchedulerWorker routes it to the correct BullMQ queue.
  • A few routes are called directly for synchronous responses (e.g., /api/similar for vector search).

Backend Workers (Internal)

Workers consume jobs from BullMQ queues. They never receive direct HTTP traffic from the browser. Communication flow:

API posts event | v eventSchedulerWorker | +---> smartImportQueue ---> smartImportWorker +---> plaidSyncQueue ---> plaidSyncWorker +---> classifyQueue ---> classificationWorker +---> portfolioQueue ---> portfolioValuationWorker +---> analyticsQueue ---> analyticsWorker +---> insightsQueue ---> insightsWorker ...

Authentication Flow

1. User signs up / signs in (credentials or Google OAuth via NextAuth.js) | v 2. JWT issued containing: { userId, tenantId, email, role } | v 3. JWT set as httpOnly cookie (secure, sameSite: lax) | v 4. Every API request: withAuth middleware --> decode JWT --> load User --> attach req.user | v 5. All database queries scoped by req.user.tenantId (multi-tenant isolation)
  • NextAuth.js handles the OAuth/credentials flow and session management.
  • JWTs are short-lived. Refresh is handled by NextAuth session callbacks.
  • The withAuth higher-order function wraps API route handlers. It returns 401 if the token is missing or invalid.
  • Multi-tenancy is enforced at the query level: every Prisma where clause includes tenantId from the authenticated user.

Database (PostgreSQL 16 + pgvector)

Schema Overview

The database has 50+ migrations managed by Prisma. Key models:

Tenant |--- User (1:N) |--- Account (1:N) | |--- Transaction (1:N) | |--- Holding (1:N) | |--- Category (1:N, hierarchical via parentId) |--- Tag (1:N) |--- Budget (1:N) | |--- PlaidItem (1:N) | |--- PlaidSyncLog (1:N) | |--- StagedImport (1:N) | |--- StagedImportRow (1:N) | |--- TransactionEmbedding (1:N) |--- ImportAdapter (1:N)

pgvector

The TransactionEmbedding table stores 768-dimensional vectors generated by Gemini embedding-001. An IVFFlat index supports fast cosine similarity queries for the vector classification tier.

-- Simplified schema CREATE TABLE "TransactionEmbedding" ( id SERIAL PRIMARY KEY, "tenantId" INTEGER NOT NULL, description TEXT NOT NULL, embedding vector(768) NOT NULL, "categoryId" INTEGER, "transactionId" INTEGER UNIQUE, UNIQUE ("tenantId", description) ); CREATE INDEX ON "TransactionEmbedding" USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

Encryption at Rest

Sensitive fields are encrypted with AES-256-GCM before being written to the database. This is handled transparently by Prisma middleware:

ModelEncrypted Fields
Transactiondescription, details
AccountaccountNumber
PlaidItemaccessToken
  • Encryption uses @bliss/shared/encryption, which reads ENCRYPTION_SECRET from the environment.
  • Dual-key rotation is supported: if ENCRYPTION_SECRET_PREVIOUS is set, reads attempt decryption with the new key first, then fall back to the previous key. Writes always use the current key.
  • The Prisma middleware intercepts create, update, and findMany operations to encrypt/decrypt automatically. Application code never handles ciphertext directly.

AI Classification Pipeline (3-Tier Waterfall)

Every incoming transaction (Plaid or CSV import) is classified into a category using a three-tier waterfall. Each tier is progressively more expensive:

Transaction Description | [Tier 1: EXACT_MATCH] In-memory Map backed by DescriptionMapping table Keyed by SHA-256(normalize(description)), O(1) lookup | confidence >= autoPromoteThreshold? --YES--> classified | source: EXACT_MATCH NO | [Tier 2: VECTOR_MATCH] pgvector cosine similarity search Gemini embedding-001, 768 dimensions Top-1 result above reviewThreshold | similarity >= reviewThreshold? --YES--> classified | source: VECTOR_MATCH NO | [Tier 3: LLM] Gemini Flash with structured prompt Full transaction context + tenant categories | --> classified source: LLM

Feedback Loop

When a user overrides a classification (corrects a category), the system:

  1. Immediately updates the in-memory exact-match cache and the DescriptionMapping table (write-through via addDescriptionEntry()), so the next identical description is classified instantly.
  2. Asynchronously generates a new embedding via Gemini and upserts it into TransactionEmbedding (Tier 2), so similar descriptions benefit from the correction.

This creates a flywheel: the more the user corrects, the fewer LLM calls are needed, and classification accuracy improves over time.

Configurable Thresholds

Each tenant has two tunable thresholds on the Tenant model:

ThresholdDefaultPurpose
autoPromoteThreshold0.95Exact matches above this are auto-committed (no review needed)
reviewThreshold0.70Vector matches above this are accepted; below triggers LLM

Classification Sources

Every classified transaction records its classificationSource:

  • EXACT_MATCH — from the in-memory cache
  • VECTOR_MATCH — from pgvector similarity
  • LLM — from Gemini Flash
  • USER_OVERRIDE — manually set by the user

Storage Abstraction

File storage (CSV/XLSX uploads) uses a factory pattern:

createStorageAdapter(config) | +-- STORAGE_BACKEND=local --> LocalStorageAdapter | Files in LOCAL_STORAGE_DIR (default: ./data/uploads) | +-- STORAGE_BACKEND=gcs --> GCSStorageAdapter Files in GCS bucket (GCS_BUCKET_NAME)

Both adapters implement the same interface: upload(key, buffer), download(key), delete(key), exists(key).

The shared package (packages/shared) exports the factory and both adapters. File uploads use formidable for multipart parsing with bodyParser: false in the Next.js API config. Temp files are cleaned up after upload completes.


Queue System (Redis + BullMQ)

Architecture

Redis 7 | +-- BullMQ Queues (reliable, persistent) | smart-import | plaid-sync | plaid-processor | classification | portfolio-valuation | event-scheduler | analytics | import (legacy) | insights | plaid-webhook | +-- Cache (ephemeral) Description cache (in-memory, backed by DescriptionMapping table) Rate limit counters

Worker Details

WorkerQueueConcurrencyPurpose
smartImportWorkersmart-import1CSV parse, dedup, classify, stage rows
plaidSyncWorkerplaid-sync3Incremental transaction fetch from Plaid
plaidProcessorWorkerplaid-processor1Classify and persist Plaid transactions
classificationWorkerclassification53-tier AI classification pipeline
portfolioValuationWorkerportfolio-valuation1Fetch prices, calculate holdings P&L
eventSchedulerWorkerevent-scheduler3Route typed events to appropriate queues
analyticsWorkeranalytics1Compute and cache spending analytics
importWorkerimport1Legacy CSV import (kept for backward compat)
insightsWorkerinsights1Generate AI financial insights
plaidWebhookWorkerplaid-webhook3Process Plaid webhook payloads

Scheduled Jobs (Nightly Cron)

Three workers register BullMQ repeatable jobs that run on a nightly schedule:

JobWorkerCron (UTC)Purpose
refresh-all-fundamentalssecurityMasterWorker0 3 * * * (3 AM)Refresh stock prices, profiles, earnings, dividends
revalue-all-tenantsportfolioWorker0 4 * * * (4 AM)Enqueue per-tenant portfolio revaluation (investments, cash, debts)
generate-all-insightsinsightGeneratorWorker0 6 * * * (6 AM)Generate AI financial insights for all tenants

The schedule chain is intentional: fresh prices (3 AM) feed into revaluation (4 AM), which feeds into insights (6 AM). The portfolio revaluation ensures history has no gaps even when no transactions occur for days.

On-access fallback: The GET /api/portfolio/history endpoint also checks if the most recent history record is before today. If stale, it fires a PORTFOLIO_STALE_REVALUATION event to trigger revaluation for that tenant. This covers self-hosters where the nightly job may not be running reliably.

Queue Patterns

  • Singletons: Each queue is created once in src/queues/ and imported by the corresponding worker. This avoids duplicate Redis connections.
  • Retries: Jobs have configurable retry counts with exponential backoff.
  • TLS: Supports TLS connections to Redis in production. Set REDIS_SKIP_TLS_CHECK=true for local development without TLS.

Docker Architecture

Compose Services

services: postgres: # PostgreSQL 16 + pgvector extension redis: # Redis 7 api: # Next.js API (apps/api) backend: # Express workers (apps/backend) web: # nginx serving the React SPA (apps/web)

Startup Order

postgres ----+ |--- api (runs prisma migrate deploy + seed, then starts) redis -------+ |--- backend (connects to postgres + redis, starts workers) | +--- web (nginx, no dependencies beyond api being routable)

Key Configuration

  • Shared volume (uploads_data): Mounted in both api and backend so that uploaded files (written by the API during file upload) can be read by backend workers (during smart import processing).
  • Multi-stage Dockerfiles: Each service uses a multi-stage build to minimize final image size (install deps, build, copy only production artifacts).
  • nginx SPA routing: The web service serves static assets and falls back to index.html for client-side routing.

Multi-Tenancy

Isolation Model

Bliss uses query-level tenant isolation (shared database, shared schema):

Request arrives | v withAuth extracts tenantId from JWT | v Every Prisma query includes: { where: { tenantId: req.user.tenantId, ... } } | v Response contains only the tenant's data

What Is Scoped

  • All user-created data: accounts, transactions, categories, tags, budgets, imports, embeddings, audit logs, Plaid connections, holdings.
  • Per-tenant settings: classification thresholds, display currency, adapter configurations.

What Is Shared

  • Reference data: countries, currencies, supported banks, exchange rates.
  • System configuration: feature flags, global rate limits.

There is no row-level security (RLS) in PostgreSQL. Isolation is enforced entirely in the application layer through Prisma query scoping. Every query that touches tenant data must include tenantId in its where clause.


Environment Variables

Required (all services)

VariableService(s)Purpose
DATABASE_URLapi, backendPostgreSQL connection string
REDIS_URLbackendRedis connection string
ENCRYPTION_SECRETapi, backendAES-256-GCM key (32 bytes, base64)
NEXTAUTH_SECRETapiNextAuth.js JWT signing secret
INTERNAL_API_KEYapi, backendShared secret for internal API calls
BACKEND_URLapiURL of the Express backend

Optional

VariableDefaultPurpose
ENCRYPTION_SECRET_PREVIOUS(none)Previous key for rotation
STORAGE_BACKENDlocal”local” or “gcs”
LOCAL_STORAGE_DIR./data/uploadsPath for local file storage
GCS_BUCKET_NAME(none)Google Cloud Storage bucket
PLAID_CLIENT_ID(none)Plaid API client ID
PLAID_SECRET(none)Plaid API secret
GEMINI_API_KEY(none)Google Gemini API key
REDIS_SKIP_TLS_CHECKfalseSkip TLS verification (dev only)
NEXT_PUBLIC_API_URL(none)API base URL for the frontend SPA

Security Summary

LayerMechanism
TransportHTTPS (TLS termination at nginx/load balancer)
AuthenticationJWT in httpOnly cookies (NextAuth.js)
AuthorizationTenant-scoped queries; role field on User
Internal servicesINTERNAL_API_KEY header
Data at restAES-256-GCM on sensitive fields
Secrets managementEnvironment variables (not committed to repo)
Input validationZod/Joi schemas on API routes
Rate limitingPer-route rate limiting middleware
File uploadsformidable with size limits; temp file cleanup
CSRFhttpOnly + sameSite cookie policy