Building MainTrac from Scratch
A deep-dive into choosing a modern tech stack for a standalone maintenance management product — with a field-first mobile experience at the center of every decision.
Recommended Stack — TL;DR
Before the detail, here is the recommended stack at a glance. Every choice below is justified in Section 4 with explicit tradeoffs considered.
Design Principles
Every tech stack choice is downstream of what you are building for. Before evaluating options, lock in the constraints and values that will govern tradeoffs throughout the project.
| Principle | What it means for this product | Stack implication |
|---|---|---|
| Field workers first | The primary daily users are maintenance technicians in parks and facilities — not office staff on fast Wi-Fi. Phones, outdoor lighting, gloves, intermittent connectivity. | Native mobile app, not just responsive web. Offline capability is non-negotiable. |
| Data integrity above all | Work order logs, inspection results, and cost records must be accurate and auditable. Lost or duplicated records destroy trust. | Postgres with strong constraints and transactions. Optimistic-lock sync on mobile. Append-only event log for audit trail. |
| Type safety end-to-end | Small team building fast. Runtime type errors against the domain model cost disproportionately in a maintenance product. | TypeScript throughout. Shared types package between frontend, mobile, and API. Zod schemas as the single source of truth for validation. |
| API-first | The decoupling thesis depends on MainTrac being consumable by NextRec, RecDesk, and third-party systems without a shared database. | REST API with OpenAPI 3.1 spec generated from code. No private DB access by other products — ever. |
| Multi-tenant by design | Vermont Systems hosts dozens of agencies. Each agency must have fully isolated data. | PostgreSQL Row-Level Security (RLS) with tenant_id on every table. Enforced at the DB layer, not just the application layer. |
| Incrementally replaceable | This will co-exist with RecTrac during migration. It must run alongside the old system for months. | No shared database with RecTrac. Auth abstracted behind a service boundary from day one. |
| AI-ready | The decomp program is tied to the AI Transformation initiative. Work order analysis, predictive maintenance, and NL query are on the near-term roadmap. | Postgres with pgvector support. Event-driven architecture so AI agents can subscribe to work order lifecycle events. |
The Mobile UX Challenge
MainTrac's most significant user-experience problem — and the one most likely to make or break adoption of a standalone product — is the mobile experience. The current RecTrac implementation has documented failures here: work order email deep links break on mobile, there is no staff-based filtering in the field view, and there is no logging wizard. Customers have explicitly called these out.
Understanding why the current system fails on mobile is the starting point for making better architecture decisions.
PWA vs. Native App — The Core Decision
The choice between a Progressive Web App (PWA) and a native mobile app (React Native / Expo) affects offline behavior, camera performance, push notifications, app store distribution, and the entire developer workflow. It deserves a serious analysis, not a default answer.
| Dimension | PWA (web-based) | React Native / Expo (native) |
|---|---|---|
| Offline support | Service Workers + IndexedDB. Works but complex to implement correctly for sync-heavy apps. Quotas apply (~50MB typical). | SQLite (WatermelonDB) on device. Robust, large storage, designed for offline-first. No browser quotas. |
| Camera / photo | Web MediaDevices API. Works on modern mobile browsers but inconsistent quality on older Android. No fine-grained camera control. |
expo-camera / expo-image-picker. Full native camera experience, compression control, barcode scanning. |
| Push notifications | Web Push API. Supported on iOS 16.4+ Safari only. Android Chrome works well. Requires user to "Add to Home Screen" for reliable iOS behavior. | expo-notifications wraps APNs + FCM. Works reliably on all iOS and Android versions. Background delivery guaranteed. |
| Performance / feel | Good on modern hardware. Scroll jank on complex lists is a real risk. No access to native animations. | Native components render via the platform's own UI layer. Smooth 60fps scrolling. Feels like a real app. |
| Distribution | No app store. URL-based install. Great for enterprise MDM. No store review delays. | App Store / Play Store. Review process adds delay. Enterprise distribution possible (TestFlight, App Distribution). |
| GPS / location | Geolocation API. Background location not available in browsers. | expo-location. Foreground and background location. Important for field routing. |
| Deep linking | URL-based — works naturally from email links. | Universal Links (iOS) / App Links (Android). Requires setup but resolves the current email-link failure completely. |
| Shared code with web | 100% — same codebase | ~60-70% via shared TypeScript packages (business logic, API calls, types). UI components are different. |
| Verdict for MainTrac | Not recommended as primary | Recommended |
The verdict: The customer-reported pain points (camera photos, push notifications, deep-link from email, smooth work order listing) all point to native. PWA is a reasonable supplement — a lightweight web view for situations where the app isn't installed — but it should not be the primary mobile delivery vehicle for field maintenance workers.
Offline-First Architecture for Field Workers
A maintenance technician in a park or utility tunnel cannot depend on connectivity. The mobile app must treat offline as the default state, not an exception to handle.
Optimistic Writes
Log entries, inspection results, and status updates are written locally first, then synced to the server. The user never waits for a network round-trip to continue their workflow.
Sync Queue
All writes enter a persistent queue on the device. When connectivity is restored, the queue drains in order. Failed items are retried with exponential backoff.
Conflict Resolution
If the same work order is modified by two people offline simultaneously, a deterministic merge strategy resolves it (last-write-wins per field, or server-authoritative for status transitions).
Local DB Mirror
Assigned work orders, asset lists, and task codes are synced down to the device. The app reads from local DB — instant, no spinner. Sync happens in the background.
WatermelonDB (by Nozbe) is a high-performance offline database for React Native apps. It uses lazy loading, SQLite on device, and a sync protocol designed for exactly this pattern: a master Postgres database on the server with per-device partial mirrors. Used by large-scale React Native apps to handle thousands of records without UI jank. This is the recommended offline layer for the MainTrac mobile app.
Mobile UX Patterns to Build In From Day One
| Problem (Current) | Pattern to Implement | Technology |
|---|---|---|
| Email links go to login page, not work order | Universal Links / App Links configured so email URLs open directly in the app, deep-linked to the specific work order after login | expo-linking, Universal Links entitlement, AASA file on server |
| No staff-based filtering on mobile WO list | First-class filter bar at the top of the work order list: status, assignee, priority, asset type. Persisted as local user preference. | WatermelonDB queries + MMKV for persisted filter state |
| Complex logging with no wizard | Step-by-step "Log Work" wizard: select assets → select tasks → log labor → log materials → add photos → submit. Progress saved at each step. | Multi-step form with local draft save (MMKV) |
| No push notifications | Push on: work order assignment, status change, PM plan due, inspection due | expo-notifications + FCM/APNs + server-side trigger on work order events |
| Photo quality and management | Native camera capture with compression, before/after photo mode per asset, photo annotationv | expo-camera, expo-image-manipulator, direct S3 presigned upload |
| Outdoor readability | High-contrast mode, large tap targets (min 44×44pt), no subtle hover states | Design system tokens — enforce minimum touch target sizes in Tailwind/NativeWind |
Stack Decisions by Layer
4.1 Architecture Shape — Modular Monolith
The temptation when building a greenfield product as part of a decomposition program is to immediately decompose into microservices. Resist it. Microservices are an organizational scaling solution, not a greenfield architecture pattern. With a small team, premature service splitting multiplies operational complexity before the domain boundaries are proven. Build a well-structured modular monolith that can be seamed apart later if needed.
The recommended shape is a Turborepo monorepo with clear internal module boundaries:
# Repository structure maintrac/ ├── apps/ │ ├── web/ # Next.js management UI │ ├── mobile/ # Expo React Native app │ └── api/ # Hono REST API server ├── packages/ │ ├── core/ # Domain types, business rules, validation (shared) │ ├── db/ # Drizzle schema, migrations, query helpers │ ├── auth/ # Clerk helpers, permission helpers (shared) │ └── ui/ # shadcn/ui components for web (shared between web screens) └── turbo.json
This gives you the benefits of a monorepo (shared types, single version lock, one CI pipeline) while preserving the ability to deploy apps/api independently as the API grows.
4.2 Frontend Framework — Management UI
Management UI Framework
Recommended
- React Server Components for data-heavy report pages — renders on server, ships less JS
- App Router with file-based routing — easy to onboard new devs
- Massive ecosystem; shadcn/ui, Tailwind, every charting lib
- Vercel deployment or self-host on Docker
- Best-in-class TypeScript integration
- React's mental model (hooks, suspense) has a learning curve vs Vue
- RSC caching behavior can be surprising
Reasonable
- Team already knows Vue 2 — migration path familiar
- Nuxt 3 is excellent; Nitro server is fast
- Smaller ecosystem than React; fewer UI component libraries at this quality level
- Cannot share components with Expo (which uses React)
- Vue 2 knowledge doesn't fully transfer — Composition API is a significant change
Skip
- Smallest bundle size; excellent DX
- Smallest ecosystem; hard to hire for
- No shared component model with Expo
Component Library: shadcn/ui + Tailwind CSS
Don't use a heavyweight component library like MUI or Ant Design for a greenfield build. Use shadcn/ui — a collection of unstyled, accessible, copy-paste components built on Radix UI primitives, styled with Tailwind. You own the source code; nothing is locked in a dependency you can't modify. This matters for a maintenance product where you'll have bespoke UI patterns (work order cards, inspection checklist forms, asset trees) that no off-the-shelf library handles well.
4.3 Mobile Framework
Mobile App Framework
Recommended
- Managed workflow handles native builds without Xcode/Android Studio expertise
- Expo Router gives the same file-based routing as Next.js — one mental model
- Best-in-class libraries for camera, notifications, location, file system
- OTA updates via EAS Update — push fixes without app store review
- Shares React component patterns with Next.js web app
- NativeWind brings Tailwind classes to React Native components
- Some very specialized native modules require ejecting to bare workflow
- Build times longer than web (EAS cloud builds mitigate this)
Reasonable
- Share 100% of web code; wrap in a native shell
- Faster if team has strong web skills and limited native experience
- WebView performance — scrolling large work order lists will feel sluggish
- Camera and push notifications require Capacitor plugins; less polished than Expo's
- Doesn't solve the "native feel" problem that's core to the rebuild value prop
Skip
- Maximum performance and platform integration
- Two separate codebases — doubled maintenance burden
- Requires iOS and Android specialists; VSI team doesn't have this today
4.4 Backend API Framework
API Server
Recommended
- Ultra-fast TypeScript HTTP framework — benchmarks 3–5× faster than Express
- First-class Zod validation middleware — schemas auto-generate OpenAPI spec
- Typed RPC client
hono/client— end-to-end type safety without GraphQL overhead - Runs on Bun, Node, Cloudflare Workers, Deno — deploy anywhere
- Same language as frontend — one hire profile
- Bun is newer; some npm packages have edge-case compatibility issues
- Smaller community than Express/Fastify (but growing fast)
Strong alternative
- Fastest runtime performance; lowest memory footprint per instance
- Strong concurrency model — great for high-volume work order log writes
- Compiled binary simplifies deployment
- Split language stack (Go backend, TS frontend) — two hire profiles, no code sharing
- More verbose; slower iteration speed for CRUD-heavy product
Skip
- Familiar if team has .NET experience; excellent for enterprise
- No code sharing with TypeScript frontend; heavy ceremony for a CRUD product
- Doesn't align with the startup-speed goal of this rebuild
4.5 Database
The interesting decisions here are not which database (PostgreSQL, clearly) but which ORM, how to handle multi-tenancy, and how to structure the schema for the MainTrac domain.
ORM: Drizzle over Prisma
| Factor | Prisma | Drizzle Recommended |
|---|---|---|
| Type safety | Excellent | Excellent — types derived directly from schema definition |
| SQL control | Prisma generates SQL; escape hatches exist but are clunky | SQL-like API — what you write is close to what executes. Reporting queries are readable. |
| Performance | Binary Prisma engine adds latency on cold start; heavier memory | Pure JS/TS — zero-overhead abstraction, fast cold starts, edge-deployable |
| Migrations | Mature migration tooling | Good migration tooling; slightly less mature but actively developed |
| Verdict | — | Drizzle's SQL-first API is better for the complex reporting queries MainTrac needs (multi-join cost aggregations, date-windowed work order stats) |
Multi-Tenancy via Row-Level Security
Every table in the MainTrac schema gets a tenant_id column. PostgreSQL Row-Level Security policies enforce isolation at the database layer — even if application code has a bug, one tenant cannot see another's data. The API sets a session variable (SET app.current_tenant = '...') after authentication, and all queries are automatically filtered.
-- Example RLS policy (applied to every MT table) CREATE POLICY tenant_isolation ON work_orders USING (tenant_id = current_setting('app.current_tenant')::uuid); ALTER TABLE work_orders ENABLE ROW LEVEL SECURITY;
Schema Design Notes for the MainTrac Domain
- Use UUIDs (v7 for sortability) as primary keys — safe for distributed IDs, works across sync scenarios
- Work order status transitions enforced via a CHECK constraint or a state machine table — don't rely solely on application logic
- All log entries (labor, material, equipment, combo) use an append-only pattern — never UPDATE a log row; INSERT a reversal record instead (matches the existing MTLogReverse.p pattern)
- Add
pgvectorextension from day one — enables semantic search on work order descriptions and AI-assisted task suggestions later at no migration cost
4.6 API Design
| Approach | Pros | Cons | |
|---|---|---|---|
| REST + OpenAPI | Standard; every third-party integration tool understands it; easy to generate typed clients; simple mental model | Multiple round-trips for complex screens; over/under-fetching | Recommended |
| GraphQL | Flexible queries; single endpoint; strong for complex relational reads | Complex to cache; n+1 problem requires DataLoader; overkill for a mostly-CRUD maintenance product; harder to generate OpenAPI for third parties | Skip |
| tRPC | Best-in-class type safety; RPC-style calls with no serialization layer | TypeScript-only; makes third-party integration harder; not appropriate for the "API-first for NextRec/RecDesk" mandate | Use internally only |
Decision: REST + OpenAPI 3.1, generated automatically from Hono route definitions + Zod schemas. Use Hono's typed client internally between web/mobile apps and the API (giving tRPC-like type safety), while publishing the OpenAPI spec externally for third-party integrations and for NextRec/RecDesk to consume.
4.7 Authentication & Authorization
Auth is the #1 coupling blocker from the RecTrac decomp analysis. It must be a first-class standalone capability, not bolted on. The choices are:
| Option | Best for | MainTrac verdict |
|---|---|---|
| Clerk | Fast time to production; org/role management; SSO; pre-built UI components | Recommended — Clerk's organization model maps directly to tenants. Roles (admin, supervisor, technician) map to MainTrac permission codes. Webhook sync keeps your DB in sync with Clerk state. |
| Auth0 | Enterprise SSO; complex permission hierarchies; established vendor | Reasonable — more configuration overhead than Clerk; better if VSI already has an Auth0 contract. CNIC/government customers may prefer Auth0's compliance certifications. |
| Supabase Auth | If using Supabase as the full backend (Auth + DB + Storage together) | Skip if not using Supabase DB — don't use Supabase Auth in isolation. |
| Self-hosted (Keycloak, Ory) | Government/compliance requirements for data residency; total control | Consider for CNIC/military customers — data residency requirements on DoD networks may require on-premise auth. Keep auth abstracted behind an interface so the provider can be swapped. |
MainTrac has fine-grained permissions — who can approve work orders, who can view which facilities, who can see cost data. Define these as permission codes stored in your own database (mirroring the existing SAPermissionData pattern), and use Clerk roles only for coarse-grained access (admin, supervisor, technician). Populate permissions from Clerk's JWT claims via a middleware that maps roles → permission sets on each request.
4.8 Real-Time & Notifications
MainTrac needs real-time updates in two scenarios: a supervisor watching work order statuses change during the day, and a technician being notified of a new assignment. These have different requirements.
| Scenario | Technology | Why |
|---|---|---|
| Supervisor dashboard — live status updates | Server-Sent Events (SSE) | Unidirectional — server pushes, client reads. No WebSocket complexity. Works over HTTP/2. Reconnects automatically. Sufficient for status board updates. |
| Mobile push — work order assignment | Expo Push + FCM/APNs | Only channel that reliably wakes a backgrounded mobile app. Expo abstracts platform differences into a single API call from your server. |
| In-app notification center | Polling + SSE fallback | Read-model: when user opens the app, fetch unread notifications. SSE badge count updates while app is in foreground. |
| Work order comment threads | Optimistic UI + polling | Comment threads are low-frequency enough that 30-second polling is acceptable. Optimistic UI makes it feel instant from the commenter's side. |
4.9 File & Photo Storage
Photos are a central part of the mobile workflow — before/after shots of work, inspection photo documentation. The wrong architecture here creates server bottlenecks.
Pattern: direct-to-S3 upload via presigned URLs.
- Mobile app requests a presigned upload URL from the API (
POST /uploads/presign) - API generates and returns a short-lived S3 presigned URL
- Mobile app uploads directly to S3 — the photo bytes never touch your API server
- On upload completion, mobile app calls
POST /work-orders/:id/photoswith the S3 key to save the reference in the DB
This pattern keeps your API server stateless and avoids saturating it with multi-megabyte photo uploads during busy inspection runs. Cloudflare R2 is a cost-effective S3-compatible alternative (no egress fees) if cost is a concern.
4.10 Reporting & Analytics
| Report Type | Approach | Technology |
|---|---|---|
| Standard operational reports (Work Order Analysis, Project Results, Facility Task) | Built-in — server-side query + React rendering | Drizzle query → JSON → @react-pdf/renderer for PDF, native CSV serialization for CSV export |
| Dashboard charts (work orders by status, cost by asset category) | Built-in — React charting library | Recharts or Tremor — both are React-native, lightweight, Tailwind-compatible |
| Ad-hoc / management analytics | Embedded BI tool | Metabase (self-hosted, open-source). Point at a read replica of your Postgres DB. Admins can build custom queries without engineering. Embeddable with signed tokens so it appears inside the MainTrac UI. |
| AI Natural Language Reporting (future) | LLM-driven query generation | Text-to-SQL using Claude or GPT-4o against the Postgres schema + pgvector for semantic matching. Aligns with exec priority #3 from the decomp meeting. |
4.11 Cloud & Hosting
| Component | AWS Service | Azure Equivalent | Notes |
|---|---|---|---|
| API containers | ECS Fargate | Azure Container Apps | Serverless containers — no EC2 fleet management. Auto-scale to zero on low traffic. |
| Database | RDS PostgreSQL (Multi-AZ) | Azure Database for PostgreSQL | Managed Postgres with automatic failover, backups, and PITR. Use a read replica for reporting queries. |
| Cache (session, rate limit) | ElastiCache (Redis) | Azure Cache for Redis | Rate limiting, job queues, real-time presence |
| Static frontend | CloudFront + S3 | Azure Static Web Apps | Next.js can export static or deploy as a container |
| File storage | S3 | Azure Blob Storage | Photos, export files, report PDFs |
| Background jobs | SQS + Lambda or ECS task | Service Bus + Azure Functions | PM plan work order generation, email notifications, report generation |
| Secrets | AWS Secrets Manager | Azure Key Vault | Never environment variables in production for secrets |
AWS adds significant operational overhead before you've proven the product. For the first 6–12 months, Railway (Postgres + containers in one platform) or Render lets the team ship fast without an SRE. Migrate to AWS when you need multi-region, compliance certifications, or government cloud (GovCloud) for CNIC customers.
Resolving the RecTrac Couplings
The previous document identified eight coupling dependencies. Here is exactly how the recommended stack resolves each one.
| Dependency (from coupling analysis) | Old coupling | Resolution in new stack |
|---|---|---|
| Authentication & Sessions | SAUserSettings, SASecurityFile in shared Progress DB |
Clerk (or Auth0) is the new auth provider. Staff log in via Clerk. JWTs issued and validated independently of RecTrac. Zero shared DB dependency. |
| Permission System | SAPermissionData, SAModule — shared across all RecTrac modules |
Permission codes live in MainTrac's own Postgres DB (permissions table). Mapped from Clerk roles via webhook sync. No RecTrac dependency. |
| Staff / User Records | Shared staff lookup tables in RecTrac Progress DB | Staff profile data synced from Clerk (via webhook) into a staff_profiles table owned by MainTrac. Work orders reference staff_profiles.id. If RecTrac staff data needs to flow in during migration, a one-time import script copies it. |
| GL Code Configuration | Shared with FinTrac module in RecTrac DB | GL codes imported into a gl_codes reference table in MainTrac's own DB. Seeded from RecTrac during migration; maintained independently thereafter. If a future financial service is built, MainTrac can subscribe to it via API. |
| Progress Database Schema | MT* tables in shared rectrac.df Progress DB schema |
Entire domain re-implemented in PostgreSQL. Drizzle ORM schema is the new source of truth. Data migration: write a Progress → Postgres ETL script during cutover. The MT* table namespace makes data extraction straightforward. |
| Web Handler / Routing | web-handler.p routes all RecTrac requests |
Hono API server is MainTrac's own entry point. Deployed as a separate container. No shared routing infrastructure with RecTrac. |
| Application Logging & Utilities | Shared LogManager, CleanupMemory.p |
Standard Node.js logging (Pino — structured JSON logs, extremely fast). No shared utilities needed. |
| Vue.js Frontend Shell | Shared Vue Router + Vuex store across all RecTrac NextGen modules | Next.js App Router is MainTrac's own routing. No shared state with RecTrac. The React-based Expo mobile app is similarly independent. |
The clean data namespace (MT* tables), the absence of unified transaction table coupling, and the well-scoped permission model all make this a tractable extraction. The hardest work is the one-time data migration from Progress DB to Postgres — which is an engineering effort, not an architectural blocker.
Migration Path from Progress/OpenEdge
The migration from the current Progress-based implementation to the new stack must be done without a "big bang" cutover. Customers need continuity. The recommended approach is the Strangler Fig pattern — new functionality goes into the new stack while the old system continues to run, and screens are migrated one by one until the old system can be decommissioned.
| Phase | What happens | Risk |
|---|---|---|
| Phase 0 — Schema design | Map every MT* Progress table to a Postgres equivalent. Write the Drizzle schema. Generate and validate the migration script against a copy of production data. No customer impact. | Low — purely additive |
| Phase 1 — Parallel run (shadow mode) | New Postgres DB runs alongside Progress DB. Writes go to both. New stack reads from Postgres and validates results match Progress. No users on new stack yet. | Low — old system is authoritative |
| Phase 2 — Mobile app launch | New Expo mobile app reads from the new Postgres-backed API. This is the highest-value, lowest-risk entry point — it's new functionality (no equivalent native app existed) and surfaces the improved mobile UX immediately. | Medium — new surface; test thoroughly |
| Phase 3 — Web module migration | Migrate screens one at a time: Work Orders → Assets → PM Plans → Logging → Reporting. Each screen goes live on Next.js with the old RecTrac screen available as fallback. Dual-write continues. | Medium — feature parity required per screen |
| Phase 4 — Cutover | New stack is authoritative. Progress DB is read-only archive. Stop dual-write. Decommission RecTrac MT screens. Announce standalone MainTrac product. | Medium — coordination across customers required |
| Phase 5 — New feature velocity | Ship the roadmap: API endpoints (VSXORG-246), logging wizard, advanced mobile filtering, AI work order suggestions, natural language reporting. | Low — clean codebase, no legacy constraints |
The Progress → Postgres ETL must be run against a full production data snapshot and validated for every MT record type before Phase 1 goes live. Build reconciliation tooling that counts records, validates key relationships (work order → asset links, log → work order links), and flags discrepancies. Do not skip this. A lost work order log from 2023 discovered in 2027 will undermine customer confidence in the entire product.
Build Sequence — What to Build in What Order
Assuming a small team (3–5 engineers), sequence the work to deliver value to users as early as possible while managing technical risk.
-
1
Weeks 1–4: Foundation
Monorepo scaffold (Turborepo), Postgres schema + Drizzle ORM, Clerk auth wired up, Hono API skeleton with health check, Next.js shell deployed, Expo app bootstrapped with Expo Router. CI/CD pipeline running. No feature work — just infrastructure that everything else builds on.
-
2
Weeks 5–10: Core Domain (API-first)
Work Orders CRUD, Asset management (all 5 types), Task codes, PM Plans. Build as API endpoints first with full OpenAPI spec. Integration tests against a test Postgres DB. This is the skeleton — all later UI is built on these endpoints.
-
3
Weeks 11–18: Mobile App (Field Priority)
Expo mobile app: work order list (with staff filtering from day one), work order detail + status update, logging wizard (labor + materials in one flow), photo capture + S3 upload, offline sync (WatermelonDB). Push notifications for assignment. Deep link from email. This is the flagship differentiator — ship it before the web redesign.
-
4
Weeks 19–28: Web Management UI
Next.js screens: Work Order management board, Asset registry, PM Plan scheduler, Combination Log review, supervisor dashboard with charts. shadcn/ui component library established. SSE for live status updates.
-
5
Weeks 29–36: Reporting + Inspections
Work Order Analysis Report (PDF + CSV), Project Results Report, Facility Task Report. Mobile inspection forms with offline capture. Inspection Results Report. Metabase embedded for ad-hoc analytics.
-
6
Weeks 37+: Gap Closure + AI Features
Contract Processing as a first-class workflow (fix the hybrid approach from 10.2 migration). Public API launch (VSXORG-246 delivered). Natural language work order search. AI-suggested PM plan schedules based on historical completion data. Predictive cost alerts.
The current RecTrac web UI for MainTrac, while dated, is functional. The mobile experience is where customers are actively in pain today. Delivering a polished native mobile app — with offline support, staff filtering, a logging wizard, and push notifications — before the full web redesign demonstrates immediate product value and builds customer confidence. It is also lower-risk: it is new surface area, not a migration of existing workflows.
Scale & Deployment
The previous sections assume a modern shared SaaS deployment but don't directly address the question of how MainTrac handles ~80 customer organizations — each with multiple employees — in a single running application. This section covers the multi-tenancy model options, what the scale numbers actually mean in practice, the connection pooling requirement that the rest of the document omits, and how to offer both a shared SaaS tier and a dedicated tier for customers who need it.
Each RecTrac customer is currently deployed as a separate instance — their own Progress database, their own application server. This gives strong data isolation but makes upgrades, monitoring, and infrastructure management expensive to operate across 80+ customers. A shared SaaS model inverts this: one codebase, one deployment, one place to ship updates — with isolation enforced by the application and database rather than by physical separation.
8.1 Multi-Tenancy Models — Three Options
There are three standard approaches to multi-tenancy in a PostgreSQL-backed SaaS. The right choice for MainTrac depends on how you weight operational simplicity against customer isolation guarantees.
| Model | How it works | Isolation level | Operational cost | Verdict |
|---|---|---|---|---|
| A — Shared schema (Row-Level Security) |
All tenants share the same tables. Every table has a tenant_id column. PostgreSQL RLS policies ensure each query only sees its own rows. The API sets a session variable on connect; all queries are automatically scoped. |
Row-level. Data is logically isolated but physically co-located. | Lowest. One schema to migrate, one place to add indexes, one DB to monitor. | Recommended for most customers |
| B — Per-tenant schema | Each tenant gets their own PostgreSQL schema (namespace) within the same database: tenant_abc.work_orders, tenant_xyz.work_orders, etc. Schemas are isolated at the DB level without separate connections. |
Schema-level. Stronger than RLS — a misconfigured query against the wrong schema returns nothing, not another tenant's data. | Medium. Migrations must run against every schema. 80 schemas means 80× DDL operations on every deploy. Manageable with tooling but adds friction. | Good middle ground if customers push back on Model A |
| C — Per-tenant database | Each tenant gets a fully separate PostgreSQL database (or RDS instance). Closest to the current RecTrac per-instance model. | Database-level. Maximum isolation — a credential leak for one tenant cannot touch another's data at all. | Highest. 80 databases to provision, migrate, back up, and monitor. Connection pooling becomes complex. Cost scales linearly with customer count. | Reserved for compliance-tier customers only |
Run the vast majority of customers on the shared schema (Model A) with RLS. Offer Model C — a dedicated database — as a premium "Compliance Tier" for CNIC/DoD customers or any customer with contractual data residency requirements. The application code is identical between tiers; only the database connection string changes. This keeps operational costs low while giving you a clean answer for compliance-sensitive customers.
How RLS Works in Practice
Every table in the MainTrac schema carries a tenant_id column. The API middleware sets a Postgres session variable immediately after authenticating the request. All subsequent queries in that session are automatically filtered — no application-level WHERE clause required.
-- 1. Every table has tenant_id CREATE TABLE work_orders ( id uuid PRIMARY KEY DEFAULT gen_random_uuid(), tenant_id uuid NOT NULL, -- always present, never nullable title text NOT NULL, status work_order_status NOT NULL DEFAULT 'new', ... ); -- 2. Enable RLS and add the policy ALTER TABLE work_orders ENABLE ROW LEVEL SECURITY; CREATE POLICY tenant_isolation ON work_orders AS RESTRICTIVE USING (tenant_id = current_setting('app.current_tenant', true)::uuid); -- 3. API middleware sets the session variable on every request -- (Hono middleware — runs before any route handler) app.use('*', async (c, next) => { const tenantId = getTenantFromJWT(c) // extracted from Clerk JWT await db.execute(sql`SELECT set_config('app.current_tenant', ${tenantId}, true)`) await next() })
With this in place, a query like SELECT * FROM work_orders automatically returns only the current tenant's rows — even if a developer forgets to add a WHERE clause. The database enforces it.
8.2 Scale Analysis — What 80 Customers Actually Means
Before over-engineering for scale, it's worth putting concrete numbers against the workload. MainTrac is a maintenance management product, not a consumer app. The usage patterns are predictable and modest.
| Metric | Conservative estimate | Postgres implication |
|---|---|---|
| Customers (tenants) | 80 organizations | 80 values in the tenant_id dimension — trivial for RLS |
| Staff per org | ~50 employees (mix of field + office) | ~4,000 total registered users |
| Concurrent active users | ~10% at peak — ~400 simultaneous | 400 concurrent API requests; well within a single app instance |
| Work orders per org per month | ~500 (larger parks depts) | 40,000/month across all customers; ~500k/year total rows — small |
| Log entries per work order | ~5 (labor + materials) | ~2.5M log rows/year — still comfortably within a single Postgres instance |
| Photos per work order | ~3 (before/after + detail) | ~1.5M photos/year; stored in S3, not the DB — no DB impact |
| Report queries | ~50/day per org | Route heavy reports to a read replica; primary DB unaffected |
A single AWS RDS db.m6g.large instance (2 vCPU, 8GB RAM, ~$130/mo) comfortably handles this workload with room to grow. You would need to be at 10–20× this scale before considering read replicas for the primary workload. The constraint you'll hit first is concurrent database connections — not compute or storage — which is why the next section matters.
8.3 Connection Pooling — The Gap in the Original Analysis
This is the one infrastructure requirement the rest of this document omits, and it matters at 80 customers. PostgreSQL allocates memory per connection — typically ~5–10MB each. The default max_connections on RDS is 80–170 depending on instance size. With 400 concurrent users each holding a connection from the API server, you will exceed this limit without a pooler.
| Option | How it works | Verdict |
|---|---|---|
| AWS RDS Proxy | Managed connection pooler that sits between your app and RDS. Maintains a small pool of long-lived DB connections and multiplexes thousands of app connections through them. Handles failover transparently. Integrated with IAM auth. | Recommended for AWS deployment — zero ops overhead, handles failover, works with RLS session variables via connection pinning. |
| PgBouncer | Open-source connection pooler, typically deployed as a sidecar container. Transaction-mode pooling multiplexes connections efficiently. Industry standard. | Good for self-hosted / Railway deployments — requires one more container to manage but gives you full control over pooling behavior. |
| Supabase Pooler (Supavisor) | If using Supabase as the managed Postgres host, their pooler is built in at the connection string level. | Built-in if using Supabase — simplest path if the team wants to avoid managing DB infrastructure altogether. |
The SET app.current_tenant = '...' session variable used by RLS must persist for the entire duration of a request. In transaction-mode pooling (PgBouncer default), connections are returned to the pool between statements, which resets session variables. You must either use session-mode pooling (less efficient) or connection pinning (RDS Proxy supports this natively). Design this before you write the first query — retrofitting it is painful.
8.4 Deployment Tiers — Shared SaaS vs Dedicated
The recommended production architecture has two tiers. Most customers land on the Shared tier automatically; the Dedicated tier exists for the handful of customers with compliance or contractual requirements.
| Shared SaaS Tier | Dedicated Tier | |
|---|---|---|
| Target customers | Standard municipal parks & rec departments | CNIC/DoD, customers with data residency contracts, large enterprise accounts |
| Infrastructure | Single ECS cluster, single RDS instance (with RDS Proxy), shared S3 bucket (per-tenant key prefix) | Dedicated RDS instance per customer, dedicated S3 bucket, optionally dedicated ECS service or AWS GovCloud |
| Data isolation | PostgreSQL RLS (Model A) | Separate database (Model C) |
| Application code | Identical — same Docker image, same codebase. Only the DATABASE_URL env var differs. | |
| Upgrades | One deploy reaches all shared customers simultaneously | Must deploy and migrate each dedicated instance — same process as RecTrac today |
| Pricing model | Standard subscription | Premium subscription (offset the higher infrastructure cost) |
| Ops overhead | Low — one system to monitor | Scales with number of dedicated customers — keep this tier small |
Infrastructure Diagram — Shared Tier
┌─────────────────────────────────────────────────────────────┐
│ AWS (or Azure) │
│ │
│ CloudFront CDN │
│ │ │
│ ▼ │
│ S3 (Next.js static) Expo / App Store │
│ │ │ │
│ └────────────┬───────────┘ │
│ ▼ │
│ ECS Fargate │
│ ┌────────────────────┐ │
│ │ Hono API (N tasks)│ ◄── auto-scale on CPU/req │
│ └────────┬───────────┘ │
│ │ │
│ ▼ │
│ RDS Proxy (connection pooler) │
│ │ │
│ ▼ │
│ RDS PostgreSQL 16 │
│ ├── primary (writes) │
│ └── read replica (reports, analytics) │
│ │
│ ElastiCache Redis ──► rate limiting, job queues │
│ S3 bucket ──► photos (per-tenant key prefix) │
│ SQS + Lambda ──► PM plan auto-generation, emails │
└─────────────────────────────────────────────────────────────┘
8.5 Moving Existing Customers from Per-Instance to Shared SaaS
The 80 existing RecTrac customers running MainTrac today are each on their own instance. Migrating them into the shared SaaS model is a one-time data migration per customer — not a technical blocker, but it needs a process.
| Phase | What happens |
|---|---|
| 1. Provision tenant | Create a row in the tenants table for the customer. Generate their tenant_id UUID. Provision their Clerk organization. This takes seconds and can be done via a simple admin CLI command. |
| 2. Extract MT data | Run the Progress → Postgres ETL script against their specific Progress database (the same migration tooling built in Phase 0 of the build sequence). This produces a set of INSERT statements scoped to their tenant_id. |
| 3. Load into shared DB | Load the tenant-scoped data into the shared Postgres instance. RLS ensures it's immediately isolated. Validate record counts match the source. |
| 4. Cutover | Point the customer at the new MainTrac URL. Their data is there, their Clerk accounts are set up. The old per-instance MainTrac is decommissioned on a grace period. |
| 5. Repeat | Do this customer by customer — no big bang. Each migration is independent and takes a few hours of engineering time, not days. At 80 customers, this is a 2–3 month migration program running in the background while the product is live. |
Some customers have been running MainTrac for 10+ years. Their combination log history and work order archives could be large. Profile the top 5–10 customers' data sizes before finalizing the shared DB instance size — you may find a handful of power users with 10× the data of the average customer. These are also the most likely Dedicated Tier candidates.
Summary: What Changes in the Recommended Stack
Adding these deployment considerations refines the stack in three specific ways:
- Add RDS Proxy between ECS and RDS — required to handle concurrent connections safely at 80+ customers. Without it you will hit Postgres connection limits under normal load.
- Use RESTRICTIVE RLS policies (not just USING — the default is permissive) and set session variables before the first query in every request, not lazily. This is a correctness requirement, not a performance optimization.
- Design a
tenantstable and a deployment-tier flag from day one. The columntenants.deployment_tier ENUM('shared', 'dedicated')drives which DATABASE_URL the API resolves at request time. Retrofitting this after launch is expensive.