Case study · A European Business · 9 months (ongoing)

WhatsApp Business Admin Panel with AI

A full-stack SaaS platform for managing WhatsApp Business communications with AI-powered response generation, real-time messaging, and multi-tenant deployment.

Client A European Business
Duration 9 months (ongoing)
Role Solo Full Stack Developer
118K+
Lines of Code
2,157
Commits
35+
Database Tables
29
Feature Modules
39
Database Migrations

Key challenges.

  • Building a production-grade AI assistant that could answer business queries using a custom knowledge base
  • Real-time message streaming with sub-second latency
  • Multi-tenant architecture supporting multiple business instances
  • Zero-downtime deployments with automatic rollback
  • WhatsApp Cloud API integration with reliable webhook processing

Implementation highlights.

01

Intelligent Intent Classification

Problem

Customer messages range from simple greetings to complex product inquiries. A naive approach would either waste AI tokens on simple queries or fail to provide adequate responses for complex ones.

Solution

Implemented a tiered intent classification system using Claude Haiku for fast, cheap classification. Messages are routed to specialized handlers: knowledge queries hit the RAG system, chitchat gets simple responses, and clarification requests trigger follow-up prompts. Each handler has optimized prompts and token budgets.

Impact

90% reduction in AI costs for simple queries while maintaining high-quality responses for complex inquiries. Response categorization enables analytics on customer intent patterns.

02

Hybrid RAG with Multi-Provider AI

Problem

The client needed AI responses grounded in their actual business content (services, pricing, policies) rather than generic AI responses. Content comes from multiple sources: WordPress sites, uploaded documents, and manually entered FAQs.

Solution

Built a hybrid search system combining BM25 text search with Voyage AI vector embeddings. Content is chunked with token counting for context window optimization. Anthropic's cache_control reduces costs by 90% on cache hits. The system falls back gracefully: KB → web search → decline with explanation.

Impact

AI responses cite actual business content with source attribution. Customers get accurate, contextual answers rather than hallucinated information. Usage tracking shows which content sources are most valuable.

03

Real-Time Streaming Architecture

Problem

AI responses can take several seconds to generate. Blocking the user during this time creates a poor UX, especially when the AI is 'thinking' through complex queries.

Solution

Implemented Server-Sent Events via Socket.IO for real-time streaming. The pipeline emits progress events (searching, found N sources, generating) followed by token-by-token response streaming. Abort controllers allow cancellation. Thinking blocks are filtered out while preserving the final response.

Impact

Users see immediate feedback as the AI works. Progress indicators show exactly what the system is doing. Cancellation prevents wasted compute on abandoned queries.

04

Zero-Downtime Multi-Instance Deployment

Problem

Multiple business clients need isolated instances with their own databases, but all running the same codebase. Updates must not disrupt active conversations.

Solution

GitHub Actions workflow with SOPS-encrypted secrets per instance. Deployments extract to a temp directory, install deps, perform atomic swap, and run health checks. Failed health checks trigger automatic rollback to the previous version. Graphile Worker handles background jobs with idempotent task processing.

Impact

Multiple production instances managed from one codebase. Deployments complete in under 2 minutes with automatic rollback on failure. Zero customer-facing downtime during updates.

Tech stack.

TypeScriptReact 19Express 5PostgreSQLDrizzle ORMSocket.IOClaude APIOpenAI APIVoyage AIGraphile WorkerDockerGitHub Actions

Overview

Built a comprehensive WhatsApp Business admin panel that enables teams to manage customer conversations with AI-powered assistance. The platform handles message routing, team collaboration, template management, and intelligent response generation—all while maintaining enterprise-grade reliability.

The Challenge

The client needed to modernize their customer communication workflow. Their requirements included:

  • Centralized inbox for WhatsApp Business messages
  • AI assistant that understands their specific business context
  • Team collaboration features with role-based access
  • Multi-language support (Italian primary, English secondary)
  • Reliable message delivery with retry mechanisms
  • Analytics and usage tracking

The technical challenge was building an AI system that could provide genuinely useful responses without hallucinating information or requiring expensive per-query costs.

Technical Approach

Architecture Decisions

Monorepo with Workspaces: Chose npm workspaces over microservices for this project size. Shared types via a packages/shared workspace ensure API contracts are enforced at compile time. Single deployment artifact simplifies ops.

PostgreSQL + Drizzle ORM: Selected for type-safe queries, excellent JSON support, and Graphile Worker compatibility. Drizzle's schema-as-code approach makes migrations predictable. 35+ tables handle everything from messages to AI content to analytics.

Socket.IO for Real-Time: WebSocket connections enable instant message delivery and AI streaming. Four namespaces isolate different concerns: /whatsapp, /team-chat, /notifications, /ai-assistant.

Express 5 + TypeScript: Express 5's native Promise support simplifies async error handling. Feature-based directory structure keeps related code together. Each feature has router → service → repository layers.

AI Pipeline Architecture

The AI assistant follows a pipeline pattern:

  1. Context Preparation: Load conversation history, business facts, AI settings
  2. Intent Detection: Claude Haiku classifies the query type
  3. Handler Selection: Route to appropriate handler (knowledge, chat, clarification, web-search)
  4. Context Building: Handler fetches relevant content via hybrid search
  5. Execution: Stream response with progress events
  6. Post-Processing: Log usage, extract facts, update analytics

Security & Reliability

  • JWT authentication with refresh token rotation
  • Rate limiting on AI endpoints (prevent abuse)
  • SOPS encryption for secrets at rest
  • Health check endpoints for deployment verification
  • Automatic rollback on failed deployments
  • Structured logging with Pino for debugging

Results

The system has been running in production for 9 months, handling real customer conversations daily. Key outcomes:

  • Consistent Uptime: Zero-downtime deployments with automatic rollback
  • Cost Efficiency: Tiered AI approach keeps costs predictable
  • Developer Experience: Full-stack TypeScript enables rapid iteration
  • Extensibility: New features (voice transcription, image description) integrated with minimal architecture changes

Tech Stack Summary

Backend: Express 5, TypeScript, PostgreSQL, Drizzle ORM, Graphile Worker, Socket.IO, Pino

Frontend: React 19, Vite, TanStack Router + Query, Tailwind CSS, Radix UI, i18next

AI/ML: Claude API (Anthropic), OpenAI API, Voyage AI, pgvector, hybrid BM25+semantic search

Infrastructure: Docker, GitHub Actions, SOPS, systemd, Caddy reverse proxy

Integrations: WhatsApp Cloud API, AWS S3, Firebase Cloud Messaging, WordPress sync

Want something similar?

Ready to
ship?

I build production-grade applications with the same attention to architecture and reliability.