WhatsApp Business Admin Panel with AI

Key challenges.

Building a production-grade AI assistant that could answer business queries using a custom knowledge base

Real-time message streaming with sub-second latency

Multi-tenant architecture supporting multiple business instances

Zero-downtime deployments with automatic rollback

WhatsApp Cloud API integration with reliable webhook processing

Implementation highlights.

Intelligent Intent Classification

Problem

Customer messages range from simple greetings to complex product inquiries. A naive approach would either waste AI tokens on simple queries or fail to provide adequate responses for complex ones.

Solution

Implemented a tiered intent classification system using Claude Haiku for fast, cheap classification. Messages are routed to specialized handlers: knowledge queries hit the RAG system, chitchat gets simple responses, and clarification requests trigger follow-up prompts. Each handler has optimized prompts and token budgets.

Impact

90% reduction in AI costs for simple queries while maintaining high-quality responses for complex inquiries. Response categorization enables analytics on customer intent patterns.

Hybrid RAG with Multi-Provider AI

Problem

The client needed AI responses grounded in their actual business content (services, pricing, policies) rather than generic AI responses. Content comes from multiple sources: WordPress sites, uploaded documents, and manually entered FAQs.

Solution

Built a hybrid search system combining BM25 text search with Voyage AI vector embeddings. Content is chunked with token counting for context window optimization. Anthropic's cache_control reduces costs by 90% on cache hits. The system falls back gracefully: KB → web search → decline with explanation.

Impact

AI responses cite actual business content with source attribution. Customers get accurate, contextual answers rather than hallucinated information. Usage tracking shows which content sources are most valuable.

Real-Time Streaming Architecture

Problem

AI responses can take several seconds to generate. Blocking the user during this time creates a poor UX, especially when the AI is 'thinking' through complex queries.

Solution

Implemented Server-Sent Events via Socket.IO for real-time streaming. The pipeline emits progress events (searching, found N sources, generating) followed by token-by-token response streaming. Abort controllers allow cancellation. Thinking blocks are filtered out while preserving the final response.

Impact

Users see immediate feedback as the AI works. Progress indicators show exactly what the system is doing. Cancellation prevents wasted compute on abandoned queries.

Zero-Downtime Multi-Instance Deployment

Problem

Multiple business clients need isolated instances with their own databases, but all running the same codebase. Updates must not disrupt active conversations.

Solution

GitHub Actions workflow with SOPS-encrypted secrets per instance. Deployments extract to a temp directory, install deps, perform atomic swap, and run health checks. Failed health checks trigger automatic rollback to the previous version. Graphile Worker handles background jobs with idempotent task processing.

Impact

Multiple production instances managed from one codebase. Deployments complete in under 2 minutes with automatic rollback on failure. Zero customer-facing downtime during updates.

Overview

Built a comprehensive WhatsApp Business admin panel that enables teams to manage customer conversations with AI-powered assistance. The platform handles message routing, team collaboration, template management, and intelligent response generation—all while maintaining enterprise-grade reliability.

The Challenge

The client needed to modernize their customer communication workflow. Their requirements included:

Centralized inbox for WhatsApp Business messages
AI assistant that understands their specific business context
Team collaboration features with role-based access
Multi-language support (Italian primary, English secondary)
Reliable message delivery with retry mechanisms
Analytics and usage tracking

The technical challenge was building an AI system that could provide genuinely useful responses without hallucinating information or requiring expensive per-query costs.

Technical Approach

Architecture Decisions

Monorepo with Workspaces: Chose npm workspaces over microservices for this project size. Shared types via a packages/shared workspace ensure API contracts are enforced at compile time. Single deployment artifact simplifies ops.

PostgreSQL + Drizzle ORM: Selected for type-safe queries, excellent JSON support, and Graphile Worker compatibility. Drizzle's schema-as-code approach makes migrations predictable. 35+ tables handle everything from messages to AI content to analytics.

Socket.IO for Real-Time: WebSocket connections enable instant message delivery and AI streaming. Four namespaces isolate different concerns: /whatsapp, /team-chat, /notifications, /ai-assistant.

Express 5 + TypeScript: Express 5's native Promise support simplifies async error handling. Feature-based directory structure keeps related code together. Each feature has router → service → repository layers.

AI Pipeline Architecture

The AI assistant follows a pipeline pattern:

Context Preparation: Load conversation history, business facts, AI settings
Intent Detection: Claude Haiku classifies the query type
Handler Selection: Route to appropriate handler (knowledge, chat, clarification, web-search)
Context Building: Handler fetches relevant content via hybrid search
Execution: Stream response with progress events
Post-Processing: Log usage, extract facts, update analytics

Security & Reliability

JWT authentication with refresh token rotation
Rate limiting on AI endpoints (prevent abuse)
SOPS encryption for secrets at rest
Health check endpoints for deployment verification
Automatic rollback on failed deployments
Structured logging with Pino for debugging

Results

The system has been running in production for 9 months, handling real customer conversations daily. Key outcomes:

Consistent Uptime: Zero-downtime deployments with automatic rollback
Cost Efficiency: Tiered AI approach keeps costs predictable
Developer Experience: Full-stack TypeScript enables rapid iteration
Extensibility: New features (voice transcription, image description) integrated with minimal architecture changes

Tech Stack Summary

Backend: Express 5, TypeScript, PostgreSQL, Drizzle ORM, Graphile Worker, Socket.IO, Pino

Frontend: React 19, Vite, TanStack Router + Query, Tailwind CSS, Radix UI, i18next

AI/ML: Claude API (Anthropic), OpenAI API, Voyage AI, pgvector, hybrid BM25+semantic search

Infrastructure: Docker, GitHub Actions, SOPS, systemd, Caddy reverse proxy

Integrations: WhatsApp Cloud API, AWS S3, Firebase Cloud Messaging, WordPress sync

Key challenges.

Implementation highlights.

Intelligent Intent Classification

Hybrid RAG with Multi-Provider AI

Real-Time Streaming Architecture

Zero-Downtime Multi-Instance Deployment

Tech stack.

Overview

The Challenge

Technical Approach

Architecture Decisions

AI Pipeline Architecture

Security & Reliability

Results

Tech Stack Summary

Ready to
ship?

WhatsApp Business Admin Panel with AI

Key challenges.

Implementation highlights.

Intelligent Intent Classification

Hybrid RAG with Multi-Provider AI

Real-Time Streaming Architecture

Zero-Downtime Multi-Instance Deployment

Tech stack.

Overview

The Challenge

Technical Approach

Architecture Decisions

AI Pipeline Architecture

Security & Reliability

Results

Tech Stack Summary

Ready toship?

Ready to
ship?