Conversational AI | CRM | Business Automation | SaaS
Built a multi-channel AI assistant platform that automates phone, SMS, and chat interactions at scale.
Project X is a production-grade multi-channel AI SaaS platform designed for continuous, human-like conversations across voice and messaging. It integrates with business systems (including CRMs) to automate front-desk and operational workflows—reducing overhead while delivering near real-time responses.
What it solved
- Missed calls turning directly into lost revenue.
- Expensive call centers and bloated support teams.
- Slow response times across disconnected channels.
- Generic chatbots that delivered poor accuracy and robotic UX.
What was built
- Ultra-low-latency voice pipeline with streaming interaction design.
- Channel-agnostic multi-channel agent core with shared context.
- Industry-specific AI agents with grounding from business knowledge + CRM data.
- Workflow automation engine (Trigger → Intent → Action → Response) with observability.
Why it mattered
- Customers received human-like conversations without waiting for staffed teams.
- Businesses replaced a large % of support workload with automated front-desk execution.
- Exact same agent behavior worked across phone, SMS, and web chat.
- Compliance-friendly, vertical-aware workflows (example: healthcare data) reduced risk.
The Challenge
Before this platform, businesses used fragmented communication systems and expensive operational overhead. There was no unified system capable of handling real-time voice + messaging + workflow execution with high accuracy and low latency—especially under strict constraints like concurrency and compliance.
Environment constraints & goals
- Designed to handle 24/7 customer interactions.
- Provide natural voice conversations with a seamless phone UX.
- Integrate with CRMs and business workflow systems.
- Deliver 10–50x ROI by reducing operational costs and support load.
- Support compliance-heavy, industry-specific workflows (e.g., healthcare data).
Core constraints to win
- Critical latency: real-time voice latency requirements for natural, interruptible conversations.
- High concurrency: multiple simultaneous calls without performance degradation.
- Unified UX: channel consistency across Phone, SMS, and Web.
- Accuracy under real-world conditions: reduce hallucinations in strict business workflows.
- The primary goal was to deliver human-like AI conversations (voice and text), enable full automation of front-desk workflows, and provide industry-specific intelligence.
The Solution
The core idea was a modular AI agent platform that understands user intent via LLMs, acts through workflow integrations, responds naturally, and continuously learns from business data. A real-time communication stack and a channel-agnostic agent core unified voice and messaging experiences.
Solution overview & architecture
- LLM-based intent understanding to interpret user requests across channels.
- Workflow integration layer to execute business actions beyond Q&A.
- Natural response generation designed for voice and text UX.
- Continuous learning grounded in scraped and business-uploaded knowledge.
- Unified memory/context shared across phone, SMS, and web chat sessions.
Design principles
- Ultra-low latency first for voice (streaming-first pipeline).
- Channel-agnostic agent core for consistent UX across protocols.
- Retrieval-grounded answers using semantic runtime retrieval.
- Event-driven automation with explicit intent → action → response flow.
- Observability baked in: latency tracking, failure logging, and escalation signals.
Capability Snapshot
Project X delivered more than AI conversations. It created a unified business automation layer that could answer, retrieve context, trigger workflows, and update external systems with production reliability.
Real-time conversation engine
Streaming STT for partial transcripts, incremental LLM token streaming, and TTS orchestration.
Knowledge grounding
Scraping + parsing ingestion pipelines with chunking/embedding and FAISS-backed vector search.
Workflow automation
Event-driven execution: Trigger → Intent → Action → Response across business APIs and CRM signals.
Multi-channel consistency
Channel-agnostic agent core with shared memory and context across phone, SMS, and web chat.
Industry-specific intelligence
Vertical-specific prompting plus grounding from FAQs, local documents, and CRM data.
Observability & safe operations
Conversation logs, latency and failure monitoring, sentiment analysis, and escalation dashboards.
Implementation Highlights
Ultra-low latency voice pipeline Natural phone UX requires near real-time response.
Problem: Traditional AI voice systems feel laggy and unnatural, breaking the flow of human conversations.
Solution: Implemented streaming STT for partial transcripts, used incremental LLM token streaming combined with parallel TTS generation, and optimized buffering for real-time audio playback.
Impact: Achieved ~1–1.5s response latency, delivering interruptible, human-like conversations that feel immediate.
Multi-channel unified engine One intelligence layer across phone, SMS, and web.
Challenge: Phone, SMS, and Web have different protocols and UX expectations, but businesses need consistent assistant behavior across all channels.
Solution: Built a channel-agnostic agent core and abstracted the communication layer so the same experience could be delivered everywhere. Shared memory and context were implemented across channels.
Impact: The exact same AI agent works seamlessly across phone calls, SMS interactions, and embedded web chat.
Industry-specific AI agents Vertical workflows needed more than generic models.
Problem: Generic LLMs struggle in strict, domain-specific business workflows and can hallucinate when accuracy requirements are high.
Solution: Fine-tuned prompts per vertical (Healthcare, Retail, Finance) and deeply integrated business FAQs, local documents, and Zoho Desk CRM data to ground interactions.
Impact: Higher accuracy, reduced hallucination rates, and superior task completion.
Business knowledge ingestion & retrieval Fresh, grounded context for accurate conversations.
Problem: Assistants underperform without fresh, business-specific context.
Solution: Built ingestion pipelines for scraping websites and parsing business-uploaded PDFs / SOPs. Chunked and embedded the data into a self-hosted FAISS vector store, then used semantic runtime retrieval and synced structured CRM signals for context-aware replies.
Impact: Faster vertical onboarding and highly accurate, workflow-aware responses.
Workflow automation engine & observability Automation that can be monitored and improved.
Workflow: Built an event-driven system (Trigger → Intent → Action → Response) integrating external APIs (Calendars, Zoho Desk, databases) to automate appointments, orders, and routing.
Observability: Logged conversations, latency, and failures. Added sentiment analysis and escalation tracking dashboards to ensure continuous improvement and fast detection of degradation.
Impact: Replaced human intervention for end-to-end tasks while ensuring system degradation was caught immediately.
Business & System Results
The platform achieved scale and reliability by combining near real-time conversational AI with workflow execution, knowledge grounding, and continuous operational observability.
Scale & performance
- Handles multiple simultaneous calls without waiting.
- 24/7 automated communication across phone, SMS, and web.
- Voice latency engineered for near real-time user experience.
Conversions & customer experience
- Increased response speed, directly improving conversions.
- Human-like, interruptible conversations rather than robotic scripts.
- Accurate, industry-aware assistance from grounded knowledge and CRM signals.
Cost & operations
- Significant reduction in call center workload and hiring dependencies.
- Reduced overhead through end-to-end automation of front-desk workflows.
- Operational degradation caught immediately via observability dashboards.
My Role & Scope
Timeline: 2.5 years. Users: SMBs, Enterprises, Healthcare. Role: Lead AI Engineer. I owned the system end-to-end, from the real-time communication stack to AI/LLM orchestration.
Owned directly
- System architecture for real-time AI + communication stack.
- AI/LLM orchestration: prompting, orchestration, optimization.
- Knowledge: web scraping, parsing, FAISS vector search.
- Infrastructure for multi-channel (Phone, SMS, Web chat) and workflows.
- Integrations with Zoho Desk CRM, scheduling systems, and custom APIs.
Hands-on engineering
- Wrote production-grade code across backend systems and AI pipelines.
- Built and maintained real-time communication services (voice + messaging).
- Optimized system performance for low-latency voice and high concurrency.
- Designed and implemented APIs and internal services used by the platform.
- Debugged live call flows, LLM hallucinations, and async race conditions.
Lessons Learned
This project demonstrates the ability to build production-grade AI systems, solve real-world latency and scaling challenges, design end-to-end SaaS platforms, and deliver measurable business impact.
What worked
- Real-time performance is non-negotiable for voice AI.
- Vertical-specific agents consistently outperform generic models.
- Multi-channel unification is a massive market differentiator.
Challenges conquered
- Balancing the strict trade-off between latency and accuracy in LLMs.
- Handling unpredictable edge cases in real human voice conversations.
Final takeaway
Final Takeaway: This project demonstrates the ability to build production-grade AI systems, solve real-world latency and scaling challenges, design end-to-end SaaS platforms, and deliver measurable business impact.