GoToBuddy Case Study: AI Calling Application
Project Overview
GoToBuddy is an innovative software development and AI project designed to revolutionize phone interactions by enabling users to have conversations with specialized AI characters. Users can dial a specific number and choose from six distinct AI personas, each an expert in their respective field. The project also includes a comprehensive website for managing user profiles and subscriptions, ensuring a seamless user experience.Project Domain
Artificial Intelligence (AI)Technologies and Technical Specifications
Integrations with Different Platforms
- Twilio
- Twilio Streaming
- Deepgram
- Perplexity
- OpenAI (GPT-4)
- OpenAI TTS
- Langchain
Twilio, Twilio Streaming, Deepgram, Perplexity, OpenAI (GPT-4), OpenAI TTS, Langchain
Architecture:
Describes the technical details of what happens when a user calls the AI phone number.
The 6 distinct personas are as below:
- Lisa, The News Anchor: Provides the latest news updates and discusses current events.
- Ben, The Storyteller: Narrates engaging stories and discusses literature.
- Ethan, The Sustainable Chef: Offers recipes, cooking tips, and advice on sustainable eating.
- Jack, The Sports Analyst: Discusses sports news, game analyses, and player statistics.
- Chloe, The Sass Master: Engages in witty banter and provides fashion interactions.
- Maya, The Coach Extraordinaire: Offers motivational talks and personal development advice.
Understanding The Project Challenges
When the client approached us, they had a passionate vision for harnessing AI technology. They proposed creating an AI-based application capable of engaging in autonomous conversations with end users. While developing this project we faced the following challenges:
- Natural Language Understanding and Processing: Ensuring the AI can accurately understand diverse user inputs and respond appropriately. Handling accents, slang, and varied speech patterns to provide a seamless conversation experience.
- AI Persona Development: Creating distinct and engaging personalities for each AI character. Maintaining consistency in the character’s knowledge base, tone, and interaction style.
- Real-Time Response and Latency: Achieving minimal latency in AI responses to maintain natural, real-time conversations. Ensuring high performance and reliability of the telecommunication infrastructure.
- Approval for OTP Campaigns: Navigating the complexities of obtaining approval from the US government for sending OTP (One-Time Password) messages to end users. Ensuring compliance with regulatory requirements and addressing potential legal hurdles.
- Personalization and Context Awareness: Implementing to personalize interactions based on user preferences and history. Ensuring the AI retains context within a conversation to provide coherent and relevant responses.
- User Experience (UX) Design: Developing an intuitive and user-friendly interface for both the phone interaction system and the website. Ensuring smooth navigation and interaction flow to enhance user satisfaction.
Tool and Model Exploration: Exploring and integrating various tools and models posed significant challenges:
- Voice Audio Tools:
- Twilio: Initially considered for communication infrastructure but lacked advanced voice synthesis capabilities.
- Whisper: Tested for enhancing voice quality and naturalness in AI interactions.
- Deepgram: Implemented for accurate speech recognition to improve user interaction precision.
- Eleven Labs: Explored for integrating voice analytics and advanced features.
- Language Models (LLMs): OpenAI v3.5 to v4 to v4.o: Transitioned to leverage advancements in natural language processing and dialogue generation. Send appropriate context to the question based on the previous conversation.
- Function calling: Since the AI lacks real-time knowledge, it retrieves information based on user queries from an online model (Perplexity).
How ScaleReal Made It Happen
Natural Language Processing (NLP) and AI Integration
Objective: To ensure the AI characters can understand and interact with users naturally and accurately.
- Actions:
- Implemented state-of-the-art NLP models (e.g., fine-tuned versions of GPT-4).
- Used top-notch third-party applications for text-to-speech and speech-to-text
Outcomes: High accuracy in understanding user inputs. Natural, engaging, and context-aware AI responses.
User Experience (UX) Design
Objective: To create an intuitive, user-friendly interface for both the phone interaction system and the website.
- Actions:
- Designed a seamless and responsive website for user profile management and subscription services.
- Developed a clear and straightforward phone menu for selecting AI characters.
- Incorporated user feedback to continually refine and enhance the design.
Outcomes: Easy-to-navigate interfaces that enhance user satisfaction. Positive user engagement and retention.
Data Security and Privacy
Objective: To protect user data and ensure compliance with data protection regulations.
- Actions:
- Implemented robust data encryption and security protocols.
- Conducted regular security and compliance checks.
- Developed secure systems for storing and managing personal information and call logs.
Outcomes: High level of data security and user trust. Compliance with relevant regulations
By focusing on these three key areas, we ensured that the AI GoToBuddy project not only delivered advanced AI interactions but also provided a secure and user-friendly experience.
Call Handling and Real-Time Interaction Flow
Initial Call Handling: When a user calls the AI phone number, Twilio manages the call and forwards the request to our Python backend server via a webhook.
User Details Check: The Python service processes the webhook event to verify the user’s registration status. If registered, it proceeds; otherwise, a temporary user object is created. Respond to Twilio to establish a two-way communication channel using Twilio Streaming (connect verb).
Establishing Data Streaming: Once the streaming connection is established, the user’s phone continuously sends data over a WebSocket connection to our Python service through Twilio Streaming.
Streaming Setup and Data Handling:
- Database Interaction: Fetches constant details like pause times from the database in microseconds to manage sentence completion.
- Deepgram Integration: Configures parameters to handle Twilio data, which arrives in x-mu-law encoding format with 8k samples per second.
- Ensures streaming buffer sizes range between 20ms and 250ms of audio, compliant with Deepgram's requirements.
Live Data Collection: Python service collects live data from the user in mu-law encoding format at an 8k sample rate via WebSocket. Buffers data until it reaches 20ms, then sends it to Deepgram for speech-to-text conversion.
Speech-to-Text Conversion: Deepgram converts live audio data into text. Upon reaching final speech pause time, gathers sentences and forwards them to OpenAI LLM through Langchain.
Query Processing: Transcribed text (user query) along with a prompt fetched from the database is passed to the OpenAI LLM module through Langchain. OpenAI (GPT-4.o) analyzes the query to determine if a function call is necessary, fetching the latest information via Perplexity if required.
Generating and Formatting the Response: Retrieved information is formatted into a human-readable response by OpenAI. The entire AI-generated response is processed through OpenAI TTS (whisper) for conversion from text to speech.
Final Response Delivery: Output from OpenAI TTS is base64 encoded with a 24k sample rate and converted to 8k mu-law encoding for compatibility with Twilio. Formatted audio response is sent through Twilio streaming, allowing users to hear the response in real-time on their phone.
Repetition for Continuous Interaction: Steps 5 to 9 are repeated continuously during the call, ensuring each user query is processed, answered, and delivered promptly.
These steps outline the detailed process flow for handling user interactions in real-time through the AI system, integrating voice handling, speech recognition, natural language processing, and audio response delivery seamlessly.
Key Features Of The Project
Through our development process, we introduced several key features to enhance user engagement and operational efficiency:
- AI Greeting and Personal Context The AI greets users with a personal touch by addressing them by name, leveraging Google social login for user registration. The greeting message includes a fun fact about the current date and concludes with a query to engage the user. The AI maintains a summary of previous conversations, ensuring seamless context switching throughout the interaction.
- Personas To enhance user engagement and interest, we introduced 6 distinct AI personas. Each persona provides responses characterized by its unique traits, ensuring varied and personalized interactions.
- Current Updates Our application is powered by OpenAI’s GPT-4.0. Since GPT-4.0 does not provide the latest news or current events, we integrated Perplexity AI. By using specific keywords, our system determines whether to source information from OpenAI or Perplexity based on the user’s query.
- Subscriptions We offer three types of subscriptions:
- Pre-trial: Users get 5 minutes of conversation with the AI without any registration.
- Trial: Users must register on the website to receive 15 minutes of conversation time.
- Paid Subscription: For extended interaction, users can subscribe to a paid plan for unlimited conversation time.
By refining these features, we ensure a personalized, engaging, and up-to-date experience for all users of the AI GoToBuddy system.
How ScaleReal Made A Difference
The intervention led to transformative results for GoToBuddy:
Advanced Technology Integration:
- Twilio Integration: Seamless implementation for handling calls, ensuring reliable communication channels.
- Deepgram Integration: Precision in speech recognition capabilities, enhancing interaction accuracy.
- OpenAI Models: Utilization of state-of-the-art natural language processing for dynamic and context-aware responses.
User-Centric Design Approach: Tailored user experiences through personalized interactions with distinct AI personas. Real-time response capabilities ensure immediate and relevant feedback to user queries.
Scalable Infrastructure: Implemented scalable infrastructure to accommodate growing user demands. Optimized data handling processes for efficient and effective operations.
Innovation and Adaptability: Constant exploration and integration of new tools and technologies to refine system functionalities. Agile development methodologies ensure adaptability to evolving project requirements and user needs.
Client Success and Satisfaction: Exceeded client expectations by delivering a robust and feature-rich AI-powered communication platform. Enhanced user engagement and satisfaction through intuitive design and seamless functionality.
Beta Launch: The application has successfully catered to approximately 50-100 users during the beta launch phase. We received overwhelmingly positive feedback, highlighting the system’s engaging interactions and seamless user experience. Users appreciated the personalized greetings, context-aware conversations, and diverse AI personas.
Conclusion
Scalereal’s partnership with GoToBuddy has been transformative. Integrating Twilio for seamless calls, Deepgram for accurate speech recognition, and OpenAI’s latest models has created a robust platform. Personalized interactions through AI personas and real-time responses have enhanced user engagement. Scalable infrastructure ensures readiness for growth, underscoring Scalereal’s commitment to AI innovation and client success.
This case study exemplifies Scalereal’s dedication to pushing the boundaries of AI technology to deliver exceptional user experiences and drive meaningful business outcomes.