Voice-Enabled AI Coach for Jobcase App

What is the Jobcase App?
The Jobcase app is the go-to platform for everyday workers seeking career advice, profile-building tools, and networking opportunities. With guidance from career coaches, users can create standout profiles that attract employers, and connect with a supportive community to advance their careers. Jobcase believes that in an era where automation is reshaping the job market, these resources are crucial for helping workers navigate and succeed in the evolving world of work.

Problem Statement
Given the demanding schedules of workers who juggle family responsibilities, transition between various tasks, and explore new employment opportunities, the existing keyboard-based interaction with their AI coach can be labor-intensive and inefficient.
Hypothesis
Introducing voice interaction to the AI Coach will save users time by enabling them to dictate responses instead of typing on their phones. This enhancement will increase engagement with the coach and create a more natural flow for updating profiles/resumes, job searches, seeking advice, and practicing for interviews. We envision this feature expanding to include biometrics and camera/video functionality, providing real-time sentiment feedback, along with analysis of stress levels, facial expressions, eye contact, and body language. How might we design a voice-enabled interface that facilitates seamless interaction, allowing users to engage with their AI coach efficiently during their busy daily routines?
Design Goals
1. Intuitive User Experience:
• Ensure the voice interface is user-friendly and easy to navigate.
• Implement natural language processing for effective understanding and response to user inputs.
2. Benchmarking and Best Practices:
• Use the existing web voice service as a benchmark.
• Incorporate successful elements and industry best practices from other AI voice interfaces.
3. Feature Awareness and Adoption:
• Automatically introduce the voice feature to users during initial interactions.
• Include onboarding tutorials to highlight the benefits and usage of the voice modality.
The people we set out to design for

Competitive Intelligence
We looked at Copilot, ChatGPT, Snapchat, Perplexity, and Replika for competitive intelligence insights and to be inspired by their voice-enabled functionalities. I reviewed the common design patterns to understand timing, transitions with the end of goal replicating those same patterns on our app.

Based on my user research, I've identified diverse approaches to implementing voice modality, with each application employing distinct methods. Furthermore, there is no universal design standard for indicating voice input, as different apps use various icons such as headsets, microphones, and phones to represent this feature.
Exploratory User Research
To understand user pain points with the current keyboard-based AI coach, I conducted surveys and interviews with a diverse group of users. The exploratory research indicates a growing acceptance and appreciation for voice-activated AI job assistants. Users are particularly drawn to the efficiency and ease of use, as well as the potential for personalized career assistance. Future improvements should focus on enhancing voice recognition accuracy, providing user guidance, offering customization options, integrating with productivity tools, and ensuring transparent, detailed feedback. These enhancements can drive greater user satisfaction and broader adoption of AI job assistants.
Key insights:
1. Users found typing responses to be time-consuming and preferred hands-free interaction, especially while multitasking especially home.
2. Users also didn't always know what to ask the coach besides look for a job
3. There was significant interest in AI coaches or job assistants that could provide personalized advice, job matching, and resume improvement suggestions.
Mobile Web Voice Modality Benchmark
To ensure we were building the right product for our members in native, I decided to user-test the mobile web version of the voice modality. The insights gained from this testing were invaluable in improving the native version. This approach also benefited the web engineers, who had designed the web version with minimal guidance from the design team, allowing them to quickly develop an MVP for testing.

Key Issues Identified:
Users encountered several challenges with the current design, particularly with the tap states of input forms and icons. While most users understood that the phone icon indicated they could call their coach, many suggested that a microphone icon would be more intuitive for speech-to-text functionality.
Response time was a notable issue, with users experiencing unresponsive input forms and unexpected keyboard pop-ups. Additionally, some users were surprised that the AI automatically sent their replies without giving them a chance to edit their text. This highlighted a desire for more control over the final message.
Accents also posed a problem, as users with accents felt the AI misunderstood them, impacting their experience. Despite these issues, users were pleased with the thoroughness and helpfulness of the AI coach.
Exploratory concepts
Concept 1: Voice-Based Human-AI Interaction with Animated Coach

The concept involves a voice-based interaction system between users and an AI career coach, named Jaycee, who is depicted as an animated, half-cartoon, half-human figure to ensure clarity that it is an AI. This system is designed to offer personalized career advice and assistance through natural voice interactions.
Key Features:
- Voice Interaction: Users speak directly to the AI coach via a microphone, simulating a two-way conversation.
- Animated AI Coach: Jaycee uses deep learning to animate a realistic face, enhancing user engagement.
- Permission Prompt: The app requests microphone access to enable voice interactions.
- Interactive Dialog: Jaycee listens and responds to user queries, offering assistance such as building resumes and finding jobs.
- Chat History: Users can review their conversation history with Jaycee, ensuring continuity and context in ongoing career guidance.
This design aims to provide a more intuitive and efficient way for users to interact with their AI career coach, leveraging voice-to-voice communication and animated visuals to create a seamless and engaging user experience.
Concept 2: Voice-Based Real-Time Translation: Voice-to-Voice with Animated Shape
This concept focuses on enabling real-time dictation and translation for users interacting with an AI career coach, represented by an animated blob. The system listens to user input through a microphone and provides instant transcription and audio responses, enhancing the efficiency of communication.
Key Features:
- Real-Time Dictation and Translation: Users speak into the microphone, and the system transcribes and translates the input on the fly, displaying the text and responding with audio.
- Animated Blob Interface: The interface features an animated blob that visually indicates listening and processing states, offering a dynamic and engaging user experience.
- Permission Prompt: Similar to the previous concept, the app requests microphone access to facilitate voice interactions.
- Interactive Dialog: Users can ask questions or seek assistance from the AI coach, who provides immediate, voice-based responses.
- Chat History: Users can review their conversation history to maintain context and continuity in their interactions with the AI coach.
This design aims to streamline the user experience by providing instant, voice-based interactions with an AI career coach, leveraging real-time transcription and translation to improve communication and user satisfaction.

Strategic Hybrid Approach for Enhanced User Experience and MVP Direction
Due to limitations in animating the AI coach, we opted for a hybrid approach that combines elements from Concept 1 and Concept 2 until technology allows for animating a flat image. This solution was chosen to provide the best possible experience for our users in the current context. To complement this, we’ve thoughtfully crafted an onboarding process that guides users on how to engage with the coach, including suggestions on what to ask if they remain silent. This approach ensures users receive a meaningful interaction even as we await advancements in animation technology.

Enhancing User Experience Through AI Microanimations
Microanimations in an AI coach’s interface, such as flat lines for listening, wave patterns for talking, and pulsing or rotating circles for thinking, are designed to enrich user interaction by visually communicating the AI's state. A flat line during listening signifies steady, passive attentiveness, indicating that the AI is focused on the user's input. When the AI is processing or thinking, pulsing or rotating circles offer a dynamic visual cue of activity, suggesting that the system is engaged in deep analysis. During speaking, varying wave patterns animate the AI’s responses, representing the fluidity and variability of conversation. These signals not only enhance the user experience by providing clear, visual feedback but also create a more engaging and intuitive interaction by mirroring the natural flow of communication.

Personalizing Voice Modality and Close-Ups for a Deeper Connection
The final UI designs emphasize voice modality and detailed close-ups of the AI coach, allowing users to forge a more personal connection and fully appreciate the MidJourney artwork. By showcasing the coach's visual and vocal presence in greater detail, users can engage more intimately with their coach, enhancing the overall experience and fostering a stronger sense of familiarity and connection. These images of coaches were created with Midjourny using the following prompt:
Prompt: a 3d portrait of a [insert gender, and category] in the style of unreal engine 5, soft-focus, wearing glasses [insert any type of clothing or style i wanted them to have] approachable smile, realistic lighting, maya rendering, realistic and hyper-detailed renderings, in the office isolated illustration, daytime soft natural lighting

Next Steps: Integrating Vasa 1 and Hume.AI for Future Enhancements
We are committed to continual improvement and innovation. The future integration of Vasa 1 and Hume.AI promises to elevate the AI coach experience by animating static images and providing natural, empathetic voice-overs. This advanced combination would create a more immersive and expressive interaction. However, due to current constraints on timing, finances, and ChatGPT token costs, the implementation of Vasa 1 and Hume.AI is postponed. In the interim, we will employ a more basic animation approach to maintain an engaging experience with the resources available.

Conclusion
To ensure consistency and a cohesive user experience across Jobcase, all teams will follow a unified voice modality feature set that aligns with our brand’s guidelines and values. This standardization covers voice interactions across the AI Digital Hiring Assistant, AI Career Counselors, and other voice-activated tools, creating a seamless experience for users whether they're engaging with Jobcase directly or through a third-party. By establishing clear and consistent voice rules, we also reduce engineering, product, and design time, enabling teams to work more efficiently and streamline feature development. This consistency sets the stage for expanding our offerings as a white-label service, allowing external partners to leverage our voice technology within their own branded ecosystems.
More case studies
Other important case studies