A real-time voice assistant powered by Google Gemini Live API. Talk to it like you'd talk to a friend—it's creative, conversational, and actually useful.
- Listens in real-time using Web Audio API and AudioWorklet for low-latency processing
- Responds naturally with a personality that feels like chatting with someone who gets you
- Uses tools to search the web, get news, play music, tell jokes, and more
- Streams responses so you hear answers as they're being generated
- Visualizes audio with an animated waveform that reacts to what's happening
- Frontend: React + TypeScript + Vite
- Audio: Web Audio API + AudioWorklet for real-time PCM processing
- API: Google Gemini Live API for streaming responses
- Styling: Tailwind CSS
- Development: HMR with ngrok tunneling for remote testing
- Node.js 16+ and npm
- A Gemini API key from Google AI Studio
- Clone and install dependencies:
npm install- Create a
.env.localfile:
VITE_GEMINI_API_KEY=your_api_key_here
VITE_GEMINI_MODEL=gemini-2.0-flash-exp
- Start the dev server:
npm run dev- Open http://localhost:3000 in your browser
- Captures microphone input using the Web Audio API
- Uses AudioWorklet (
PCMProcessor) to process audio in a separate thread - Buffers audio into 2048-sample chunks (~128ms at 16kHz) for optimal latency
- Converts to base64 for transmission to the Gemini API
- Sends audio chunks to Gemini Live API as they arrive
- Receives text responses incrementally and renders them in real-time
- Plays back audio responses using the Web Audio API
The assistant can be extended with tools (demo examples):
- Search the web for current info
- Get the time and set reminders
- Check weather and news
- Play music and tell jokes
- Calculate and translate
- Generate code and look up definitions
src/
├── components/ # React components (Visualizer, Chat, etc.)
├── hooks/ # Custom React hooks (useLiveApi, etc.)
├── utils/ # Utilities (audio processing, tool execution)
├── audio/ # AudioWorklet processor
├── types/ # TypeScript interfaces
├── config/ # Configuration (API settings, tools)
└── styles/ # Global styles
npm run dev- Start dev servernpm run build- Build for productionnpm run preview- Preview production build locally
- Audio is captured and buffered locally, then sent to Google's Gemini API for processing
- The visualizer shows real-time audio activity with animated rings and particles
- Animations only run when the assistant is speaking to save CPU
- HMR is configured to work with ngrok for testing on remote devices
MIT