Skip to content

cris-m/voice-assistant

Repository files navigation

Voice Assistant

A real-time voice assistant powered by Google Gemini Live API. Talk to it like you'd talk to a friend—it's creative, conversational, and actually useful.

What it does

  • Listens in real-time using Web Audio API and AudioWorklet for low-latency processing
  • Responds naturally with a personality that feels like chatting with someone who gets you
  • Uses tools to search the web, get news, play music, tell jokes, and more
  • Streams responses so you hear answers as they're being generated
  • Visualizes audio with an animated waveform that reacts to what's happening

Tech stack

  • Frontend: React + TypeScript + Vite
  • Audio: Web Audio API + AudioWorklet for real-time PCM processing
  • API: Google Gemini Live API for streaming responses
  • Styling: Tailwind CSS
  • Development: HMR with ngrok tunneling for remote testing

Getting started

Prerequisites

Setup

  1. Clone and install dependencies:
npm install
  1. Create a .env.local file:
VITE_GEMINI_API_KEY=your_api_key_here
VITE_GEMINI_MODEL=gemini-2.0-flash-exp
  1. Start the dev server:
npm run dev
  1. Open http://localhost:3000 in your browser

How it works

Audio Input

  • Captures microphone input using the Web Audio API
  • Uses AudioWorklet (PCMProcessor) to process audio in a separate thread
  • Buffers audio into 2048-sample chunks (~128ms at 16kHz) for optimal latency
  • Converts to base64 for transmission to the Gemini API

Streaming

  • Sends audio chunks to Gemini Live API as they arrive
  • Receives text responses incrementally and renders them in real-time
  • Plays back audio responses using the Web Audio API

Tools

The assistant can be extended with tools (demo examples):

  • Search the web for current info
  • Get the time and set reminders
  • Check weather and news
  • Play music and tell jokes
  • Calculate and translate
  • Generate code and look up definitions

Project structure

src/
├── components/     # React components (Visualizer, Chat, etc.)
├── hooks/          # Custom React hooks (useLiveApi, etc.)
├── utils/          # Utilities (audio processing, tool execution)
├── audio/          # AudioWorklet processor
├── types/          # TypeScript interfaces
├── config/         # Configuration (API settings, tools)
└── styles/         # Global styles

Development

  • npm run dev - Start dev server
  • npm run build - Build for production
  • npm run preview - Preview production build locally

Notes

  • Audio is captured and buffered locally, then sent to Google's Gemini API for processing
  • The visualizer shows real-time audio activity with animated rings and particles
  • Animations only run when the assistant is speaking to save CPU
  • HMR is configured to work with ngrok for testing on remote devices

License

MIT

About

Real-time voice assistant powered by Google Gemini Live API.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages