Voice Assistant

A real-time voice assistant powered by Google Gemini Live API. Talk to it like you'd talk to a friend—it's creative, conversational, and actually useful.

What it does

Listens in real-time using Web Audio API and AudioWorklet for low-latency processing
Responds naturally with a personality that feels like chatting with someone who gets you
Uses tools to search the web, get news, play music, tell jokes, and more
Streams responses so you hear answers as they're being generated
Visualizes audio with an animated waveform that reacts to what's happening

Tech stack

Frontend: React + TypeScript + Vite
Audio: Web Audio API + AudioWorklet for real-time PCM processing
API: Google Gemini Live API for streaming responses
Styling: Tailwind CSS
Development: HMR with ngrok tunneling for remote testing

Getting started

Prerequisites

Node.js 16+ and npm
A Gemini API key from Google AI Studio

Setup

Clone and install dependencies:

npm install

Create a .env.local file:

VITE_GEMINI_API_KEY=your_api_key_here
VITE_GEMINI_MODEL=gemini-2.0-flash-exp

Start the dev server:

npm run dev

Open http://localhost:3000 in your browser

How it works

Audio Input

Captures microphone input using the Web Audio API
Uses AudioWorklet (PCMProcessor) to process audio in a separate thread
Buffers audio into 2048-sample chunks (~128ms at 16kHz) for optimal latency
Converts to base64 for transmission to the Gemini API

Streaming

Sends audio chunks to Gemini Live API as they arrive
Receives text responses incrementally and renders them in real-time
Plays back audio responses using the Web Audio API

Tools

The assistant can be extended with tools (demo examples):

Search the web for current info
Get the time and set reminders
Check weather and news
Play music and tell jokes
Calculate and translate
Generate code and look up definitions

Project structure

src/
├── components/     # React components (Visualizer, Chat, etc.)
├── hooks/          # Custom React hooks (useLiveApi, etc.)
├── utils/          # Utilities (audio processing, tool execution)
├── audio/          # AudioWorklet processor
├── types/          # TypeScript interfaces
├── config/         # Configuration (API settings, tools)
└── styles/         # Global styles

Development

npm run dev - Start dev server
npm run build - Build for production
npm run preview - Preview production build locally

Notes

Audio is captured and buffered locally, then sent to Google's Gemini API for processing
The visualizer shows real-time audio activity with animated rings and particles
Animations only run when the assistant is speaking to save CPU
HMR is configured to work with ngrok for testing on remote devices

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
public		public
src		src
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eslint.config.js		eslint.config.js
index.html		index.html
metadata.json		metadata.json
package.json		package.json
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Voice Assistant

What it does

Tech stack

Getting started

Prerequisites

Setup

How it works

Audio Input

Streaming

Tools

Project structure

Development

Notes

License

About

Uh oh!

Releases

Packages

Languages

License

cris-m/voice-assistant

Folders and files

Latest commit

History

Repository files navigation

Voice Assistant

What it does

Tech stack

Getting started

Prerequisites

Setup

How it works

Audio Input

Streaming

Tools

Project structure

Development

Notes

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages