Skip to content

babelcloud/novnc-audio-server

Repository files navigation

noVNC Audio Server

TypeScript Node.js

A standalone audio server for noVNC that captures system audio via PulseAudio, encodes with Opus, and streams over WebSocket.

📌 Important: This server requires the babelcloud/noVNC fork which includes client-side audio support. The upstream novnc/noVNC does not support audio playback.

Features

  • 🎵 Audio Capture: Captures system audio via PulseAudio monitor sources
  • 🗜️ Opus Encoding: Efficient audio compression with Opus codec (fallback to PCM)
  • 🌐 WebSocket Streaming: Real-time audio streaming to multiple clients
  • ⏱️ Timestamp Sync: Adds timestamps for audio-video synchronization
  • 🔧 TypeScript: Full type safety and excellent developer experience
  • Tested: Comprehensive test suite with Vitest

⚠️ Important: Audio playback requires user interaction in the browser (click or keypress) due to browser autoplay policies. This is a browser security feature, not a bug.

Architecture

Independent WebSocket connection for audio streaming, separate from VNC video stream:

┌─────────────────────────────────────────┐
│  Server Container                       │
│                                         │
│  ┌──────────┐      ┌───────────────┐    │
│  │  x11vnc  │      │  PulseAudio   │    │
│  └────┬─────┘      └───────┬───────┘    │
│       │                    │ monitor    │
│  ┌────▼─────┐      ┌───────▼────────┐   │
│  │websockify│      │  novnc-audio-  │   │
│  │:6080/vnc │      │  server :6090  │   │
│  └──────────┘      └────────────────┘   │
└─────────────────────────────────────────┘
           │                  │
           ↓ Video            ↓ Audio
    ┌──────────────────────────────────┐
    │  Browser (noVNC Client)          │
    └──────────────────────────────────┘

Prerequisites

Server Side (This Project)

  • Node.js >= 18.0.0
  • pnpm (recommended) or npm
  • PulseAudio (Linux only)
    # Ubuntu/Debian
    sudo apt-get install pulseaudio pulseaudio-utils
    
    # Fedora/RHEL
    sudo dnf install pulseaudio pulseaudio-utils

Note: This server requires Linux with PulseAudio. macOS and Windows are not supported.

Client Side (Browser)

  • babelcloud/noVNC - Required!

    The upstream noVNC does not include audio support. You must use the babelcloud fork:

    # Clone the audio-enabled noVNC fork
    git clone https://github.com/babelcloud/noVNC.git
    cd noVNC
    
    # Install and build
    pnpm install
    pnpm run build
    
    # Serve the client
    # ... (use your web server or noVNC's built-in server)

    What's included in the fork:

    • WebSocket audio stream connection
    • Opus decoder (opus-decoder)
    • Audio playback via Web Audio API
    • Audio/video synchronization
    • Audio controls in UI

Native Opus Encoder (Recommended)

For better performance and lower bandwidth, install a native Opus encoder:

# Recommended - actively maintained
pnpm add @discordjs/opus

⚠️ Build Requirements: Native Opus encoder requires compilation:

# Ubuntu/Debian
sudo apt-get install build-essential python3 libopus-dev

# Fedora/RHEL
sudo dnf install gcc-c++ make python3 opus-devel

# Alpine Linux
apk add --no-cache python3 make g++ opus-dev

Docker Users: Install build tools before pnpm install:

RUN apt-get update && apt-get install -y \
    python3 make g++ pkg-config libopus-dev
RUN pnpm install

⚠️ pnpm v10+ Security Policy: If build scripts are blocked, manually trigger compilation:

# After pnpm install, manually build native module
RUN cd node_modules/.pnpm/@discordjs+opus@*/node_modules/@discordjs/opus && \
    npm run install

Verification:

# Check if native module is built
ls -la node_modules/.pnpm/@discordjs+opus@*/node_modules/@discordjs/opus/prebuild/

# You should see compiled .node files

Without a native encoder, the server automatically falls back to PCM passthrough (larger bandwidth).

Quick Start

1. Setup noVNC Client (Required)

# Clone the audio-enabled noVNC fork
git clone https://github.com/babelcloud/noVNC.git
cd noVNC
pnpm install && pnpm build

# Start noVNC server (example)
./utils/novnc_proxy --vnc localhost:5900 --listen 6080

2. Setup Audio Server (This Project)

# Clone this repository
git clone https://github.com/babelcloud/novnc-audio-server.git
cd novnc-audio-server

# Install dependencies (including native Opus encoder)
pnpm install

# Build
pnpm build

# Start audio server
pnpm start -- --port 6090 --device default

3. Connect

  1. Open browser: http://localhost:6080/vnc.html
  2. Connect to VNC server
  3. Click anywhere on the page to enable audio (browser autoplay policy)
  4. Audio should start playing automatically!

Audio URL Configuration: The noVNC client automatically builds the audio URL from the VNC URL. For example:

  • VNC: ws://localhost:6080/websockify → Audio: ws://localhost:6090/audio
  • VNC: wss://example.com:6081/vnc → Audio: wss://example.com:6091/audio

Installation

# Clone repository
git clone https://github.com/babelcloud/novnc-audio-server.git
cd novnc-audio-server

# Install dependencies
pnpm install

# Build
pnpm build

Usage

Development Mode

# Hot reload for development
pnpm dev

# With custom options
pnpm dev -- --port 6090 --device default

Production Mode

Option 1: Using pnpm (Recommended)

# Build and run
pnpm build
pnpm start

# With custom options
pnpm start -- --port 6090 --device default

Option 2: Global Installation (Most Convenient)

# Install globally
pnpm link --global

# Now you can use it anywhere
novnc-audio --port 6090 --device default

Option 3: Direct Execution (For Scripts/Containers)

# Direct execution (after build)
./dist/main.js --port 6090 --device default

# Or with node
node dist/main.js --port 6090 --device default

CLI Options

Options:
  -p, --port <port>              WebSocket port (default: "6090")
  -H, --host <host>              Host to bind to (default: "0.0.0.0")
  --path <path>                  WebSocket path (default: "/audio")
  -d, --device <device>          PulseAudio device name (default: "default")
  -r, --sample-rate <rate>       Audio sample rate in Hz (default: "48000")
  -c, --channels <channels>      Audio channels - 1=mono, 2=stereo (default: "2")
  -b, --bitrate <bitrate>        Opus bitrate in bps (default: "64000")
  -l, --latency <latency>        Capture latency in microseconds (default: "20000")
  --no-monitor                   Do not append .monitor to device name
  --codec <codec>                Audio codec: opus or pcm (default: "opus")
  -h, --help                     Display help

Examples

# Basic usage (after global install)
novnc-audio --port 6090 --device default

# Or with pnpm
pnpm start -- --port 6090 --device default

# Custom PulseAudio device
novnc-audio --device alsa_output.pci-0000_00_1f.3.analog-stereo

# Low latency, high quality
novnc-audio --latency 10000 --bitrate 128000 --sample-rate 48000

# PCM mode (no encoding)
novnc-audio --codec pcm

Programmatic Usage

import { AudioServer, AudioCodec } from 'novnc-audio-server';

const server = new AudioServer({
  capture: {
    device: 'default',
    sampleRate: 48000,
    channels: 2,
    format: 's16le',
    latency: 20000,
    useMonitor: true,
  },
  encoder: {
    sampleRate: 48000,
    channels: 2,
    bitrate: 64000,
    frameSize: 960,
  },
  websocket: {
    port: 6090,
    path: '/audio',
    host: '0.0.0.0',
  },
  codec: AudioCodec.OPUS,
});

await server.start();

// Get statistics
const stats = server.getStats();
console.log(stats);

// Graceful shutdown
await server.stop();

Audio Frame Format

Audio frames sent over WebSocket follow this binary format:

┌──────────────┬──────────┬──────────────┐
│  Timestamp   │  Codec   │  Audio Data  │
│  (8 bytes)   │ (1 byte) │   (variable) │
│  BigUInt64BE │  0x00/01 │    Buffer    │
└──────────────┴──────────┴──────────────┘
  • Timestamp: Milliseconds since server start (for A/V sync)
  • Codec: 0x00 = PCM, 0x01 = Opus
  • Audio Data: Encoded audio bytes

PulseAudio Device Management

# List all audio output devices
pactl list short sinks

# List monitor sources (for capturing output)
pactl list short sources | grep monitor

# Test audio capture
parec --device=default.monitor --format=s16le | aplay

Client Integration

📌 Note: Client-side audio support is already implemented in babelcloud/noVNC. You don't need to implement this yourself!

The noVNC fork includes:

  • core/audio/audiostream.js - WebSocket audio connection
  • core/audio/audio-decoder.js - Opus/PCM decoder
  • core/audio/audio-player.js - Web Audio API playback
  • Automatic connection to audio server when VNC connects

For reference, the audio frame format is:

// Audio frame structure (handled automatically by babelcloud/noVNC)
const audioWs = new WebSocket('ws://server:6090/audio');
audioWs.binaryType = 'arraybuffer';

audioWs.onmessage = (event) => {
  const data = new Uint8Array(event.data);
  
  // Parse frame (8 bytes timestamp + 1 byte codec + audio data)
  const timestamp = new DataView(data.buffer).getBigUint64(0);
  const codec = data[8]; // 0x00 = PCM, 0x01 = Opus
  const audioData = data.slice(9);
  
  // Decode and play (implemented in babelcloud/noVNC)
  // ...
};

If you want to integrate into your own client:

See the implementation in babelcloud/noVNC:

  • core/audio/ directory for complete audio stack
  • core/rfb.js - Audio initialization and connection logic
  • app/ui.js - Audio controls UI

Performance

Bandwidth Usage

Configuration Bandwidth Quality Use Case
Opus 32 kbps, mono ~32 Kbps Basic Voice, minimal bandwidth
Opus 64 kbps, stereo ~64 Kbps Good General use (recommended)
Opus 128 kbps, stereo ~128 Kbps High Music, high quality
PCM 48kHz stereo ~1.5 Mbps Lossless Development only

Latency

  • Audio Capture: 20ms (configurable)
  • Encoding: 5-10ms (Opus)
  • Network: Variable
  • Total: ~50-150ms typical

Related Projects

  • babelcloud/noVNC ⭐ - Required noVNC fork with audio support
    • This is the only compatible noVNC client for this audio server
    • Includes Opus decoder and audio playback components
    • Based on novnc/noVNC with audio extensions
    • See FORK_CHANGES.md for details

Support

LICENSE

MIT

About

A standalone audio server for noVNC - captures system audio via PulseAudio, encodes with Opus, and streams over WebSocket

Topics

Resources

License

Stars

Watchers

Forks

Contributors