Skip to content

KevinGallaccio/murmure

Repository files navigation

murmure

murmure

On rend le murmure visible.

A live-transcription display for hearing-impaired audiences at podcast festivals and live events.

MIT License Electron 33 UI: FR · EN Latest release


murmure captures audio from a microphone, transcribes it in real time using your choice of Speechmatics (default) or AssemblyAI, and projects the transcript full-screen onto a secondary monitor with typography you control. It's a small, focused desktop app for one job: making the spoken word readable in the room.

The operator UI ships in French and English (auto-detected from your OS, switchable from a globe icon next to the version). The transcription language is selectable in Réglages / Setup (French and English in v1.2; more on request).

It runs locally, talks directly to your chosen STT provider, and stores nothing in the cloud.


Why this exists

Live captioning at small events is usually expensive, technical, or both. The tools that do exist tend to assume a broadcast operator, a captioning workstation, and a software stack that costs more than the festival's coffee budget.

murmure is the opposite: one laptop, one microphone, one HDMI cable to the audience screen, one API key. An operator can set it up in five minutes, and the audience reads the transcript in whatever font, size, and contrast suits the room.

If you're running a small French-language event and you'd like to make it accessible, this might be the simplest path.


Install

Pre-built installers are published on the Releases page — grab the .dmg for macOS or .exe for Windows.

The first time you open an unsigned .dmg, macOS Gatekeeper will refuse with "developer cannot be verified." Right-click the app → Open → confirm in the dialog; subsequent launches work normally. The same applies to Windows SmartScreen (More info → Run anyway).

macOS first-launch quirks

murmure isn't signed with an Apple Developer ID (yet — see below), which makes macOS treat it with extra suspicion. You may bump into one or both of these on first install:

The mic prompt re-appears even after clicking Allow. macOS's TCC (the privacy framework) uses an app's signature as part of its identity. Because we're ad-hoc signed and not notarized, TCC sometimes can't reliably remember the grant. Workaround:

# Wipe murmure's TCC entry so macOS treats the next launch as a clean first run
tccutil reset Microphone app.murmure

Then launch the app, click Allow on the mic prompt, and grant access in System Settings → Privacy & Security → Microphone if prompted again. After macOS commits the grant (it sometimes takes two or three clicks under Sequoia), the prompt is gone for good — you should be able to quit and relaunch with no further dialogs.

The icon shows as a generic placeholder in System Settings. macOS caches app icons aggressively. After the first install, run killall Finder Dock to refresh; the murmure logo will show up correctly on the next opening.

These rough edges go away once the app is properly signed with a paid Apple Developer ID + notarized. That's on the roadmap; until then, the workarounds above are the path.


Develop

You need Node.js (brew install node on macOS) and an API key from one of the supported providers — Speechmatics (recommended; 480 free real-time minutes/month) or AssemblyAI.

git clone https://github.com/KevinGallaccio/murmure.git
cd murmure
npm install
npm run dev

Then in the app:

  1. Open Réglages / Setup, pick a provider (Speechmatics is the default), set the transcription language, then paste your API key and click Tester / Test to verify.
  2. Pick your microphone under Source audio / Audio source — the VU meter goes live as soon as a device is selected.
  3. Click Ouvrir l'affichage / Open display to spawn the audience-facing window. With a second monitor connected, it opens fullscreen there automatically.
  4. Click Diffuser / Broadcast to start streaming.

While idle, the display shows a Victor Hugo extract through the same partial/final pipeline as live transcription, so you can dial in typography from the operator's chair without anyone in the room.

To produce installers locally:

npm run dist:mac    # produces dist-electron/*.dmg
npm run dist:win    # produces dist-electron/*.exe (NSIS)

How it works

┌─────────────────────────────────────────────────────────────┐
│                    Main process (Node)                      │
│  • Window lifecycle (control + display)                     │
│  • STT WebSocket client (Speechmatics or AssemblyAI)        │
│  • Encrypted settings (electron-store + safeStorage)        │
│  • Session duration tracking → local cost estimate          │
│  • IPC hub between the two renderers                        │
└──────┬──────────────────────────────────┬───────────────────┘
       │ IPC                              │ IPC
┌──────▼─────────────────┐       ┌────────▼───────────────────┐
│  Control renderer      │       │  Display renderer          │
│  (operator's screen)   │       │  (audience screen, HDMI)   │
│                        │ audio │                            │
│  • API key & device    │ + txt │  • Full-bleed transcript   │
│  • Audio capture       │◄─────►│  • Live restyling via CSS  │
│  • Diffuser button     │       │  • No chrome, no controls  │
│  • Style + preview     │       │                            │
└────────────────────────┘       └────────────────────────────┘

Three processes, two windows, one WebSocket. Audio is captured in the control renderer (browser APIs: getUserMedia + an AudioWorklet that downsamples 48 kHz float → 16 kHz int16 PCM in ~100 ms chunks), forwarded to the main process over IPC, and streamed to whichever STT provider is selected via ws. Both providers implement a small STTClient interface in src/main/stt-client.ts, so the audio path doesn't care which is active. Transcripts come back, get broadcast to both renderers, and the display restyles in real time via CSS custom properties pushed over IPC.

API keys never enter the renderer's JS heap — they stay decrypted only in main, encrypted at rest per-provider via OS-level safeStorage (Keychain on macOS, DPAPI on Windows).

The control window is a three-column workspace: a collapsible sidebar (Configuration / Source / Journal / Costs) on the left, the audience preview as a centered "stage" in the middle, and an always-visible Apparence inspector on the right. The pattern is borrowed from professional creative tools (Figma, Logic Pro): you tweak typography on the right and watch the result on the stage at the same time, never scrolling.

Project layout

murmure/
├── resources/
│   ├── icon.svg / icon.icns / icon.ico   # identity
│   └── fonts/                            # bundled woff2 (Inter, Manrope, …)
├── src/
│   ├── main/                # Electron main process
│   ├── preload/             # contextBridge APIs (per-window)
│   ├── renderer-control/    # operator window (React)
│   ├── renderer-display/    # audience window (React)
│   └── shared/              # IPC types, style types, constants
├── design/
│   └── logos.html           # logo studies (review document)
├── electron-builder.yml     # .dmg / .exe build config
└── .github/workflows/       # CI: build & publish on tag push

Configuration & cost

Neither provider exposes a public balance API, so murmure tracks streaming duration locally and estimates cost at a per-provider configurable rate (Speechmatics $0.24/h, AssemblyAI $0.45/h — both the published base rates at time of writing). Switching providers updates the rate displayed in the cost card; per-provider customizations persist independently. The figure under Coûts & usage / Costs & usage is an estimate — for the canonical number, click Tableau de bord / Dashboard to open the active provider's billing portal in your browser.

The display style — typeface, size, line height, colors, padding, alignment, max visible lines — is fully customizable from the inspector with a live preview. Three presets ship: Grand contraste / High contrast (high contrast for low-vision audiences), Sobre / Subtle (softer, longer reading sessions), and Lecture longue / Long reading (extended sessions with looser line spacing).

Six typefaces ship bundled for the audience display (no remote fonts at runtime): Inter, Manrope, Atkinson Hyperlegible (the one designed by the Braille Institute for low-vision readers — recommended), IBM Plex Sans, Roboto Slab, and JetBrains Mono. The operator UI uses Fraunces for editorial titles and Manrope for body text.

How each provider commits text

The two providers have different models for when a transcript becomes "final." Both produce the same audience experience in murmure (one line per sentence, with words fading in as they're recognized) but the mechanism differs:

  • Speechmatics emits partial transcripts continuously every ~500 ms while you're talking, regardless of silence, and commits a final every max_delay (we ship 1.5 s). This is the right choice for continuous speech — interviews, debates, fast monologues. murmure aggregates the per-window commits into sentence-bounded finals client-side so the audience sees one line per sentence rather than one line per 1-3 words.
  • AssemblyAI is turn-based: a final commits when the model detects a silence pause. murmure ships with tightened defaults — min_end_of_turn_silence_when_confident: 100 ms, max_turn_silence: 1400 ms — and a 5 s ForceEndpoint watchdog for monologues without natural pauses. See src/shared/constants.ts → ASSEMBLY_PARAMS.

Identity

The icon — three discrete dots transitioning into a continuous line — visualizes the brand promise: scattered sound becoming continuous text. The leftmost dot is rendered in #2745CF, exactly the color the app uses for partial (in-flight) transcripts, deliberately closing the loop between identity and interface. The wordmark is set in Manrope.

See design/logos.html for the full review document (open it in a browser).


Contributing & forking

This is a small, deliberate project — I'm happy to take PRs that fix bugs or improve accessibility. The scope is intentionally narrow. If you want to fork it for a different language, different STT backend, or different deployment shape, please do; that's why it's MIT-licensed.

Things explicitly out of scope: transcript export to file (live-only by design), speaker diarization, multi-language switching mid-stream, cloud sync, auto-update, telemetry.


Credits

  • Speechmatics — real-time WebSocket API, broadcast model
  • AssemblyAI — Universal-3 Pro Streaming model
  • Atkinson Hyperlegible — Braille Institute, designed for low-vision readers
  • Electron, Vite, React — the boring-but-correct desktop stack
  • Victor Hugo, Demain dès l'aube — used as the demo-mode placeholder text

License

MIT — see LICENSE.

About

Live transcription display for hearing-impaired audiences at podcast festivals. Local desktop app, AssemblyAI-powered.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors