OmniDictate: Real-time AI Dictation GUI for Windows

Summary: Free, open-source, real-time dictation for Windows. Runs locally (no cloud!), uses AI (faster-whisper), and types directly into any application via a user-friendly GUI.

OmniDictate provides a modern desktop application for real-time speech-to-text on your Windows PC. It utilizes the optimized faster-whisper library (based on OpenAI's Whisper) for accurate transcription directly on your machine, ensuring privacy and offline capability. Text is typed directly into your active window.

Looking for the original command-line version? Find it here: OmniDictate-CLI Repository

Demo

Omnidictate.mp4

Features

Premium Slate & Glass UI (v2.0): A stunning, modern interface featuring a dark slate theme with frosted glass accents, designed for clarity and focus.
Whisper Ultra Support: Now supports the large-v3-turbo model for state-of-the-art accuracy with improved speed.
Streamlined Controls: Simplified experience with unified VAD/PTT toggle and removal of complex stop hotkeys.
Real-time Transcription: Converts speech to text with low latency.
Local & Private: All processing happens on your machine; no cloud required.
Type Anywhere: Simulates keyboard input into virtually any active Windows application (except OmniDictate itself).
Configurable Settings: Adjust model size, language (Auto-Detect or specific), VAD sensitivity, typing delay, hotkeys, and more via the interface. Settings are saved automatically.
Voice Activity Detection (VAD): Toggle automatic start/stop based on speech via a GUI button.
Push-to-Talk (PTT): Use a configurable global hotkey (Default: Right Shift) for manual control.
Voice Commands:
- Spoken Punctuation (e.g., "comma", "period").
Hallucination Filtering: Add/Remove specific repetitive phrases to filter from the output via the GUI.
Transcription Display: View the transcribed text within the application.
Copy Functionality: Easily copy the displayed transcription using a button.
Restore Defaults: Reset all configurable settings to their original values.

System Requirements

Before installing, ensure your system meets these requirements.

Hardware

Operating System: Windows 10 or 11 (64-bit).
Processor: Intel Core i5 or equivalent (quad-core+).
RAM: 8GB (16GB+ recommended).
Storage: Space for the Whisper model, approx 4.5 GB.
GPU (Highly Recommended): NVIDIA GPU with CUDA support (4GB+ VRAM, 6GB+ for larger models) for acceptable performance. CPU mode is supported but significantly slower.

Software & Drivers

Microsoft Visual C++ Redistributable: REQUIRED for both CPU and GPU usage.
- Download Visual Studio 2015-2022 Redistributable (x64)
NVIDIA Users (Critical): If you intend to use GPU acceleration, you MUST install the following before running the application:
1. NVIDIA Driver: Download Latest Driver
2. CUDA Toolkit 12.6: Download from NVIDIA Archive
3. Verify: Ensure nvcc --version works in your terminal.

Installation & Downloads

Important Considerations:

Large File Size: Includes Python runtime and AI models.
Unsigned Application: You may see a Windows SmartScreen warning ("Windows protected your PC"). Click "More info" -> "Run anyway" to proceed.

A. Using the Installer (`.exe`) - Recommended

Prerequisites: Ensure you have installed the items listed in System Requirements.
Download: Go to the Releases Page and download the latest OmniDictate_Setup_vX.Y.Z.exe.
Run Installer: Double-click the .exe. Accept the SmartScreen warning. Follow the prompts.
Launch: Use the Start Menu or Desktop shortcut.

B. Using the Portable ZIP (`.7z` Archive)

Prerequisites: Ensure you have installed the items listed in System Requirements.
Download: Go to the Releases Page and download OmniDictate_Portable_vX.Y.Z.7z.
Extract: Use 7-Zip to extract the archive to a folder of your choice.
Run: Open the extracted folder and double-click OmniDictate.exe.

Usage Guide

Launch OmniDictate.
Configure (Optional): Adjust settings in the "Configuration" section. Settings save automatically.
Start Dictation: Click the "Start Dictation" button. The status will update.
Dictate:
- VAD Mode (Default): Simply speak when the status is "Listening". Transcription starts automatically. Pause speaking to stop recording.
- Push-to-Talk (PTT): Hold down the configured PTT key (Default: Right Shift). Transcription occurs only while the key is held. This overrides VAD. (Toggle VAD off using the button if you only want PTT).
- Output: Text appears in the active application window (unless it's OmniDictate) and in the "Transcription Output" area in the GUI.
Use Commands: Say "comma", "at", "open bracket", etc., during dictation.
Stop Dictation: Click the "Stop" button.

Configuration via GUI

Model: Select Whisper model size (including large-v3-turbo).
Language: Supports Auto Detection and multiple languages (English, Spanish, French, German, etc.).
VAD Toggle: Enable/disable Voice Activity Detection.
Silence Threshold: VAD sensitivity (lower = more sensitive).
Typing Delay: Time (seconds) between typed characters.
PTT Hotkey: Click "Set" and press the desired key.
Filter Words: Add/Remove exact phrases to ignore.
Restore Defaults: Reset all settings.

Troubleshooting

CUDA is not available / Slow Performance: Verify GPU, NVIDIA Driver, matching CUDA Toolkit/cuDNN installation, and CUDA-enabled PyTorch installation. Check PATH variables.
Failed to load Python DLL... (when running .exe): Ensure Microsoft Visual C++ Redistributable (VS 2015-2022 x64) is installed.
Garbled Typing: Increase the "Typing Delay" setting. Test in Notepad first.
No Audio/VAD Not Working: Check default microphone in Windows Sound settings (ensure 16000 Hz, not exclusive mode). Adjust "Silence Threshold".
ModuleNotFoundError: Activate virtual environment and run pip install -r requirements.txt.
Hotkey Issues: Ensure no other app uses the same global hotkeys. Restart the app after changing keys.

Tested Versions (v2.0.x Build)

The pre-built application was compiled with the following versions. If you are building from source or troubleshooting GPU issues, aim for these:

Component	Version	Notes
Python	3.11.9	Download
PyTorch	2.6.0+cu126	Get Started or (pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126)
CUDA Toolkit	12.6	Download
faster-whisper	1.1.1	Source
ctranslate2	4.5.0	Source

Building from Source (Optional)

If you prefer to build from source:

Install Python 3.11.9 and Git.
Clone Repo: git clone https://github.com/gurjar1/OmniDictate.git && cd OmniDictate
Create & Activate Venv: python -m venv venv and activate it.

Install PyTorch (CUDA Version):

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

Install Other Dependencies: pip install -r requirements.txt
Run: python main_gui.py

License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.

✅ Free for personal/non-commercial use.
🚫 Commercial use requires explicit permission.

(See the LICENSE file for full details).

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
images		images
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
OmniDictate.spec		OmniDictate.spec
OmniDictate_Setup.iss		OmniDictate_Setup.iss
README.md		README.md
compress_video.py		compress_video.py
core_logic.py		core_logic.py
hotkey_listener.py		hotkey_listener.py
icon.ico		icon.ico
main_gui.py		main_gui.py
readme_after.txt		readme_after.txt
readme_before.txt		readme_before.txt
requirements.txt		requirements.txt
style.qss		style.qss
wizard-image.bmp		wizard-image.bmp
wizard-small-image.bmp		wizard-small-image.bmp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OmniDictate: Real-time AI Dictation GUI for Windows

Demo

Features

System Requirements

Hardware

Software & Drivers

Installation & Downloads

A. Using the Installer (`.exe`) - Recommended

B. Using the Portable ZIP (`.7z` Archive)

Usage Guide

Configuration via GUI

Troubleshooting

Tested Versions (v2.0.x Build)

Building from Source (Optional)

License

About

Uh oh!

Releases 4

Packages

Uh oh!

Languages

License

gurjar1/OmniDictate

Folders and files

Latest commit

History

Repository files navigation

OmniDictate: Real-time AI Dictation GUI for Windows

Demo

Features

System Requirements

Hardware

Software & Drivers

Installation & Downloads

A. Using the Installer (.exe) - Recommended

B. Using the Portable ZIP (.7z Archive)

Usage Guide

Configuration via GUI

Troubleshooting

Tested Versions (v2.0.x Build)

Building from Source (Optional)

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Languages

A. Using the Installer (`.exe`) - Recommended

B. Using the Portable ZIP (`.7z` Archive)

Packages