A lightweight Go library for efficient English word validation using Bloom filters. Perfect for spell-checking, word games, and text validation with minimal memory footprint.
Bloom Words is a Go library that validates English words using Bloom filters—achieving fast lookups with minimal memory usage. Perfect for spell-checking, word validation, and text filtering.
- 🚀 Fast Lookup: O(1) constant-time word lookup using Bloom filter
- 💾 Memory Efficient: Compressed filter using bitsets, much smaller than storing all words
- 📖 Comprehensive Dictionary: Pre-built filter with 370,000+ English words
- ⚡ Streaming Support: Efficiently handle large datasets with minimal memory usage
- 🧪 Well Tested: Includes comprehensive test suite
Quick Stats:
- 370K+ English words compressed into just ~500KB of data
- Sub-microsecond lookups - test a word in less than 1 microsecond
- 99% accuracy - only 1% false positive rate on average
- Zero false negatives - if a word exists, you'll always find it
go get github.com/oosawy/bloom-wordspackage main
import (
"fmt"
"log"
bw "github.com/oosawy/bloom-words"
)
func main() {
// Test if a word exists in the dictionary
if bw.Test("hello") {
fmt.Println("'hello' is a valid word")
}
if !bw.Test("xyzabc") {
fmt.Println("'xyzabc' is likely not a valid word")
}
}Bloom Words uses Go's go:embed directive to embed the pre-built Bloom filter (filter/bloom_words.bf) directly into the binary. This eliminates the need to load external files at runtime and removes external dependencies. The embedded filter is loaded into memory during initialization, and all subsequent word lookups execute in constant O(1) time against this in-memory data.
To rebuild the Bloom filter from the word list:
go run ./cmd/build.goThis reads from datasets/words_alpha.txt and generates a new filter/bloom_words.bf.
The English word dataset used in this project is sourced from dwyl/english-words.
Run the test suite:
go test ./tests -vMIT