Ultra-High-Performance NUMA-aware Zero-Copy Batch-IO, Lock-free Library for Go
![]() |
Dear sponsors! We are looking for partners to provide a high-end testing environment in Moscow for hioload-ws — a WebSocket library delivering microsecond-level end-to-end latency under extreme loads. If your infrastructure can host millions of simultaneous connections and supports advanced p99 latency analytics — get in touch to collaborate! Contact: [email protected] |
|---|
Features
- Zero-copy data paths with NUMA-aware buffer pools
- Poll-mode I/O (epoll, IOCP, io_uring ready)
- Lock-free and wait-free concurrency primitives
- Batch processing for maximum throughput
- Pluggable transports: TCP, DPDK, io_uring, Windows IOCP
- CPU/NUMA affinity and hugepage support for ultra-low latency
- Hot-reloadable configuration, runtime metrics, and debug probes
Platform Support
- Linux (kernel ≥6.20)
- Windows Server (2016+)
- Go 1.21+ (1.23 recommended)
2×Intel Xeon Gold 6248R (24C, 48T, 3.0GHz), 256ГБ DDR4, 2×1.92ТБ NVMe, 2×100GbE, NUMA, Gentoo Linux 6.12 / Windows Server 2019 Datacenter
hioload-ws is a cross-platform, production-grade, extensible skeleton for building ultra-performing WebSocket and custom network servers in Go. The project fuses best-in-class approaches from DPDK-like data plane architectures—zero-copy, NUMA-awareness, affinity, batch-processing, lock-free data structures—with scalable event-driven design. It is crafted for maximum throughput, lowest possible latency, and industrial integrity when serving millions of concurrent connections on modern Linux or Windows servers.
This repository targets backend engineers, distributed-systems architects, and hardware-oriented developers needing extreme performance and resource efficiency (RAM/CPU per connection) combined with architectural modularity and testability—whether for powering messaging infrastructure, cloud backends, edge gateways, or research prototypes.
-
Zero-Copy Data Path
- Buffers are always managed via NUMA-aware pools.
- Send/receive methods operate in batch, directly referencing pooled memory.
- All framed protocol operations are zero-alloc and zero-copy by design.
-
NUMA Awareness and Locality
- Pools and executors can be instantiated per NUMA node or CPU.
- Transport/buffer layers offer thread/core/Numa pinning to minimize cross-node traffic.
- Memory and connection pools are segmented for strict data locality.
-
Batching and Poll-Mode IO
- Pollers and reactors operate in batch, receive/send up to N frames/packets at once.
- Event loops avoid thread-per-connection; all IO is poll-driven (epoll, IOCP, future io_uring).
- Offers easy extension to DPDK/XDP/AF_XDP/container native deployment models.
-
Lock-Free and Wait-Free Critical Path
- Ring buffers, queues, and handler registries are lock-free to avoid contention and context switches.
- Event loops implement adaptive spin-wait backoff, replacing traditional mutexes.
-
Transport Flexibility
- Unified transport abstraction covering standard sockets, DPDK, io_uring, and Windows IOCP.
- OS detection and conditional compilation for full cross-platform compatibility.
-
Extensible Pool/Buffer and Reactor Interfaces
- Modular, strongly validated contracts for buffer pools, executors, pollers.
- All interfaces documented and engineered for mocking and fast development.
- Seamless switch between production-grade and test/mock components.
-
Advanced Testing/Mock/Profiling Integrations
- Fakes for all core interfaces, enabling deep unit and integration testing.
- Runtime metrics, hot-reload config, and debug probes for live observability.
-
English Language Documentation and Inline Comments
- Every source file features detailed English commentary explaining domain logic.
- Multi-level README structure for onboarding, architecture, and system integration.
- Project Goals
- Design Principles
- System Requirements
- Directory Overview
- Component Breakdown
- Quick Start & Usage
- Testing and Best Practices
- License & Notices
- Roadmap & Contribution
- Frequently Asked Questions
hioload-ws was conceived to answer the needs of organizations and independent developers running the world’s largest, most latency-sensitive networking workloads. Core objectives:
- Exploit full modern hardware capabilities: multi-core, NUMA, hugepages, kernel zero-copy, and affinity.
- Expose a clean, extensible interface spanning all performance-critical domains: pooling, reactor, handler, protocol, transport.
- Guarantee production observability and testability without sacrificing throughput.
- Offer easy extensibility for adding advanced transports (DPDK/io_uring), metrics, and session/state management.
- Zero-copy from transport to protocol: pool-backed, page-aligned buffer management everywhere.
- NUMA and affinity first: explicit CPU/NUMA assignment and memory partitioning.
- Event loop-centric model: Poll-mode service, batched dispatch, minimal per-connection Syscalls.
- Lock-free/wait-free concurrency: no blocking Mutexes in hotpaths.
- Testable, replaceable, auditable: each component is an interface, mockable for CI/CD.
- Professional, clear English documentation: code and comments readable by SREs and systems architects worldwide.
- Go version: 1.21 or above (1.23 recommended for stability and performance).
- Operating Systems: Linux (kernel 6.20+ preferred for io_uring), Windows Server 2016 or later.
- CPU: Modern x86_64/amd64 with multi-core, NUMA (for production deployments).
- Optional: DPDK or io_uring libraries for cutting-edge transport integration.
| Directory | Description |
|---|---|
/api/ |
Public interfaces (buffer pool, reactor, handler, transport, etc.) |
/adapters/ |
Glue layers adapting internal logic to API contracts |
/internal/ |
System-specific, performance-critical implementations |
/pool/ |
Object and byte pools, lock-free ring buffers, batching |
/transport/ |
High-speed, zero-copy transport adapters for OS/hardware |
/protocol/ |
WebSocket protocol, framing, parsing, (zero-copy everywhere) |
/session/ |
NUMA- and concurrency-aware session/context management |
/control/ |
Config, live metrics, hot-reload, hooks, debug/probes |
/examples/ |
Realistic echo server, fake/mock-based tests, stress suites |
/benchmarks/ |
Performance measurement and regression tracking |
/tests/ |
Automated integration testing, system-level correctness checks |
/fake/ |
Mock/fake implementations for every interface and layer |
- Buffer/BufferPool: Page-based, NUMA-aware, reference-counted memory management.
- Transport: Zero-copy, batch-oriented sender/receiver for network communication.
- Poller/Reactor: DPDK/NAPI-style reactor pattern, high-rate/conflict-free handler dispatch.
- Handler/Batch: Typed, modular data/event processing (pluggable middleware).
- Executor/ThreadPool: NUMA- and concurrency-aware background task runner.
- Affinity: CPU and NUMA pinning with runtime reconfiguration.
- Control: Hot-reload configuration registry (atomic snapshot), runtime metrics, debug probes.
- Ring: Lock-free FIFO for high-throughput message passing between goroutines and threads.
- Scheduler: High-res timer/event scheduling, all cancelable.
- RingBuffer: Fast, lock-free array-based ring buffer for inter-thread/goroutine transfer.
- Batch: Non-allocating wrapper over slices for batch traversal and split/slice ops.
- BufferPoolManager: Segments buffer pools by NUMA node—fast O(1) get/put memory regions.
- Linux (epoll, SendmsgBuffers): Kernel sockets, zero-copy batch using slices, extensible for io_uring in the future.
- Windows (WSASend/Recv, IOCP): Overlapped IO, batch-capable, direct use of system handles.
- Cross-platform transport registry: Conditional compilation, safe instantiation, and feature query everywhere.
- WebSocket frame encode/decode: Strict RFC6455, masking/unmasking, header/payload, direct buffer references.
- Frame parsing: Zero-allocation, zero-copy, all payload slices reference pooled memory.
- Pluggable protocol actions: Simple extension for ping/pong, control frames, and future extensions (compression, multiplexing).
git clone https://github.com/momentics/hioload-ws.git
cd hioload-ws
go mod tidy
go test ./...
go run ./examples/lowlevel/echo/main.go
# Connect with any ws-compatible client (browser, wscat, websocat, etc.)
- Use provided fakes/mocks for unit tests on every public interface.
- Always obtain buffers via pool (
Get(size, numa)), release withRelease()—never allocate []byte directly in hot path. - Prefer batch transport methods, not per-message send; use slice-based batch ops for throughput.
- Pin event loops/reactors and executors per NUMA region for best cache locality.
- Use debug probes, metrics registry, and live config reload for tuning in production.
hioload-ws is licensed under the Apache License, Version 2.0.
NOTICE and LICENSE files include a summary of all included third-party components (MIT, BSD) and special attributions.
- hioload-ws core: Apache 2.0
- github.com/eapache/queue: MIT License (included via Go Modules)
- golang.org/x/sys: BSD-like License (included via Go Modules)
- No GPL/LGPL copyleft code; all dependencies are compatible and vetted.
Future work will focus on:
- Full io_uring and DPDK transport extensions.
- Advanced NUMA-based buffer reuse and buffer cache tuning.
- Autoscaling/reactor rebalance scheduling.
- Protocol extensibility (compression, multiplex, HTTP2 upgrades).
- Richer metrics/telemetry and built-in live admin APIs.
- Open integration interfaces for shared memory, RDMA, and kernel-bypass transports.
- Sample deployments for Kubernetes/bare-metal scaling/edge.
Contributions are welcome!
Open PRs, discuss features, file bugs, and prototype blueprints in the GitHub repository.
Q: Is this library "production-ready"?
A: This repository is an advanced skeleton, architected for production—but not a monolithic drop-in for all use-cases. Deployment, protocol policy, security, and low-level tuning are up to integrators.
Q: Does it require DPDK/io_uring?
A: No. All core components work on standard kernels with or without advanced backends.
Q: Is Windows supported?
A: Yes. All modern Windows with IOCP and overlapped sockets.
Q: How are dependencies managed?
A: Go Modules with full version pinning and auditing in LICENSE/NOTICE.
Q: Where to find more usage/extension examples?
A: See /examples/, detailed file headers, and open community discussions on GitHub.
Developed and maintained by momentics [email protected]
Special thanks to the Go, DPDK, and io_uring communities for foundational work and inspiration.
For full technical and legal details, see LICENSE and NOTICE in the project root.



