NoKV's flush path converts immutable memtables into L0 SST files, then advances the manifest WAL checkpoint and reclaims obsolete WAL segments. The queue and timing bookkeeping live directly in lsm/flush_runtime.go; SST persistence and manifest install are in lsm/builder.go and lsm/levels.go.
- Persistence: materialize immutable memtables into SST files.
- Ordering: publish SST metadata to manifest only after the SST is durably installed (strict mode).
- Cleanup: remove WAL segments once checkpoint and raft constraints allow removal.
- Observability: export queue/build/release timing through flush metrics.
flowchart LR
Active[Active MemTable]
Immutable[Immutable MemTable]
FlushQ[flush queue]
Build[Build SST]
Install[Install SST]
Release[Release MemTable]
Active -->|threshold reached| Immutable --> FlushQ
FlushQ --> Build --> Install --> Release --> Active
- Enqueue:
lsm.submitFlushpushes the immutable memtable into the concrete flush queue and records wait-start time. - Build: worker pulls the next task, builds the SST (
levelManager.flush->openTable->tableBuilder.flush). - Install: after SST + manifest edits succeed, the worker records install timing.
- Release: worker removes the immutable from memory, closes the memtable, records release timing, and completes the task.
Flush uses two write modes controlled by Options.ManifestSync:
-
Fast path (
ManifestSync=false)- Writes SST directly to final filename with
O_CREATE|O_EXCL. - No temp file/rename step.
- Highest throughput, weaker crash-consistency guarantees.
- Writes SST directly to final filename with
-
Strict path (
ManifestSync=true)- Writes to
"<table>.tmp.<pid>.<ns>". tmp.Sync()to persist SST bytes.RenameNoReplace(tmp, final)installs file atomically. If unsupported by platform/filesystem, returnsvfs.ErrRenameNoReplaceUnsupported.SyncDir(workdir)is called before manifest edit so directory entry is durable.
- Writes to
This is the durability ordering used by current code.
lsm.Set/lsm.SetBatchdetectswalSize + estimate > MemTableSizeand rotates memtable.- Rotated memtable is submitted to the flush queue (
lsm.submitFlush). - Worker executes
levelManager.flush(mt):- iterates memtable entries,
- builds SST via
tableBuilder, - prepares manifest edits:
EditAddFile+EditLogPointer.
- In strict mode,
SyncDirruns beforemanifest.LogEdits(...). - On successful manifest commit, table is added to L0 and
wal.RemoveSegmentruns when allowed.
- Startup rebuild (
levelManager.build) validates manifest SST entries against disk. - Missing or unreadable SSTs fail startup; normal restart does not repair manifest state by deleting referenced files.
- Temp SST names are only used in strict mode and are created in
WorkDirwith suffix.tmp.<pid>.<ns>(not a dedicatedtmp/directory).
flushRuntime.stats() feeds StatsSnapshot.Flush:
pending,queue,active- wait/build/release totals, counts, last, max
completed
Use:
nokv stats --workdir <dir>to inspect flush backlog and latency.
lsm/flush_runtime_test.go: queue lifecycle and timing counters.db_test.go::TestRecoveryWALReplayRestoresData: replay still restores data after crash before flush completion.db_test.go::TestRecoveryFailsOnMissingSSTanddb_test.go::TestRecoveryFailsOnCorruptSST: startup fails when manifest SSTs are missing or corrupt.
See also recovery.md, memtable.md, and wal.md.