This package implements content cleanup on each target node. The goal is to reclaim storage space from work files, erasure coding (EC) artifacts, and chunked upload leftovers while ensuring correctness and avoiding premature deletion.
Cleanup operates per-bucket, per-mountpath and works strictly on local state — it does not coordinate across targets.
Table of Contents
- Overview
- Relation to rebalance cleanup mode
- Cleanup Policies
- Implementation Details
- Corner Cases & Constraints
- Future Enhancements
-
Scope: for each specified bucket or all buckets:
- scan all content items under a mountpath's bucket namespace
-
Main method:
clnJ.visit()in turn callingvisitCTorvisitObj
-
Global recency guard:
- given configurable (cluster-level) knob (
dont_cleanup_time) prevents premature deletion during ongoing operations
- given configurable (cluster-level) knob (
Any file with mtime + dont_cleanup_time > now is skipped to avoid racing against:
- EC slice => metafile write sequences
- Replica => metafile write sequences
- Other concurrent operations
Invalid entries (malformed FQNs, bucket mismatches) are logged and removed.
ais space-cleanup is a general local-storage cleanup tool. It walks local
mountpaths and removes several classes of safely reclaimable files, including
objects with corrupted or missing local metadata, zero-size objects when
configured, extra local copies, misplaced EC artifacts, local mountpath orphans,
and verified migrated-away leftovers.
AIStore also provides:
$ ais start rebalance --cleanupRebalance cleanup is narrower and more explicit. It reuses the global rebalance lifecycle and monitoring machinery, does not migrate object payloads, and is intended specifically for reclaiming source-side copies left after topology changes and regular data-moving rebalance.
In short:
- use
ais start rebalance --cleanupwhen the goal is post-rebalance, placement-specific cleanup after maintenance, decommission, scale-out, scale-in, or node return (from maintenance); - use
ais space-cleanupfor broader local-storage hygiene and capacity reclamation.
- Parsed via
ParseUbase - Invalid encoding => removed
- PID mismatch (from old process) => removed as old work
Behavior depends on whether EC is enabled for the bucket:
- All EC slices and metafiles removed as old work
-
Slices (
fs.ECSliceCT)- Missing corresponding metafile → flagged as misplaced EC
- Removal batched under
flagRmMisplacedEC
-
Metafiles (
fs.ECMetaCT)- Kept if local slice OR replica (
ObjCT) exists - Removed only when both slice and replica are missing locally
- Removal batched as old work
- Kept if local slice OR replica (
Note: All decisions use local perspective only. A metafile orphaned locally may still have valid slices/replicas on other targets.
-
Chunks (
fs.ChunkCT)- Must encode valid
(uploadID, chunkNum)pair - Invalid encodings → removed
- Valid chunks validated against manifest state in
visitChunk
- Must encode valid
-
Manifests (
fs.ChunkMetaCT)- Completed manifests (no extras) kept
- Partial manifests (extras include uploadID) - removed as old partials
- Handled in
visitObj() - For EC-enabled buckets: objects missing corresponding metafiles flagged as misplaced EC
Space cleanup uses the unified cmn/load throttling (load.Advice) to avoid I/O and CPU spikes during large scans:
- Each mountpath keeps a
load.Adviceinstance initialized withFlMem|FlCla|FlDskandRW=false(metadata-only). - On every N-th visit (
adv.ShouldCheck(nvisits)), it refreshes node pressure and may insert a small sleep. - Under
Criticalmemory, CPU, or disk pressure, cleanup backs off; under merelyHighload it keeps progressing but with gentler pacing.
Relies on filesystem mtimes. Clock changes on the operator may influence cleanup decisions.
- Race Protection: Slice => Meta and Replica => Meta sequences covered by global recency guard
- Local Scope: Does not consult cluster maps; global orphan detection is out of scope
- Encoding Requirements:
fs.WorkCTtags, chunk uploadIDs, and chunk numbers must never be empty - Legacy State: Partial manifests treated as invalid and always removed
Delay removal when conflicting generations exist; prefer newest metadata.
Consult cluster-wide state to distinguish local vs. global orphans.
Move questionable artifacts to quarantine directory instead of immediate deletion.
Add Prometheus counters for:
- Misplaced EC artifacts
- Old work removal
- Invalid FQN detection
- Cleanup performance metrics
Non-destructive cleanup pass that reports what would be removed:
- Categorized reasons (old work, misplaced EC, invalid FQN)
- Output formats: logs, xaction stats, JSON/CSV export
- Integration with monitoring dashboards
Extend beyond filename heuristics by loading and validating metadata:
- EC metafile → slice/replica consistency
- Chunk manifest → chunk file validation
- Cross-reference integrity checks
- Detailed mismatch reporting