perf(l1): optimize storage layer for block execution by ilitteri · Pull Request #6207 · lambdaclass/ethrex

ilitteri · 2026-02-13T22:06:19Z

Motivation

Profiling block execution on mainnet revealed that lock contention, per-lookup allocations, and O(log n) data structure access in the storage layer accounted for a significant portion of execution time. The storage hot path (get_storage_at_root → trie lookups) was hitting multiple bottlenecks on every trie node access.

Description

Four incremental optimizations to the storage layer, each targeting a specific bottleneck identified via perf profiling on a 32-core AMD server:

1. BTreeMap → FxHashMap + Mutex → RwLock

Replace BTreeMap<Vec<u8>, Vec<u8>> with FxHashMap for in-memory storage tables — O(1) lookups instead of O(log n)
Replace Mutex with RwLock for trie_cache and last_computed_flatkeyvalue — allows concurrent readers (only the background worker writes, 2-3 times per block)

2. RCU pattern + held read view

RCU (Read-Copy-Update) for InMemoryBackend: wrap the database as Arc<RwLock<Arc<Database>>> so readers clone the inner Arc (O(1)) and read entirely lock-free
Held read view in BackendTrieDB: acquire the read view once at construction and reuse it for all trie node lookups, eliminating ~8000+ Box allocations per block

3. Shared read view across trie opens

Change begin_read() to return Arc<dyn StorageReadView> instead of Box<dyn StorageReadView>, enabling cheap Arc::clone() sharing
In get_storage_at_root, pre-acquire the read view, trie cache, and last_written value once, then share across both the state trie and storage trie opens — eliminates duplicate RwLock acquisitions
Add StorageReadView: Send + Sync bound (required for Arc sharing)

4. Pre-compute prefix nibbles in TrieWrapper

apply_prefix was called on every trie node lookup, creating 3 Vec allocations per call (from_bytes + append_new + concat)
Pre-compute the prefix nibbles once in the TrieWrapper constructor, reducing to 1 allocation (just concat) per lookup
Add TrieWrapper::new() constructor to encapsulate prefix pre-computation

Benchmark Results (ethrex-replay)

30 runs each, back-to-back on same server (32-core AMD, ethrex-office-4):

Block	Txs	Gas	Baseline	Optimized	Delta
24443168	442	37.5M	57.19ms	48.32ms	-15.5%
24197547	556	53.0M	66.31ms	56.75ms	-14.4%
24199455	651	37.9M	45.26ms	38.70ms	-14.5%
24198607	116	12.0M	22.62ms	20.88ms	-7.7%

14-15% improvement on heavy blocks, 8% on light blocks. Lock contention dropped from 8.1% to 1.4% of block executor time.

How to Test

cargo test -p ethrex-storage
cargo test -p ethrex-blockchain
cargo clippy -p ethrex-storage -p ethrex-blockchain -- -D warnings
Run a full sync or replay a mainnet block to verify identical state roots

RwLockReadGuard across read transactions instead of re-acquiring per get(). Also replace Mutex with RwLock for trie_cache and last_computed_flatkeyvalue to allow concurrent readers. Combined, these changes reduce block execution time by ~9.3% (57ms → 52ms) on block 24443168 (442 txs, 37.5M gas) benchmarked on a 32-core AMD server.

pre-acquired read view in BackendTrieDB for the entire trie traversal. InMemoryBackend now wraps its Database in Arc, so begin_read() clones the Arc (O(1)) and releases the lock immediately. InMemoryReadTx and InMemoryLocked hold an owned Arc<Database> snapshot — all subsequent gets are lock-free HashMap lookups with no RwLock contention. BackendTrieDB now acquires a single read view in its constructor and reuses it for all get() calls during the trie traversal. This eliminates the per-node-lookup Box allocation and lock acquisition that previously happened ~8000+ times per block. StorageReadView gains Send + Sync bounds and begin_read() returns a 'static view, enabling BackendTrieDB to own the read view.

Change begin_read() to return Arc<dyn StorageReadView> instead of Box, allowing the read view to be shared across multiple BackendTrieDB instances. In get_storage_at_root (the SLOAD hot path), pre-acquire the read view, trie cache, and last_written once and share them between the state trie and storage trie opens. This eliminates per-query duplicate RwLock acquisitions and Arc allocations.

apply_prefix was called on every trie node lookup, creating 3 Vec allocations each time (from_bytes + append_new + concat). Pre-computing the prefix nibbles once in the TrieWrapper constructor reduces this to 1 allocation (just concat) per lookup. Also adds TrieWrapper::new() constructor to encapsulate this.

github-actions · 2026-02-13T22:07:01Z

🤖 Kimi Code Review

Review Summary

This PR introduces significant performance optimizations to the storage layer, primarily through:

Shared read views to reduce lock contention
Pre-computed prefix nibbles for storage tries
Arc-based snapshots for in-memory backend
RwLock instead of Mutex for better concurrency

Issues Found

1. Critical: Atomicity Violation in InMemoryBackend (in_memory.rs:33-34, 153-154, 168-169)

The use of Arc::make_mut creates a new database snapshot on every write, breaking atomicity guarantees. Multiple concurrent writes will create divergent snapshots instead of properly synchronizing:

// Problem: Each write creates a new Arc, breaking atomicity
let db_mut = Arc::make_mut(&mut *db);

Fix: Use RwLock<Database> directly instead of RwLock<Arc<Database>>, or implement proper transaction isolation.

2. Performance: Unnecessary Arc Cloning (in_memory.rs:44-48)

The in-memory backend clones the entire database on every read transaction, which is O(n) complexity. This defeats the purpose of the optimization:

let snapshot = self.inner.read()?.clone(); // Clones entire database

Fix: Return a reference-based read view that holds the RwLock guard, or use a proper MVCC implementation.

3. Memory Safety: Lifetime Mismatch (api/mod.rs:37)

The trait signature change from Box<dyn StorageReadView + '_> to Arc<dyn StorageReadView> removes the lifetime parameter, but the actual implementation may still have lifetime constraints.

4. Code Duplication (store.rs:2422-2599)

Multiple nearly identical open_*_trie functions could be consolidated using generics or a builder pattern.

5. Error Handling Inconsistency (layering.rs:195-210)

The TrieWrapper::new function computes prefix_nibbles but the old apply_prefix function is still used in some places, creating inconsistency.

Recommendations

Revert the Arc-based snapshot approach in InMemoryBackend - it's fundamentally flawed for atomicity
Use parking_lot::RwLock instead of std::sync::RwLock for better performance
Consider using a BTreeMap snapshot approach that shares unchanged nodes rather than full cloning
Add comprehensive tests for concurrent operations to verify atomicity guarantees
Document the thread-safety guarantees clearly for each backend implementation

The RocksDB backend changes look correct as RocksDB handles its own MVCC.

Automated review by Kimi (Moonshot AI) · custom prompt

github-actions · 2026-02-13T22:08:13Z

🤖 Codex Code Review

Findings

Potential nondeterminism in prefix iteration due to HashMap/FxHashMap
In crates/storage/backend/in_memory.rs around the type Table = FxHashMap... and InMemoryReadTx::prefix_iter, the data structure changed from BTreeMap to hash maps. prefix_iter builds a Vec by iterating the whole table and filtering by prefix, which now yields nondeterministic order. If any caller relies on deterministic ordering for iteration (e.g., tests, debug tooling, or FKV generation comparisons), this can cause flakiness or different outputs.
Recommendation: either keep BTreeMap or explicitly sort results (by key) before returning.
Line refs: crates/storage/backend/in_memory.rs near type Table = FxHashMap... and impl StorageReadView for InMemoryReadTx::prefix_iter.
Potential large-copy amplification in in-memory DB on writes
In crates/storage/backend/in_memory.rs around Arc<RwLock<Arc<Database>>> and Arc::make_mut, any active read snapshot causes Arc::make_mut to clone the entire Database on write. This is an intentional RCU pattern but can be unexpectedly expensive for large in-memory tables (copies all tables, not just one).
Recommendation: consider documenting this tradeoff or scoping snapshots (per-table arc) if this is used outside tests.
Line refs: crates/storage/backend/in_memory.rs near struct InMemoryBackend { inner: Arc<RwLock<Arc<Database>>> } and Arc::make_mut in clear_table, commit, and delete.

Notes / Questions

StorageReadView is now Send + Sync and begin_read returns Arc<dyn StorageReadView>. This implies read views may be shared across threads. I assume RocksDBReadTx (not shown here) is thread-safe; if it wraps a RocksDB Snapshot or similar, please confirm it is Send + Sync and that concurrent reads on the same view are safe.
Line refs: crates/storage/api/mod.rs near pub trait StorageReadView: Send + Sync and begin_read -> Arc<dyn StorageReadView>.

Positive

The shared read view in BackendTrieDB and the pre-acquired cache/read view in get_storage_at_root reduce lock churn and repeated transaction creation; the intent is clear and likely a net performance win.
Line refs: crates/storage/trie.rs in BackendTrieDB and crates/storage/store.rs around get_storage_at_root and open_*_trie_shared.

If you want, I can scan for any callers that depend on deterministic prefix iteration order to confirm whether Point 1 is a real behavioral change.

Automated review by OpenAI Codex · custom prompt

github-actions · 2026-02-13T22:08:32Z

🤖 Claude Code Review

Now I have enough context to provide a comprehensive code review. Let me compile my findings:

PR Review: Performance Optimization of Storage Layer (-15.5%)

Summary

This PR implements four incremental optimizations to the storage hot path, targeting lock contention and allocation overhead identified through profiling. The changes are well-motivated and demonstrate clear performance wins (15.5% speedup, 18.2% throughput increase).

Critical Issues

1. Type inconsistency in InMemoryBackend (crates/storage/backend/in_memory.rs:11)

The outer Database container type was changed from BTreeMap to HashMap, but this introduces a subtle inconsistency:

type Table = FxHashMap<Vec<u8>, Vec<u8>>;
type Database = HashMap<&'static str, Table>;  // ← Standard HashMap, not FxHashMap

Issue: The outer HashMap uses the default SipHash hasher while tables use FxHash. For consistency and performance with small string keys (table names), this should also use FxHashMap:

type Database = FxHashMap<&'static str, Table>;

Impact: Minor performance regression for table name lookups, though table count is small so impact is minimal in practice.

2. Potential Arc clone overhead in get_storage_at_root (crates/storage/store.rs:2104-2106, 2118-2120)

The code clones Arc three times per call:

read_view.clone(),
cache.clone(),
last_written.clone(),  // ← This is Vec<u8>, not Arc!

Issue: last_written is Vec<u8>, not Arc<Vec<u8>>. This clones the entire vector twice (once for state trie, once for storage trie) when it should be wrapped in Arc for sharing.

Recommendation: Change last_written() to return Arc<Vec<u8>> and store last_computed_flatkeyvalue as Arc<RwLock<Arc<Vec<u8>>>>, similar to the trie_cache pattern. This would eliminate the vector clones.

Impact: Moderate - each last_written vec is typically 66-131 bytes, cloned twice per storage lookup.

3. Missing Send + Sync bounds verification (crates/storage/api/mod.rs:58)

The trait now requires Send + Sync:

pub trait StorageReadView: Send + Sync {

Verification needed: Ensure all concrete implementations (InMemoryReadTx, RocksDBReadTx) have fields that are Send + Sync. Looking at the code:

InMemoryReadTx::snapshot is Arc<Database> ✓ (Arc is Send+Sync if Database is)
RocksDBReadTx::db is Arc<DB> ✓ (DB from RocksDB is Send+Sync)

Status: Appears correct, but worth confirming with a compile-time assertion or test if not already verified.

Performance Concerns

4. Arc::make_mut potential COW overhead (crates/storage/backend/in_memory.rs:33, 153, 169)

The RCU pattern uses Arc::make_mut for writes:

let db_mut = Arc::make_mut(&mut *db);

Concern: If any reader still holds a reference to the inner Arc<Database>, this will trigger a full database clone (copy-on-write). With FxHashMap's fast hashing, this could be expensive for large databases.

Analysis: The design assumes writes are infrequent (background worker only, 2-3 times per block). Readers clone the inner Arc and release the RwLock immediately, so the window for contention is small. However, if a reader holds the snapshot for a long time while a write occurs, the entire DB gets cloned.

Recommendation: Add a comment documenting this behavior and the assumption that reads are short-lived. Consider profiling to verify COW isn't triggered frequently in practice.

5. Prefix nibble computation still allocates (crates/storage/layering.rs:207-208, 233-236, 241-244)

The optimization pre-computes prefix nibbles, but the application still allocates:

let key = match &self.prefix_nibbles {
    Some(prefix) => prefix.concat(&key),  // ← Still allocates a new Nibbles
    None => key,
};

Issue: concat creates a new Nibbles on every lookup (reduced from 3 allocations to 1, but not eliminated).

Potential optimization: If Nibbles used a small-vec optimization or stack buffer for common sizes (65-131 nibbles), this allocation could be eliminated entirely. This would require changes to the ethrex_trie crate.

Impact: Lower priority - already reduced by 67%, further optimization requires trie library changes.

Code Quality Issues

6. Redundant field in BackendTrieDB (crates/storage/trie.rs:14-15)

The db field is documented as "used only for writes" but the struct is used for both reads and writes:

/// Reference to the storage backend (used only for writes)
db: Arc<dyn StorageBackend>,
/// Pre-acquired read view held for the lifetime of this struct.
read_view: Arc<dyn StorageReadView>,

Question: Is db only used in put_batch? If so, consider making BackendTrieDB read-only and creating a separate write struct. If reads also need db, update the comment.

Looking at usage: db is only used in put_batch (line 140). Consider creating BackendTrieDBWrite that wraps this for writes, or clarify that this struct supports both operations but prefers the read_view for reads.

7. Removed apply_prefix usage not fully deleted (crates/storage/trie.rs:6, 111)

The import and function are still present but largely unused:

use crate::layering::apply_prefix;  // Line 6
// ...
fn make_key(&self, path: Nibbles) -> Vec<u8> {
    apply_prefix(self.address_prefix, path).into_vec()  // Line 111
}

While BackendTrieDB still uses it in make_key, TrieWrapper no longer uses it in the hot path. The function itself is still useful but consider:

Documenting that it's kept for make_key and backwards compatibility
Or inlining it into the few places it's still used

Correctness

8. RwLock poisoning behavior (crates/storage/store.rs:2093-2094)

Lock errors map to StoreError::LockError:

.read()
.map_err(|_| StoreError::LockError)?

Question: What happens if a thread panics while holding a write lock? The RwLock will be poisoned, and all subsequent operations will fail. Is this the desired behavior, or should poisoned locks be recovered?

Analysis: For blockchain storage, fail-fast on poisoned locks is likely correct - if a storage write panicked, the database is potentially in an inconsistent state. The current behavior seems appropriate.

9. Trie cache RCU consistency (crates/storage/store.rs:2759-2760)

During trie updates, the cache is updated with RCU:

let trie = Arc::new(trie_mut);
*trie_cache.write().map_err(|_| StoreError::LockError)? = trie.clone();

Observation: After the write lock is released, readers will see the new cache state. There's a small window where readers might miss cache entries if they grabbed the old cache before the update but perform lookups after the database is updated.

Analysis: This is inherent to the RCU pattern and acceptable - readers will simply hit the database instead of the cache, which is correct (slower but not incorrect). The tradeoff is worth it for lock-free reads.

Minor Issues

10. Missing type documentation (crates/storage/backend/in_memory.rs:10-11)

The type aliases should document why FxHashMap is chosen:

// Using FxHashMap for O(1) lookups vs BTreeMap's O(log n).
// FxHash is faster than SipHash for small keys (byte vectors, table names).
type Table = FxHashMap<Vec<u8>, Vec<u8>>;
type Database = FxHashMap<&'static str, Table>;  // Fixed: use FxHashMap here too

11. TrieWrapper::new could take ownership (crates/storage/layering.rs:201-215)

The constructor currently takes owned Box<dyn TrieDB> but borrows to compute prefix:

pub fn new(
    state_root: H256,
    inner: Arc<TrieLayerCache>,
    db: Box<dyn TrieDB>,
    prefix: Option<H256>,
) -> Self {
    let prefix_nibbles = prefix.map(|p| Nibbles::from_bytes(p.as_bytes()).append_new(17));
    // ...
}

This is fine, but the prefix calculation could be made more explicit that it's pre-computation. Consider renaming the field in the struct to precomputed_prefix for clarity.

Positive Observations

Well-structured incremental optimization - Each optimization is independent and can be reasoned about separately
Strong performance gains - 15.5% reduction with clear profiling data to back it up
Type-safe RCU implementation - Using Arc<RwLock<Arc<T>>> properly implements the RCU pattern
Proper trait bounds - Adding Send + Sync to StorageReadView is necessary and correct for Arc sharing
Backward compatible - All existing tests should pass without modification
Good documentation - The PR description clearly explains each optimization's rationale

Recommendations

Must fix:

Change Database type to use FxHashMap consistently (Item 1)

Should fix:
2. Wrap last_written in Arc to avoid vector clones (Item 2)
3. Add documentation for COW behavior in InMemoryBackend (Item 4)

Nice to have:
4. Add type aliases documentation (Item 10)
5. Clarify BackendTrieDB::db field usage (Item 6)

Future optimization:
6. Consider small-vec optimization for Nibbles in the trie library (Item 5)

Security Assessment

No security vulnerabilities identified. The changes maintain the same correctness guarantees as before:

No data races (RwLock properly protects shared state)
No use-after-free (Arc ensures proper lifetimes)
No undefined behavior from unsafe code (no unsafe used in this PR)
Blockchain correctness preserved (state root computation unchanged)

Verdict

APPROVE with minor fixes requested (Item 1 and optionally Item 2). The performance gains are significant and well-justified. The implementation is sound with only minor consistency issues to address.

Automated review by Claude (Anthropic) · custom prompt

github-actions · 2026-02-13T22:08:47Z

Lines of code report

Total lines added: 113
Total lines removed: 0
Total lines changed: 113

Detailed view

+--------------------------------------------+-------+------+
| File                                       | Lines | Diff |
+--------------------------------------------+-------+------+
| ethrex/crates/storage/backend/in_memory.rs | 153   | +5   |
+--------------------------------------------+-------+------+
| ethrex/crates/storage/layering.rs          | 194   | +22  |
+--------------------------------------------+-------+------+
| ethrex/crates/storage/store.rs             | 2486  | +58  |
+--------------------------------------------+-------+------+
| ethrex/crates/storage/trie.rs              | 175   | +27  |
+--------------------------------------------+-------+------+
| ethrex/crates/vm/levm/src/vm.rs            | 557   | +1   |
+--------------------------------------------+-------+------+

greptile-apps · 2026-02-13T22:09:43Z

Greptile Overview

Greptile Summary

This PR implements four incremental storage layer optimizations that collectively achieve a 15.5% reduction in block execution time through targeted elimination of lock contention, allocation overhead, and algorithmic complexity.

Key Changes:

Replaced Mutex with RwLock for trie_cache and last_computed_flatkeyvalue to enable concurrent readers (only background worker writes)
Implemented RCU (Read-Copy-Update) pattern for InMemoryBackend with Arc<RwLock<Arc<Database>>> for lock-free reads after initial snapshot acquisition
Changed BTreeMap to FxHashMap in in-memory storage tables for O(1) instead of O(log n) lookups
Modified begin_read() to return Arc<dyn StorageReadView> instead of Box<dyn StorageReadView + '_>, enabling cheap cloning and sharing of read views across multiple trie opens
Added BackendTrieDB field to hold pre-acquired read view for entire trie traversal, eliminating ~8000+ Box allocations per block
Pre-computed prefix nibbles in TrieWrapper constructor, reducing from 3 allocations per trie node lookup to 1

Performance Impact:
Block execution improved from 57.19ms to 48.32ms median (-15.5%), lock contention dropped from 8.1% to 1.4%. The changes are well-architected with proper separation of concerns between the optimized hot path (get_storage_at_root with shared resources) and existing code paths.

Confidence Score: 4/5

This PR is safe to merge with minor considerations around the RCU pattern implementation
Score reflects well-tested optimizations with clear performance gains (15.5% improvement, backed by benchmarks), sound architectural design, and proper thread safety. The RCU pattern and RwLock migration are correctly implemented. One point deducted because the Arc::make_mut pattern in InMemoryBackend will clone the entire database on writes if any read snapshots still exist, which could cause unexpected memory pressure in high-concurrency scenarios. The changes maintain backward compatibility and existing tests should catch regressions.
Pay close attention to crates/storage/backend/in_memory.rs - verify RCU pattern behavior under concurrent read/write load in production

Important Files Changed

Filename	Overview
crates/storage/backend/in_memory.rs	Implemented RCU pattern with `Arc<RwLock<Arc<Database>>>`, replaced BTreeMap with FxHashMap for O(1) lookups, snapshots now clone inner Arc for lock-free reads
crates/storage/layering.rs	Pre-computes prefix nibbles in `TrieWrapper::new()` constructor to avoid repeated allocations on every trie node lookup, reducing allocations from 3 to 1 per lookup
crates/storage/store.rs	Changed `trie_cache` and `last_computed_flatkeyvalue` from `Mutex` to `RwLock`, added shared read view pattern in `get_storage_at_root`, new `*_with_view` methods enable resource sharing across multiple trie opens
crates/storage/trie.rs	Added `read_view` field to `BackendTrieDB` to hold pre-acquired read view, eliminating per-lookup allocations, new `*_with_view` constructors support sharing a single read view across multiple trie instances

Sequence Diagram

sequenceDiagram
    participant App as Block Executor
    participant Store as Store
    participant Backend as StorageBackend
    participant TrieDB as BackendTrieDB
    participant Cache as TrieLayerCache
    
    Note over App,Cache: Optimized get_storage_at_root flow
    
    App->>Store: get_storage_at_root(state_root, address, storage_key)
    
    Note over Store: Pre-acquire shared resources (optimization #2 & #3)
    Store->>Backend: begin_read()
    Backend-->>Store: Arc<StorageReadView> (cloneable snapshot)
    Store->>Cache: trie_cache.read() (RwLock instead of Mutex)
    Cache-->>Store: Arc<TrieLayerCache> clone
    Store->>Store: last_written() (RwLock.read())
    
    Note over Store: Open state trie with shared resources
    Store->>TrieDB: new_for_accounts_with_view(backend, read_view, last_written)
    Note over TrieDB: Holds read_view for entire traversal (no per-lookup allocation)
    
    Store->>TrieDB: state_trie.get(account_hash)
    Note over TrieDB: Pre-computed prefix nibbles (optimization #4)
    TrieDB->>Cache: inner.get(state_root, prefixed_key)
    Cache-->>TrieDB: cache hit/miss
    alt cache miss
        TrieDB->>Backend: read_view.get() (lock-free with RCU)
        Backend-->>TrieDB: trie node data
    end
    TrieDB-->>Store: encoded_account
    
    Note over Store: Open storage trie reusing same read_view
    Store->>TrieDB: new_for_storages_with_view(backend, read_view.clone(), last_written)
    
    Store->>TrieDB: storage_trie.get(hashed_key)
    Note over TrieDB: Reuses held read_view (no new snapshot)
    TrieDB->>Cache: inner.get(state_root, prefixed_key)
    alt cache miss
        TrieDB->>Backend: read_view.get() (same snapshot)
        Backend-->>TrieDB: storage value
    end
    TrieDB-->>Store: storage_value
    Store-->>App: U256 result
    
    Note over App,Cache: Key optimizations:<br/>1. BTreeMap→FxHashMap (O(log n)→O(1))<br/>2. RCU pattern (lock-free reads)<br/>3. Shared read_view (no duplicate snapshots)<br/>4. Pre-computed prefix (1 alloc vs 3)

_{Last reviewed commit: d4555a6}

greptile-apps

_{6 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

crates/storage/backend/in_memory.rs

crates/storage/store.rs

crates/storage/layering.rs

Copilot

Pull request overview

This PR optimizes the storage layer for block execution through four targeted performance improvements, achieving a 15.5% reduction in execution time. The optimizations target bottlenecks identified via profiling: data structure access patterns (BTreeMap→FxHashMap), lock contention (Mutex→RwLock), per-lookup allocations (RCU pattern + held read views), and repeated prefix computations.

Changes:

Replaced Mutex with RwLock for read-heavy trie_cache and last_computed_flatkeyvalue to enable concurrent readers
Implemented RCU pattern in InMemoryBackend with Arc<RwLock<Arc>> for lock-free reads via snapshots
Modified storage API to return Arc<dyn StorageReadView> instead of Box, enabling cheap sharing of read views across multiple trie operations
Added constructors for BackendTrieDB that accept pre-acquired shared read views to eliminate per-trie-open allocations
Pre-computed prefix nibbles in TrieWrapper constructor to reduce allocations from 3 to 1 per trie node lookup

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
crates/storage/api/mod.rs	Changed StorageBackend::begin_read() to return Arc instead of Box, added Send + Sync bounds to StorageReadView trait
crates/storage/backend/rocksdb.rs	Updated begin_read() to return Arc
crates/storage/backend/in_memory.rs	Implemented RCU pattern with Arc<RwLock<Arc>>, switched Table from BTreeMap to FxHashMap, updated begin_read() for snapshot-based lock-free reads
crates/storage/trie.rs	Added BackendTrieDB constructors with shared read view support (*_with_view variants), changed read view field to Arc for sharing
crates/storage/layering.rs	Added TrieWrapper::new() constructor with pre-computed prefix nibbles, replaced apply_prefix calls with direct concat operations
crates/storage/store.rs	Converted trie_cache and last_computed_flatkeyvalue from Mutex to RwLock, added *_shared trie opening methods, updated get_storage_at_root to pre-acquire and share resources, updated all TrieWrapper instantiations to use new constructor

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

crates/storage/backend/in_memory.rs

github-actions · 2026-02-17T13:45:09Z

Benchmark Block Execution Results Comparison Against Main

Command	Mean [s]	Min [s]	Max [s]	Relative
`base`	63.569 ± 0.155	63.355	63.859	1.00
`head`	65.664 ± 0.194	65.259	65.982	1.03 ± 0.00

…2929 warm/cold tracking Switch the internal data structures used for EIP-2929 accessed storage slot tracking from sorted tree structures to hash-based structures for O(1) lookups. Changes: - Substate.accessed_storage_slots: BTreeMap<Address, BTreeSet<H256>> → FxHashMap<Address, FxHashSet<H256>> - VM.storage_original_values: BTreeMap<(Address, H256), U256> → FxHashMap<(Address, H256), U256> Output boundaries that require determinism (make_access_list, get_accessed_storage_slots) create fresh BTreeMap/BTreeSet locally, preserving sorted output regardless of field type.

…eature flag Add SloadCounters struct with 4 AtomicU64 counters to measure SLOAD cache effectiveness across the cache hierarchy: - sload_l1_hit: value found in per-tx GenDB storage (L1 cache) - sload_l2_hit: value found in CachingDatabase (L2 cross-tx cache) - sload_l2_miss: fell through to Store/trie lookup - sload_duplicate_miss_race: read miss became hit on write recheck (warmer race) Counters are global statics (const-initialized, no LazyLock needed for atomics) and output via [PERF] log line alongside existing opcode timings. All behind #[cfg(feature = "perf_opcode_timings")] for zero overhead in production builds.

…tion Two coupled optimizations that reduce cold SLOAD overhead: 1. Block-scoped locked RocksDB snapshot (Phase B): - New SharedLockedTrieDB wrapper reusing Arc<BackendTrieDBLocked> - Single RocksDB snapshot created per block, shared across all trie reads - Eliminates up to 16 heap allocations + transaction creations per cold SLOAD - 5 new locked methods in Store mirroring existing unlocked variants - Locked backend stored in StoreVmDatabase, auto-dropped at block end 2. Storage-root memoization (Phase C): - Cache address→storage_root in StoreVmDatabase (Arc<Mutex<FxHashMap>>) - On cache hit, skips state trie traversal entirely - Reduces trie opens from 2N to N+1 per N unique slot reads per account - New get_storage_at_storage_root_locked() takes pre-resolved root - Non-existent accounts intentionally not cached (rare path, simple types) Architecture after both optimizations: get_storage_slot → check storage_root_cache HIT: skip state trie → open storage trie (locked snapshot) → 8 node reads MISS: open state trie (locked snapshot) → cache root → open storage trie → 8 reads

…ot semantics BackendTrieDB now snapshots at construction time (pre-acquired read view). Tests were writing then reading from the same instance, but the snapshot predated the writes. Use separate instances so the reader gets a fresh view.

ElFantasma · 2026-02-18T13:36:21Z

crates/storage/store.rs

        let account_nibbles = Nibbles::from_bytes(account.as_bytes());
-        let last_computed_flatkeyvalue = self.last_written()?;
-        Ok(&last_computed_flatkeyvalue[0..64] > account_nibbles.as_ref())
+        &last_written[0..64] > account_nibbles.as_ref()


nit: &last_written[0..64] will panic if the slice is shorter than 64 bytes. The caller (self.last_written()) always returns ≥64 bytes (initialized from unwrap_or_else(|| vec![0u8; 64])), but this static method's &[u8] signature doesn't encode that invariant. Consider:

last_written.get(0..64).is_some_and(|lw| lw > account_nibbles.as_ref())

ElFantasma · 2026-02-18T13:36:21Z

crates/storage/store.rs

    trie_mut.put_batch(parent_state_root, child_state_root, new_layer);
    let trie = Arc::new(trie_mut);
-    *trie_cache.lock().map_err(|_| StoreError::LockError)? = trie.clone();
+    *trie_cache.write().map_err(|_| StoreError::LockError)? = trie.clone();


nit: The .read() → clone → modify → .write() RCU pattern (here and again at line 2833) is safe because only the background worker thread calls apply_trie_updates. Worth adding a brief comment at the .write() sites to make this single-writer invariant explicit — e.g., // Single writer: only called from trie update worker thread.

github-actions · 2026-02-18T15:42:18Z

Benchmark Results Comparison

No significant difference was registered for any benchmark run.

Detailed Results

Benchmark Results: BubbleSort

Command	Mean [s]	Min [s]	Max [s]	Relative
`main_revm_BubbleSort`	2.941 ± 0.025	2.915	2.998	1.01 ± 0.01
`main_levm_BubbleSort`	2.957 ± 0.035	2.916	3.025	1.01 ± 0.01
`pr_revm_BubbleSort`	2.921 ± 0.018	2.891	2.936	1.00
`pr_levm_BubbleSort`	2.941 ± 0.032	2.892	2.988	1.01 ± 0.01

Benchmark Results: ERC20Approval

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`main_revm_ERC20Approval`	992.2 ± 6.4	981.4	1003.1	1.00 ± 0.01
`main_levm_ERC20Approval`	1095.7 ± 51.8	1066.1	1228.9	1.11 ± 0.05
`pr_revm_ERC20Approval`	990.7 ± 6.4	983.7	999.7	1.00
`pr_levm_ERC20Approval`	1064.1 ± 22.9	1045.6	1110.8	1.07 ± 0.02

Benchmark Results: ERC20Mint

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`main_revm_ERC20Mint`	137.8 ± 1.0	136.5	139.9	1.03 ± 0.01
`main_levm_ERC20Mint`	171.7 ± 2.0	167.5	174.0	1.28 ± 0.02
`pr_revm_ERC20Mint`	133.9 ± 0.8	132.4	134.7	1.00
`pr_levm_ERC20Mint`	169.0 ± 1.7	166.4	171.8	1.26 ± 0.02

Benchmark Results: ERC20Transfer

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`main_revm_ERC20Transfer`	234.5 ± 2.3	231.7	238.9	1.00 ± 0.02
`main_levm_ERC20Transfer`	278.6 ± 1.7	276.5	281.9	1.19 ± 0.02
`pr_revm_ERC20Transfer`	234.3 ± 2.8	231.7	241.6	1.00
`pr_levm_ERC20Transfer`	273.8 ± 2.1	271.2	278.3	1.17 ± 0.02

Benchmark Results: Factorial

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`main_revm_Factorial`	228.5 ± 0.6	227.7	229.5	1.02 ± 0.00
`main_levm_Factorial`	262.1 ± 3.3	259.7	270.5	1.17 ± 0.01
`pr_revm_Factorial`	223.7 ± 0.5	223.0	224.4	1.00
`pr_levm_Factorial`	261.5 ± 0.9	259.8	262.7	1.17 ± 0.00

Benchmark Results: FactorialRecursive

Command	Mean [s]	Min [s]	Max [s]	Relative
`main_revm_FactorialRecursive`	1.739 ± 0.020	1.709	1.768	1.01 ± 0.06
`main_levm_FactorialRecursive`	8.572 ± 0.075	8.424	8.670	4.97 ± 0.31
`pr_revm_FactorialRecursive`	1.724 ± 0.107	1.433	1.803	1.00
`pr_levm_FactorialRecursive`	8.557 ± 0.040	8.490	8.614	4.96 ± 0.31

Benchmark Results: Fibonacci

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`main_revm_Fibonacci`	202.1 ± 1.9	199.7	206.0	1.01 ± 0.01
`main_levm_Fibonacci`	246.3 ± 7.1	241.1	262.7	1.23 ± 0.04
`pr_revm_Fibonacci`	200.5 ± 0.8	199.9	202.7	1.00
`pr_levm_Fibonacci`	244.5 ± 4.1	241.3	255.1	1.22 ± 0.02

Benchmark Results: FibonacciRecursive

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`main_revm_FibonacciRecursive`	929.0 ± 16.1	910.2	964.2	1.35 ± 0.03
`main_levm_FibonacciRecursive`	696.6 ± 11.6	685.5	725.3	1.02 ± 0.02
`pr_revm_FibonacciRecursive`	926.9 ± 17.3	904.4	961.1	1.35 ± 0.03
`pr_levm_FibonacciRecursive`	686.0 ± 9.2	675.0	705.7	1.00

Benchmark Results: ManyHashes

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`main_revm_ManyHashes`	8.4 ± 0.1	8.3	8.5	1.00
`main_levm_ManyHashes`	9.7 ± 0.1	9.6	9.8	1.15 ± 0.01
`pr_revm_ManyHashes`	8.6 ± 0.7	8.3	10.6	1.03 ± 0.08
`pr_levm_ManyHashes`	9.7 ± 0.1	9.6	9.9	1.16 ± 0.01

Benchmark Results: MstoreBench

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`main_revm_MstoreBench`	260.6 ± 10.8	253.2	287.8	1.13 ± 0.05
`main_levm_MstoreBench`	230.4 ± 1.8	228.4	233.6	1.00
`pr_revm_MstoreBench`	255.7 ± 4.8	252.2	264.7	1.11 ± 0.02
`pr_levm_MstoreBench`	236.0 ± 13.8	228.4	274.2	1.02 ± 0.06

Benchmark Results: Push

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`main_revm_Push`	294.9 ± 1.2	292.9	296.7	1.06 ± 0.01
`main_levm_Push`	278.6 ± 2.6	275.8	284.6	1.00
`pr_revm_Push`	295.3 ± 1.3	293.6	297.6	1.06 ± 0.01
`pr_levm_Push`	290.8 ± 30.7	276.0	374.5	1.04 ± 0.11

Benchmark Results: SstoreBench_no_opt

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`main_revm_SstoreBench_no_opt`	161.2 ± 3.4	157.4	165.2	1.27 ± 0.04
`main_levm_SstoreBench_no_opt`	127.5 ± 4.1	123.9	132.8	1.00 ± 0.04
`pr_revm_SstoreBench_no_opt`	165.3 ± 1.1	163.4	166.8	1.30 ± 0.04
`pr_levm_SstoreBench_no_opt`	127.0 ± 3.4	124.1	132.1	1.00

iovoid · 2026-02-18T16:04:40Z

crates/vm/levm/src/timings.rs

I feel like the addition of the SLOAD counters might belong in another PR.

…nd associated fixes Reverts commits 3d279be..d0896cf (4 commits): - d0896cf Run rustfmt - e4a016c fix(test): use separate writer/reader in trie_db_tests for RCU snapshot semantics - 8345d1d perf(l1): add block-scoped locked trie reads and storage-root memoization - 3d279be perf(l1): add SLOAD attribution counters behind perf_opcode_timings feature flag

Two fixes: - Format long FxHashMap type annotation in vm.rs to satisfy cargo fmt - Fix trie_db tests that wrote via put_batch then read back on the same BackendTrieDB instance. With the RCU snapshot pattern, the read view is captured at construction time and won't see subsequent writes. Create a fresh BackendTrieDB after writing to get an updated snapshot.

ilitteri added 4 commits February 13, 2026 15:13

ilitteri requested a review from a team as a code owner February 13, 2026 22:06

Copilot AI review requested due to automatic review settings February 13, 2026 22:06

github-actions bot assigned ilitteri Feb 13, 2026

github-actions bot added L1 Ethereum client performance Block execution throughput and performance in general labels Feb 13, 2026

github-project-automation bot added this to ethrex_l1 and ethrex_performance Feb 13, 2026

github-project-automation bot moved this to Todo in ethrex_performance Feb 13, 2026

Copilot started reviewing on behalf of ilitteri February 13, 2026 22:06 View session

greptile-apps bot reviewed Feb 13, 2026

View reviewed changes

crates/storage/backend/in_memory.rs Show resolved Hide resolved

crates/storage/store.rs Show resolved Hide resolved

crates/storage/layering.rs Outdated Show resolved Hide resolved

Copilot AI reviewed Feb 13, 2026

View reviewed changes

ilitteri changed the title ~~perf(l1): optimize storage layer for block execution (-15.5%)~~ perf(l1): optimize storage layer for block execution Feb 13, 2026

ilitteri added 3 commits February 17, 2026 10:00

Fix formatting in TrieWrapper::new to pass CI

fae79ad

fix(storage): make FKV reads snapshot-consistent

c23f3de

Add changelog entry for storage optimization PR #6207

acb06ac

edg-l reviewed Feb 17, 2026

View reviewed changes

crates/storage/backend/in_memory.rs Outdated Show resolved Hide resolved

docs(storage): document in-memory RCU COW tradeoff

98489f3

edg-l approved these changes Feb 17, 2026

View reviewed changes

perf(storage): use FxHashMap for in-memory table index

99c31db

fix(storage): keep in-memory prefix iteration deterministic

53d90e6

ilitteri added 4 commits February 18, 2026 02:05

ElFantasma approved these changes Feb 18, 2026

View reviewed changes

Run rustfmt

d0896cf

iovoid reviewed Feb 18, 2026

View reviewed changes

crates/vm/levm/src/timings.rs

Copy link

Contributor

iovoid Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the addition of the SLOAD counters might belong in another PR.

azteca1998 approved these changes Feb 19, 2026

View reviewed changes

github-project-automation bot moved this to In Review in ethrex_l1 Feb 19, 2026

ilitteri and others added 2 commits February 19, 2026 12:13

Merge branch 'main' into perf/storage-optimizations

68bd969

ilitteri added this pull request to the merge queue Feb 19, 2026

Merged via the queue into main with commit 82234ae Feb 19, 2026
67 checks passed

ilitteri deleted the perf/storage-optimizations branch February 19, 2026 20:33

github-project-automation bot moved this from Todo to Done in ethrex_performance Feb 19, 2026

github-project-automation bot moved this from In Review to Done in ethrex_l1 Feb 19, 2026

claude bot added the claude-code-assisted label Feb 20, 2026

Conversation

ilitteri commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description

1. BTreeMap → FxHashMap + Mutex → RwLock

2. RCU pattern + held read view

3. Shared read view across trie opens

4. Pre-compute prefix nibbles in TrieWrapper

Benchmark Results (ethrex-replay)

How to Test

Uh oh!

github-actions bot commented Feb 13, 2026

🤖 Kimi Code Review

Review Summary

Issues Found

1. Critical: Atomicity Violation in InMemoryBackend (in_memory.rs:33-34, 153-154, 168-169)

2. Performance: Unnecessary Arc Cloning (in_memory.rs:44-48)

3. Memory Safety: Lifetime Mismatch (api/mod.rs:37)

4. Code Duplication (store.rs:2422-2599)

5. Error Handling Inconsistency (layering.rs:195-210)

Recommendations

Uh oh!

github-actions bot commented Feb 13, 2026

🤖 Codex Code Review

Uh oh!

github-actions bot commented Feb 13, 2026

🤖 Claude Code Review

PR Review: Performance Optimization of Storage Layer (-15.5%)

Summary

Critical Issues

1. Type inconsistency in InMemoryBackend (crates/storage/backend/in_memory.rs:11)

2. Potential Arc clone overhead in get_storage_at_root (crates/storage/store.rs:2104-2106, 2118-2120)

3. Missing Send + Sync bounds verification (crates/storage/api/mod.rs:58)

Performance Concerns

4. Arc::make_mut potential COW overhead (crates/storage/backend/in_memory.rs:33, 153, 169)

5. Prefix nibble computation still allocates (crates/storage/layering.rs:207-208, 233-236, 241-244)

Code Quality Issues

6. Redundant field in BackendTrieDB (crates/storage/trie.rs:14-15)

7. Removed apply_prefix usage not fully deleted (crates/storage/trie.rs:6, 111)

Correctness

8. RwLock poisoning behavior (crates/storage/store.rs:2093-2094)

9. Trie cache RCU consistency (crates/storage/store.rs:2759-2760)

Minor Issues

10. Missing type documentation (crates/storage/backend/in_memory.rs:10-11)

11. TrieWrapper::new could take ownership (crates/storage/layering.rs:201-215)

Positive Observations

Recommendations

Security Assessment

Verdict

Uh oh!

github-actions bot commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Lines of code report

Uh oh!

greptile-apps bot commented Feb 13, 2026

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

github-actions bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Block Execution Results Comparison Against Main

Uh oh!

ElFantasma Feb 18, 2026

Choose a reason for hiding this comment

ilitteri commented Feb 13, 2026 •

edited

Loading

github-actions bot commented Feb 13, 2026 •

edited

Loading

github-actions bot commented Feb 17, 2026 •

edited

Loading

github-actions bot commented Feb 18, 2026 •

edited

Loading