Skip to content

Legacy codegen ~5000× slower than via-IR on multiplied modifier (m m m m) with many _; placeholders #16699

@msooseth

Description

@msooseth

Description

A small (~3.2 KB, 126 lines) Solidity source consisting of a single modifier with many _; placeholders applied four times (m m m m) on a function causes the legacy codegen + optimizer pipeline to take 141.6 s to compile, compared to 28 ms for --via-ir --optimize — roughly a 5000× slowdown. The legacy pipeline without --optimize is also slow (10.3 s); both legacy configurations use ~1.5 GB of peak RAM, vs 19 MB for the via-IR pipelines.

Configuration Time Peak RSS
--via-ir 46 ms 19 MB
--via-ir --optimize 28 ms 19 MB
--via-ir --optimize --experimental --via-ssa-cfg 30 ms 19 MB
legacy (no opt) 10 313 ms 1547 MB
legacy --optimize 141 595 ms 1522 MB

Found via differential fuzzing.

Environment

  • Compiler version: 0.8.35-develop.2026.5.7+commit.b83005c9.Linux.g++
  • Compilation pipeline (legacy, IR, EOF): legacy is affected (both with and without --optimize); all three IR-based configurations are fast.
  • Target EVM version (as per compiler settings): osaka
  • Framework/IDE: solc command line
  • EVM execution environment / backend / blockchain client: N/A — pure compilation
  • Operating system: Linux 7.0.3-arch1-2

Steps to Reproduce

solc --bin --evm-version osaka --optimize C.sol  # ~141 s
solc --bin --evm-version osaka            C.sol  # ~10 s

Full source (126 lines): source.sol.

Inline source (click to expand)
contract C {
    uint256 public x;
    modifier m() {
        for (uint256 i; i < 10; i++) {         _;
            return;
            for ( i; i < 10; i++) {
            {
            {
            assembly {
                for { let a := 0} lt(a,1) { a := add(a, 1) } { continue let b := 42 }
                for { let a := 0} lt(a,1) { a := add(a, 1) } { continue let b := 42 }
            }
            }
            return;
            for (uint256 i; i < 10; i++) {
                _;
                uint t;
                uint8 x = 0xff;
                for (uint256 i; i < 10; i++) { _; return; ++x; }
            }
            }
            { /* more nested for/_;/return/assembly … */ }
            /* 12 `_;` placeholders in total in this modifier body, interleaved
               with `return;`, deeply-nested for-loops and inline assembly */
            ...
        }
    }

    function f() public m  m m m returns (uint) {  // modifier applied 4 times
        for (uint256 i = 0; i < 10; i++) {
            ++x;
        }
    }
}

The full text is in the gist. The salient feature is that modifier m contains 12 _; placeholder sites, and is applied four times on f(). In the legacy codegen each placeholder site inlines the next modifier's body, so the chain m m m m f materialises on the order of 12⁴ ≈ 2 × 10⁴ inlined copies of the inner code path, embedded inside heavily-nested control-flow (nested for-loops, inline assembly with continue/dead-store patterns, and many return; statements that the legacy frontend leaves in place).

Nature of the slowdown

The slowdown is a legacy-codegen compile-time issue, not a runtime / output bug. All five configurations produce bytecode successfully. The via-IR pipeline (with or without optimizer, and with or without SSA-CFG) handles this source in under 50 ms; only the legacy pipeline is slow:

  1. Without --optimize, legacy already takes ~10 s and 1.5 GB of RAM. Even the no-optimize legacy build runs the peephole optimiser and JumpdestRemover, which dominate.
  2. With --optimize, legacy jumps to ~142 s. The added cost is overwhelmingly in evmasm::BlockDeduplicator::deduplicate() on the assembled EVM code.

The Solidity feature driving the blow-up is modifier application multiplied across _; placeholders: m m m m on f, where m contains 12 _;, produces a multiplicative expansion of the inner body inside the legacy ContractCompiler::appendModifierOrFunctionCode chain. Each instance generates structurally similar EVM blocks, feeding the assembly optimiser an enormous list of basic blocks.

Relevant perf data

Full perf top-50 reports and flamegraphs are in the attached gist; the salient parts:

legacy --optimize — 141.6 s (flamegraph, perf top50)

93.09%  evmasm::Assembly::optimiseInternal
65.99%  evmasm::BlockDeduplicator::deduplicate
57.02% 25.19%  BlockDeduplicator::deduplicate lambda                  ← 25% self
17.31%  9.66%  BlockDeduplicator::BlockIterator::operator++           ← ~10% self
 9.80%  9.79%  evmasm::AssemblyItem::instruction                      ← ~10% self
 8.68%  CommonSubexpressionEliminator::getOptimizedItems
 7.91%  5.98%  evmasm::SemanticInformation::altersControlFlow         ← ~6% self
 7.06%  PeepholeOptimiser::optimise

BlockDeduplicator::deduplicate alone accounts for ~66% of the entire 141 s run, with ~25% self-time in its per-pair comparison lambda.

legacy (no opt) — 10.3 s (flamegraph, perf top50)

84.49%  evmasm::Assembly::optimiseInternal     ← runs even without --optimize
66.42%  6.41%   PeepholeOptimiser::optimise
50.93% 16.78%   applyMethods<PushPop, OpPop, OpStop, …>     ← ~17% self
14.56%  9.32%   AssemblyItem::operator==(Instruction)        ← ~9% self
14.35%  3.64%   JumpdestRemover::optimise
13.76% 13.59%   std::vector<AssemblyItem>::push_back
 9.13%  9.12%   evmasm::AssemblyItem::instruction
 7.50%  7.43%   AssemblyItem::bytesRequired

Even without --optimize, the legacy assembler still runs a peephole pass and JumpdestRemover; together they dominate the 10 s run — the input EVM assembly is simply very large.

via-IR pipelines — under 50 ms each

All three via-IR configurations finish in ~30–46 ms and use 19 MB. Profiles are essentially flat (see attached). The IR pipeline either avoids producing the same item explosion (dead-code / reachability done earlier) or handles it without quadratic blow-up.

Potential reasons

A few observations, presented without proposing a fix:

  • BlockDeduplicator::deduplicate is ~66% of the legacy --optimize run, with 25% self-time in the per-pair comparison lambda and 10% self in the iterator advance. The dominant frames are consistent with roughly quadratic block-pair comparison over a very long block list.

  • Modifier-inlining multiplier. m has 12 _; sites, and f is decorated with m m m m. In the legacy ContractCompiler pipeline each _; inlines the next modifier's body in full, so the chain produces on the order of 12⁴ ≈ 2 × 10⁴ copies of the inner body — plus the modifier-body assembly itself replicated similarly. The legacy frontend does not drop post-return; regions before emitting EVM, so dead branches still reach the optimizer.

  • Even without --optimize (legacy, 10 s), peephole + JumpdestRemover dominate, with applyMethods<…> at 17% self and AssemblyItem::operator== at 9% self. The size of the emitted item vector — driven by the modifier-inlining blow-up above — appears to be the underlying driver across both legacy configurations.

  • All three via-IR pipelines are unaffected (28–46 ms, 19 MB). The two codegens scale very differently on modifier-heavy / dead-code-heavy inputs (contrast with e.g. Slow compilation under via-ir & SSA-CFG on deeply nested try/catch #16697, where via-IR is the slow path).

Attachments

All artefacts are in a single gist: https://gist.github.com/msooseth/b221f0ac40d878147f1209370050558a

  • source.sol — the full reproducer (126 lines)
  • noOpt_viaIR=false.perf_top50.txt, opt_viaIR=false.perf_top50.txt — slow legacy profiles
  • noOpt_viaIR=true.perf_top50.txt, opt_viaIR=true.perf_top50.txt, opt_ssaCFG.perf_top50.txt — fast via-IR reference profiles
  • corresponding *.flamegraph.svg files for each configuration

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions