You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A small (~3.2 KB, 126 lines) Solidity source consisting of a single modifier with many _; placeholders applied four times (m m m m) on a function causes the legacy codegen + optimizer pipeline to take 141.6 s to compile, compared to 28 ms for --via-ir --optimize — roughly a 5000× slowdown. The legacy pipeline without --optimize is also slow (10.3 s); both legacy configurations use ~1.5 GB of peak RAM, vs 19 MB for the via-IR pipelines.
contractC {
uint256public x;
modifier m() {
for (uint256 i; i <10; i++) { _;return;
for ( i; i <10; i++) {
{
{
assembly {
for { let a :=0} lt(a,1) { a :=add(a, 1) } { continuelet b :=42 }
for { let a :=0} lt(a,1) { a :=add(a, 1) } { continuelet b :=42 }
}
}
return;
for (uint256 i; i <10; i++) {
_;uint t;
uint8 x =0xff;
for (uint256 i; i <10; i++) { _;return; ++x; }
}
}
{ /* more nested for/_;/return/assembly … */ }
/* 12 `_;` placeholders in total in this modifier body, interleaved with `return;`, deeply-nested for-loops and inline assembly */
...
}
}
function f() public m m m m returns (uint) { // modifier applied 4 timesfor (uint256 i =0; i <10; i++) {
++x;
}
}
}
The full text is in the gist. The salient feature is that modifier m contains 12 _; placeholder sites, and is applied four times on f(). In the legacy codegen each placeholder site inlines the next modifier's body, so the chain m m m m f materialises on the order of 12⁴ ≈ 2 × 10⁴ inlined copies of the inner code path, embedded inside heavily-nested control-flow (nested for-loops, inline assembly with continue/dead-store patterns, and many return; statements that the legacy frontend leaves in place).
Nature of the slowdown
The slowdown is a legacy-codegen compile-time issue, not a runtime / output bug. All five configurations produce bytecode successfully. The via-IR pipeline (with or without optimizer, and with or without SSA-CFG) handles this source in under 50 ms; only the legacy pipeline is slow:
Without --optimize, legacy already takes ~10 s and 1.5 GB of RAM. Even the no-optimize legacy build runs the peephole optimiser and JumpdestRemover, which dominate.
With --optimize, legacy jumps to ~142 s. The added cost is overwhelmingly in evmasm::BlockDeduplicator::deduplicate() on the assembled EVM code.
The Solidity feature driving the blow-up is modifier application multiplied across _; placeholders: m m m m on f, where m contains 12 _;, produces a multiplicative expansion of the inner body inside the legacy ContractCompiler::appendModifierOrFunctionCode chain. Each instance generates structurally similar EVM blocks, feeding the assembly optimiser an enormous list of basic blocks.
Relevant perf data
Full perf top-50 reports and flamegraphs are in the attached gist; the salient parts:
Even without --optimize, the legacy assembler still runs a peephole pass and JumpdestRemover; together they dominate the 10 s run — the input EVM assembly is simply very large.
via-IR pipelines — under 50 ms each
All three via-IR configurations finish in ~30–46 ms and use 19 MB. Profiles are essentially flat (see attached). The IR pipeline either avoids producing the same item explosion (dead-code / reachability done earlier) or handles it without quadratic blow-up.
Potential reasons
A few observations, presented without proposing a fix:
BlockDeduplicator::deduplicate is ~66% of the legacy --optimize run, with 25% self-time in the per-pair comparison lambda and 10% self in the iterator advance. The dominant frames are consistent with roughly quadratic block-pair comparison over a very long block list.
Modifier-inlining multiplier. m has 12 _; sites, and f is decorated with m m m m. In the legacy ContractCompiler pipeline each _; inlines the next modifier's body in full, so the chain produces on the order of 12⁴ ≈ 2 × 10⁴ copies of the inner body — plus the modifier-body assembly itself replicated similarly. The legacy frontend does not drop post-return; regions before emitting EVM, so dead branches still reach the optimizer.
Even without --optimize (legacy, 10 s), peephole + JumpdestRemover dominate, with applyMethods<…> at 17% self and AssemblyItem::operator== at 9% self. The size of the emitted item vector — driven by the modifier-inlining blow-up above — appears to be the underlying driver across both legacy configurations.
Description
A small (~3.2 KB, 126 lines) Solidity source consisting of a single modifier with many
_;placeholders applied four times (m m m m) on a function causes the legacy codegen + optimizer pipeline to take 141.6 s to compile, compared to 28 ms for--via-ir --optimize— roughly a 5000× slowdown. The legacy pipeline without--optimizeis also slow (10.3 s); both legacy configurations use ~1.5 GB of peak RAM, vs 19 MB for the via-IR pipelines.--via-ir--via-ir --optimize--via-ir --optimize --experimental --via-ssa-cfg--optimizeFound via differential fuzzing.
Environment
0.8.35-develop.2026.5.7+commit.b83005c9.Linux.g++--optimize); all three IR-based configurations are fast.osakasolccommand lineLinux 7.0.3-arch1-2Steps to Reproduce
Full source (126 lines):
source.sol.Inline source (click to expand)
The full text is in the gist. The salient feature is that modifier
mcontains 12_;placeholder sites, and is applied four times onf(). In the legacy codegen each placeholder site inlines the next modifier's body, so the chainm m m m fmaterialises on the order of12⁴ ≈ 2 × 10⁴inlined copies of the inner code path, embedded inside heavily-nested control-flow (nested for-loops, inline assembly with continue/dead-store patterns, and manyreturn;statements that the legacy frontend leaves in place).Nature of the slowdown
The slowdown is a legacy-codegen compile-time issue, not a runtime / output bug. All five configurations produce bytecode successfully. The via-IR pipeline (with or without optimizer, and with or without SSA-CFG) handles this source in under 50 ms; only the legacy pipeline is slow:
--optimize, legacy already takes ~10 s and 1.5 GB of RAM. Even the no-optimize legacy build runs the peephole optimiser andJumpdestRemover, which dominate.--optimize, legacy jumps to ~142 s. The added cost is overwhelmingly inevmasm::BlockDeduplicator::deduplicate()on the assembled EVM code.The Solidity feature driving the blow-up is modifier application multiplied across
_;placeholders:m m m monf, wheremcontains 12_;, produces a multiplicative expansion of the inner body inside the legacyContractCompiler::appendModifierOrFunctionCodechain. Each instance generates structurally similar EVM blocks, feeding the assembly optimiser an enormous list of basic blocks.Relevant perf data
Full perf top-50 reports and flamegraphs are in the attached gist; the salient parts:
legacy
--optimize— 141.6 s (flamegraph, perf top50)BlockDeduplicator::deduplicatealone accounts for ~66% of the entire 141 s run, with ~25% self-time in its per-pair comparison lambda.legacy (no opt) — 10.3 s (flamegraph, perf top50)
Even without
--optimize, the legacy assembler still runs a peephole pass andJumpdestRemover; together they dominate the 10 s run — the input EVM assembly is simply very large.via-IR pipelines — under 50 ms each
All three via-IR configurations finish in ~30–46 ms and use 19 MB. Profiles are essentially flat (see attached). The IR pipeline either avoids producing the same item explosion (dead-code / reachability done earlier) or handles it without quadratic blow-up.
Potential reasons
A few observations, presented without proposing a fix:
BlockDeduplicator::deduplicateis ~66% of the legacy--optimizerun, with 25% self-time in the per-pair comparison lambda and 10% self in the iterator advance. The dominant frames are consistent with roughly quadratic block-pair comparison over a very long block list.Modifier-inlining multiplier.
mhas 12_;sites, andfis decorated withm m m m. In the legacyContractCompilerpipeline each_;inlines the next modifier's body in full, so the chain produces on the order of12⁴ ≈ 2 × 10⁴copies of the inner body — plus the modifier-body assembly itself replicated similarly. The legacy frontend does not drop post-return;regions before emitting EVM, so dead branches still reach the optimizer.Even without
--optimize(legacy, 10 s), peephole +JumpdestRemoverdominate, withapplyMethods<…>at 17% self andAssemblyItem::operator==at 9% self. The size of the emitted item vector — driven by the modifier-inlining blow-up above — appears to be the underlying driver across both legacy configurations.All three via-IR pipelines are unaffected (28–46 ms, 19 MB). The two codegens scale very differently on modifier-heavy / dead-code-heavy inputs (contrast with e.g. Slow compilation under via-ir & SSA-CFG on deeply nested try/catch #16697, where via-IR is the slow path).
Attachments
All artefacts are in a single gist: https://gist.github.com/msooseth/b221f0ac40d878147f1209370050558a
source.sol— the full reproducer (126 lines)noOpt_viaIR=false.perf_top50.txt,opt_viaIR=false.perf_top50.txt— slow legacy profilesnoOpt_viaIR=true.perf_top50.txt,opt_viaIR=true.perf_top50.txt,opt_ssaCFG.perf_top50.txt— fast via-IR reference profiles*.flamegraph.svgfiles for each configuration