Legacy codegen ~5000&times; slower than via-IR on multiplied modifier (`m m m m`) with many `_;` placeholders

## Description

A small (~3.2 KB, 126 lines) Solidity source consisting of a single modifier with many `_;` placeholders applied four times (`m m m m`) on a function causes the **legacy codegen + optimizer pipeline to take 141.6 s** to compile, compared to **28 ms for `--via-ir --optimize`** &mdash; roughly a **5000&times; slowdown**. The legacy pipeline without `--optimize` is also slow (10.3 s); both legacy configurations use ~1.5 GB of peak RAM, vs 19 MB for the via-IR pipelines.

| Configuration                                      | Time       | Peak RSS |
| -------------------------------------------------- | ---------- | -------- |
| `--via-ir`                                         | 46 ms      | 19 MB    |
| `--via-ir --optimize`                              | 28 ms      | 19 MB    |
| `--via-ir --optimize --experimental --via-ssa-cfg` | 30 ms      | 19 MB    |
| legacy (no opt)                                    | 10 313 ms  | 1547 MB  |
| legacy `--optimize`                                | 141 595 ms | 1522 MB  |

Found via differential fuzzing.

## Environment

- Compiler version: `0.8.35-develop.2026.5.7+commit.b83005c9.Linux.g++`
- Compilation pipeline (legacy, IR, EOF): **legacy is affected** (both with and without `--optimize`); all three IR-based configurations are fast.
- Target EVM version (as per compiler settings): `osaka`
- Framework/IDE: `solc` command line
- EVM execution environment / backend / blockchain client: N/A &mdash; pure compilation
- Operating system: `Linux 7.0.3-arch1-2`

## Steps to Reproduce

```bash
solc --bin --evm-version osaka --optimize C.sol  # ~141 s
solc --bin --evm-version osaka            C.sol  # ~10 s
```

Full source (126 lines): [`source.sol`](https://gist.github.com/msooseth/b221f0ac40d878147f1209370050558a#file-source-sol).

<details>
<summary>Inline source (click to expand)</summary>

```solidity
contract C {
    uint256 public x;
    modifier m() {
        for (uint256 i; i < 10; i++) {         _;
            return;
            for ( i; i < 10; i++) {
            {
            {
            assembly {
                for { let a := 0} lt(a,1) { a := add(a, 1) } { continue let b := 42 }
                for { let a := 0} lt(a,1) { a := add(a, 1) } { continue let b := 42 }
            }
            }
            return;
            for (uint256 i; i < 10; i++) {
                _;
                uint t;
                uint8 x = 0xff;
                for (uint256 i; i < 10; i++) { _; return; ++x; }
            }
            }
            { /* more nested for/_;/return/assembly &hellip; */ }
            /* 12 `_;` placeholders in total in this modifier body, interleaved
               with `return;`, deeply-nested for-loops and inline assembly */
            ...
        }
    }

    function f() public m  m m m returns (uint) {  // modifier applied 4 times
        for (uint256 i = 0; i < 10; i++) {
            ++x;
        }
    }
}
```

The full text is in the gist. The salient feature is that **modifier `m` contains 12 `_;` placeholder sites**, and is applied **four times** on `f()`. In the legacy codegen each placeholder site inlines the next modifier's body, so the chain `m m m m f` materialises on the order of `12&#8308; &asymp; 2 &times; 10&#8308;` inlined copies of the inner code path, embedded inside heavily-nested control-flow (nested for-loops, inline assembly with continue/dead-store patterns, and many `return;` statements that the legacy frontend leaves in place).

</details>

## Nature of the slowdown

The slowdown is a **legacy-codegen compile-time issue**, not a runtime / output bug. All five configurations produce bytecode successfully. The via-IR pipeline (with or without optimizer, and with or without SSA-CFG) handles this source in under 50 ms; only the legacy pipeline is slow:

1. **Without `--optimize`, legacy already takes ~10 s** and 1.5 GB of RAM. Even the no-optimize legacy build runs the peephole optimiser and `JumpdestRemover`, which dominate.
2. **With `--optimize`, legacy jumps to ~142 s**. The added cost is overwhelmingly in `evmasm::BlockDeduplicator::deduplicate()` on the assembled EVM code.

The Solidity feature driving the blow-up is **modifier application multiplied across `_;` placeholders**: `m m m m` on `f`, where `m` contains 12 `_;`, produces a multiplicative expansion of the inner body inside the legacy `ContractCompiler::appendModifierOrFunctionCode` chain. Each instance generates structurally similar EVM blocks, feeding the assembly optimiser an enormous list of basic blocks.

## Relevant perf data

Full perf top-50 reports and flamegraphs are in the attached gist; the salient parts:

### legacy `--optimize` &mdash; 141.6 s ([flamegraph](https://gist.github.com/msooseth/b221f0ac40d878147f1209370050558a#file-opt_viair-false-flamegraph-svg), [perf top50](https://gist.github.com/msooseth/b221f0ac40d878147f1209370050558a#file-opt_viair-false-perf_top50-txt))

```
93.09%  evmasm::Assembly::optimiseInternal
65.99%  evmasm::BlockDeduplicator::deduplicate
57.02% 25.19%  BlockDeduplicator::deduplicate lambda                  &larr; 25% self
17.31%  9.66%  BlockDeduplicator::BlockIterator::operator++           &larr; ~10% self
 9.80%  9.79%  evmasm::AssemblyItem::instruction                      &larr; ~10% self
 8.68%  CommonSubexpressionEliminator::getOptimizedItems
 7.91%  5.98%  evmasm::SemanticInformation::altersControlFlow         &larr; ~6% self
 7.06%  PeepholeOptimiser::optimise
```

`BlockDeduplicator::deduplicate` alone accounts for ~66% of the entire 141 s run, with ~25% self-time in its per-pair comparison lambda.

### legacy (no opt) &mdash; 10.3 s ([flamegraph](https://gist.github.com/msooseth/b221f0ac40d878147f1209370050558a#file-noopt_viair-false-flamegraph-svg), [perf top50](https://gist.github.com/msooseth/b221f0ac40d878147f1209370050558a#file-noopt_viair-false-perf_top50-txt))

```
84.49%  evmasm::Assembly::optimiseInternal     &larr; runs even without --optimize
66.42%  6.41%   PeepholeOptimiser::optimise
50.93% 16.78%   applyMethods<PushPop, OpPop, OpStop, &hellip;>     &larr; ~17% self
14.56%  9.32%   AssemblyItem::operator==(Instruction)        &larr; ~9% self
14.35%  3.64%   JumpdestRemover::optimise
13.76% 13.59%   std::vector<AssemblyItem>::push_back
 9.13%  9.12%   evmasm::AssemblyItem::instruction
 7.50%  7.43%   AssemblyItem::bytesRequired
```

Even without `--optimize`, the legacy assembler still runs a peephole pass and `JumpdestRemover`; together they dominate the 10 s run &mdash; the input EVM assembly is simply very large.

### via-IR pipelines &mdash; under 50 ms each

All three via-IR configurations finish in ~30&ndash;46 ms and use 19 MB. Profiles are essentially flat (see attached). The IR pipeline either avoids producing the same item explosion (dead-code / reachability done earlier) or handles it without quadratic blow-up.

## Potential reasons

A few observations, presented without proposing a fix:

- **`BlockDeduplicator::deduplicate` is ~66% of the legacy `--optimize` run**, with 25% self-time in the per-pair comparison lambda and 10% self in the iterator advance. The dominant frames are consistent with roughly quadratic block-pair comparison over a very long block list.

- **Modifier-inlining multiplier**. `m` has 12 `_;` sites, and `f` is decorated with `m m m m`. In the legacy `ContractCompiler` pipeline each `_;` inlines the next modifier's body in full, so the chain produces on the order of `12&#8308; &asymp; 2 &times; 10&#8308;` copies of the inner body &mdash; plus the modifier-body assembly itself replicated similarly. The legacy frontend does not drop post-`return;` regions before emitting EVM, so dead branches still reach the optimizer.

- **Even without `--optimize` (legacy, 10 s)**, peephole + `JumpdestRemover` dominate, with `applyMethods<&hellip;>` at 17% self and `AssemblyItem::operator==` at 9% self. The size of the emitted item vector &mdash; driven by the modifier-inlining blow-up above &mdash; appears to be the underlying driver across both legacy configurations.

- **All three via-IR pipelines are unaffected** (28&ndash;46 ms, 19 MB). The two codegens scale very differently on modifier-heavy / dead-code-heavy inputs (contrast with e.g. #16697, where via-IR is the slow path).

## Attachments

All artefacts are in a single gist: https://gist.github.com/msooseth/b221f0ac40d878147f1209370050558a

- `source.sol` &mdash; the full reproducer (126 lines)
- `noOpt_viaIR=false.perf_top50.txt`, `opt_viaIR=false.perf_top50.txt` &mdash; slow legacy profiles
- `noOpt_viaIR=true.perf_top50.txt`, `opt_viaIR=true.perf_top50.txt`, `opt_ssaCFG.perf_top50.txt` &mdash; fast via-IR reference profiles
- corresponding `*.flamegraph.svg` files for each configuration


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Legacy codegen ~5000× slower than via-IR on multiplied modifier (`m m m m`) with many `_;` placeholders #16699

Description

Environment

Steps to Reproduce

Nature of the slowdown

Relevant perf data

legacy `--optimize` — 141.6 s (flamegraph, perf top50)

legacy (no opt) — 10.3 s (flamegraph, perf top50)

via-IR pipelines — under 50 ms each

Potential reasons

Attachments

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Configuration	Time	Peak RSS
`--via-ir`	46 ms	19 MB
`--via-ir --optimize`	28 ms	19 MB
`--via-ir --optimize --experimental --via-ssa-cfg`	30 ms	19 MB
legacy (no opt)	10 313 ms	1547 MB
legacy `--optimize`	141 595 ms	1522 MB

Legacy codegen ~5000× slower than via-IR on multiplied modifier (m m m m) with many _; placeholders #16699

Description

Description

Environment

Steps to Reproduce

Nature of the slowdown

Relevant perf data

legacy --optimize — 141.6 s (flamegraph, perf top50)

legacy (no opt) — 10.3 s (flamegraph, perf top50)

via-IR pipelines — under 50 ms each

Potential reasons

Attachments

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Legacy codegen ~5000× slower than via-IR on multiplied modifier (`m m m m`) with many `_;` placeholders #16699

legacy `--optimize` — 141.6 s (flamegraph, perf top50)