fish-speech.tools.api_server --compile Error: accessing tensor output of CUDAGraphs that has been overwritten by a subsequent run.

### Self Checks

- [x] This template is only for bug reports. For questions, please visit [Discussions](https://github.com/fishaudio/fish-speech/discussions).
- [x] I have thoroughly reviewed the project documentation (installation, training, inference) but couldn't find information to solve my problem. [English](https://speech.fish.audio/) [&#20013;&#25991;](https://speech.fish.audio/zh/) [&#26085;&#26412;&#35486;](https://speech.fish.audio/ja/) [Portuguese (Brazil)](https://speech.fish.audio/pt/)
- [x] I have searched for existing issues, including closed ones. [Search issues](https://github.com/fishaudio/fish-speech/issues)
- [x] I confirm that I am using English to submit this report (&#25105;&#24050;&#38405;&#35835;&#24182;&#21516;&#24847; [Language Policy](https://github.com/fishaudio/fish-speech/issues/515)).
- [x] [FOR CHINESE USERS] &#35831;&#21153;&#24517;&#20351;&#29992;&#33521;&#25991;&#25552;&#20132; Issue&#65292;&#21542;&#21017;&#20250;&#34987;&#20851;&#38381;&#12290;&#35874;&#35874;&#65281;:&#65289;
- [x] Please do not modify this template and fill in all required fields.

### Cloud or Self Hosted

Self Hosted (Source)

### Environment Details

Windows 10, Python3.11, torch==2.6.0+cu126, latest Triton for windows

### Steps to Reproduce

I run the command: 
```
python -m fish-speech.tools.api_server --listen 0.0.0.0:8080 --llama-checkpoint-path "checkpoints/fish-speech-1.5" --decoder-checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" --decoder-config-name firefly_gan_vq --compile
```


### &#10004;&#65039; Expected Behavior

I expect fish speech server to run, and compile with Torch so it can be fast ( i need realtime tts )

### &#10060; Actual Behavior

INFO:     Started server process [29352]
INFO:     Waiting for application startup.
2025-05-07 13:21:20.841 | INFO     | fish_speech.models.text2semantic.inference:load_model:683 - Restored model from checkpoint
2025-05-07 13:21:20.841 | INFO     | fish_speech.models.text2semantic.inference:load_model:689 - Using DualARTransformer
2025-05-07 13:21:20.842 | INFO     | fish_speech.models.text2semantic.inference:load_model:697 - Compiling function...
2025-05-07 13:21:20.907 | INFO     | tools.server.model_manager:load_llama_model:99 - LLAMA model loaded.
D:\Python\Python311\Lib\site-packages\vector_quantize_pytorch\vector_quantize_pytorch.py:445: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
D:\Python\Python311\Lib\site-packages\vector_quantize_pytorch\vector_quantize_pytorch.py:630: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
D:\Python\Python311\Lib\site-packages\vector_quantize_pytorch\finite_scalar_quantization.py:147: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
D:\Python\Python311\Lib\site-packages\vector_quantize_pytorch\lookup_free_quantization.py:209: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
2025-05-07 13:21:23.808 | INFO     | fish_speech.models.vqgan.inference:load_model:46 - Loaded model: <All keys matched successfully>
2025-05-07 13:21:23.809 | INFO     | tools.server.model_manager:load_decoder_model:107 - Decoder model loaded.
2025-05-07 13:21:23.824 | INFO     | fish_speech.models.text2semantic.inference:generate_long:790 - Encoded text: Hello world.
2025-05-07 13:21:23.826 | INFO     | fish_speech.models.text2semantic.inference:generate_long:808 - Generating sentence 1/1 of sample 1/1
  0%|                                                                                         | 0/1023 [00:00<?, ?it/s]D:\Python\Python311\Lib\contextlib.py:105: FutureWarning: `torch.backends.cuda.sdp_kernel()` is deprecated. In the future, this context manager will be removed. Please see `torch.nn.attention.sdpa_kernel()` for the new context manager, with updated signature.
  self.gen = func(*args, **kwds)
  0%|                                                                             | 1/1023 [03:45<64:03:51, 225.67s/it]D:\Python\Python311\Lib\contextlib.py:105: FutureWarning: `torch.backends.cuda.sdp_kernel()` is deprecated. In the future, this context manager will be removed. Please see `torch.nn.attention.sdpa_kernel()` for the new context manager, with updated signature.
  self.gen = func(*args, **kwds)
  0%|                                                                             | 1/1023 [03:45<64:04:06, 225.68s/it]
ERROR:    Traceback (most recent call last):
  File "D:\Python\Python311\Lib\site-packages\kui\asgi\lifespan.py", line 36, in __call__
    await result
  File "D:\2025\Call Center Agent X\fish-speech\tools\api_server.py", line 83, in initialize_app
    app.state.model_manager = ModelManager(
                              ^^^^^^^^^^^^^
  File "D:\2025\Call Center Agent X\fish-speech\tools\server\model_manager.py", line 65, in __init__
    self.warm_up(self.tts_inference_engine)
  File "D:\2025\Call Center Agent X\fish-speech\tools\server\model_manager.py", line 121, in warm_up
    list(inference(request, tts_inference_engine))
  File "D:\2025\Call Center Agent X\fish-speech\tools\server\inference.py", line 25, in inference_wrapper
    raise HTTPException(
baize.exceptions.HTTPException: (500, '\'Error: accessing tensor output of CUDAGraphs that has been overwritten by a subsequent run. Stack trace: File "D:\\\\2025\\\\Call Center Agent X\\\\fish-speech\\\\fish_speech\\\\models\\\\text2semantic\\\\inference.py", line 307, in decode_one_token_ar\\n    codebooks = torch.stack(codebooks, dim=0). To prevent overwriting, clone the tensor outside of torch.compile() or call torch.compiler.cudagraph_mark_step_begin() before each model invocation.\'')

ERROR:    Application startup failed. Exiting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fish-speech.tools.api_server --compile Error: accessing tensor output of CUDAGraphs that has been overwritten by a subsequent run. #967

Self Checks

Cloud or Self Hosted

Environment Details

Steps to Reproduce

✔️ Expected Behavior

❌ Actual Behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

fish-speech.tools.api_server --compile Error: accessing tensor output of CUDAGraphs that has been overwritten by a subsequent run. #967

Description

Self Checks

Cloud or Self Hosted

Environment Details

Steps to Reproduce

✔️ Expected Behavior

❌ Actual Behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions