Description
Self Checks
- This template is only for bug reports. For questions, please visit Discussions.
- I have thoroughly reviewed the project documentation (installation, training, inference) but couldn't find information to solve my problem. English 中文 日本語 Portuguese (Brazil)
- I have searched for existing issues, including closed ones. Search issues
- I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- Please do not modify this template and fill in all required fields.
Cloud or Self Hosted
Self Hosted (Source)
Environment Details
Windows 10, Python3.11, torch==2.6.0+cu126, latest Triton for windows
Steps to Reproduce
I run the command:
python -m fish-speech.tools.api_server --listen 0.0.0.0:8080 --llama-checkpoint-path "checkpoints/fish-speech-1.5" --decoder-checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" --decoder-config-name firefly_gan_vq --compile
✔️ Expected Behavior
I expect fish speech server to run, and compile with Torch so it can be fast ( i need realtime tts )
❌ Actual Behavior
INFO: Started server process [29352]
INFO: Waiting for application startup.
2025-05-07 13:21:20.841 | INFO | fish_speech.models.text2semantic.inference:load_model:683 - Restored model from checkpoint
2025-05-07 13:21:20.841 | INFO | fish_speech.models.text2semantic.inference:load_model:689 - Using DualARTransformer
2025-05-07 13:21:20.842 | INFO | fish_speech.models.text2semantic.inference:load_model:697 - Compiling function...
2025-05-07 13:21:20.907 | INFO | tools.server.model_manager:load_llama_model:99 - LLAMA model loaded.
D:\Python\Python311\Lib\site-packages\vector_quantize_pytorch\vector_quantize_pytorch.py:445: FutureWarning: torch.cuda.amp.autocast(args...)
is deprecated. Please use torch.amp.autocast('cuda', args...)
instead.
@autocast(enabled = False)
D:\Python\Python311\Lib\site-packages\vector_quantize_pytorch\vector_quantize_pytorch.py:630: FutureWarning: torch.cuda.amp.autocast(args...)
is deprecated. Please use torch.amp.autocast('cuda', args...)
instead.
@autocast(enabled = False)
D:\Python\Python311\Lib\site-packages\vector_quantize_pytorch\finite_scalar_quantization.py:147: FutureWarning: torch.cuda.amp.autocast(args...)
is deprecated. Please use torch.amp.autocast('cuda', args...)
instead.
@autocast(enabled = False)
D:\Python\Python311\Lib\site-packages\vector_quantize_pytorch\lookup_free_quantization.py:209: FutureWarning: torch.cuda.amp.autocast(args...)
is deprecated. Please use torch.amp.autocast('cuda', args...)
instead.
@autocast(enabled = False)
2025-05-07 13:21:23.808 | INFO | fish_speech.models.vqgan.inference:load_model:46 - Loaded model:
2025-05-07 13:21:23.809 | INFO | tools.server.model_manager:load_decoder_model:107 - Decoder model loaded.
2025-05-07 13:21:23.824 | INFO | fish_speech.models.text2semantic.inference:generate_long:790 - Encoded text: Hello world.
2025-05-07 13:21:23.826 | INFO | fish_speech.models.text2semantic.inference:generate_long:808 - Generating sentence 1/1 of sample 1/1
0%| | 0/1023 [00:00<?, ?it/s]D:\Python\Python311\Lib\contextlib.py:105: FutureWarning: torch.backends.cuda.sdp_kernel()
is deprecated. In the future, this context manager will be removed. Please see torch.nn.attention.sdpa_kernel()
for the new context manager, with updated signature.
self.gen = func(*args, **kwds)
0%| | 1/1023 [03:45<64:03:51, 225.67s/it]D:\Python\Python311\Lib\contextlib.py:105: FutureWarning: torch.backends.cuda.sdp_kernel()
is deprecated. In the future, this context manager will be removed. Please see torch.nn.attention.sdpa_kernel()
for the new context manager, with updated signature.
self.gen = func(*args, **kwds)
0%| | 1/1023 [03:45<64:04:06, 225.68s/it]
ERROR: Traceback (most recent call last):
File "D:\Python\Python311\Lib\site-packages\kui\asgi\lifespan.py", line 36, in call
await result
File "D:\2025\Call Center Agent X\fish-speech\tools\api_server.py", line 83, in initialize_app
app.state.model_manager = ModelManager(
^^^^^^^^^^^^^
File "D:\2025\Call Center Agent X\fish-speech\tools\server\model_manager.py", line 65, in init
self.warm_up(self.tts_inference_engine)
File "D:\2025\Call Center Agent X\fish-speech\tools\server\model_manager.py", line 121, in warm_up
list(inference(request, tts_inference_engine))
File "D:\2025\Call Center Agent X\fish-speech\tools\server\inference.py", line 25, in inference_wrapper
raise HTTPException(
baize.exceptions.HTTPException: (500, ''Error: accessing tensor output of CUDAGraphs that has been overwritten by a subsequent run. Stack trace: File "D:\\2025\\Call Center Agent X\\fish-speech\\fish_speech\\models\\text2semantic\\inference.py", line 307, in decode_one_token_ar\n codebooks = torch.stack(codebooks, dim=0). To prevent overwriting, clone the tensor outside of torch.compile() or call torch.compiler.cudagraph_mark_step_begin() before each model invocation.'')
ERROR: Application startup failed. Exiting.