Enable precision-preserving mel training + 16‑bit W&B logging for unconditional example by turian · Pull Request #1 · turian/diffusers

turian · 2025-11-08T14:43:19Z

Summary

Add _prepare_sample_images/_log_sample_images helpers plus --image_bit_depth so sample grids can be emitted as true uint16 PNGs (with 8-bit previews for TensorBoard) and uploaded losslessly to W&B, including a temp-file workaround so wandb.Image accepts 16-bit payloads. WANDB_AUDIO_HOOK lets us call back into a user-provided module:function to attach generated audio to the same step.
Paths: examples/unconditional_image_generation/train_unconditional.py:76, 256, 788.
Introduce --preserve_input_precision and a matching transform pipeline that skips the default .convert("RGB") cast, keeps planar uint16 data via PILToTensor, and only normalizes after we’ve enforced three channels. This lets us feed mel PNGs without data degradation or redundant quantization.
Paths: examples/unconditional_image_generation/train_unconditional.py:394, 597-625.
Document the new flag in the unconditional README so users know how to opt into 16-bit logging and how the previews behave across TensorBoard vs. W&B.
Path: examples/unconditional_image_generation/README.md:45.

Details

Sample logging upgrades (train_unconditional.py:76-177, 788-839)
- _prepare_sample_images scales NHWC floats into uint8/uint16 arrays and produces a TensorBoard-safe 8-bit preview when the requested --image_bit_depth is 16.
- _log_sample_images routes to the chosen tracker: TensorBoard sees the preview tensor, while W&B either uploads uint8 arrays directly or encodes each uint16 frame to disk via Pillow before creating wandb.Image objects. Cleanup is handled even on failure.
- When WANDB_AUDIO_HOOK=package.module:fn_name and --logger=wandb, the generated numpy images/metadata are passed to that callback. Any dict it returns is merged into the log payload, so BigVGAN (or other vocoders) can push aligned audio without modifying diffusers core code.
Precision-preserving dataloader (train_unconditional.py:597-625)
- Keeping mel PNGs in 16-bit space previously forced a lossy image.convert("RGB"). The new --preserve_input_precision flag switches to precision_augmentations, which runs PILToTensor → _ensure_three_channels → ConvertImageDtype(torch.float32) before spatial ops. Palette images still get promoted once, but standard uint16 PNGs stay untouched until normalization.
Docs & ergonomics (README.md:45)
- Quick-start instructions now mention --image_bit_depth 16, clarify that previews remain 8-bit, and point W&B users to the Files/Artifacts tab for the high-precision grids. This keeps the branch self-documenting for downstream researchers.

How To Use

Quantization-safe training:

accelerate launch examples/unconditional_image_generation/train_unconditional.py \
  --train_data_dir=…/mels_png --resolution 128 --image_bit_depth 16 \
  --preserve_input_precision --logger wandb

Optional W&B audio:
```
export WANDB_AUDIO_HOOK=scripts.audio_hooks:log_bigvgan_audio
```
(Or your own module) so the hook receives images, epoch, global_step, and args and returns a dict of additional metrics/files.

Testing

accelerate launch … --image_bit_depth 16 --logger=tensorboard (verifies TB preview pipeline).
accelerate launch … --logger=wandb --image_bit_depth 16 --preserve_input_precision with a WANDB_AUDIO_HOOK pointing at our BigVGAN helper to confirm 16-bit uploads + audio payloads.

turian added 11 commits November 5, 2025 14:39

Add 16-bit logging option for unconditional example

522de2f

Add optional precision-preserving preprocessing

d5f7f8a

Combine precision-preserving input and 16-bit output toggles

5b09385

Merge remote-tracking branch 'upstream/main'

4386d60

Handle explicit channel cases for 16-bit W&B logging

d2ed793

Improve 16-bit logging helpers

731ff6f

Polish 16-bit logging details

e15369b

Document 16-bit PNG encoding expectations

e0b2244

Add wandb audio hook support

c520243

Allow uint dtype check for logging

cd4687a

Log grayscale 16-bit previews to wandb

ad0e227

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable precision-preserving mel training + 16‑bit W&B logging for unconditional example#1

Enable precision-preserving mel training + 16‑bit W&B logging for unconditional example#1
turian wants to merge 11 commits intomainfrom
bigvgan_wandb

turian commented Nov 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

turian commented Nov 8, 2025

Summary

Details

How To Use

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant