MARVEL - A Multi-Depth Lookup-Free Quantization 1D Tokenizer

We introduced a residual tokenizer with a lookup-free quantization approach to modify the well-known 1D tokenizer scheme, Titok.

🚀 Contributions

We introduce a novel 1D image tokenization framework that breaks grid constraints existing in 2D tokenization methods, leading to a much more flexible and compact image latent representation.

The proposed 1D tokenizer can tokenize a 256 × 256 image into as few as 32 discrete tokens, leading to a significant speed-up (hundreds times faster than diffusion models) in generation process, while maintaining state-of-the-art generation quality.

We conduct a series of experiments to probe the properties of rarely studied 1D image tokenization, paving the path towards compact latent space for efficient and effective image representation.

Model Zoo

Model	Link	FID
TiTok-L-32 Tokenizer	checkpoint	2.21 (reconstruction)
TiTok-B-64 Tokenizer	checkpoint	1.70 (reconstruction)
TiTok-S-128 Tokenizer	checkpoint	1.71 (reconstruction)
MARVEL-L-32 Tokenizer	_	2.21 (reconstruction)
MARVEL-L-32*2 Tokenizer	_	1.36 (reconstruction)
MARVEL-L-32*4 Tokenizer	_	1.06 (reconstruction)

Please note that these models are trained only on limited academic dataset ImageNet, and they are only for research purposes.

Installation

pip3 install -r requirements.txt

Training Preparation

We use webdataset format for data loading. To begin with, it is needed to convert the dataset into webdataset format. An example script to convert ImageNet to wds format is provided here.

Furthermore, the stage1 training relies on a pre-trained MaskGIT-VQGAN to generate proxy codes as learning targets. You can convert the official Jax weight to PyTorch version using this script. Alternatively, we provided a converted version at HuggingFace and Google Drive. The MaskGIT-VQGAN's weight will be automatically downloaded when you run the training script.

You may also pretokenize the dataset for a training speedup, please refer to the example pretokenization script.

Training

We provide example commands to train TiTok as follows:

# Training for TiTok-l32*4
# Stage 1
WANDB_MODE=offline accelerate launch --num_machines=1 --num_processes=8 --machine_rank=0 --main_process_ip=127.0.0.1 --main_process_port=9999 --same_network scripts/train_titok.py config=configs/training/TiTok/stage1/finetune_titok_l32.yaml \
    experiment.project="finetune_titok_l32_stage1" \
    experiment.name="finetune_titok_l32_stage1_run1" \
    experiment.output_dir="finetune_titok_l32_stage1_run1" \
    training.per_gpu_batch_size=32

# Stage 2
WANDB_MODE=offline accelerate launch --num_machines=1 --num_processes=8 --machine_rank=0 --main_process_ip=127.0.0.1 --main_process_port=9999 --same_network scripts/train_titok.py config=configs/training/TiTok/stage2/finetune_titok_l32.yaml \
    experiment.project="finetune_titok_l32_stage2" \
    experiment.name="finetune_titok_l32_stage2_run1" \
    experiment.output_dir="finetune_titok_l32_stage2_run1" \
    training.per_gpu_batch_size=32 \
    experiment.init_weight=${PATH_TO_STAGE1_WEIGHT}

You may remove the flag "WANDB_MODE=offline" to support online wandb logging, if you have configured it.

The config titok_b64.yaml can be replaced with titok_s128.yaml or titok_l32.yaml for other TiTok variants.

Visualizations

Acknowledgement

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
data		data
evaluator		evaluator
imagenet_convert		imagenet_convert
modeling		modeling
scripts		scripts
utils		utils
README.md		README.md
README_TiTok.md		README_TiTok.md
convert_dataset.sh		convert_dataset.sh
demo.ipynb		demo.ipynb
demo_util.py		demo_util.py
imagenet_classes.py		imagenet_classes.py
requirements.txt		requirements.txt
run_finetune_titok.sh		run_finetune_titok.sh
run_pretrained.sh		run_pretrained.sh
sample_imagenet_titok.py		sample_imagenet_titok.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MARVEL - A Multi-Depth Lookup-Free Quantization 1D Tokenizer

🚀 Contributions

We introduce a novel 1D image tokenization framework that breaks grid constraints existing in 2D tokenization methods, leading to a much more flexible and compact image latent representation.

The proposed 1D tokenizer can tokenize a 256 × 256 image into as few as 32 discrete tokens, leading to a significant speed-up (hundreds times faster than diffusion models) in generation process, while maintaining state-of-the-art generation quality.

We conduct a series of experiments to probe the properties of rarely studied 1D image tokenization, paving the path towards compact latent space for efficient and effective image representation.

Model Zoo

Installation

Training Preparation

Training

Visualizations

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MARVEL - A Multi-Depth Lookup-Free Quantization 1D Tokenizer

🚀 Contributions

We introduce a novel 1D image tokenization framework that breaks grid constraints existing in 2D tokenization methods, leading to a much more flexible and compact image latent representation.

The proposed 1D tokenizer can tokenize a 256 × 256 image into as few as 32 discrete tokens, leading to a significant speed-up (hundreds times faster than diffusion models) in generation process, while maintaining state-of-the-art generation quality.

We conduct a series of experiments to probe the properties of rarely studied 1D image tokenization, paving the path towards compact latent space for efficient and effective image representation.

Model Zoo

Installation

Training Preparation

Training

Visualizations

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages