This repository provides tools and scripts for processing the Natural Scenes Dataset (NSD), extracting features, training regression models, and reconstructing images using state-of-the-art deep learning models.
Reconstruction results(results might differ on each run due to randomness in diffusion and VAE)
- refer to the report for more details on the methods used.
Ensure the following tools and libraries are installed on your system:
- Python 3.8+
- pip
- AWS CLI
- NVIDIA GPU with CUDA support
- Basic Linux utilities (
wget,curl,unzip, etc.)
pip install -r requirements.txt- Install AWS CLI (if not already installed) and log in. This is required to access the NSD datasets stored in a public bucket.
- Navigate to the
data/directory:cd data/ - Download the NSD data:
python3 download_nsdata.py
- Navigate to the
annots/directory:cd annots/ - Download the dataset:
wget https://huggingface.co/datasets/pscotti/naturalscenesdataset/resolve/main/COCO_73k_annots_curated.npy
- Navigate back to the parent directory:
cd .. - Prepare the data for all subjects (1, 2, 5, 7):
python3 prepare_nsddata.py -sub x
- Navigate to the
vdvae/modeldirectory:cd vdvae/model - Download the pre-trained VDVAE model files:
wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets-2/imagenet64-iter-1600000-log.jsonl wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets-2/imagenet64-iter-1600000-model.th wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets-2/imagenet64-iter-1600000-model-ema.th wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets-2/imagenet64-iter-1600000-opt.th
- Navigate back to the main directory:
cd ../.. - Extract features:
python3 scripts/vdvae_extract_features.py -sub x
- Train the regressor:
python3 scripts/vdvae_regression.py -sub x
python3 scripts/vdvae_reconstruct_images.py -sub x- Navigate to the
versatile_diffusion/pretrained/directory:cd versatile_diffusion/pretrained/ - Download the pre-trained Versatile Diffusion model files:
wget https://huggingface.co/shi-labs/versatile-diffusion/resolve/main/pretrained_pth/vd-four-flow-v1-0-fp16-deprecated.pth wget https://huggingface.co/shi-labs/versatile-diffusion/resolve/main/pretrained_pth/kl-f8.pth wget https://huggingface.co/shi-labs/versatile-diffusion/resolve/main/pretrained_pth/optimus-vae.pth
- Navigate back to the main directory:
cd ../.. - Extract features:
python3 scripts/cliptext_extract_features.py -sub x python3 scripts/clipvision_extract_features.py -sub x
- Train the regressors:
python3 scripts/cliptext_regression.py -sub x python3 scripts/clipvision_regression.py -sub x
- Reconstruct images:
python3 scripts/versatilediffusion_reconstruct_images.py -sub x
- Save test images:
python3 scripts/save_test_images.py
Compare the reconstructed images in the following folders:
/data/nsddata_stimuli/test_data/results/vdvae/sub0x/results/versatile_diffusionsub0x