This repository is the official implementation of our paper MC-CoT: A Modular Collaborative CoT Framework for Zero-shot Medical-VQA with LLM and MLLM Integration

First, please use python 3.10 and create a new conda environment.
conda create -n mc-cot python=3.10
conda activate mc-cotThen, you should visit the official website of PyTorch to install the correct version of PyTorch according to your system.
https://pytorch.org/get-started/locally/
Lastly, you can install the required packages by running the following command.
pip install -r requirements.txtPlease download PATH-VQA from its official website, unzip it, and move the PATH-VQA_test_open.json in the ./dataset/PATH-VQA/ folder into the unzipped folder.
Please download SLAKE from its official website, unzip it, and move the Slake_test_open.json in the ./dataset/Slake/ folder into the unzipped folder.
Please download VQA-RAD from its official website, move the VQA-RAD_test_open.json in the ./dataset/VQA-RAD/ folder into the unzipped folder.
Note: In this repository, your_path_to_{Dataset}_dir refers to the path to the unzipped folder.
We have provided implementations of MCCoT as well as various CoT frameworks, along with code for calling 4 types of LLMs and 2 types of MLLMs.
After configuring the required environment variables, such as OPENAI_API_KEY, you can execute MCCoT using GPT-3.5 and LLava-v1.5-7B on the SLAKE dataset with the following command:
python run.py --method MCCoT \
--language_model_name GPT \
--visual_model_name LLava \
--dataset_name Slake \
--slake_path /your_path_to_slake_dirOther example scripts can be found in the ./scripts/run/ directory.
Attention: The output format is strictly adhere to the following: ./outputs/{LLM}/{MLLM}/{Method}/{Method}_{Dataset}.jsonl
To evaluate the recall rate, you can run the following command:
python eval.py \
--mode recall \
--method MCCoT IICoT \
--dataset_name PATH-VQA VQA-RAD Slake \
--v_model LLava\
--l_model ChatGLM Qwen2 DeepseekThis command evaluates the recall rates of ChatGLM and Qwen2 as LLMs, and LLava as an MLLM, using MCCoT and IICoT across all three datasets.
Parameters can be adjusted to assess different combinations.
To evaluate the accuracy score, you can run the following command:
python eval.py \
--mode acc \
--method MMCoT IICoT \
--dataset_name PATH-VQA VQA-RAD Slake \
--v_model LLava QwenVL \
--l_model GPT \
--parallel \
--max_workers 8And to calculate and show the scaled score, please use:
python eval_show.py \
--v_model LLava \
--l_model GPT \
--method DDCoT IICoT MMCoT \
--dataset_name PATH-VQA VQA-RAD Slake