This repository contains an implementation of distributed training techniques using PyTorch, including AllReduce, Distributed Data Parallel (DDP), and custom Gather/Scatter operations. The code is designed for training models using the CIFAR-10 dataset with a VGG11 architecture. The setup is intended for CPU-only execution.
Contains the implementation of the AllReduce technique for averaging gradients across multiple nodes in a distributed setup. This helps synchronize the model weights during training.
Implements Distributed Data Parallel (DDP), a PyTorch feature that improves training efficiency by distributing the model across multiple processes and updating the gradients in parallel. This file sets up the environment for DDP training.
Includes custom Gather and Scatter operations, which collect and distribute model gradients across nodes. This allows for synchronized updates to the model parameters during training.
Defines the VGG11 model architecture, which is used for training on the CIFAR-10 dataset. The model consists of several convolutional layers followed by fully connected layers for classification.
- Python 3.x
- PyTorch
- NumPy
- CIFAR-10 dataset (automatically downloaded)
To set up and run the code:
-
Clone the repository:
git clone https://github.com/yourusername/distributed-training.git cd distributed-training
-
Install the required dependencies:
pip install torch torchvision numpy
The code uses PyTorch's Distributed framework. To start training on multiple nodes, run the ddp.py
script with the appropriate arguments. Example for running on 2 nodes:
python ddp.py --master-ip "127.0.0.1" --num-nodes 2 --rank 0
--master-ip
is the IP address of the master node.--num-nodes
specifies the number of nodes in the distributed setup.--rank
is the rank of the current node (0 for the master node).
- Gather/Scatter Operations: The model's parameters are updated during training using custom gather and scatter functions, ensuring that all gradients are synchronized across nodes.
- DDP Setup: The Distributed Data Parallel framework ensures efficient training by parallelizing the process and updating gradients in parallel.
- Model: The VGG11 model is trained on the CIFAR-10 dataset using the CrossEntropyLoss criterion and SGD optimizer.
- Model: VGG11 - A deep neural network with convolutional layers, batch normalization, and ReLU activation.
- Dataset: CIFAR-10 - A benchmark dataset for image classification with 60,000 32x32 color images in 10 classes.
- The code will output the training loss and accuracy after each epoch.
- For each node, the model will synchronize gradients using the AllReduce or custom gather-scatter operations.