Tech Arena 2025 — Anthropometric Measurement Extraction (RGB-D → 11D) Predict 11 anthropometric measurements (1 head width and 5 for each ear) from multi-view RGB-D images using a ResNet-18 CNN trained on 2D images.
This project was developed for Tech Arena 2025 (Anthropometric Data Extraction Track) by Team HuddsPros.
🧠 Core Idea Instead of using complex 3D reconstruction or point-cloud methods, this system predicts anthropometric measurements directly from multiple 2D RGB images of a subject. A pretrained ResNet-18 (ImageNet) backbone is fine-tuned for regression on an 11-dimensional vector. During inference, predictions from multiple views are averaged to form a subject-level output. Pipeline Summary: Convert .HEIC → .PNG (RGB + depth separation). Resize and normalise using ImageNet statistics. Load images and 11D ground-truths into a PyTorch Dataset. Fine-tune ResNet-18 (frozen backbone + new 11-output regression head). Average predictions across all subject views at inference.
📂Repository Structure Heads_ears/ │ ├─ Environment/ │ ├─ environment_cpu.yml │ └─ environment.yml │ ├─ PythonScripts/ │ ├─ train_model.py # Main training script │ ├─ resnet18_anthropometry_best_weights.pt # Trained weights │ │ │ └─ sorting_data/ # Preprocessing and utility scripts │ ├─ Process_DATA.py # Converts HEIC → PNG, separates RGB & depth │ ├─ preprocess_size.py # Resizes large images │ ├─ csvfileread.py # Loads anthropometrics.csv → subject labels │ ├─ checkresolution.py # Verifies image resolution │ ├─ remove_depth_images.py # Moves depth maps into one folder │ ├─ Split_depth_maps.py # Reassigns depth maps to subjects │ ├─ Split_training_test.py # Splits subjects into train/val/test │ ├─ HIEF_PNG_liam.py # Optional HEIC-to-PNG converter (legacy) │ └─ prediction.py # Inference and averaging across views │ ├─ .gitignore └─ README.md________________________________________
⚙️ Setup Instructions 1️⃣ Create environment conda env create -f environment_cpu.yml conda activate anthropometry pip install pillow-heif 2️⃣ Verify installation python -c "import torch; print('Torch:', torch.version, '| CUDA available?', torch.cuda.is_available())"
🧾 Ground Truth Format Each subject’s entry in anthropometrics.csv corresponds to an 11D vector: subject,headwidth,p1_left,p2_left,p3_left,p4_left,p5_left,p1_right,p2_right,p3_right,p4_right,p5_right P0001,0.14399,0.02679,0.01669,0.02307,0.07029,0.03025,0.02550,0.01709,0.02457,0.07021,0.03342
🚀 Running the Pipeline 1️⃣ Convert HEIC → PNG python PythonScripts/sorting_data/Process_DATA.py 2️⃣ Train the Model python PythonScripts/train_model.py Training configuration: Model: ResNet-18 (ImageNet pretrained) Optimiser: SGD (momentum = 0.9, lr = 1e-3, StepLR step=7, γ=0.1) Loss: MSE Batch size: 16 Epochs: 25 Device: CUDA if available, else CPU Saves weights to resnet18_anthropometry_best_weights.pt 3️⃣ Run Predictions python PythonScripts/sorting_data/prediction.py Loads trained weights Averages predictions across all views per subject Outputs the final 11D prediction vector per subject
📊 Evaluation Metric Competition metric: d=√(∑_i(s ̂_i-s_i )^2 )
where both vectors are standardised using the training-set mean and standard deviation. Lower distance = higher accuracy.
🧮 Model Summary Stage Description Input RGB PNG images (224×224) Backbone ResNet-18 (ImageNet pretrained) Head Linear (512 → 11) regression layer Loss Mean Squared Error (MSE) Training Image-level; predictions averaged across views at inference
🛠️ Tools & Libraries Component Tool Language Python 3.10 Framework PyTorch, TorchVision Image I/O Pillow, Pillow-HEIF Data Handling Pandas, NumPy Environment Conda (environment_cpu.yml) IDE VSCode
📈 Future Work Integrate depth as a fourth CNN channel (RGB-D fusion). Automate ear cropping for more localized features. Explore attention-based multi-view fusion instead of mean pooling. Deploy model in a lightweight Extractor API for leaderboard testing.
👥 Authors William Meredith Jonathan Southwell University of Huddersfield — Team HuddsPros
📚 References He, K. et al. Deep Residual Learning for Image Recognition, CVPR 2016. Fantini, D. et al. A Survey on ML Techniques for HRTF Individualization, IEEE OJSP 2025. Torres-Gallegos, E.A. et al. Photo-Anthropometry for HRTF Personalization, Applied Acoustics 2015. PyTorch Documentation – https://pytorch.org ResNet18 – https://pytorch.org/vision/stable/models/generated/torchvision.models.resnet18.html