-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
Description
Is there an existing issue for this?
- I have searched the existing issues
Bug description
Hello,
I ran into this issue during training with the new pytorch engine and DLC 3.0 and the dlcrnet_stride32_ms5. Attempting to train on the same dataset and dlcrnet_stride16_ms5 also threw an error during the evaluation step, but unfortunately I didn't save the output so I am not sure if it fails in exactly the same way. Looks like the problem might be similar to #2631, as it also occurs during evaluation.
Operating System
Windows 11 Home 64-bit
DeepLabCut version
DLC 3.0.0rc1
DeepLabCut mode
multi animal
Device type
NVIDIA GeForce RTX 3070 (unbranded OEM card from HP, 8GB of VRAM)
Steps To Reproduce
- I created a conda environment for DLC 3.0 following the instructions
- Generated the following project config using the GUI
# Project definitions (do not edit)
Task: han_lab_pinned_ball_cropped
scorer: Radu
date: Jul1
multianimalproject: true
identity: false
# Project path (change when moving around)
project_path: D:\data\deeplabcut_projects\han_lab_pinned_ball_cropped-Radu-2024-07-01
# Default DeepLabCut engine to use for shuffle creation (either pytorch or tensorflow)
engine: pytorch
# Annotation data set configuration (and individual video cropping parameters)
video_sets:
D:\data\leo\132727\132727_DLC\cropped\posttrial_11-37-03_LEFT.mp4:
crop: 0, 960, 0, 720
[...]
D:\data\leo\132724\motion_data\132724_DLC\cropped\rec8_exp_2_14-44-08_RIGHT.mp4:
crop: 0, 960, 0, 720
individuals:
- individual1
uniquebodyparts:
- ledon
- ledoff
multianimalbodyparts:
- snout
- tailbase
- leftforepaw
- rightforepaw
- lefthindpaw
- righthindpaw
bodyparts: MULTI!
# Fraction of video to start/stop when extracting frames for labeling/refinement
start: 0
stop: 1
numframes2pick: 10
# Plotting configuration
skeleton:
- - snout
- tailbase
- - snout
- leftforepaw
- - snout
- rightforepaw
- - tailbase
- lefthindpaw
- - tailbase
- righthindpaw
skeleton_color: white
pcutoff: 0.6
dotsize: 12
alphavalue: 0.7
colormap: tab10
# Training,Evaluation and Analysis configuration
TrainingFraction:
- 0.95
iteration: 0
default_net_type: dlcrnet_ms5
default_augmenter: multi-animal-imgaug
default_track_method: ellipse
snapshotindex: -1
detector_snapshotindex: -1
batch_size: 8
# Cropping Parameters (for analysis and outlier frame detection)
cropping: false
#if cropping is true for analysis, then set the values here:
x1: 0
x2: 640
y1: 277
y2: 624
# Refinement configuration (parameters from annotation dataset configuration also relevant in this stage)
corner2move2:
- 50
- 50
move2corner: true
# Conversion tables to fine-tune SuperAnimal weights
SuperAnimalConversionTables:
- Labeled ~300 images and trained using the following pytorch config. I believe I only changed minor settings like the batch size and # of epochs to save.
data:
colormode: RGB
inference:
normalize_images: true
train:
affine:
p: 0.5
rotation: 30
scaling:
- 1.0
- 1.0
translation: 0
collate:
type: ResizeFromDataSizeCollate
min_scale: 0.4
max_scale: 1.0
min_short_side: 128
max_short_side: 1152
multiple_of: 32
to_square: false
covering: false
gaussian_noise: 12.75
hist_eq: false
motion_blur: false
normalize_images: true
device: auto
metadata:
project_path: D:\data\deeplabcut_projects\han_lab_pinned_ball_cropped-Radu-2024-07-01
pose_config_path:
D:\data\deeplabcut_projects\han_lab_pinned_ball_cropped-Radu-2024-07-01\dlc-models-pytorch\iteration-0\han_lab_pinned_ball_croppedJul1-trainset95shuffle1\train\pose_cfg.yaml
bodyparts:
- snout
- tailbase
- leftforepaw
- rightforepaw
- lefthindpaw
- righthindpaw
unique_bodyparts:
- ledon
- ledoff
individuals:
- individual1
with_identity: false
method: bu
model:
backbone:
type: DLCRNet
model_name: resnet50
pretrained: true
output_stride: 32
backbone_output_channels: 2304
pose_model:
stride: 8
heads:
bodypart:
type: DLCRNetHead
predictor:
type: PartAffinityFieldPredictor
num_animals: 1
num_multibodyparts: 6
num_uniquebodyparts: 0
nms_radius: 5
sigma: 1.0
locref_stdev: 7.2801
min_affinity: 0.05
graph: &id001
- - 0
- 1
- - 0
- 2
- - 0
- 3
- - 0
- 4
- - 0
- 5
- - 1
- 2
- - 1
- 3
- - 1
- 4
- - 1
- 5
- - 2
- 3
- - 2
- 4
- - 2
- 5
- - 3
- 4
- - 3
- 5
- - 4
- 5
edges_to_keep:
- 0
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
target_generator:
type: SequentialGenerator
generators:
- type: HeatmapPlateauGenerator
num_heatmaps: 6
pos_dist_thresh: 17
heatmap_mode: KEYPOINT
generate_locref: true
locref_std: 7.2801
- type: PartAffinityFieldGenerator
graph: *id001
width: 20
criterion:
heatmap:
type: WeightedBCECriterion
weight: 1.0
locref:
type: WeightedHuberCriterion
weight: 0.05
paf:
type: WeightedHuberCriterion
weight: 0.1
heatmap_config:
channels:
- 2304
- 1152
- 6
kernel_size:
- 3
- 3
strides:
- 2
- 2
locref_config:
channels:
- 2304
- 1152
- 12
kernel_size:
- 3
- 3
strides:
- 2
- 2
paf_config:
channels:
- 2304
- 1152
- 30
kernel_size:
- 3
- 3
strides:
- 2
- 2
num_stages: 5
unique_bodypart:
type: HeatmapHead
weight_init: normal
predictor:
type: HeatmapPredictor
apply_sigmoid: false
clip_scores: true
location_refinement: true
locref_std: 7.2801
target_generator:
type: HeatmapGaussianGenerator
num_heatmaps: 2
pos_dist_thresh: 17
heatmap_mode: KEYPOINT
generate_locref: true
locref_std: 7.2801
label_keypoint_key: keypoints_unique
criterion:
heatmap:
type: WeightedMSECriterion
weight: 1.0
locref:
type: WeightedHuberCriterion
weight: 0.05
heatmap_config:
channels:
- 2304
- 2
kernel_size:
- 3
strides:
- 2
locref_config:
channels:
- 2304
- 4
kernel_size:
- 3
strides:
- 2
net_type: dlcrnet_stride32_ms5
runner:
type: PoseTrainingRunner
gpus:
key_metric: test.mAP
key_metric_asc: true
eval_interval: 25
optimizer:
type: AdamW
params:
lr: 0.0001
scheduler:
type: LRListScheduler
params:
lr_list:
- - 1e-05
- - 1e-06
milestones:
- 160
- 190
snapshots:
max_snapshots: 5
save_epochs: 5
save_optimizer_state: false
train_settings:
batch_size: 2
dataloader_workers: 0
dataloader_pin_memory: true
display_iters: 1000
epochs: 200
seed: 42
Relevant log output
File "C:\Users\radud\.conda\envs\deeplabcut3\lib\site-packages\deeplabcut\gui\tabs\train_network.py", line 190, in train_network
compat.train_network(config, shuffle, **kwargs)
File "C:\Users\radud\.conda\envs\deeplabcut3\lib\site-packages\deeplabcut\compat.py", line 245, in train_network
return train_network(
File "C:\Users\radud\.conda\envs\deeplabcut3\lib\site-packages\deeplabcut\pose_estimation_pytorch\apis\train.py", line 336, in train_network
train(
File "C:\Users\radud\.conda\envs\deeplabcut3\lib\site-packages\deeplabcut\pose_estimation_pytorch\apis\train.py", line 189, in train
runner.fit(
File "C:\Users\radud\.conda\envs\deeplabcut3\lib\site-packages\deeplabcut\pose_estimation_pytorch\runners\train.py", line 181, in fit
valid_loss = self._epoch(
File "C:\Users\radud\.conda\envs\deeplabcut3\lib\site-packages\deeplabcut\pose_estimation_pytorch\runners\train.py", line 221, in _epoch
losses_dict = self.step(batch, mode)
File "C:\Users\radud\.conda\envs\deeplabcut3\lib\site-packages\deeplabcut\pose_estimation_pytorch\runners\train.py", line 346, in step
self._update_epoch_predictions(
File "C:\Users\radud\.conda\envs\deeplabcut3\lib\site-packages\deeplabcut\pose_estimation_pytorch\runners\train.py", line 422, in _update_epoch_predictions
vis = kpts[-1]
IndexError: invalid index to scalar variable.Anything else?
No response
Code of Conduct
- I agree to follow this project's Code of Conduct
Reactions are currently unavailable