-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Is there an existing issue for this?
- I have searched the existing issues
Operating System
Windows 11 Pro
DeepLabCut version
3.0.0rc9
What engine are you using?
pytorch
DeepLabCut mode
multi animal
Device type
Intel(R) Core(TM) i9-14900HX (2.20 GHz)
Bug description 🐛
After training previous multi-animal projects successfully on DLC 2.3.5, I created a project on a new device with DLC 3.0 and successfully progressed to training the new network. As we intend to use this pose estimation for later analysis in SimBA, we were encouraged to manually label ~1000 images (33 per video from 32 videos). We trained the network for 950 epochs on a batch size of 8 (further details below), but evaluation of the network displayed awful results, and the tracklets were empty during analysis.
By playing around with the training, I realized that partway through training (80 epochs, or sometimes as few as 40), the slow improvement of the network abruptly cuts off -- the RMSE begins to increase, the RMSE pcutoff reduces to NaN (though in network evaluation it's usually quite high), and the mAP/mAR become 0.00. This persists for the rest of the training (however many epochs were maximum) instead of throwing an error and ending.
Steps To Reproduce
- Create a multi-animal DLC project on 3.0 with 8 body parts per (2) animal(s) and progress past labelling to creating a training dataset and training the network.
- Training, Evaluation and Analysis configuration
- TrainingFraction: - 0.95
- iteration: 0
- default_net_type: resnet_50
- default_augmenter: albumentations
- default_track_method: ellipse
- snapshotindex: -1
- detector_snapshotindex: -1
- batch_size: 8
- 950 epochs, saved every 50 epochs
- Observe error (40+ epochs into training)
LATER:
4. Evaluate Network errors below (I don't know if this is related to the training issue but these are problems I noticed downstream that I haven't encountered with previous multianimal DLC analyses, and I'd appreciate any insight)
Relevant log output
Original (950 epochs) output at end of training:
Epoch 950/950 (lr=1e-05), train loss 0.00642, valid loss 0.66643
Model performance:
metrics/test.rmse: 78.88
metrics/test.rmse_pcutoff: nan
metrics/test.mAP: 0.00
metrics/test.mAR: 0.00
Original training (950 epochs) evaluation:
INFO:console:Evaluation results for DLC_Resnet50_HW_ResidentIntruderDLC_1Aug7shuffle1_snapshot_760-results.csv (pcutoff: 0.6):
INFO:console:train rmse 30.23
train rmse_pcutoff 43.44
train mAP 0.03
train mAR 0.01
train id_head_Ear_left_accuracy 0.49
train id_head_Ear_right_accuracy 0.49
train id_head_Nose_accuracy 0.50
train id_head_Center_accuracy 0.48
train id_head_Lateral_left_accuracy 0.48
train id_head_Lateral_right_accuracy 0.48
train id_head_Tail_base_accuracy 0.49
train id_head_Tail_end_accuracy 0.50
test rmse 85.63
test rmse_pcutoff 61.84
test mAP 0.00
test mAR 0.00
test id_head_Ear_left_accuracy 0.51
test id_head_Ear_right_accuracy 0.52
test id_head_Nose_accuracy 0.51
test id_head_Center_accuracy 0.49
test id_head_Lateral_left_accuracy 0.48
test id_head_Lateral_right_accuracy 0.48
test id_head_Tail_base_accuracy 0.52
test id_head_Tail_end_accuracy 0.49
Evaluate Network error #1:
:\Users\itilton\AppData\Local\anaconda3\envs\DEEPLABCUT\lib\site-packages\deeplabcut\pose_estimation_pytorch\data\postprocessor.py:489: RuntimeWarning: invalid value encountered in cast
heatmap_indices = np.rint(individual_keypoints).astype(int)
Evaluate Network error #2 (Only when I check Plot)
Traceback (most recent call last):
File "C:\Users\itilton\AppData\Local\anaconda3\envs\DEEPLABCUT\lib\site-packages\deeplabcut\gui\tabs\evaluate_network.py", line 235, in evaluate_network
_ = launch_napari(image_dir)
File "C:\Users\itilton\AppData\Local\anaconda3\envs\DEEPLABCUT\lib\site-packages\deeplabcut\gui\widgets.py", line 46, in launch_napari
viewer.open(files, plugin=plugin, stack=stack)
File "C:\Users\itilton\AppData\Local\anaconda3\envs\DEEPLABCUT\lib\site-packages\napari\components\viewer_model.py", line 1092, in open
self._add_layers_with_plugins(
File "C:\Users\itilton\AppData\Local\anaconda3\envs\DEEPLABCUT\lib\site-packages\napari\components\viewer_model.py", line 1292, in _add_layers_with_plugins
layer_data, hookimpl = read_data_with_plugins(
File "C:\Users\itilton\AppData\Local\anaconda3\envs\DEEPLABCUT\lib\site-packages\napari\plugins\io.py", line 77, in read_data_with_plugins
res = _npe2.read(paths, plugin, stack=stack)
File "C:\Users\itilton\AppData\Local\anaconda3\envs\DEEPLABCUT\lib\site-packages\napari\plugins\_npe2.py", line 63, in read
layer_data, reader = io_utils.read_get_reader(
File "C:\Users\itilton\AppData\Local\anaconda3\envs\DEEPLABCUT\lib\site-packages\npe2\io_utils.py", line 66, in read_get_reader
return _read(
File "C:\Users\itilton\AppData\Local\anaconda3\envs\DEEPLABCUT\lib\site-packages\npe2\io_utils.py", line 165, in _read
read_func = rdr.exec(
File "C:\Users\itilton\AppData\Local\anaconda3\envs\DEEPLABCUT\lib\site-packages\npe2\manifest\contributions\_readers.py", line 61, in exec
callable_ = super().exec(args=args, kwargs=kwargs, _registry=_registry)
File "C:\Users\itilton\AppData\Local\anaconda3\envs\DEEPLABCUT\lib\site-packages\npe2\manifest\utils.py", line 61, in exec
return self.get_callable(reg)(*args, **kwargs)
File "C:\Users\itilton\AppData\Local\anaconda3\envs\DEEPLABCUT\lib\site-packages\napari_deeplabcut\_reader.py", line 79, in get_folder_parser
layers.extend(read_images(images))
File "C:\Users\itilton\AppData\Local\anaconda3\envs\DEEPLABCUT\lib\site-packages\napari_deeplabcut\_reader.py", line 112, in read_images
return [(imread(path), params, "image")]
File "C:\Users\itilton\AppData\Local\anaconda3\envs\DEEPLABCUT\lib\site-packages\dask_image\imread\__init__.py", line 48, in imread
with pims.open(sfname) as imgs:
File "C:\Users\itilton\AppData\Local\anaconda3\envs\DEEPLABCUT\lib\site-packages\pims\api.py", line 161, in open
return ImageSequence(sequence, **kwargs)
File "C:\Users\itilton\AppData\Local\anaconda3\envs\DEEPLABCUT\lib\site-packages\pims\image_sequence.py", line 68, in __init__
tmp = self.imread(self._filepaths[0], **self.kwargs)
File "C:\Users\itilton\AppData\Local\anaconda3\envs\DEEPLABCUT\lib\site-packages\pims\image_sequence.py", line 85, in imread
return imread(filename, **kwargs)
File "C:\Users\itilton\AppData\Local\anaconda3\envs\DEEPLABCUT\lib\site-packages\skimage\_shared\utils.py", line 328, in fixed_func
return func(*args, **kwargs)
File "C:\Users\itilton\AppData\Local\anaconda3\envs\DEEPLABCUT\lib\site-packages\skimage\io\_io.py", line 82, in imread
img = call_plugin('imread', fname, plugin=plugin, **plugin_args)
File "C:\Users\itilton\AppData\Local\anaconda3\envs\DEEPLABCUT\lib\site-packages\skimage\_shared\utils.py", line 538, in wrapped
return func(*args, **kwargs)
File "C:\Users\itilton\AppData\Local\anaconda3\envs\DEEPLABCUT\lib\site-packages\skimage\io\manage_plugins.py", line 254, in call_plugin
return func(*args, **kwargs)
File "C:\Users\itilton\AppData\Local\anaconda3\envs\DEEPLABCUT\lib\site-packages\skimage\io\_plugins\imageio_plugin.py", line 11, in imread
out = np.asarray(imageio_imread(*args, **kwargs))
File "C:\Users\itilton\AppData\Local\anaconda3\envs\DEEPLABCUT\lib\site-packages\imageio\v3.py", line 53, in imread
with imopen(uri, "r", **plugin_kwargs) as img_file:
File "C:\Users\itilton\AppData\Local\anaconda3\envs\DEEPLABCUT\lib\site-packages\imageio\core\imopen.py", line 113, in imopen
request = Request(uri, io_mode, format_hint=format_hint, extension=extension)
File "C:\Users\itilton\AppData\Local\anaconda3\envs\DEEPLABCUT\lib\site-packages\imageio\core\request.py", line 249, in __init__
self._parse_uri(uri)
File "C:\Users\itilton\AppData\Local\anaconda3\envs\DEEPLABCUT\lib\site-packages\imageio\core\request.py", line 409, in _parse_uri
raise FileNotFoundError("No such file: '%s'" % fn)
FileNotFoundError: No such file: 'C:\Users\itilton\Desktop\HW_ResidentIntruderDLC_1-haoyu-2025-08-07\evaluation-results-pytorch\iteration-0\HW_ResidentIntruderDLC_1Aug7-trainset95shuffle1\LabeledImages_DLC_Resnet50_HW_ResidentIntruderDLC_1Aug7shuffle1_snapshot_760\Test-237_d1_pursuit-img086.png'Anything else?
I noticed that a similar issue was brought up in #2697 , but on my side training starts strong and improves until it suddenly fails (instead of beginning with RMSE pcutoff NaN and mAP/AR=0), and I also have very high RMSE. Not sure if these are related or actually different.
I also found that the labelling on the images in the evaluation-results-pytorch folder doesn't resemble the labelling earlier on -- there's a variety of colors for different body parts. I've attached 1) when one clicks on 'check labels' under label frames and 2) an image under the evaluation results.
Thank you for any help or insight with this issue!
Code of Conduct
- I agree to follow this project's Code of Conduct