IndexError during evaluation step of training with the new pytorch engine

### Is there an existing issue for this?

- [X] I have searched the existing issues

### Bug description

Hello,

I ran into this issue during training with the new pytorch engine and DLC 3.0 and the `dlcrnet_stride32_ms5`. Attempting to train on the same dataset and `dlcrnet_stride16_ms5` also threw an error during the evaluation step, but unfortunately I didn't save the output so I am not sure if it fails in exactly the same way. Looks like the problem might be similar to #2631, as it also occurs during evaluation.

### Operating System

Windows 11 Home 64-bit

### DeepLabCut version

DLC 3.0.0rc1

### DeepLabCut mode

multi animal

### Device type

NVIDIA GeForce RTX 3070 (unbranded OEM card from HP, 8GB of VRAM)


### Steps To Reproduce

1. I created a conda environment for DLC 3.0 following the instructions
2. Generated the following project config using the GUI
```
# Project definitions (do not edit)
Task: han_lab_pinned_ball_cropped
scorer: Radu
date: Jul1
multianimalproject: true
identity: false


# Project path (change when moving around)
project_path: D:\data\deeplabcut_projects\han_lab_pinned_ball_cropped-Radu-2024-07-01


# Default DeepLabCut engine to use for shuffle creation (either pytorch or tensorflow)
engine: pytorch


# Annotation data set configuration (and individual video cropping parameters)
video_sets:
  D:\data\leo\132727\132727_DLC\cropped\posttrial_11-37-03_LEFT.mp4:
    crop: 0, 960, 0, 720
[...]
  D:\data\leo\132724\motion_data\132724_DLC\cropped\rec8_exp_2_14-44-08_RIGHT.mp4:
    crop: 0, 960, 0, 720
individuals:
- individual1
uniquebodyparts:
- ledon
- ledoff
multianimalbodyparts:
- snout
- tailbase
- leftforepaw
- rightforepaw
- lefthindpaw
- righthindpaw
bodyparts: MULTI!


# Fraction of video to start/stop when extracting frames for labeling/refinement
start: 0
stop: 1
numframes2pick: 10


# Plotting configuration
skeleton:
- - snout
  - tailbase
- - snout
  - leftforepaw
- - snout
  - rightforepaw
- - tailbase
  - lefthindpaw
- - tailbase
  - righthindpaw

skeleton_color: white
pcutoff: 0.6
dotsize: 12
alphavalue: 0.7
colormap: tab10


# Training,Evaluation and Analysis configuration
TrainingFraction:
- 0.95
iteration: 0
default_net_type: dlcrnet_ms5
default_augmenter: multi-animal-imgaug
default_track_method: ellipse
snapshotindex: -1
detector_snapshotindex: -1
batch_size: 8


# Cropping Parameters (for analysis and outlier frame detection)
cropping: false
#if cropping is true for analysis, then set the values here:
x1: 0
x2: 640
y1: 277
y2: 624


# Refinement configuration (parameters from annotation dataset configuration also relevant in this stage)
corner2move2:
- 50
- 50
move2corner: true


# Conversion tables to fine-tune SuperAnimal weights
SuperAnimalConversionTables:
```

3. Labeled ~300 images and trained using the following pytorch config. I believe I only changed minor settings like the batch size and # of epochs to save.
```
data:
  colormode: RGB
  inference:
    normalize_images: true
  train:
    affine:
      p: 0.5
      rotation: 30
      scaling:
      - 1.0
      - 1.0
      translation: 0
    collate:
      type: ResizeFromDataSizeCollate
      min_scale: 0.4
      max_scale: 1.0
      min_short_side: 128
      max_short_side: 1152
      multiple_of: 32
      to_square: false
    covering: false
    gaussian_noise: 12.75
    hist_eq: false
    motion_blur: false
    normalize_images: true
device: auto
metadata:
  project_path: D:\data\deeplabcut_projects\han_lab_pinned_ball_cropped-Radu-2024-07-01
  pose_config_path: 
    D:\data\deeplabcut_projects\han_lab_pinned_ball_cropped-Radu-2024-07-01\dlc-models-pytorch\iteration-0\han_lab_pinned_ball_croppedJul1-trainset95shuffle1\train\pose_cfg.yaml
  bodyparts:
  - snout
  - tailbase
  - leftforepaw
  - rightforepaw
  - lefthindpaw
  - righthindpaw
  unique_bodyparts:
  - ledon
  - ledoff
  individuals:
  - individual1
  with_identity: false
method: bu
model:
  backbone:
    type: DLCRNet
    model_name: resnet50
    pretrained: true
    output_stride: 32
  backbone_output_channels: 2304
  pose_model:
    stride: 8
  heads:
    bodypart:
      type: DLCRNetHead
      predictor:
        type: PartAffinityFieldPredictor
        num_animals: 1
        num_multibodyparts: 6
        num_uniquebodyparts: 0
        nms_radius: 5
        sigma: 1.0
        locref_stdev: 7.2801
        min_affinity: 0.05
        graph: &id001
        - - 0
          - 1
        - - 0
          - 2
        - - 0
          - 3
        - - 0
          - 4
        - - 0
          - 5
        - - 1
          - 2
        - - 1
          - 3
        - - 1
          - 4
        - - 1
          - 5
        - - 2
          - 3
        - - 2
          - 4
        - - 2
          - 5
        - - 3
          - 4
        - - 3
          - 5
        - - 4
          - 5
        edges_to_keep:
        - 0
        - 1
        - 2
        - 3
        - 4
        - 5
        - 6
        - 7
        - 8
        - 9
        - 10
        - 11
        - 12
        - 13
        - 14
      target_generator:
        type: SequentialGenerator
        generators:
        - type: HeatmapPlateauGenerator
          num_heatmaps: 6
          pos_dist_thresh: 17
          heatmap_mode: KEYPOINT
          generate_locref: true
          locref_std: 7.2801
        - type: PartAffinityFieldGenerator
          graph: *id001
          width: 20
      criterion:
        heatmap:
          type: WeightedBCECriterion
          weight: 1.0
        locref:
          type: WeightedHuberCriterion
          weight: 0.05
        paf:
          type: WeightedHuberCriterion
          weight: 0.1
      heatmap_config:
        channels:
        - 2304
        - 1152
        - 6
        kernel_size:
        - 3
        - 3
        strides:
        - 2
        - 2
      locref_config:
        channels:
        - 2304
        - 1152
        - 12
        kernel_size:
        - 3
        - 3
        strides:
        - 2
        - 2
      paf_config:
        channels:
        - 2304
        - 1152
        - 30
        kernel_size:
        - 3
        - 3
        strides:
        - 2
        - 2
      num_stages: 5
    unique_bodypart:
      type: HeatmapHead
      weight_init: normal
      predictor:
        type: HeatmapPredictor
        apply_sigmoid: false
        clip_scores: true
        location_refinement: true
        locref_std: 7.2801
      target_generator:
        type: HeatmapGaussianGenerator
        num_heatmaps: 2
        pos_dist_thresh: 17
        heatmap_mode: KEYPOINT
        generate_locref: true
        locref_std: 7.2801
        label_keypoint_key: keypoints_unique
      criterion:
        heatmap:
          type: WeightedMSECriterion
          weight: 1.0
        locref:
          type: WeightedHuberCriterion
          weight: 0.05
      heatmap_config:
        channels:
        - 2304
        - 2
        kernel_size:
        - 3
        strides:
        - 2
      locref_config:
        channels:
        - 2304
        - 4
        kernel_size:
        - 3
        strides:
        - 2
net_type: dlcrnet_stride32_ms5
runner:
  type: PoseTrainingRunner
  gpus:
  key_metric: test.mAP
  key_metric_asc: true
  eval_interval: 25
  optimizer:
    type: AdamW
    params:
      lr: 0.0001
  scheduler:
    type: LRListScheduler
    params:
      lr_list:
      - - 1e-05
      - - 1e-06
      milestones:
      - 160
      - 190
  snapshots:
    max_snapshots: 5
    save_epochs: 5
    save_optimizer_state: false
train_settings:
  batch_size: 2
  dataloader_workers: 0
  dataloader_pin_memory: true
  display_iters: 1000
  epochs: 200
  seed: 42
```

### Relevant log output

```shell
File "C:\Users\radud\.conda\envs\deeplabcut3\lib\site-packages\deeplabcut\gui\tabs\train_network.py", line 190, in train_network
    compat.train_network(config, shuffle, **kwargs)
  File "C:\Users\radud\.conda\envs\deeplabcut3\lib\site-packages\deeplabcut\compat.py", line 245, in train_network
    return train_network(
  File "C:\Users\radud\.conda\envs\deeplabcut3\lib\site-packages\deeplabcut\pose_estimation_pytorch\apis\train.py", line 336, in train_network
    train(
  File "C:\Users\radud\.conda\envs\deeplabcut3\lib\site-packages\deeplabcut\pose_estimation_pytorch\apis\train.py", line 189, in train
    runner.fit(
  File "C:\Users\radud\.conda\envs\deeplabcut3\lib\site-packages\deeplabcut\pose_estimation_pytorch\runners\train.py", line 181, in fit
    valid_loss = self._epoch(
  File "C:\Users\radud\.conda\envs\deeplabcut3\lib\site-packages\deeplabcut\pose_estimation_pytorch\runners\train.py", line 221, in _epoch
    losses_dict = self.step(batch, mode)
  File "C:\Users\radud\.conda\envs\deeplabcut3\lib\site-packages\deeplabcut\pose_estimation_pytorch\runners\train.py", line 346, in step
    self._update_epoch_predictions(
  File "C:\Users\radud\.conda\envs\deeplabcut3\lib\site-packages\deeplabcut\pose_estimation_pytorch\runners\train.py", line 422, in _update_epoch_predictions
    vis = kpts[-1]
IndexError: invalid index to scalar variable.
```


### Anything else?

_No response_

### Code of Conduct

- [X] I agree to follow this project's [Code of Conduct](https://github.com/DeepLabCut/DeepLabCut/blob/master/CODE_OF_CONDUCT.md)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IndexError during evaluation step of training with the new pytorch engine #2648

Is there an existing issue for this?

Bug description

Operating System

DeepLabCut version

DeepLabCut mode

Device type

Steps To Reproduce

Relevant log output

Anything else?

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

IndexError during evaluation step of training with the new pytorch engine #2648

Description

Is there an existing issue for this?

Bug description

Operating System

DeepLabCut version

DeepLabCut mode

Device type

Steps To Reproduce

Relevant log output

Anything else?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions