Skip to content

hukcc/Awesome-Video-Hallucination

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome-Video-Hallucination Awesome

TechRxiv Papers Auto arXiv Update License: MIT Contributions Welcome PRs Welcome Last Commit Visitors

A curated and structured collection of papers on hallucination in Video Large Language Models (Vid-LLMs), covering 19 evaluation benchmarks and 23 mitigation methods. Automatically updated monthly via arXiv search.

📄 Based on the survey: Distorted or Fabricated? A Survey on Hallucination in Video LLMs

Framework overview

Overview at a Glance

📊 19

Evaluation Benchmarks

🛠️ 23

Mitigation Methods

🏛️ 15+

Top-tier Venues

📅 2023–2026

Coverage Period

🤖 Auto

Monthly Paper Update

🔔 News

  • [2026/03] 🤖 Automated monthly arXiv paper update is now live! A GitHub Action runs on the 1st of each month to find new video hallucination papers and commit directly to the main branch. Newly discovered papers that have not yet been classified can be found in new_papers.md.

📖 Table of Contents

 📋 Taxonomy of Video Hallucinations
 📊 Evaluation Benchmarks — 19 benchmarks
   🔵 Spatiotemporal Dynamics
   🟢 Referential Inconsistency
   🟠 Context-Driven Fabrication
   🟣 Audio-Visual Conflict
 🛠️ Mitigation Strategies — 23 methods
   🔵 Spatiotemporal Dynamics
   🟢 Referential Inconsistency
   🟠 Context-Driven Fabrication
   🟣 Audio-Visual Conflict
 🤝 Contributing


Taxonomy of Video Hallucinations

We propose a mechanism-driven taxonomy that classifies hallucinations in Video Large Language Models (Vid-LLMs) into two primary types:

🔷 Dynamic Distortion

The model correctly detects entities but misrepresents their temporal progression or referential consistency.

  • 🔵 Spatiotemporal Dynamics — Errors in event ordering, duration estimation, or frequency counting.
  • 🟢 Referential Inconsistency — Characters or scenes are conflated across temporal boundaries.

🔶 Content Fabrication

The model produces outputs that lack grounding in visual evidence and are instead influenced by learned priors.

  • 🟠 Context-Driven Fabrication — Common object-action or scene-event associations lead to unsupported predictions.
  • 🟣 Audio-Visual Conflict — Dominant auditory cues override visual input.


Mechanism-driven taxonomy of Vid-LLM hallucinations. Solid fill = benchmarks; striped fill = mitigation methods.


Evaluation Benchmarks

Note

Benchmarks are organized by our mechanism-driven taxonomy. Each entry includes venue, date, and links to code/project pages where available.

Legend: page = Project Page   code = GitHub Repository   - = Not Available

🔵 Spatiotemporal Dynamics Benchmarks (Dynamic Distortion)

Event Misordering (4 papers)
Title Benchmark Venue Date Code
VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding VidHalluc CVPR 2025 12/2024 page code
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation HAVEN arXiv 2025 03/2025 code
MHBench: Demystifying Motion Hallucination in VideoLLMs MHBench AAAI 2025 01/2025 code
ARGUS: Hallucination and Omission Evaluation in Video-LLMs ARGUS ICCV 2025 06/2025 code
Duration Distortion (2 papers)
Title Benchmark Venue Date Code
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models VideoHallucer arXiv 2024 06/2024 code
Online Video Understanding: OVBench and VideoChat-Online OVBench CVPR 2025 01/2025 page code
Frequency Confusion (2 papers)
Title Benchmark Venue Date Code
VidHal: Benchmarking Temporal Hallucinations in Vision LLMs VidHal arXiv 2024 11/2024 code
Vript: A Video Is Worth Thousands of Words Vript NeurIPS 2024 06/2024 code

🟢 Referential Inconsistency Benchmarks (Dynamic Distortion)

Character Conflation (2 papers)
Title Benchmark Venue Date Code
EGOILLUSION: Benchmarking Hallucinations in Egocentric Video Understanding EGOILLUSION EMNLP 2025 11/2025 page
MESH: Measuring Hallucinations in Large Video Models MESH ACM MM 2025 09/2025 code
Scene Conflation (1 paper)
Title Benchmark Venue Date Code
ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding ELV-Halluc arXiv 2025 08/2025 code

🟠 Context-Driven Fabrication Benchmarks (Content Fabrication)

Object-Action Hallucination (2 papers)
Title Benchmark Venue Date Code
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding VideoHallu NeurIPS 2025 05/2025 code
Models See Hallucinations: Evaluating the Factuality in Video Captioning FactVC EMNLP 2023 03/2023 code
Scene-Event Hallucination (3 papers)
Title Benchmark Venue Date Code
EventHallusion: Diagnosing Event Hallucinations in Video LLMs EventHallusion arXiv 2024 09/2024 code
NOAH: Benchmarking Narrative Prior driven Hallucination and Omission in Video Large Language Models NOAH arXiv 2025 11/2025 page code
RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives RoadSocial CVPR 2025 02/2025 page code

🟣 Audio-Visual Conflict Benchmarks (Content Fabrication)

Action Attribution (2 papers)
Title Benchmark Venue Date Code
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models AVHBench ICLR 2025 10/2024 code
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio CMM arXiv 2024 10/2024 page code
Emotion Inference (1 paper)
Title Benchmark Venue Date Code
EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models EmotionHallucer arXiv 2025 05/2025 code

Mitigation Strategies

Note

Methods are classified by the type of hallucination they target. The Training-Free column indicates whether the method requires additional training (✘) or not (✔︎).

🔵 Spatiotemporal Dynamics Mitigation (Dynamic Distortion)

Event Misordering (3 papers)
Title Method Venue Date Training-Free Code
SEASON: Mitigating Temporal Hallucination in Video LLMs via Self-Diagnostic Contrastive Decoding SEASON arXiv 2025 12/2025 ✔︎ -
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation Video-thinking (TDPO) arXiv 2025 03/2025 code
SmartSight: Mitigating Hallucination in Video-LLMs via Temporal Attention Collapse SmartSight AAAI 2026 12/2025 ✔︎ -
Duration Distortion (3 papers)
Title Method Venue Date Training-Free Code
Temporal Insight Enhancement: Mitigating Temporal Hallucination in Video Understanding by MLLMs Temporal Insight ICPR 2024 01/2024 ✔︎ -
VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding DINO-HEAL CVPR 2025 12/2024 ✔︎ page code
Mitigating Hallucination in VideoLLMs via Temporal-Aware Activation Engineering TAAE arXiv 2025 05/2025 -
Frequency Confusion (2 papers)
Title Method Venue Date Training-Free Code
VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding VTG-LLM AAAI 2025 05/2024 code
Vript: A Video Is Worth Thousands of Words Vriptor NeurIPS 2024 06/2024 code

🟢 Referential Inconsistency Mitigation (Dynamic Distortion)

Character Conflation (2 papers)
Title Method Venue Date Training-Free Code
Vista-LLaMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens Vista-LLaMA CVPR 2024 12/2023 page code
Alternating Perception-Reasoning for Hallucination-Resistant Video Understanding VideoPLR arXiv 2025 11/2025 code
Scene Conflation (2 papers)
Title Method Venue Date Training-Free Code
ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding ELV-Halluc-DPO arXiv 2025 08/2025 code
Online Video Understanding: OVBench and VideoChat-Online VideoChat-Online CVPR 2025 01/2025 page code

🟠 Context-Driven Fabrication Mitigation (Content Fabrication)

Object-Action Hallucination (2 papers)
Title Method Venue Date Training-Free Code
Mitigating Object and Action Hallucinations in Multimodal LLMs via Self-Augmented Contrastive Alignment SANTA WACV 2026 12/2025 page
EventHallusion: Diagnosing Event Hallucinations in Video LLMs TCD arXiv 2024 09/2024 ✔︎ code
Scene-Event Hallucination (3 papers)
Title Method Venue Date Training-Free Code
MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations MASH-VLM CVPR 2025 03/2025 -
PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning PaMi-VDPO arXiv 2025 04/2025 -
Hallucination Reduction in Video-Language Models via Hierarchical Multimodal Consistency MMA IJCAI 2025 08/2025 -
Both Object-Action & Scene-Event (2 papers)
Title Method Venue Date Training-Free Code
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models VistaDPO ICML 2025 04/2025 code
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding VideoHallu-GRPO NeurIPS 2025 05/2025 code

🟣 Audio-Visual Conflict Mitigation (Content Fabrication)

Action Attribution (2 papers)
Title Method Venue Date Training-Free Code
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models AVHModel-Align-FT ICLR 2025 10/2024 code
AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding AVCD NeurIPS 2025 05/2025 ✔︎ code
Emotion Inference (1 paper)
Title Method Venue Date Training-Free Code
EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models PEP-MEK arXiv 2025 05/2025 ✔︎ code

Contributing

Tip

We welcome contributions from the community! Here's how you can help:

🔀 Pull Request — Add new papers, update code links, or correct errors
🐛 Open an Issue — Report mistakes, suggest missing papers, or request features

📝 PR Format Guide

Please follow this table structure when adding new entries:

| [**Paper Title**](paper_link) | Method/Benchmark Name | Venue | MM/YYYY | [code](code_link) |

If you find this repository helpful, please consider giving it a

Maintained by the SmileLab team at Northeastern University.

Releases

No releases published

Packages

 
 
 

Contributors

Languages