Awesome-Video-Hallucination

A curated and structured collection of papers on hallucination in Video Large Language Models (Vid-LLMs), covering 19 evaluation benchmarks and 23 mitigation methods. Automatically updated monthly via arXiv search.

📄 Based on the survey: Distorted or Fabricated? A Survey on Hallucination in Video LLMs

Overview at a Glance

📊 19

_{Evaluation Benchmarks}

🛠️ 23

_{Mitigation Methods}

🏛️ 15+

_{Top-tier Venues}

📅 2023–2026

_{Coverage Period}

🤖 Auto

_{Monthly Paper Update}

🔔 News

[2026/03] 🤖 Automated monthly arXiv paper update is now live! A GitHub Action runs on the 1st of each month to find new video hallucination papers and commit directly to the main branch. Newly discovered papers that have not yet been classified can be found in new_papers.md.

📖 Table of Contents

📋 Taxonomy of Video Hallucinations
📊 Evaluation Benchmarks — 19 benchmarks
🔵 Spatiotemporal Dynamics
🟢 Referential Inconsistency
🟠 Context-Driven Fabrication
🟣 Audio-Visual Conflict
🛠️ Mitigation Strategies — 23 methods
🔵 Spatiotemporal Dynamics
🟢 Referential Inconsistency
🟠 Context-Driven Fabrication
🟣 Audio-Visual Conflict
🤝 Contributing

Taxonomy of Video Hallucinations

We propose a mechanism-driven taxonomy that classifies hallucinations in Video Large Language Models (Vid-LLMs) into two primary types:

🔷 Dynamic Distortion

The model correctly detects entities but misrepresents their temporal progression or referential consistency.

🔵 Spatiotemporal Dynamics — Errors in event ordering, duration estimation, or frequency counting.
🟢 Referential Inconsistency — Characters or scenes are conflated across temporal boundaries.

🔶 Content Fabrication

The model produces outputs that lack grounding in visual evidence and are instead influenced by learned priors.

🟠 Context-Driven Fabrication — Common object-action or scene-event associations lead to unsupported predictions.
🟣 Audio-Visual Conflict — Dominant auditory cues override visual input.

Mechanism-driven taxonomy of Vid-LLM hallucinations. Solid fill = benchmarks; striped fill = mitigation methods.

Evaluation Benchmarks

Note

Benchmarks are organized by our mechanism-driven taxonomy. Each entry includes venue, date, and links to code/project pages where available.

Legend: = Project Page = GitHub Repository - = Not Available

🔵 Spatiotemporal Dynamics Benchmarks (Dynamic Distortion)

Event Misordering (4 papers)

Title	Benchmark	Venue	Date
VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding	VidHalluc	CVPR 2025	12/2024
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation	HAVEN	arXiv 2025	03/2025
MHBench: Demystifying Motion Hallucination in VideoLLMs	MHBench	AAAI 2025	01/2025
ARGUS: Hallucination and Omission Evaluation in Video-LLMs	ARGUS	ICCV 2025	06/2025

Duration Distortion (2 papers)

Title	Benchmark	Venue	Date	Code
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models	VideoHallucer	arXiv 2024	06/2024
Online Video Understanding: OVBench and VideoChat-Online	OVBench	CVPR 2025	01/2025

Frequency Confusion (2 papers)

Title	Benchmark	Venue	Date	Code
VidHal: Benchmarking Temporal Hallucinations in Vision LLMs	VidHal	arXiv 2024	11/2024
Vript: A Video Is Worth Thousands of Words	Vript	NeurIPS 2024	06/2024

🟢 Referential Inconsistency Benchmarks (Dynamic Distortion)

Character Conflation (2 papers)

Title	Benchmark	Venue	Date	Code
EGOILLUSION: Benchmarking Hallucinations in Egocentric Video Understanding	EGOILLUSION	EMNLP 2025	11/2025
MESH: Measuring Hallucinations in Large Video Models	MESH	ACM MM 2025	09/2025

Scene Conflation (1 paper)

Title	Benchmark	Venue	Date	Code
ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding	ELV-Halluc	arXiv 2025	08/2025

🟠 Context-Driven Fabrication Benchmarks (Content Fabrication)

Object-Action Hallucination (2 papers)

Title	Benchmark	Venue	Date	Code
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding	VideoHallu	NeurIPS 2025	05/2025
Models See Hallucinations: Evaluating the Factuality in Video Captioning	FactVC	EMNLP 2023	03/2023

Scene-Event Hallucination (3 papers)

Title	Benchmark	Venue	Date
EventHallusion: Diagnosing Event Hallucinations in Video LLMs	EventHallusion	arXiv 2024	09/2024
NOAH: Benchmarking Narrative Prior driven Hallucination and Omission in Video Large Language Models	NOAH	arXiv 2025	11/2025
RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives	RoadSocial	CVPR 2025	02/2025

🟣 Audio-Visual Conflict Benchmarks (Content Fabrication)

Action Attribution (2 papers)

Title	Benchmark	Venue	Date	Code
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models	AVHBench	ICLR 2025	10/2024
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio	CMM	arXiv 2024	10/2024

Emotion Inference (1 paper)

Title	Benchmark	Venue	Date	Code
EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models	EmotionHallucer	arXiv 2025	05/2025

Mitigation Strategies

Note

Methods are classified by the type of hallucination they target. The Training-Free column indicates whether the method requires additional training (✘) or not (✔︎).

🔵 Spatiotemporal Dynamics Mitigation (Dynamic Distortion)

Event Misordering (3 papers)

Title	Method	Venue	Date	Training-Free	Code
SEASON: Mitigating Temporal Hallucination in Video LLMs via Self-Diagnostic Contrastive Decoding	SEASON	arXiv 2025	12/2025	✔︎	-
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation	Video-thinking (TDPO)	arXiv 2025	03/2025	✘
SmartSight: Mitigating Hallucination in Video-LLMs via Temporal Attention Collapse	SmartSight	AAAI 2026	12/2025	✔︎	-

Duration Distortion (3 papers)

Title	Method	Venue	Date	Training-Free	Code
Temporal Insight Enhancement: Mitigating Temporal Hallucination in Video Understanding by MLLMs	Temporal Insight	ICPR 2024	01/2024	✔︎	-
VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding	DINO-HEAL	CVPR 2025	12/2024	✔︎
Mitigating Hallucination in VideoLLMs via Temporal-Aware Activation Engineering	TAAE	arXiv 2025	05/2025	✘	-

Frequency Confusion (2 papers)

Title	Method	Venue	Date	Training-Free	Code
VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding	VTG-LLM	AAAI 2025	05/2024	✘
Vript: A Video Is Worth Thousands of Words	Vriptor	NeurIPS 2024	06/2024	✘

🟢 Referential Inconsistency Mitigation (Dynamic Distortion)

Character Conflation (2 papers)

Title	Method	Venue	Date	Training-Free	Code
Vista-LLaMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens	Vista-LLaMA	CVPR 2024	12/2023	✘
Alternating Perception-Reasoning for Hallucination-Resistant Video Understanding	VideoPLR	arXiv 2025	11/2025	✘

Scene Conflation (2 papers)

Title	Method	Venue	Date	Training-Free	Code
ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding	ELV-Halluc-DPO	arXiv 2025	08/2025	✘
Online Video Understanding: OVBench and VideoChat-Online	VideoChat-Online	CVPR 2025	01/2025	✘

🟠 Context-Driven Fabrication Mitigation (Content Fabrication)

Object-Action Hallucination (2 papers)

Title	Method	Venue	Date	Training-Free	Code
Mitigating Object and Action Hallucinations in Multimodal LLMs via Self-Augmented Contrastive Alignment	SANTA	WACV 2026	12/2025	✘
EventHallusion: Diagnosing Event Hallucinations in Video LLMs	TCD	arXiv 2024	09/2024	✔︎

Scene-Event Hallucination (3 papers)

Title	Method	Venue	Date	Training-Free	Code
MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations	MASH-VLM	CVPR 2025	03/2025	✘	-
PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning	PaMi-VDPO	arXiv 2025	04/2025	✘	-
Hallucination Reduction in Video-Language Models via Hierarchical Multimodal Consistency	MMA	IJCAI 2025	08/2025	✘	-

Both Object-Action & Scene-Event (2 papers)

Title	Method	Venue	Date	Training-Free	Code
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models	VistaDPO	ICML 2025	04/2025	✘
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding	VideoHallu-GRPO	NeurIPS 2025	05/2025	✘

🟣 Audio-Visual Conflict Mitigation (Content Fabrication)

Action Attribution (2 papers)

Title	Method	Venue	Date	Training-Free	Code
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models	AVHModel-Align-FT	ICLR 2025	10/2024	✘
AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding	AVCD	NeurIPS 2025	05/2025	✔︎

Emotion Inference (1 paper)

Title	Method	Venue	Date	Training-Free	Code
EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models	PEP-MEK	arXiv 2025	05/2025	✔︎

Contributing

Tip

We welcome contributions from the community! Here's how you can help:

🔀 Pull Request — Add new papers, update code links, or correct errors
🐛 Open an Issue — Report mistakes, suggest missing papers, or request features

📝 PR Format Guide

Please follow this table structure when adding new entries:

| [**Paper Title**](paper_link) | Method/Benchmark Name | Venue | MM/YYYY | [code](code_link) |

If you find this repository helpful, please consider giving it a ⭐

Maintained by the SmileLab team at Northeastern University.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
data		data
imgs		imgs
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
new_papers.md		new_papers.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-Video-Hallucination

Overview at a Glance

📊 19

🛠️ 23

🏛️ 15+

📅 2023–2026

🤖 Auto

🔔 News

Taxonomy of Video Hallucinations

Evaluation Benchmarks

🔵 Spatiotemporal Dynamics Benchmarks (Dynamic Distortion)

🟢 Referential Inconsistency Benchmarks (Dynamic Distortion)

🟠 Context-Driven Fabrication Benchmarks (Content Fabrication)

🟣 Audio-Visual Conflict Benchmarks (Content Fabrication)

Mitigation Strategies

🔵 Spatiotemporal Dynamics Mitigation (Dynamic Distortion)

🟢 Referential Inconsistency Mitigation (Dynamic Distortion)

🟠 Context-Driven Fabrication Mitigation (Content Fabrication)

🟣 Audio-Visual Conflict Mitigation (Content Fabrication)

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Awesome-Video-Hallucination

Overview at a Glance

📊 19

🛠️ 23

🏛️ 15+

📅 2023–2026

🤖 Auto

🔔 News

Taxonomy of Video Hallucinations

Evaluation Benchmarks

🔵 Spatiotemporal Dynamics Benchmarks (Dynamic Distortion)

🟢 Referential Inconsistency Benchmarks (Dynamic Distortion)

🟠 Context-Driven Fabrication Benchmarks (Content Fabrication)

🟣 Audio-Visual Conflict Benchmarks (Content Fabrication)

Mitigation Strategies

🔵 Spatiotemporal Dynamics Mitigation (Dynamic Distortion)

🟢 Referential Inconsistency Mitigation (Dynamic Distortion)

🟠 Context-Driven Fabrication Mitigation (Content Fabrication)

🟣 Audio-Visual Conflict Mitigation (Content Fabrication)

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages