DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models

Zongxin Yang Guikun Chen Xiaodi Li Wenguan Wang Yi Yang✉

ReLER, CCAI, Zhejiang University

^✉Corresponding Author

arXiv 2024

Overview. Given a video with a question/task, DoraemonGPT first extracts a Task-related Symbolic Memory, which has two types of memory for selection: space-dominant memory based on instances and time-dominant memory based on time frames/clips. The memory can be queried by sub-task tools, which are driven by LLMs with different prompts and generate symbolic language (i.e., SQL sentences) to do different reasoning. Also, other tools for querying external knowledge or utility tools are supported. For planning, DoraemonGPT employs the MCTS Planner to decompose the question into an action sequence by exploring multiple feasible N solutions, which can be further summarized into an informative answer.

The official release of DoraemonGPT will be accessible soon!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Sources		Sources
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models

About

Uh oh!

Releases

Packages

License

JiajunHong1/DoraemonGPT

Folders and files

Latest commit

History

Repository files navigation

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages