Skip to content

Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models

License

Notifications You must be signed in to change notification settings

JiajunHong1/DoraemonGPT

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models

ReLER, CCAI, Zhejiang University
Corresponding Author
Overview. Given a video with a question/task, DoraemonGPT first extracts a Task-related Symbolic Memory, which has two types of memory for selection: space-dominant memory based on instances and time-dominant memory based on time frames/clips. The memory can be queried by sub-task tools, which are driven by LLMs with different prompts and generate symbolic language (i.e., SQL sentences) to do different reasoning. Also, other tools for querying external knowledge or utility tools are supported. For planning, DoraemonGPT employs the MCTS Planner to decompose the question into an action sequence by exploring multiple feasible N solutions, which can be further summarized into an informative answer.

The official release of DoraemonGPT will be accessible soon!

About

Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published