- 🔭 I’m currently working on multi-modal transformers and multi-task learning
- 🌱 I’m currently learning to play Table Tennis 🏓
- 📫 How to reach me: [email protected]
😀
I am a final year Ph.D. student in the Computer Vision Department at MBZUAI, working under the supervision of Dr. Salman Khan and Prof. Fahad Khan.
- Abu Dhabi, UAE, San Francisco, USA
- https://www.mmaaz60.com
- in/mmaaz60
- @mmaaz60
Pinned Loading
-
facebookresearch/perception_models
facebookresearch/perception_models PublicState-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
-
mbzuai-oryx/Video-ChatGPT
mbzuai-oryx/Video-ChatGPT Public[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted fo…
-
mbzuai-oryx/groundingLMM
mbzuai-oryx/groundingLMM Public[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
-
mbzuai-oryx/VideoGPT-plus
mbzuai-oryx/VideoGPT-plus PublicOfficial Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
-
mbzuai-oryx/LLaVA-pp
mbzuai-oryx/LLaVA-pp Public🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.