Skip to content

jeho-lee/Awesome-On-Device-AI-Systems

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 

Repository files navigation

Awesome On-Device AI Systems Awesome

A curated list of efficient on-device AI systems, including practical inference engines, benchmarks, and state-of-the-art research papers for mobile and edge devices.

This repository bridges the gap between Systems Research (academic papers) and Practical Deployment (engineering frameworks), focusing on optimizing ML models (e.g., LLM/VLMs, ViTs, etc.) on resource-constrained hardware.

📂 Table of Contents

🚀 Inference Engines

Frameworks and runtimes designed for deploying models on edge devices.

General ML Workloads

  • LiteRT (formerly TensorFlow Lite) - Google's framework for on-device inference.
  • ExecuTorch - PyTorch’s end-to-end solution for enabling on-device AI.
  • ONNX Runtime - Cross-platform inference engine for ONNX models.
  • MNN - Lightweight deep learning framework by Alibaba.
  • NCNN - High-performance NN inference framework by Tencent.

LLM & GenAI Specialized

  • llama.cpp - LLM inference in C/C++ with minimal dependencies.
  • MLC LLM - Universal solution for deploying LLMs on any hardware (based on TVM).
  • mllm - A fast and lightweight LLM inference engine for mobile and edge devices.
  • OmniInfer - High-performance, on-device VLM inference with hybrid NPU acceleration.
  • RunAnywhere - Open-source SDK for running LLMs and multimodal models on-device across iOS, Android, and cross-platform apps.

Vendor-Specific SDKs

  • Qualcomm QNN - Qualcomm AI Stack for Snapdragon NPUs/DSPs.
  • Apple Core ML - Framework to integrate ML models into iOS/macOS apps.
  • FluidAudio - Local audio AI SDK for Apple platforms with ASR, speaker diarization, VAD, and TTS optimized for Apple Neural Engine.
  • NVIDIA TensorRT - SDK for high-performance deep learning inference on NVIDIA GPUs (including Jetson).
  • Intel OpenVINO - Toolkit for optimizing and deploying AI inference on Intel hardware (CPU/GPU/NPU).
  • MediaTek NeuroPilot - AI ecosystem and SDK for MediaTek NPUs.

📝 Research Papers

Note: Some of the works are designed for inference acceleration on cloud/server infrastructure, which has much higher computational resources, but I also include them here if they can be potentially generalized to on-device inference use cases.

Attention Acceleration

LLM Inference on Mobile SoCs

Compiler-based ML Optimization

Hardware-aware Quantization

Inference Acceleration using Heterogeneous Computing Processors (e.g., CPU, GPU, NPU, etc.)

Adaptive Inference for Optimized Resource Utilization

On-device Training, Model Adaptation

Profilers

Releases

No releases published

Packages

 
 
 

Contributors