AirLens-Vision is a high-speed computer vision application that identifies and labels objects in real-time. Built with the modern MediaPipe Tasks API and OpenCV, it utilizes the EfficientDet-Lite0 model to provide professional-grade object detection directly on a live camera feed.
- Real-Time Inference: Optimized for live video streams with minimal latency.
- 80+ Categories: Detects a wide range of common objects (people, vehicles, laptops, bottles, etc.) based on the COCO dataset.
- Dynamic UI: Overlays precise bounding boxes, class labels, and confidence scores.
- Lightweight Architecture: Designed to run efficiently on standard hardware without requiring a dedicated high-end GPU.
git clone https://github.com/NKumar-B/ObjectSense-AI.git
cd ObjectSense-AI
python -m venv .venv
.\.venv\Scripts\activate
Ensure you have Python 3.9+ installed, then run:
pip install -r requirements.txt
You must download the EfficientDet-Lite0 (float32) model and place it in the project root:
- Model Name:
efficientdet_lite0.tflite - Source: Google MediaPipe Model Garden.
- Run the application:
python ObjectDetect.py
- Interaction:
- Point your webcam at objects to see bounding boxes and labels in real-time.
- The Confidence Score (0.0 - 1.0) indicates the AI's certainty.
- Exit: Press 'q' on your keyboard to close the window.
- Preprocessing: The input frame is mirrored and converted from BGR (OpenCV standard) to RGB (MediaPipe standard).
- Inference: The frame is passed to the
ObjectDetectortask, which performs a single pass to identify multiple objects simultaneously. - Visualization: The result contains normalized coordinates which are mathematically mapped back to your screen's pixel dimensions to draw the bounding boxes accurately.
Distributed under the MIT License. See LICENSE for more information.
- Google MediaPipe: For providing the robust Face Landmarker Tasks API and pre-trained
.taskmodels. - OpenCV (Open Source Computer Vision Library): For the powerful real-time image processing and visualization tools.
- The COCO Dataset Team: For their foundational work in standardizing computer vision training data.
- NumPy: For the efficient numerical processing required for coordinate mapping.