Introduction
This article goes over how to use the Python app to parse MCAP file. The app allows the user to view each H.264 frame using OpenCV and overlay the bounding boxes on to the frame. Furthermore, the app also has a feature to feed it an ONNX model which can inference the frame and draw another box in different color to verify your results.
Requirements
Before proceeding, ensure you have the following prerequisites:
- Windows or Linux (WSL will not work).
- An MCAP file.
- Python 3.8 or above.
For Ubuntu 20.04/22.04, the Python Tkinter package must be installed:
sudo apt install python3-tk
If this isn't installed, the images will not appear on the screen. Confirmation that Tkinter is installed can be demonstrated by importing the Matplotlib library and checking what backend it uses:
$ python3
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import matplotlib
>>> matplotlib.get_backend()
'TkAgg'
>>> exit()
If the output is 'TkAgg', the TK backend is installed. If the output is 'Agg', there is no graphical backend installed and the images will not displayed.
Understanding the Code
The provided code consists of several components:
- Imports: Import necessary libraries and modules for video processing, object detection, and logging.
- Initialization: Initialize logging configuration and create a logger object to handle messages.
- Reading Video Frames: Define functions to read video frames from an MCAP file and decode them using the AV container. As the video is H264, which works on the fundamentals of Key & I frames, these frames need to be decoded properly.
- Object Detection: Implement functions to run an object detection model on the video frames and draw bounding boxes around detected objects.
- Displaying Frames: Define a function to display the processed frames with bounding boxes using OpenCV.
- Setting Image Size: Implement a function to set the size of the image based on camera information.
- Visualization: Combine the above components to visualize the video stream, detect objects, and display bounding boxes.
How It Works
The code starts by importing necessary libraries and setting up logging to capture informational, warning, and error messages. It then initializes an input/output buffer for storing raw data and opens an AV container to parse H.264 video format. Functions are defined to read video frames, run object detection models, draw bounding boxes, and display frames with bounding boxes.
The visualization process begins by opening an MCAP file, reading camera information, and setting the image size accordingly. Frames are then iteratively processed: decoded, objects detected, bounding boxes drawn, and displayed to the user.
How to Use
To use the provided code:
- Clone the Git repo from here.
- Ensure all prerequisites are installed on your system.
- Run the script with appropriate command-line arguments:
python viewer.py <path_to_mcap_file>
Usage for the application:
python viewer.py -h
usage: viewer.py [-h] [-m [MODEL]] [-s SCALE] [-t THICKNESS] [-b DISPLAY_BBOX] [-c] mcap_file
Process MCAP to view images with bounding boxes.
positional arguments:
mcap_file MCAP that needs to be parsed
options:
-h, --help show this help message and exit
-m [MODEL], --model [MODEL]
Run the frame through a custom model to display bounding box. Specify the model name after --model. Default: False
-s SCALE, --scale SCALE
Resizing factor to view the final image 0.1-1.0. Default: 1.0
-t THICKNESS, --thickness THICKNESS
Choose the thickness of the bounding box. Default: 2
-b DISPLAY_BBOX, --display_bbox DISPLAY_BBOX
Choose to view the bounding box [yes, no]. Default: yes
-c, --custom Choose to view the bounding box. Default: False
The image below is an example of the output produces by the app the red boxes is produced by the model and the blue boxes is read from the MCAP:
Comments
0 comments
Please sign in to leave a comment.