This example is a specialization of the Inference with user callbacks to process output tensors example, adjusted to demonstrate specifically CenterNet post-processing.
In this example a pipeline consisting of the camera feed, either v4l2src or vslsrc, forwards the video stream to the deepviewrt element which has a CenterNet model trained on playing cards. Finally the deepviewrt element forwards into appsink which has a callback registered from which we will read the model's output tensors to perform the CenterNet box decoding.
The video/x-raw is the caps applied to the camera element. When using vslsrc the caps should not be set, but with v4l2src it can be used to adjust the camera settings, for example to request VGA or 720p resolutions.
The application can be run with the following command-line and will print the decoded box parameters for detected playing cards. The "-s 0.02" is used to lower the threshold, required for detecting the cards, the "-c /dev/video3" is to point to the correct camera device on the i.MX8M Plus EVK.
./centernet_decode -m centernet_256x512_cards.rtm -s 0.02 -c /dev/video3
You should then expect some output like the following, assuming there's a playing card in view. Note it can take about 15 seconds to load the model on the NPU before you start seeing feedback.
root@imx8mpevk:~# ./centernet_decode -m centernet_256x512_cards.rtm -s 0.02 -c /dev/video3
Using the following options:
Camera = /dev/video3
Model = centernet_256x512_cards.rtm
Engine = NPU
Capture size: W=640, H=480
Loading model on NPU...
[ 5392.780968] bypass csc
[ 5392.783402] input fmt YUV4
[ 5392.786124] output fmt YUYV
Output dim for output array 0 = 1
Output dim for output array 0 = 64
Output dim for output array 0 = 128
Output dim for output array 0 = 7
Output dim for output array 1 = 1
Output dim for output array 1 = 64
Output dim for output array 1 = 128
Output dim for output array 1 = 2
Output dim for output array 2 = 1
Output dim for output array 2 = 64
Output dim for output array 2 = 128
Output dim for output array 2 = 2
Centernet decode time = 12.764 ms
Threshold = 0.02
bbx: xmin=498.60, ymin=222.97, xmax=498.68, ymax=223.02, score 0.03, ID 3
bbx: xmin=491.57, ymin=254.93, xmax=492.06, ymax=255.49, score 0.02, ID 6
The sample application can use a vslsrc from VideoStream instead of a v4l2src by using the --vsl parameter.
For example the following VSL path is defined, then centernet_decode can use it instead of the camera.
gst-launch-1.0 v4l2src device=/dev/video3 ! video/x-raw,width=640,height=480 ! vslsink path=/tmp/cam.3
./centernet_decode -m centernet_256x512_cards.rtm -s 0.02 -v /tmp/cam.3
To implement custom decoders the new_sample() function can be modified to use custom decoders instead of the provided one.
- DeepViewRT 2.5.4 or newer with CenterNet support.
- VAAL 1.0.5 or newer.
- VideoStream 1.0.1 or newer.
- Added debug printing of the processed frame timestamps (required VAAL 1.0.6).