Introduction
On the i.MX8 platforms with GPU or NPU accelerators, Deep View RT is able to use the OpenVX engine accelerator plug-in to leverage these devices. When using these accelerators, Deep View RT will generate an OpenVX graph representation of the Deep View RT Model (RTM) allowing for accelerated execution. One drawback is that the OpenVX driver takes a very long time to interpret these graphs, This is not the time it takes to generate the graph from the RTM, which is fast, but the time it takes to initially load the OpenVX graph.
Problem
Loading a model through Deep View RT or various wrappers such as ModelRunner or the VAAL GStreamer plugins can take a minute or longer. This can cause timeout issues or simply concerns about unreasonably long boot times.
Solution
The OpenVX driver provides a graph caching mechanism that significantly speeds up future reload times for the graph. The cache stores a binary representation of the in-memory graph, not the model used to generate the graph, to the file system. When loading a model, graph generation proceeds as normal but the time-consuming graph loading step will first attempt to load the cached representation.
The driver allows for many graphs to be stored and the location, along with enabling the feature, is controlled through a pair of environment variables. This allows a user to select on a per-process basis when to enable graph cache and where to store the cache.
Configuration
VIV_VX_ENABLE_CACHE_GRAPH_BINARY
This environment variable should be set to 1 to enable the caching feature or to 0 to disable it. The feature is disabled by default, if the environment variable is not defined then it is treated as 0.
VIV_VX_CACHE_BINARY_GRAPH_DIR
This environment variable controls the location of the graph cache. If the variable is unset the default is to store the graph in the process' current working directory.
Example
Here we show loading a model using ModelRunner in benchmark mode (-b 10) with the OpenVX plug-in and a CenterNet model, which has many layers (more layers take longer). We then run it again with caching enabled showing the time remains the same for the first run, but subsequent runs are much faster. Times are highlighted in blue.
Cache Disabled
# time modelrunner -e deepview-rt-openvx.so -m model.rtm -b 10
Average model run time: 19.8771 ms (layer sum: 0.0000 ms)
25.36user 0.24system 0:26.30elapsed 97%CPU
NOTE: with Deep View RT versions 2.5.6 and later, you must not include the "deepview-rt" and ".so" for the engine parameter. The above command would be written as:
# time modelrunner -e openvx -m model.rtm -b 10
Average model run time: 19.8771 ms (layer sum: 0.0000 ms)
25.36user 0.24system 0:26.30elapsed 97%CPU
Cache Enabled
Note you will likely need to run this twice before the cache takes effect. The issue has been reported to VSI and a fix is expected in a future driver update.
# VIV_VX_ENABLE_CACHE_GRAPH_BINARY=1 time modelrunner -e deepview-rt-openvx.so -m centernet_256x512_cards.rtm -b 10
Average model run time: 19.8823 ms (layer sum: 0.0000 ms)
25.32user 0.25system 0:25.87elapsed 98%CPU
Cache Enabled Reload
VIV_VX_ENABLE_CACHE_GRAPH_BINARY=1 time modelrunner -e deepview-rt-openvx.so -m centernet_256x512_cards.rtm -b 10
Average model run time: 19.9010 ms (layer sum: 0.0000 ms)
1.79user 0.11system 0:02.16elapsed 87%CPU
Cache File
The cache files are saved as .nb files (network binary). We can see here the cache files are about twice the size of the .rtm file.
root@imx8mpevk:~# ls -lh *.nb
-rw-r--r-- 1 root root 31M Jan 25 16:27 5edf5e26c73205bafd4815d17e969f15.nb
root@imx8mpevk:~# ls -lh centernet_256x512_cards.rtm
-rw-r--r-- 1 root root 14M Jan 14 17:07 centernet_256x512_cards.rtm
Comments
0 comments
Please sign in to leave a comment.