On the i.MX8 platforms with GPU or NPU accelerators DeepViewRT is able to use the deepview-rt-openvx engine accelerator plug-in to leverage these devices. When using these accelerators DeepViewRT will generate an OpenVX graph representation of the DeepViewRT Model "RTM" allowing for accelerated execution. One drawback is that the OpenVX driver takes a very long time to interpret these graphs, this is not the time it takes to generate the graph from RTM which is fast, but the time it takes to initially load the OpenVX graph.
Loading a model through DeepViewRT or various wrappers such as ModelRunner or the VAAL GStreamer plugins can take a minute or longer. This can cause timeout issues or simply concerns about unreasonably long boot times.
The OpenVX driver provides a graph caching mechanism that significantly speeds up future reload times for the graph. The cache stores a binary representation of the in-memory graph, not the model used to generate the graph, to the file system. When loading a model, graph generation proceeds as normal but the time-consuming graph loading step will first attempt to load the cached representation.
The driver allows for many graphs to be stored and the location, along with enabling the feature, is controlled through a pair of environment variables. This allows a user to select on a per-process basis when to enable graph cache and where to store the cache.
This environment variable should be set to 1 to enable the caching feature or to 0 to disable it. The feature is disabled by default, if the environment variable is not defined then it is treated as 0.
This environment variable controls the location of the graph cache. If the variable is unset the default is to store the graph in the process' current working directory.
Here we show loading a model using modelrunner in benchmark mode (-b 10) with the OpenVX plug-in and a CenterNet model, which has many layers (more layers take longer). We then run it again with caching enabled showing the time remains the same for the first run, but subsequent runs are much faster. Times are highlighted in blue.
# time modelrunner -e deepview-rt-openvx.so -m model.rtm -b 10
Average model run time: 19.8771 ms (layer sum: 0.0000 ms) 25.36user 0.24system 0:26.30elapsed 97%CPU
NOTE: with DeepViewRT versions 2.5.6 and later, you must not include the "deepview-rt" and ".so" for the engine parameter. The above command would be written as:
# time modelrunner -e openvx -m model.rtm -b 10
Average model run time: 19.8771 ms (layer sum: 0.0000 ms)
25.36user 0.24system 0:26.30elapsed 97%CPU
Note you will likely need to run this twice before the cache takes effect. The issue has been reported to VSI and a fix is expected in a future driver update.
# VIV_VX_ENABLE_CACHE_GRAPH_BINARY=1 time modelrunner -e deepview-rt-openvx.so -m centernet_256x512_cards.rtm -b 10
Average model run time: 19.8823 ms (layer sum: 0.0000 ms)
25.32user 0.25system 0:25.87elapsed 98%CPU
Cache Enabled Reload
VIV_VX_ENABLE_CACHE_GRAPH_BINARY=1 time modelrunner -e deepview-rt-openvx.so -m centernet_256x512_cards.rtm -b 10
Average model run time: 19.9010 ms (layer sum: 0.0000 ms)
1.79user 0.11system 0:02.16elapsed 87%CPU
The cache files are saved as .nb files (network binary). We can see here the cache files are about twice the size of the .rtm file.
root@imx8mpevk:~# ls -lh *.nb
-rw-r--r-- 1 root root 31M Jan 25 16:27 5edf5e26c73205bafd4815d17e969f15.nb
root@imx8mpevk:~# ls -lh centernet_256x512_cards.rtm
-rw-r--r-- 1 root root 14M Jan 14 17:07 centernet_256x512_cards.rtm