In this article, we will be describing the conversion process to generate a model that can be run on the Hailo accelerator. We will go through the arguments that are available, their purpose and when to use various options. We will begin our article however with how to access the converter.
Conversion is performed through a docker container that we provide on Docker Hub. To see all of the options available we would run the container as follows.
docker run -it --rm -v $CWD:/work --gpus all \
deepview/converter:hailo --help
In order for the model to properly quantize it is required that a GPU is available to the docker container. This is handled by the argument provided
--gpus all
On linux machines, it is required that nvidia-docker be installed and the base command will be changed to
nvidia-docker
This conversion can take 25-35 minutes using a GeForce RTX 4060 and can take longer on lower powered GPU's.
From this run, we can see all of the options available to us during conversion:
usage: conversion_hailo.py [-h] [-s SAMPLES] [-c NUM_SAMPLES] [-n {raw,unsigned,signed,imagenet}] [-i INPUT_NAMES]
[-o OUTPUT_NAMES] [--iou_threshold IOU_THRESHOLD] [--score_threshold SCORE_THRESHOLD]
[-d DEFAULT_SHAPE] [-q] [--include_nms] [--int16] [-b BATCH_SIZE] [--model_arch MODEL_ARCH]
[--opt_level {0,1,2,3,4}] [--comp_level COMP_LEVEL]
input_file output_file
Hailo Converter Tool
positional arguments:
input_file The input model to be converted
output_file The output path for the Hailo model
optional arguments:
-h, --help show this help message and exit
-s SAMPLES, --samples SAMPLES
Folder containing image files to use as samples for quantization
-c NUM_SAMPLES, --num_samples NUM_SAMPLES
Number of samples to use for quantization
-n {raw,unsigned,signed,imagenet}, --quant_normalization {raw,unsigned,signed,imagenet}
Normalization used for the model
-i INPUT_NAMES, --input_names INPUT_NAMES
Comma delimited list of the input names
-o OUTPUT_NAMES, --output_names OUTPUT_NAMES
Comma delimited list of the input names
--iou_threshold IOU_THRESHOLD
IOU threshold for detection models
--score_threshold SCORE_THRESHOLD
Score threshold for detection models
-d DEFAULT_SHAPE, --default_shape DEFAULT_SHAPE
Semicolon delimited list of comma delimited lists of the input shapes
-q, --quantize Flag whether to quantize the model
--include_nms Flag whether to include nms into the HEF model.
--int16 Flag whether to convert the model as int16 instead of int8 given quantization
-b BATCH_SIZE, --batch_size BATCH_SIZE
Batch size to use for quantization of the HEF model
--model_arch MODEL_ARCH
The architecture of the model, required for NMS inclusion (yolov5 = yolov7)
--opt_level {0,1,2,3,4}
The optimization to use for HEF quantization, default=2
--comp_level COMP_LEVEL
The compression level to use for quantization to HEF, default=0
To break these down we will start in order of most importance and work our way downwards. To begin we will cover the two positional arguments in the input_file and output_file. These are straightforward and will be your input model, Keras (H5), TFLite, and ONNX are supported and your output file which should end with .hef. In addition to producing the .hef, the converter will also produce the .har file which contains more information about the model and would allow you to reload the model for further work if required.
The next two arguments available are flags that drastically affect the output model.
--quantize is necessary to always include if you want to produce a viable .hef file as without this Hailo cannot generate a .hef file only a .har archive file that is floating point. This will be updated in the future as update for floating point models could be supported, but for the time being, add this argument to any conversion command.
--include_nms is a flag used to tell the converter to append model decoding and NMS to the Hailo model. The NMS will be run on CPU, however it is still recommended to incorporate NMS into any detection model.
The next three arguments set up the samples used for quantization.
--samples is a path to the folder that contains images that will be used for quantization. It is recommended that this folder contains at least 1024 images for optimal quantization of the Hailo model.
--num_samples is the number of samples to be used for quantization. As stated above, it is recommended that this is set to at least 1024.
--quant_normalization is the normalization used in the model. This is baked into the model at conversion time, so you do not need to normalize the images before inference using the output model. Currently supported are unsigned [0-1] and signed [-1, 1].
The next three arguments are used to modify the model architecture.
--default_shape This will be a comma-delimited list of the input shape in NHWC format. Most ONNX models are in NCHW format (1,3,640,640), but will be converted to NHWC in the Hailo model. This argument is required currently.
--input_names This comma-delimited list of node names can be used to trim the model to modify the inputs to be these nodes, while trimming out all previous nodes in the model graph.
--output_names This comma-delimited list of node names can be used to trim the model to modify the outputs to be these nodes, while trimming out all nodes deeper in the model graph. If this is left out when converting a detection model, the converter will exit and recommend what layers to use as the outputs if the model does not have them as outputs.
The rest of the arguments will only be used in niche conversions or awaiting bug fixes.
--iou_threshold sets the IOU threshold for NMS when --include_nms is used. Due to a bug in the DataFlow Compiler library, this does not change the default value in the model. This value can be set at runtime in the pipeline and that is the recommended method of setting the IOU threshold. Please visit Hailo Inference for an example.
--score_threshold sets the score threshold for NMS when --include_nms is used. This value can be set at runtime in the pipeline and that is the recommended method of setting the score threshold. Please visit Hailo Inference for an example.
--int16 Currently unsupported, collaboration with Hailo is being done to make this available for use in conversion. This argument will tell the converter to generate int16 layers instead of int8 layers that it would otherwise.
--batch_size is the batch size used for quantization, the higher the value, the quicker the quantization, but it will increase the strain on your GPU potentially causing an Out of Memory Error. The main use for this argument is to lower the batch size to 1 if necessitated by the GPU performing conversion.
--model_arch is used to determine what version of NMS to append to the model when --include_nms is used. This is primarily used for YOLO models. The current default is yolov5 which is the NMS used for YOLOv5 and YOLOv7.
--opt_level determines the optimization level of quantization to be performed. This is set to 2 which performs proper quantization on most models. Each level is described as follows.
- 0 - Equalization
- 1 - Equalization + Iterative bias correction
- 2 - Equalization + Finetune with 4 epochs & 1024 images
- 3 - Equalization + Adaround with 320 epochs & 256 images on all layers
- 4 - Equalization + Adaround with 320 epochs & 1024 images on all layers
--comp_level determines the compression to be done on the model. This is recommended to be left at 0. Each level is described as follows.
- 0 - nothing is applied
- 1 - auto 4bit is set to 0.2 if network is large enough (20% of the weights)
- 2 - auto 4bit is set to 0.4 if network is large enough (40% of the weights)
- 3 - auto 4bit is set to 0.6 if network is large enough (60% of the weights)
- 4 - auto 4bit is set to 0.8 if network is large enough (80% of the weights)
- 5 - auto 4bit is set to 1.0 if network is large enough (100% of the weights)
For a complete example of a conversion performed using a YOLOv7 model, please visit Hailo QuickStart Guide.
Comments
0 comments
Please sign in to leave a comment.