In this article we present a basic example to showcase ModelPack conversion and quantization. Conversion will ensure our final model runs properly on the selected runtime environment. However, quantization refers to operations within the model running in a low bit representation (uint8, int8, etc.). Quantization is extremely important for saving energy and takes advantage of custom operations that are highly optimized to work in embedded hardware (NPU could be an example).
Model Conversion and Quantization
To convert or quantize the model we will use a docker container that encapsulates deepview-converter. This tool is capable of handling both conversion and quantization in a single step from the command line. The docker container is shared under the name deepview/converter:latest and is available on the DeepView Converter Page.
To convert the model we need to call deepview/converter container in the following way:
docker run -it \
-v /data/checkpoints/<time-stamp>/:/models \
The first thing the container needs is access to the checkpoints folder and that is why we mounted it into /models. To perform a simple conversion (float model), we only need to pass into the command line to the converter the source model and the destination model. In this case they are referenced as /modesl/last.h5 and /models/last.rtm. Converter will create last.rtm within the checkpoint folder.
Post-Training quantization is a process that maps float32 values into int8 values. This way the model has a smaller size and hence consumes less memory. To do that we need to call the container in the following way.
docker run -it \
-v /data/outputs/checkpoints/<time-stamp>/:/models \
-v /data/datasets/sagemaker/sugartbeet-stage2-v1:/images \
--quant_normalization unsigned \
--input_type uint8 \
--output_type int8 \
--default_shape 1,270,480,3 \
--samples /images \
--num_samples 100 \
--labels /models/labels.txt \
- -v /data/checkpoints/<time-stamp>/:/models: Mounting point for accessing the models from container
- -v /data/datasets/raw-sagemaker-dataset:/images: Mounting point for accessing the images from the dataset
- --quantize: Performs quantization
- --quant_normalization unsigned: Normalization function used to train the model. unsigned is mandatory.
- --input_type uint8: Input type uint8 allows the images to go direct from the recording sensor to the model during inference without any preprocessing. Allowed types are int8, uint8 and float32
- --output_type int8: Defines the output type of the model. In this case it is int8 since TensorFlow quantizes models internally to int8. This output type avoids dequantization operations at the end of the model. Allowed types are int8, uint8 and float32
- --default_shape 1,270,480,3: Input type used for training the model. Within the checkpoint folder is a JSON file that contains the parameters used to train the model and it includes the input dimension. Note: Batch dimension should be appended at the beginning.
- --samples /images: Internal path to the images folder.
- --num_samples 100: Number of calibration samples used to quantize the model
- --labels /models/labels.txt: A text file with the labels for the model. This file is automatically created by the trainer and saved in the models directory.
- --quantize-tensor: If we want to be faster on some devices like NPU, then we can make it quantized per tensor. Per-Tensor quantization means a single quantization parameter per each tensor in the model.
- /models/last.h5: Checkpoint model
- /models/last_uint8_int8_tensor.rtm: Output quantized and converted model.
Note: Always leave the source model and destination model as the last two parameters.
In this article we have explained how to use a docker container to convert and quantize a model using deepview-converter.
|DeepView Evaluation SDK||ModelPack Overview|