deploy.py

Compiles and deploys a model (or full pipeline) from a YAML definition file. Used when deploying custom weights or new model architectures.

./deploy.py <network-name-or-yaml-path>

After deployment, the compiled pipeline is cached in build/. Subsequent calls to inference.py with the same network name use the cached build without recompiling.

Output

Artifacts are written to build/<network-name>/<model-name>/:

build/
└── yolov8n-licenseplate/
    └── yolov8n-licenseplate/
        └── 1/
            ├── model.json         # pipeline descriptor
            ├── model.axmodel      # compiled AIPU binary
            └── manifest.json      # quantization parameters

See Model Formats for what each file contains.

Options

Option	Description
`--build-root \<path\>`	Write compiled output here instead of `build/`
`--data-root \<path\>`	Look for datasets here instead of `data/`
`--num-cal-images \<N\>`	Number of calibration images for quantization (default: 200; range: 100–400)
`--aipu-cores \<N\>`	Target N AIPU cores (1–4). Only valid for single-model networks.
`--pipe \<type\>`	Pipeline type: `gst` (default, AIPU), `torch` (CPU/ONNX), `torch-aipu` (Python + AIPU)
`--mode \<mode\>`	Deployment mode (see below)
`--model \<model\>`	Compile only the named model within the network; skip pipeline deployment
`--models-only`	Compile all models in the network; skip pipeline deployment
`--pipeline-only`	Deploy the pipeline only; skip model compilation (uses pre-compiled models)
`--export`	Produce a zip archive. For `QUANTIZE` mode: `\<model\>-prequantized.zip`. For other modes: `\<model\>.zip`. Saved to `exported/`.

--mode values

Mode	Description
`PREQUANTIZED`	(default) Use a pre-quantized model if available; otherwise quantize then compile
`QUANTIZE`	Quantize only — do not compile. Outputs `quantized_model_manifest.json`.
`QUANTCOMPILE`	Quantize then compile in one step

Examples

Deploy a custom YAML:

./deploy.py customers/mymodels/yolov8n-licenseplate.yaml

Increase calibration images for better quantization accuracy:

./deploy.py customers/mymodels/yolov8n-licenseplate.yaml --num-cal-images 400

Quantize only (inspect before committing to full compile):

./deploy.py customers/mymodels/yolov8n-licenseplate.yaml --mode QUANTIZE

All options:

./deploy.py --help

What happens during deployment

When you run ./deploy.py <network-name>, the following steps execute:

1. Calibration data preparation

The YAML preprocess: section (Resize, Normalize, etc.) is applied to calibration images from the dataset (default: 200 images). These preprocessed images are used for the quantization step.

preprocess:
  - letterbox:
      width: 640
      height: 640
  - torch-totensor:
  - normalize:
      mean: [0.485, 0.456, 0.406]
      std: [0.229, 0.224, 0.225]

note

YAML preprocessing is used only during deployment to prepare calibration data. It is not part of the runtime inference pipeline — the GStreamer operators handle runtime pre-processing independently.

2. Compilation

The compiler takes the ONNX model plus calibration data and produces:

Quantized model — weights converted from FP32 to INT8
AIPU binary (.axmodel) — optimized for Metis hardware
Preamble/postamble — CPU-side transforms extracted by the compiler
Manifest — quantization parameters for runtime dequantisation

3. Pipeline deployment

The compiled model is wrapped into a GStreamer pipeline descriptor (model.json) that the runtime can load directly.

Output​

Options​

--mode values​

Examples​

What happens during deployment​

1. Calibration data preparation​

2. Compilation​

3. Pipeline deployment​

See also​