Skip to main content
Version: v1.7

deploy.py

Compiles and deploys a model (or full pipeline) from a YAML definition file. Used when deploying custom weights or new model architectures.

./deploy.py <network-name-or-yaml-path>

After deployment, the compiled pipeline is cached in build/. Subsequent calls to inference.py with the same network name use the cached build without recompiling.


Output

Artifacts are written to build/<network-name>/<model-name>/:

build/
└── yolov8n-licenseplate/
└── yolov8n-licenseplate/
└── 1/
├── model.json # pipeline descriptor
├── model.axmodel # compiled AIPU binary
└── manifest.json # quantization parameters

See Model Formats for what each file contains.


Options

OptionDescription
--build-root <path>Write compiled output here instead of build/
--data-root <path>Look for datasets here instead of data/
--num-cal-images <N>Number of calibration images for quantization (default: 200; range: 100–400)
--cal-seed <N>Seed an RNG that shuffles the calibration image order. Omit for the default sorted-deterministic order; set a seed to sweep different orders for accuracy ablation. See Reproducibility.
--aipu-cores <N>Target N AIPU cores (1–4). Only valid for single-model networks.
--pipe <type>Pipeline type: gst (default, AIPU), torch (CPU/ONNX), torch-aipu (Python + AIPU)
--mode <mode>Deployment mode (see below)
--model <model>Compile only the named model within the network; skip pipeline deployment
--models-onlyCompile all models in the network; skip pipeline deployment
--pipeline-onlyDeploy the pipeline only; skip model compilation (uses pre-compiled models)
--exportProduce a zip archive. For QUANTIZE mode: <model>-prequantized.zip. For other modes: <model>.zip. Saved to exported/.

--mode values

ModeDescription
PREQUANTIZED(default) Use a pre-quantized model if available; otherwise quantize then compile
QUANTIZEQuantize only — do not compile. Outputs quantized_model_manifest.json.
QUANTCOMPILEQuantize then compile in one step

Examples

Deploy a custom YAML:

./deploy.py customers/mymodels/yolov8n-licenseplate.yaml

Increase calibration images for better quantization accuracy:

./deploy.py customers/mymodels/yolov8n-licenseplate.yaml --num-cal-images 400

Quantize only (inspect before committing to full compile):

./deploy.py customers/mymodels/yolov8n-licenseplate.yaml --mode QUANTIZE

All options:

./deploy.py --help

What happens during deployment

When you run ./deploy.py <network-name>, the following steps execute:

1. Calibration data preparation

The YAML preprocess: section (Resize, Normalize, etc.) is applied to calibration images from the dataset (default: 200 images). These preprocessed images are used for the quantization step.

preprocess:
- letterbox:
width: 640
height: 640
- torch-totensor:
- normalize:
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
note

YAML preprocessing is used only during deployment to prepare calibration data. It is not part of the runtime inference pipeline — the GStreamer operators handle runtime pre-processing independently.

2. Compilation

The compiler takes the ONNX model plus calibration data and produces:

  • Quantized model — weights converted from FP32 to INT8
  • AIPU binary (.axmodel) — optimized for Metis hardware
  • Preamble/postamble — CPU-side transforms extracted by the compiler
  • Manifest — quantization parameters for runtime dequantisation

3. Pipeline deployment

The compiled model is wrapped into a GStreamer pipeline descriptor (model.json) that the runtime can load directly.


Reproducibility

deploy.py is bit-identical-reproducible by default. Two back-to-back runs against the same network and the same calibration image directory produce the same quantized_model.pt (sha256-equal). The behaviour is controlled by two knobs:

KnobBehaviour
(omit --cal-seed)Default. Calibration images are read in lexicographically sorted filename order with shuffle=False. Same inputs → same artifact every run.
--cal-seed <N>The image order is shuffled by a torch RNG seeded with <N>. Two runs with the same <N> produce the same artifact; different <N> produce different artifacts. Use this to sweep calibration orders during accuracy experiments.

Hash determinism (advanced)

deploy.py does not by itself control Python's per-process hash randomisation (used in dict / set iteration). The cli ignores PYTHONHASHSEED assigned at runtime because the interpreter has already started by then. If you need hash-order determinism across runs (rare; the calibration path is already covered by sorted-glob + explicit shuffle), export the seed in the shell before invoking deploy.py:

PYTHONHASHSEED=0 ./deploy.py <network>

Validating reproducibility

To confirm your environment produces identical artifacts:

./deploy.py <network> --mode QUANTIZE --build-root /tmp/runA
./deploy.py <network> --mode QUANTIZE --build-root /tmp/runB
sha256sum /tmp/run{A,B}/<network>/<network>/quantized/quantized_model.pt
# both hashes must match

See also