Model Compilation
Alpha
Core operators (detection, classification, pose, segmentation, tracking) are stable. Cascade (op.foreach, op.croproi) and streaming APIs are still in development.
The Voyager SDK compiler quantizes your model to mixed-precision and compiles it for the Metis
AIPU, producing an .axm file. This page covers two paths: through a supported third-party integration (e.g.
Ultralytics, where compilation is handled for you) or directly via the compiler
API for any ONNX or PyTorch model.
Ultralytics Integration
If your model is trained with Ultralytics, a single call handles quantization and compilation:
from ultralytics import YOLO
model = YOLO("yolo11n.pt")
model.export(format="axelera")
# Output: yolo11n_axelera_model/yolo11n.axm
The output directory yolo11n_axelera_model/ contains the .axm file ready for
op.load().
To validate accuracy on the AIPU vs the original model, use yolo val:
yolo val model=yolo11n_axelera_model format=axelera
For full details, see the Ultralytics Axelera integration guide.
What Happens Under the Hood
This shows what the Ultralytics integration does internally — useful if you're curious how the exporter works, or if you want to build a similar integration for another framework.
The Ultralytics exporter calls the same compiler API you can use directly:
from axelera import compiler
from axelera.compiler import CompilerConfig
from axelera.compiler.config.model_specific import extract_ultralytics_metadata
# 1. Extract Ultralytics-specific metadata (task type, class names, keypoint shape, etc.)
# Generic models: skip this step and omit model_metadata from CompilerConfig.
metadata = extract_ultralytics_metadata(model)
# 2. Configure the compiler
config = CompilerConfig(
model_metadata=metadata,
aipu_cores_used=1,
output_axm_format=True,
model_name=model_name,
)
# 3. Quantize and compile
qmodel = compiler.quantize(
model="yolo11n.onnx",
calibration_dataset=calibration_images(),
config=config,
transform_fn=transform_fn,
)
compiler.compile(model=qmodel, config=config, output_dir=export_path)
The exporter does two things your own code needs to handle manually:
-
model_metadata-- Ultralytics-specific. The exporter extracts task type, class names, and keypoint shape and embeds them in the.axm; at runtime,op.load()reads this to auto-select optimized C++ postprocessing. For generic models, omitmodel_metadataand wire up your own postprocessing pipeline instead. -
Auto-tuned CompilerConfig -- the exporter picks the right
resources_used,quantization_scheme,tiling_depth, etc. for each model architecture. When you compile your own model, you set these yourself (sensible defaults work for most models).
Generic Path: From ONNX or PyTorch
For models not trained with Ultralytics, use the compiler API directly.
Required settings for the pipeline builder:
output_axm_format=Truemust be set so the compiler produces an.axmfile thatop.load()can consume. The pipeline builder supportsaipu_cores_used=1only.
From an ONNX Model
from pathlib import Path
from axelera import compiler
from axelera.compiler import CompilerConfig
config = CompilerConfig(
model_name="my_detector",
aipu_cores_used=1,
resources_used=0.25, # fraction of device memory to use
output_axm_format=True, # required for op.load()
)
# Provide a calibration dataset: an iterator yielding numpy arrays
# matching the model's input shape and dtype (typically float32 NCHW)
def calibration_data():
for path in Path("calibration_images/").glob("*.jpg"):
img = preprocess(path) # your preprocessing: resize, normalize, etc.
yield img
qmodel = compiler.quantize(
model="model.onnx",
calibration_dataset=calibration_data(),
config=config,
)
compiler.compile(
model=qmodel,
config=config,
output_dir=Path("compiled_output/"),
)
For model-specific tuning (e.g., quantization_scheme, tiling_depth), see the
Full Compiler Config Reference.
From a PyTorch Model with DataLoader
from pathlib import Path
from torch.utils.data import DataLoader
from axelera import compiler
from axelera.compiler import CompilerConfig
config = CompilerConfig(
model_name="my_classifier",
aipu_cores_used=1,
resources_used=0.25,
output_axm_format=True, # required for op.load()
)
loader = DataLoader(my_dataset, batch_size=1)
def extract_images(batch):
images, labels = batch
return images
qmodel = compiler.quantize(
model=torch_model,
calibration_dataset=loader,
config=config,
transform_fn=extract_images,
)
compiler.compile(
model=qmodel,
config=config,
output_dir=Path("compiled_output/"),
)
Finding Your .axm
After compilation, the .axm file is named after model_name in your
CompilerConfig. The output_dir you pass to compiler.compile() receives
intermediate build artifacts (manifests, quantized graphs, etc.), while the
.axm is placed relative to it -- typically in the current working directory.
For example, with model_name="my_detector":
$ ls *.axm
my_detector.axm
The Ultralytics exporter automatically organizes output into
<model_name>_axelera_model/. When using the API directly, you handle
file placement yourself.
Once you have the .axm, see the Pipeline Overview
for how to build inference pipelines around it.
Validate Before Deploying
After quantization, you can run the quantized model on CPU to check accuracy before compiling for hardware:
import numpy as np
# Run a sample through the quantized model on CPU
sample = np.random.randn(1, 3, 640, 640).astype(np.float32)
output = qmodel(sample)
# Compare output against the original model to verify quantization quality
Next Steps
- Pipeline Overview — Build pipelines around your
.axmwith examples for detection, classification, pose, segmentation, and tracking - Compiler API Reference — Full API details for
compiler.quantize()andcompiler.compile() - Compiler Configuration Reference —
All
CompilerConfigoptions