Compiler Python API

The Python API for compiling custom models programmatically. Use this when you need to integrate compilation into a training or evaluation script.

from axelera import compiler
from axelera.compiler import CompilerConfig

Overview

Compilation is a two-step process:

Quantize — Convert an FP32 model to int8 using calibration data. Returns an AxeleraQuantizedModel that can run on CPU for accuracy validation.
Compile — Optimize the quantized model for Metis hardware and write deployment artifacts to disk.

config = CompilerConfig()

quantized = compiler.quantize(model="model.onnx", calibration_dataset=data, config=config)
compiler.compile(model=quantized, config=config, output_dir=Path("./compiled/"))

quantize()

compiler.quantize(
    model: Union[torch.nn.Module, onnx.ModelProto, str],
    calibration_dataset: Iterator,
    config: CompilerConfig,
    transform_fn: Optional[Callable] = None,
) -> AxeleraQuantizedModel

Quantizes a model to int8 using calibration data.

Parameters

Parameter	Type	Required	Description
`model`	`torch.nn.Module`, `onnx.ModelProto`, or `str`	Yes	PyTorch model, ONNX model object, or path to `.onnx` file
`calibration_dataset`	iterator	Yes	Yields samples used to determine quantization scales
`config`	`CompilerConfig`	Yes	Quantization and compilation settings
`transform_fn`	callable	No	Preprocessing function applied to each item from `calibration_dataset` before it reaches the model

Returns

AxeleraQuantizedModel — a quantized model that supports CPU inference for accuracy validation before hardware deployment.

How the quantized model works

The model is split into three parts:

Preamble (optional): Operations that cannot run on the AIPU, executed on CPU via ONNX Runtime before the core model
Core: The main int8 model, compiled for and executed on Metis
Postamble (optional): Remaining operations executed on CPU after the core model

You can call the returned model directly to validate accuracy on CPU:

output = quantized_model(input_tensor)

Internally this runs: preamble → quantize inputs → int8 core → dequantize outputs → postamble.

compile()

compiler.compile(
    model: AxeleraQuantizedModel,
    config: CompilerConfig,
    output_dir: Path,
) -> None

Compiles a quantized model for deployment on Metis hardware.

Parameters

Parameter	Type	Description
`model`	`AxeleraQuantizedModel`	Output of `quantize()`
`config`	`CompilerConfig`	Compilation settings (cores, resource allocation, etc.)
`output_dir`	`Path`	Directory where compiled artifacts are written

The compiled artifacts in output_dir are what you pass to inference.py or create_inference_stream(). See Model Formats for the output structure.

CompilerConfig

Controls both quantization and compilation behavior. All parameters are optional — defaults work for most models.

from axelera.compiler import CompilerConfig

config = CompilerConfig(
    ptq_scheme="per_tensor_histogram",  # Quantization scheme
    aipu_cores=4,                        # Number of AIPU cores (1–4)
    resources=1.0,                       # Memory fraction (0.0–1.0)
    save_error_artifact=True,            # Save artifacts on failure
)

Parameter	Default	Description
`ptq_scheme`	`"per_tensor_min_max"`	Quantization calibration scheme: `per_tensor_min_max` or `per_tensor_histogram`
`aipu_cores`	all available	Number of AIPU cores to target (1–4)
`resources`	`1.0`	Fraction of AIPU memory to use (reduce for multi-model scenarios)
`save_error_artifact`	`False`	Preserve intermediate files when compilation fails

Examples

Raw images with preprocessing

Use this when calibration data is a directory of image files.

import cv2
import numpy as np
import glob
from pathlib import Path
from axelera import compiler
from axelera.compiler import CompilerConfig

def calibration_images():
    for path in glob.glob("calib_images/*.jpg")[:100]:
        yield cv2.imread(path)   # uint8 BGR

def preprocess(img: np.ndarray) -> np.ndarray:
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = cv2.resize(img, (640, 640))
    img = img.astype(np.float32) / 255.0
    img = np.transpose(img, (2, 0, 1))   # HWC → CHW
    return np.expand_dims(img, 0)         # add batch dim

config = CompilerConfig(ptq_scheme="per_tensor_min_max")

quantized = compiler.quantize(
    model="yolo.onnx",
    calibration_dataset=calibration_images(),
    config=config,
    transform_fn=preprocess,
)

compiler.compile(quantized, config, Path("./compiled/"))

Pre-processed data iterator

When your pipeline already outputs model-ready tensors, omit transform_fn.

def preprocessed_data():
    for path in glob.glob("calib_images/*.jpg")[:100]:
        img = cv2.imread(path)
        img = cv2.resize(img, (640, 640)).astype(np.float32) / 255.0
        yield np.expand_dims(np.transpose(img, (2, 0, 1)), 0)

quantized = compiler.quantize(
    model="yolo.onnx",
    calibration_dataset=preprocessed_data(),
    config=CompilerConfig(),
    # no transform_fn needed
)

PyTorch DataLoader

Integrate with an existing PyTorch training pipeline.

from torch.utils.data import DataLoader
from torchvision import datasets, transforms

transform = transforms.Compose([
    transforms.Resize((640, 640)),
    transforms.ToTensor(),
])

dataset = datasets.ImageFolder("calibration_images/", transform=transform)
loader = DataLoader(dataset, batch_size=1, shuffle=False)

def extract_images(batch):
    images, labels = batch
    return images.numpy()

quantized = compiler.quantize(
    model=pytorch_model,   # torch.nn.Module
    calibration_dataset=loader,
    config=CompilerConfig(),
    transform_fn=extract_images,
)

Save and reload a quantized model

Separate the quantization and compilation steps — useful for validating accuracy before deploying.

# Step 1: quantize and save
quantized = compiler.quantize(model="resnet50.onnx", ...)
quantized.export("resnet50_quantized/")

# Step 2: validate on CPU (separate script / later session)
from axelera.compiler.quantized_model import AxeleraQuantizedModel

quantized = AxeleraQuantizedModel.load("resnet50_quantized/")

# Step 3: compile for hardware
config = CompilerConfig(aipu_cores=4, resources=1.0)
compiler.compile(quantized, config, Path("resnet50_compiled/"))

Overview​

quantize()​

Parameters​

Returns​

How the quantized model works​

compile()​

Parameters​

CompilerConfig​

Examples​

Raw images with preprocessing​

Pre-processed data iterator​

PyTorch DataLoader​

Save and reload a quantized model​

See also​

Overview

quantize()

Parameters

Returns

How the quantized model works

compile()

Parameters

CompilerConfig

Examples

Raw images with preprocessing

Pre-processed data iterator

PyTorch DataLoader

Save and reload a quantized model

See also