axelera.runtime API

The low-level Python API for loading compiled models and running them directly on the AIPU. This gives you fine-grained control over device selection, buffer management, and multi-instance execution.

For most applications, the InferenceStream API is simpler and handles all of this automatically. Use axelera.runtime when you need direct tensor-level access to the hardware.

import axelera.runtime as axr

Typical usage

import numpy as np
import axelera.runtime as axr

with axr.Context() as ctx:
    devices = ctx.list_devices()
    conn = ctx.device_connect(devices[0])

    model = ctx.load_model("build/yolov8n-coco/yolov8n-coco/1/model.json")
    instance = conn.load_model_instance(model, aipu_cores=4, double_buffer=True)

    # Prepare inputs as numpy arrays
    inputs = [np.zeros(t.unpadded_shape, dtype=np.float32) for t in model.inputs()]
    outputs = [np.zeros(t.shape, dtype=np.int8) for t in model.outputs()]

    instance.run(inputs, outputs)

Context

The root object. Manages all devices and resources. Use as a context manager or call release() explicitly.

ctx = axr.Context()
# or:
with axr.Context() as ctx:
    ...

Methods

Method	Returns	Description
`list_devices()`	`list[DeviceInfo]`	Enumerate all connected Metis devices
`device_connect(device, num_sub_devices=1)`	`Connection`	Reserve a device (or sub-devices). Returns a `Connection` for loading models.
`load_model(path)`	`Model`	Load a compiled `model.json` file. The same `Model` can be loaded onto multiple connections.
`configure_device(device, **kwargs)`	`bool`	Apply a configuration property to a device. Returns `True` when complete, `False` if pending (poll with `device_ready()`).
`device_ready(device)`	`bool`	Check if a `configure_device()` call has completed.
`read_device_configuration(device)`	`dict[str, str]`	Read all current configuration properties.
`release()`	None	Release all objects. Called automatically when used as a context manager.

configure_device properties

Property	Default	Description
`clock_profile`	`800`	Device clock in MHz
`clock_profile_core_0` … `clock_profile_core_3`	`800`	Per-core clock in MHz
`mvm_utilisation_core_0` … `mvm_utilisation_core_3`	`100`	Per-core MVM utilization limit (%)

Connection

A reserved connection to a device (or sub-device group). Created by ctx.device_connect().

Methods

Method	Returns	Description
`load_model_instance(model, **kwargs)`	`ModelInstance`	Load a compiled model onto this connection for execution

load_model_instance kwargs

Property	Default	Description
`aipu_cores`	`0`	Number of AIPU cores / L2 resources. Set to the model's batch size.
`num_sub_devices`	`0`	Number of sub-devices. Set to the model's batch size.
`double_buffer`	`0`	Enable double-buffering for higher throughput
`input_dmabuf`	`0`	Inputs are DMA buffer file descriptors instead of numpy arrays
`output_dmabuf`	`0`	Outputs are DMA buffer file descriptors instead of numpy arrays
`device_profiling`	`0`	Enable device-side profiling
`host_profiling`	`0`	Enable host-side profiling
`elf_in_ddr`	`1`	True if model was compiled with `elf_in_ddr=True` (default)

Model

Represents a loaded compiled model. Created by ctx.load_model().

Methods

Method	Returns	Description
`inputs()`	`list[TensorInfo]`	Input tensor metadata
`outputs()`	`list[TensorInfo]`	Output tensor metadata

Properties

Property	Description
`preamble_graph`	Path to preamble ONNX file (host-executed prefix operations, if any)
`postamble_graph`	Path to postamble ONNX file (host-executed suffix operations, if any)
`input_tensor_layout`	Always `NHWC` in this version

ModelInstance

A model loaded onto a specific device connection. Created by conn.load_model_instance().

Methods

Method	Returns	Description
`run(inputs, outputs)`	None	Execute one inference step

inputs and outputs are lists of numpy arrays (or DMA buffer file descriptors if dmabuf mode is enabled). Shapes must match model.inputs() / model.outputs(). Raises an exception on failure.

TensorInfo

Describes one input or output tensor, including quantization and padding metadata.

Field	Type	Description
`shape`	tuple	Full tensor shape (including padding)
`unpadded_shape`	tuple	Shape without padding
`dtype`	`np.dtype`	Data type (default: `np.int8`)
`name`	str	Tensor name
`padding`	list of `(start, end)` tuples	Padding per dimension (numpy.pad format)
`scale`	float	Quantization scale
`zero_point`	int	Quantization zero-point
`size`	int	Size in bytes

Quantize an input:

t = model.inputs()[0]
src = np.zeros(t.unpadded_shape, dtype=np.float32)
quant = np.round((src / t.scale) + t.zero_point).clip(-128, 127).astype(np.int8)
padded = np.pad(quant, t.padding, constant_values=t.zero_point)

Dequantise an output:

t = model.outputs()[0]
out = np.zeros(t.shape, dtype=np.int8)
# (populate out from instance.run())
depadded = out[tuple(slice(b, -e if e else None) for b, e in t.padding)]
dequant = (depadded.astype(np.float32) - t.zero_point) * t.scale

DeviceInfo

Returned by ctx.list_devices().

Field	Type	Description
`name`	str	Device name, e.g. `'metis-0:3:0'`
`subdevice_count`	int	Number of AIPU cores (4 for Metis)
`board_type`	`BoardType`	`pcie`, `m2`, `devboard`, `sbc`, etc.
`firmware_version`	str	Running firmware version
`flashed_firmware_version`	str	Version stored in flash
`board_controller_firmware_version`	str	Board controller firmware version
`board_revision`	int	Hardware board revision number
`board_controller_board_type`	str	Board controller's reported board type
`max_memory`	int	Maximum device memory in bytes. (Not populated in current implementation — always 0)
`in_use`	bool	Whether the device is currently reserved by a process. (Not populated in current implementation)
`in_use_by`	str	Name of the process holding the device. (Not populated in current implementation)

Exceptions

Exception	Description
`axr.ConnectionError`	Failed to connect to device
`axr.DeviceInUse`	Device is reserved by another process
`axr.IncompatibleDevice`	Model and device are incompatible
`axr.InvalidArgument`	Invalid parameter passed to an API call
`axr.InvalidConfiguration`	Device configuration is invalid
`axr.InternalError`	SDK internal error
`axr.Pending`	Async operation not yet complete
`axr.UnknownError`	Unclassified error

Typical usage​

Context​

Methods​

configure_device properties​

Connection​

Methods​

load_model_instance kwargs​

Model​

Methods​

Properties​

ModelInstance​

Methods​

TensorInfo​

DeviceInfo​

Exceptions​

See also​

Typical usage

Context

Methods

configure_device properties

Connection

Methods

load_model_instance kwargs

Model

Methods

Properties

ModelInstance

Methods

TensorInfo

DeviceInfo

Exceptions

See also