Skip to main content

axelera.runtime API

The low-level Python API for loading compiled models and running them directly on the AIPU. This gives you fine-grained control over device selection, buffer management, and multi-instance execution.

For most applications, the InferenceStream API is simpler and handles all of this automatically. Use axelera.runtime when you need direct tensor-level access to the hardware.

import axelera.runtime as axr

Typical usage

import numpy as np
import axelera.runtime as axr

with axr.Context() as ctx:
devices = ctx.list_devices()
conn = ctx.device_connect(devices[0])

model = ctx.load_model("build/yolov8n-coco/yolov8n-coco/1/model.json")
instance = conn.load_model_instance(model, aipu_cores=4, double_buffer=True)

# Prepare inputs as numpy arrays
inputs = [np.zeros(t.unpadded_shape, dtype=np.float32) for t in model.inputs()]
outputs = [np.zeros(t.shape, dtype=np.int8) for t in model.outputs()]

instance.run(inputs, outputs)

Context

The root object. Manages all devices and resources. Use as a context manager or call release() explicitly.

ctx = axr.Context()
# or:
with axr.Context() as ctx:
...

Methods

MethodReturnsDescription
list_devices()list[DeviceInfo]Enumerate all connected Metis devices
device_connect(device, num_sub_devices=1)ConnectionReserve a device (or sub-devices). Returns a Connection for loading models.
load_model(path)ModelLoad a compiled model.json file. The same Model can be loaded onto multiple connections.
configure_device(device, **kwargs)boolApply a configuration property to a device. Returns True when complete, False if pending (poll with device_ready()).
device_ready(device)boolCheck if a configure_device() call has completed.
read_device_configuration(device)dict[str, str]Read all current configuration properties.
release()NoneRelease all objects. Called automatically when used as a context manager.

configure_device properties

PropertyDefaultDescription
clock_profile800Device clock in MHz
clock_profile_core_0clock_profile_core_3800Per-core clock in MHz
mvm_utilisation_core_0mvm_utilisation_core_3100Per-core MVM utilization limit (%)

Connection

A reserved connection to a device (or sub-device group). Created by ctx.device_connect().

Methods

MethodReturnsDescription
load_model_instance(model, **kwargs)ModelInstanceLoad a compiled model onto this connection for execution

load_model_instance kwargs

PropertyDefaultDescription
aipu_cores0Number of AIPU cores / L2 resources. Set to the model's batch size.
num_sub_devices0Number of sub-devices. Set to the model's batch size.
double_buffer0Enable double-buffering for higher throughput
input_dmabuf0Inputs are DMA buffer file descriptors instead of numpy arrays
output_dmabuf0Outputs are DMA buffer file descriptors instead of numpy arrays
device_profiling0Enable device-side profiling
host_profiling0Enable host-side profiling
elf_in_ddr1True if model was compiled with elf_in_ddr=True (default)

Model

Represents a loaded compiled model. Created by ctx.load_model().

Methods

MethodReturnsDescription
inputs()list[TensorInfo]Input tensor metadata
outputs()list[TensorInfo]Output tensor metadata

Properties

PropertyDescription
preamble_graphPath to preamble ONNX file (host-executed prefix operations, if any)
postamble_graphPath to postamble ONNX file (host-executed suffix operations, if any)
input_tensor_layoutAlways NHWC in this version

ModelInstance

A model loaded onto a specific device connection. Created by conn.load_model_instance().

Methods

MethodReturnsDescription
run(inputs, outputs)NoneExecute one inference step

inputs and outputs are lists of numpy arrays (or DMA buffer file descriptors if dmabuf mode is enabled). Shapes must match model.inputs() / model.outputs(). Raises an exception on failure.


TensorInfo

Describes one input or output tensor, including quantization and padding metadata.

FieldTypeDescription
shapetupleFull tensor shape (including padding)
unpadded_shapetupleShape without padding
dtypenp.dtypeData type (default: np.int8)
namestrTensor name
paddinglist of (start, end) tuplesPadding per dimension (numpy.pad format)
scalefloatQuantization scale
zero_pointintQuantization zero-point
sizeintSize in bytes

Quantize an input:

t = model.inputs()[0]
src = np.zeros(t.unpadded_shape, dtype=np.float32)
quant = np.round((src / t.scale) + t.zero_point).clip(-128, 127).astype(np.int8)
padded = np.pad(quant, t.padding, constant_values=t.zero_point)

Dequantise an output:

t = model.outputs()[0]
out = np.zeros(t.shape, dtype=np.int8)
# (populate out from instance.run())
depadded = out[tuple(slice(b, -e if e else None) for b, e in t.padding)]
dequant = (depadded.astype(np.float32) - t.zero_point) * t.scale

DeviceInfo

Returned by ctx.list_devices().

FieldTypeDescription
namestrDevice name, e.g. 'metis-0:3:0'
subdevice_countintNumber of AIPU cores (4 for Metis)
board_typeBoardTypepcie, m2, devboard, sbc, etc.
firmware_versionstrRunning firmware version
flashed_firmware_versionstrVersion stored in flash
board_controller_firmware_versionstrBoard controller firmware version
board_revisionintHardware board revision number
board_controller_board_typestrBoard controller's reported board type
max_memoryintMaximum device memory in bytes. (Not populated in current implementation — always 0)
in_useboolWhether the device is currently reserved by a process. (Not populated in current implementation)
in_use_bystrName of the process holding the device. (Not populated in current implementation)

Exceptions

ExceptionDescription
axr.ConnectionErrorFailed to connect to device
axr.DeviceInUseDevice is reserved by another process
axr.IncompatibleDeviceModel and device are incompatible
axr.InvalidArgumentInvalid parameter passed to an API call
axr.InvalidConfigurationDevice configuration is invalid
axr.InternalErrorSDK internal error
axr.PendingAsync operation not yet complete
axr.UnknownErrorUnclassified error

See also