Skip to main content

ONNX Operator Support

Which ONNX operators the Metis AIPU supports natively, and which run on the host CPU.

Auto-generated tables

The detailed per-operator constraint tables (attribute restrictions, dtype limits) are auto-generated and live in the SDK at docs/reference/onnx-opset{14,15,16,17}-support.md. This page provides the consolidated summary.


Support levels

LevelMeaning
SupportedFully accelerated on the AIPU with no restrictions
ConstrainedAccelerated on the AIPU for specific configurations (see per-opset tables for attribute/dtype restrictions)
Not supportedFalls back to host CPU via ONNX Runtime preamble/postamble

A "Constrained" operator is still hardware-accelerated — it just has attribute limits (e.g. specific kernel sizes, data types, padding modes). If your model uses a constrained operator outside its supported configuration, that layer falls back to CPU.


Supported operators (opsets 14–17)

The following operators are supported or constrained across opsets 14–17. Operators added in later opsets are noted.

OperatorOpset 14Opset 15Opset 16Opset 17
AddConstrainedConstrainedConstrainedConstrained
AveragePoolConstrainedConstrainedConstrainedConstrained
BatchNormalizationSupportedSupportedSupportedSupported
ClipConstrainedConstrainedConstrainedConstrained
ConcatConstrainedConstrainedConstrainedConstrained
ConvConstrainedConstrainedConstrainedConstrained
ConvTransposeConstrainedConstrainedConstrainedConstrained
FlattenConstrainedConstrainedConstrainedConstrained
GemmConstrainedConstrainedConstrainedConstrained
GlobalAveragePoolSupportedSupportedSupportedSupported
GlobalMaxPoolSupportedSupportedSupportedSupported
HardSigmoidConstrainedConstrainedConstrainedConstrained
HardSwishSupportedSupportedSupportedSupported
LeakyReluSupportedSupportedSupportedSupported
MatMulConstrainedConstrainedConstrainedConstrained
MaxPoolConstrainedConstrainedConstrainedConstrained
MulConstrainedConstrainedConstrainedConstrained
PReluConstrainedConstrainedConstrainedConstrained
PadConstrainedConstrainedConstrainedConstrained
ReluSupportedSupportedSupportedSupported
ReshapeConstrainedConstrainedConstrainedConstrained
ResizeConstrainedConstrainedConstrainedConstrained
SeluConstrainedConstrainedConstrainedConstrained
SigmoidSupportedSupportedSupportedSupported
SliceConstrainedConstrainedConstrainedConstrained
SoftmaxConstrainedConstrainedConstrainedConstrained
SplitConstrainedConstrainedConstrainedConstrained
SqueezeConstrainedConstrainedConstrainedConstrained
SubConstrainedConstrainedConstrainedConstrained
TanhSupportedSupportedSupportedSupported
TransposeConstrainedConstrainedConstrainedConstrained
UnsqueezeConstrainedConstrainedConstrained
GeluConstrainedConstrained
GroupNormalizationConstrained
LayerNormalizationConstrained
LSTMConstrained
MishSupported
NegativeLogLikelihoodLossConstrained

Operators that fall back to CPU

Any operator not in the table above will be executed on the host CPU using ONNX Runtime, as part of the model's preamble or postamble sections. The compiler handles this automatically — you don't need to manually split the model.

Common CPU-fallback scenarios:

  • Non-standard activation functions (e.g. Erf, Gelu in opset < 16)
  • Dynamic shape operations
  • String operations, sequence ops
  • Operators with data types not supported by the AIPU (e.g. FP32 in the core model path)

Checking your model

The compiler will report which operators are accelerated and which fall back to CPU in the compilation output. An unsupported operator does not prevent compilation — it just means that layer runs on the host.

If a significant portion of your model falls back to CPU, the performance gap between AIPU and CPU inference narrows. Use --pipe=torch and --pipe=torch-aipu to measure how much of your model's compute is AIPU-accelerated:

# CPU baseline
./inference.py my-model dataset --no-display --pipe=torch

# AIPU with Python pipeline
./inference.py my-model dataset --no-display --pipe=torch-aipu

# Full GStreamer + AIPU (production)
./inference.py my-model dataset --no-display

The compiler defaults to opset 17 for PyTorch-to-ONNX export:

config = CompilerConfig(onnx_opset_version=17)

Opset 17 has the broadest operator support including LayerNormalization, GroupNormalization, and LSTM on-AIPU.


See also