Skip to main content

Model Formats

When you run a model for the first time, the SDK compiles it for your Metis hardware. This page explains what that means, what files are involved, and what happens under the hood.


The three model representations

FormatFile extensionWhat it isWhere it comes from
ONNX.onnxStandard neural network formatExported from PyTorch, TensorFlow, etc.
Axelera model.axmodelCompiled binary for the Metis AIPUProduced by the SDK compiler
Pipeline descriptormodel.jsonDefines the full pipeline (pre-processing + model + post-processing)Part of the ax_models/ package

The SDK works with all three, depending on the operation:

  • Running inference → needs .axmodel (compiled)
  • Measuring accuracy → can use ONNX directly (CPU) or .axmodel (AIPU)
  • Customizing a pipeline → edit model.json

ONNX

ONNX (Open Neural Network Exchange) is an open standard for neural network models. Most ML frameworks — PyTorch, TensorFlow, JAX — can export models to ONNX format.

ONNX models describe the network architecture and weights in a portable format. They are not optimized for any specific hardware.

The SDK uses ONNX as the input to its compiler. If you want to deploy a custom model on Metis, you start with an ONNX export.

ONNX support

Not all ONNX operators are supported by the Metis AIPU. See the ONNX operator support reference for the full list.


Axelera model (.axmodel)

An .axmodel is the compiled, hardware-optimized version of an ONNX model. It runs only on Metis hardware.

The compiler does several things during this process:

  • Quantizes weights and activations to int8 (reducing memory and increasing speed)
  • Optimizes the computation graph for the AIPU's architecture
  • Fuses operations where possible (e.g. convolution + activation → single AIPU op)
  • Tiles the computation across the AIPU's processing cores

The compiled model runs significantly faster than the original ONNX on CPU, and uses less memory.

Where compiled models are stored

Compiled models land in build/<model-name>/:

build/
└── yolov5s-v7-coco/
└── yolov5s-v7-coco/
└── 1/
├── model.json # pipeline descriptor
├── model.axmodel # compiled AIPU binary
└── manifest.json # quantization parameters (scales, zero-points)

If this directory exists, the model runs immediately. If it doesn't, the SDK compiles on first run (takes 2–5 minutes).

Forcing recompilation

To recompile a model (e.g. after an SDK update):

rm -rf build/<model-name>
./inference.py <model-name> \<source\>

Pipeline descriptor (model.json)

model.json describes the complete inference pipeline for a model: input resolution, pre-processing steps, post-processing configuration, and where the compiled model lives.

This is what inference.py reads when you pass a model name. The ax_models/ directory contains model.json files for every model in the Model Zoo.

Example structure (simplified):

{
"name": "yolov5s-v7-coco",
"model": "yolov5s-v7-coco/1/model.axmodel",
"preprocessing": [
{ "type": "resize", "width": 640, "height": 640, "letterbox": true },
{ "type": "normalize", "mean": [0, 0, 0], "std": [1, 1, 1] }
],
"postprocessing": [
{ "type": "decode_yolov5", "classes": 80, "confidence_threshold": 0.25 },
{ "type": "nms", "iou_threshold": 0.45 }
]
}

You don't need to edit model.json for standard inference. You would edit it to:

  • Change the confidence threshold
  • Add or remove post-processing steps (e.g. add tracking)
  • Swap in a custom compiled model

manifest.json

manifest.json is generated by the compiler alongside model.axmodel. It stores quantization metadata — the scale and zero-point values needed by the dequantisation operators in the pipeline.

You won't typically edit this file. It's read automatically by operators like decode_yolov5 and transform_dequantize.


The compilation process

When a model hasn't been compiled yet, inference.py (and the accuracy measurement tools) run the compiler automatically:

$ ./inference.py yolov5s-v7-coco media/traffic.mp4

Compiling yolov5s-v7-coco for Metis...
[████████████████████] 100% (3m 42s)

Running pipeline...

The compiler (compile.py) can also be run directly for batch compilation or scripting:

python compile.py yolov5s-v7-coco

See Compiler CLI for full options.


The PyTorch pipeline

For debugging and CPU-only testing, you can run models without AIPU hardware using --pipe torch:

./inference.py yolov5s-v7-coco media/traffic.mp4 --pipe torch

In this mode, the ONNX model runs on CPU via ONNXRuntime or PyTorch. No compilation required, but performance is much lower. Useful for verifying model behavior before deploying to hardware.


See also