Model Formats
When you run a model for the first time, the SDK compiles it for your Metis hardware. This page explains what that means, what files are involved, and what happens under the hood.
The three model representations
| Format | File extension | What it is | Where it comes from |
|---|---|---|---|
| ONNX | .onnx | Standard neural network format | Exported from PyTorch, TensorFlow, etc. |
| Axelera model | .axmodel | Compiled binary for the Metis AIPU | Produced by the SDK compiler |
| Pipeline descriptor | model.json | Defines the full pipeline (pre-processing + model + post-processing) | Part of the ax_models/ package |
The SDK works with all three, depending on the operation:
- Running inference → needs
.axmodel(compiled) - Measuring accuracy → can use ONNX directly (CPU) or
.axmodel(AIPU) - Customizing a pipeline → edit
model.json
ONNX
ONNX (Open Neural Network Exchange) is an open standard for neural network models. Most ML frameworks — PyTorch, TensorFlow, JAX — can export models to ONNX format.
ONNX models describe the network architecture and weights in a portable format. They are not optimized for any specific hardware.
The SDK uses ONNX as the input to its compiler. If you want to deploy a custom model on Metis, you start with an ONNX export.
Not all ONNX operators are supported by the Metis AIPU. See the ONNX operator support reference for the full list.
Axelera model (.axmodel)
An .axmodel is the compiled, hardware-optimized version of an ONNX model. It runs only on Metis hardware.
The compiler does several things during this process:
- Quantizes weights and activations to int8 (reducing memory and increasing speed)
- Optimizes the computation graph for the AIPU's architecture
- Fuses operations where possible (e.g. convolution + activation → single AIPU op)
- Tiles the computation across the AIPU's processing cores
The compiled model runs significantly faster than the original ONNX on CPU, and uses less memory.
Where compiled models are stored
Compiled models land in build/<model-name>/:
build/
└── yolov5s-v7-coco/
└── yolov5s-v7-coco/
└── 1/
├── model.json # pipeline descriptor
├── model.axmodel # compiled AIPU binary
└── manifest.json # quantization parameters (scales, zero-points)
If this directory exists, the model runs immediately. If it doesn't, the SDK compiles on first run (takes 2–5 minutes).
Forcing recompilation
To recompile a model (e.g. after an SDK update):
rm -rf build/<model-name>
./inference.py <model-name> \<source\>
Pipeline descriptor (model.json)
model.json describes the complete inference pipeline for a model: input resolution, pre-processing steps, post-processing configuration, and where the compiled model lives.
This is what inference.py reads when you pass a model name. The ax_models/ directory contains model.json files for every model in the Model Zoo.
Example structure (simplified):
{
"name": "yolov5s-v7-coco",
"model": "yolov5s-v7-coco/1/model.axmodel",
"preprocessing": [
{ "type": "resize", "width": 640, "height": 640, "letterbox": true },
{ "type": "normalize", "mean": [0, 0, 0], "std": [1, 1, 1] }
],
"postprocessing": [
{ "type": "decode_yolov5", "classes": 80, "confidence_threshold": 0.25 },
{ "type": "nms", "iou_threshold": 0.45 }
]
}
You don't need to edit model.json for standard inference. You would edit it to:
- Change the confidence threshold
- Add or remove post-processing steps (e.g. add tracking)
- Swap in a custom compiled model
manifest.json
manifest.json is generated by the compiler alongside model.axmodel. It stores quantization metadata — the scale and zero-point values needed by the dequantisation operators in the pipeline.
You won't typically edit this file. It's read automatically by operators like decode_yolov5 and transform_dequantize.
The compilation process
When a model hasn't been compiled yet, inference.py (and the accuracy measurement tools) run the compiler automatically:
$ ./inference.py yolov5s-v7-coco media/traffic.mp4
Compiling yolov5s-v7-coco for Metis...
[████████████████████] 100% (3m 42s)
Running pipeline...
The compiler (compile.py) can also be run directly for batch compilation or scripting:
python compile.py yolov5s-v7-coco
See Compiler CLI for full options.
The PyTorch pipeline
For debugging and CPU-only testing, you can run models without AIPU hardware using --pipe torch:
./inference.py yolov5s-v7-coco media/traffic.mp4 --pipe torch
In this mode, the ONNX model runs on CPU via ONNXRuntime or PyTorch. No compilation required, but performance is much lower. Useful for verifying model behavior before deploying to hardware.
See also
- First Inference — run your first compiled model
- Model Zoo — all available pre-compiled models
- Pipelines — How Inference Works — how model.json fits into the pipeline
- GStreamer Operators — the operators that appear in pipeline.json
- Glossary: compilation — short definition