inference.py
The SDK's command-line tool for running models on Metis hardware. It handles compilation, execution, display, and performance reporting.
Basic usage
./inference.py <model-name> \<source\> [options]
| Argument | What it is | Example |
|---|---|---|
<model-name> | A model from the Model Zoo | yolov5s-v7-coco |
\<source\> | Where the video comes from (see Video Sources) | usb:0, media/traffic1_1080p.mp4 |
Examples
Run object detection on a USB camera:
./inference.py yolov5s-v7-coco usb:0
Run classification on a video file with no display:
./inference.py resnet50-imagenet media/traffic1_1080p.mp4 --no-display
Measure accuracy against a validation dataset:
./inference.py yolov5s-v7-coco dataset --no-display
Run on multiple sources simultaneously:
./inference.py yolov8s-coco-onnx usb:0 usb:1 media/traffic1_1080p.mp4
What happens when you run it
- First run only: The pipeline compiler builds the model for your hardware. This takes a few minutes and shows a progress bar. The result is cached.
- Every run: The pipeline starts — pre-processing, inference on the AIPU, post-processing.
- Display: A window shows the video with results overlaid (bounding boxes for detection, labels for classification) and performance metrics.
- On completion: Average throughput and CPU usage are printed to the terminal.
If you pass a single image instead of a video, the window stays open until you press q. No end-of-run summary is printed.
Options
Display
| Option | What it does |
|---|---|
--no-display | Run headless. No window, just terminal output. Use for benchmarking or remote sessions. |
--display opengl | Force OpenGL renderer (default if available, most efficient) |
--display opencv | Force OpenCV renderer (slower, works on more systems) |
--display console | Render to terminal using ANSI colors. Useful over SSH. |
--window-size WxH | Set window size, e.g., --window-size 1920x1080 |
--window-size fullscreen | Fullscreen display |
Performance
| Option | What it does |
|---|---|
--frames N | Stop after N frames (across all sources). Default: run all frames. |
--aipu-cores N | Use N AIPU cores (default: all available, typically 4). Useful for testing multi-model scenarios. |
--show-host-fps | Display host-specific FPS alongside the default metrics. |
--show-stream-timing | Show latency and jitter information during the run. |
Pipeline
| Option | What it does |
|---|---|
--pipe gst | Use GStreamer pipeline (default). Runs on AIPU. |
--pipe torch | Use PyTorch pipeline with ONNXRuntime. Runs on CPU. |
--pipe torch-aipu | Use PyTorch pipeline with the model offloaded to AIPU. |
The default gst pipeline is almost always what you want. The torch option is useful for comparing AIPU results against CPU-only execution.
Hardware acceleration
| Option | What it does |
|---|---|
--enable-hardware-codec | Prefer hardware video decoding. Default uses software decoding (better pipeline performance on most systems). |
--enable-vaapi / --disable-vaapi | Control Intel VA-API acceleration for pre-processing. Auto-detected by default. |
--enable-opencl / --disable-opencl | Control OpenCL acceleration for pre-processing. Auto-detected by default. |
--enable-opengl / --disable-opengl | Control OpenGL for rendering. Auto-detected by default. |
Output
| Option | What it does |
|---|---|
--save-output path.mp4 | Save the rendered output to an MP4 file. |
--save-output output%02d.mp4 | Save multiple streams separately (e.g., output00.mp4, output01.mp4). |
When saving output, all frames must be rendered. This may reduce system FPS as the pipeline waits for video encoding.
Output metrics
When a run completes, inference.py reports:
| Metric | What it means |
|---|---|
| System FPS | End-to-end throughput including all processing and display |
| Device FPS | Raw AIPU throughput (what the hardware can do) |
| CPU usage | How much host CPU the pipeline consumes |
| mAP | Accuracy score (only when using dataset source) |
Full option list
For all available options:
./inference.py --help