inference.py

The SDK's command-line tool for running models on Metis hardware. It handles compilation, execution, display, and performance reporting.

Basic usage

./inference.py <model-name> \<source\> [options]

Argument	What it is	Example
`<model-name>`	A model from the Model Zoo	`yolov5s-v7-coco`
`\<source\>`	Where the video comes from (see Video Sources)	`usb:0`, `media/traffic1_1080p.mp4`

Examples

Run object detection on a USB camera:

./inference.py yolov5s-v7-coco usb:0

Run classification on a video file with no display:

./inference.py resnet50-imagenet media/traffic1_1080p.mp4 --no-display

Measure accuracy against a validation dataset:

./inference.py yolov5s-v7-coco dataset --no-display

Run on multiple sources simultaneously:

./inference.py yolov8s-coco-onnx usb:0 usb:1 media/traffic1_1080p.mp4

What happens when you run it

First run only: The pipeline compiler builds the model for your hardware. This takes a few minutes and shows a progress bar. The result is cached.
Every run: The pipeline starts — pre-processing, inference on the AIPU, post-processing.
Display: A window shows the video with results overlaid (bounding boxes for detection, labels for classification) and performance metrics.
On completion: Average throughput and CPU usage are printed to the terminal.

Single images

If you pass a single image instead of a video, the window stays open until you press q. No end-of-run summary is printed.

Options

Display

Option	What it does
`--no-display`	Run headless. No window, just terminal output. Use for benchmarking or remote sessions.
`--display opengl`	Force OpenGL renderer (default if available, most efficient)
`--display opencv`	Force OpenCV renderer (slower, works on more systems)
`--display console`	Render to terminal using ANSI colors. Useful over SSH.
`--window-size WxH`	Set window size, e.g., `--window-size 1920x1080`
`--window-size fullscreen`	Fullscreen display

Performance

Option	What it does
`--frames N`	Stop after N frames (across all sources). Default: run all frames.
`--aipu-cores N`	Use N AIPU cores (default: all available, typically 4). Useful for testing multi-model scenarios.
`--show-host-fps`	Display host-specific FPS alongside the default metrics.
`--show-stream-timing`	Show latency and jitter information during the run.

Pipeline

Option	What it does
`--pipe gst`	Use GStreamer pipeline (default). Runs on AIPU.
`--pipe torch`	Use PyTorch pipeline with ONNXRuntime. Runs on CPU.
`--pipe torch-aipu`	Use PyTorch pipeline with the model offloaded to AIPU.

tip

The default gst pipeline is almost always what you want. The torch option is useful for comparing AIPU results against CPU-only execution.

Hardware acceleration

Option	What it does
`--enable-hardware-codec`	Prefer hardware video decoding. Default uses software decoding (better pipeline performance on most systems).
`--enable-vaapi` / `--disable-vaapi`	Control Intel VA-API acceleration for pre-processing. Auto-detected by default.
`--enable-opencl` / `--disable-opencl`	Control OpenCL acceleration for pre-processing. Auto-detected by default.
`--enable-opengl` / `--disable-opengl`	Control OpenGL for rendering. Auto-detected by default.

Output

Option	What it does
`--save-output path.mp4`	Save the rendered output to an MP4 file.
`--save-output output%02d.mp4`	Save multiple streams separately (e.g., `output00.mp4`, `output01.mp4`).

note

When saving output, all frames must be rendered. This may reduce system FPS as the pipeline waits for video encoding.

Output metrics

When a run completes, inference.py reports:

Metric	What it means
System FPS	End-to-end throughput including all processing and display
Device FPS	Raw AIPU throughput (what the hardware can do)
CPU usage	How much host CPU the pipeline consumes
mAP	Accuracy score (only when using `dataset` source)

Full option list

For all available options:

./inference.py --help

Basic usage​

Examples​

What happens when you run it​

Options​

Display​

Performance​

Pipeline​

Hardware acceleration​

Output​

Output metrics​

Full option list​