Skip to main content

inference.py

The SDK's command-line tool for running models on Metis hardware. It handles compilation, execution, display, and performance reporting.

Basic usage

./inference.py <model-name> \<source\> [options]
ArgumentWhat it isExample
<model-name>A model from the Model Zooyolov5s-v7-coco
\<source\>Where the video comes from (see Video Sources)usb:0, media/traffic1_1080p.mp4

Examples

Run object detection on a USB camera:

./inference.py yolov5s-v7-coco usb:0

Run classification on a video file with no display:

./inference.py resnet50-imagenet media/traffic1_1080p.mp4 --no-display

Measure accuracy against a validation dataset:

./inference.py yolov5s-v7-coco dataset --no-display

Run on multiple sources simultaneously:

./inference.py yolov8s-coco-onnx usb:0 usb:1 media/traffic1_1080p.mp4

What happens when you run it

  1. First run only: The pipeline compiler builds the model for your hardware. This takes a few minutes and shows a progress bar. The result is cached.
  2. Every run: The pipeline starts — pre-processing, inference on the AIPU, post-processing.
  3. Display: A window shows the video with results overlaid (bounding boxes for detection, labels for classification) and performance metrics.
  4. On completion: Average throughput and CPU usage are printed to the terminal.
Single images

If you pass a single image instead of a video, the window stays open until you press q. No end-of-run summary is printed.

Options

Display

OptionWhat it does
--no-displayRun headless. No window, just terminal output. Use for benchmarking or remote sessions.
--display openglForce OpenGL renderer (default if available, most efficient)
--display opencvForce OpenCV renderer (slower, works on more systems)
--display consoleRender to terminal using ANSI colors. Useful over SSH.
--window-size WxHSet window size, e.g., --window-size 1920x1080
--window-size fullscreenFullscreen display

Performance

OptionWhat it does
--frames NStop after N frames (across all sources). Default: run all frames.
--aipu-cores NUse N AIPU cores (default: all available, typically 4). Useful for testing multi-model scenarios.
--show-host-fpsDisplay host-specific FPS alongside the default metrics.
--show-stream-timingShow latency and jitter information during the run.

Pipeline

OptionWhat it does
--pipe gstUse GStreamer pipeline (default). Runs on AIPU.
--pipe torchUse PyTorch pipeline with ONNXRuntime. Runs on CPU.
--pipe torch-aipuUse PyTorch pipeline with the model offloaded to AIPU.
tip

The default gst pipeline is almost always what you want. The torch option is useful for comparing AIPU results against CPU-only execution.

Hardware acceleration

OptionWhat it does
--enable-hardware-codecPrefer hardware video decoding. Default uses software decoding (better pipeline performance on most systems).
--enable-vaapi / --disable-vaapiControl Intel VA-API acceleration for pre-processing. Auto-detected by default.
--enable-opencl / --disable-openclControl OpenCL acceleration for pre-processing. Auto-detected by default.
--enable-opengl / --disable-openglControl OpenGL for rendering. Auto-detected by default.

Output

OptionWhat it does
--save-output path.mp4Save the rendered output to an MP4 file.
--save-output output%02d.mp4Save multiple streams separately (e.g., output00.mp4, output01.mp4).
note

When saving output, all frames must be rendered. This may reduce system FPS as the pipeline waits for video encoding.

Output metrics

When a run completes, inference.py reports:

MetricWhat it means
System FPSEnd-to-end throughput including all processing and display
Device FPSRaw AIPU throughput (what the hardware can do)
CPU usageHow much host CPU the pipeline consumes
mAPAccuracy score (only when using dataset source)

Full option list

For all available options:

./inference.py --help