Model Zoo — Pre-trained Models
The Voyager Model Zoo is a collection of pre-trained AI models ready to run on Axelera hardware. When you run inference.py yolov5s-v7-coco usb:0, the model name (yolov5s-v7-coco) comes from the Model Zoo.
Listing available models
From the SDK root directory (with environment activated):
make
This lists three categories:
| Category | What it contains |
|---|---|
| ZOO | Individual models — one model, one task |
| REFERENCE APPLICATION PIPELINES | Multi-model pipelines (e.g., detection cascaded into pose estimation) |
| TUTORIALS | Example models used by the tutorial documentation |
How model names work
Model names follow a pattern:
\<architecture\>-\<dataset\>[-\<variant\>]
Examples:
| Name | Architecture | Dataset | Notes |
|---|---|---|---|
yolov5s-v7-coco | YOLOv5 small | COCO | v7 release of YOLOv5 |
yolov8s-coco-onnx | YOLOv8 small | COCO | ONNX format |
resnet50-imagenet | ResNet-50 | ImageNet | Classification model |
Task types
| Task | What it does | Example model |
|---|---|---|
| Object detection | Finds and labels objects with bounding boxes | yolov5s-v7-coco |
| Classification | Identifies what's in an image (single label) | resnet50-imagenet |
| Semantic segmentation | Labels every pixel by category | yolov8sseg-coco-onnx |
| Instance segmentation | Labels every pixel AND distinguishes individual objects | yolov8sseg-coco-onnx |
| Keypoint detection | Finds body joints and pose landmarks | yolov8lpose-coco-onnx |
| Depth estimation | Estimates distance of each pixel from camera | fastdepth-nyuv2 |
| License plate recognition | Reads license plates | Available in Model Zoo |
| Face recognition | Identifies or verifies faces | Available in Model Zoo |
Running a model
# Object detection with USB camera
./inference.py yolov5s-v7-coco usb:0
# Classification with a video file
./inference.py resnet50-imagenet media/traffic1_1080p.mp4
# Headless benchmarking (no display)
./inference.py yolov8s-coco-onnx usb:0 --no-display --frames 1000
The first time you run a model, the SDK:
- Downloads the pre-trained weights (if not cached)
- Compiles the model for the AIPU
- Caches the compiled model for subsequent runs
- Runs inference
Subsequent runs skip steps 1-3 and start immediately.
Datasets
Models are trained on specific datasets. The dataset name in the model identifier tells you what the model can recognize:
| Dataset | What it contains | Typical use |
|---|---|---|
| COCO | 80 object categories (person, car, dog, etc.) | General object detection |
| ImageNet | 1000 image categories | Image classification |
| VOC | 20 object categories | Object detection (smaller set) |
Non-redistributable datasets
Most datasets download automatically when you first run a model. A few datasets require manual registration and download due to licensing restrictions. These must be downloaded by hand from the links below, then placed in the specified directory within your SDK installation. The SDK raises an error with the expected path if a required dataset is missing.
| Dataset | Archive | Download location |
|---|---|---|
| Cityscapes (val) | gtFine_val.zip | data/cityscapes |
| Cityscapes (val) | leftImg8bit_val.zip | data/cityscapes |
| Cityscapes (test) | gtFine_test.zip | data/cityscapes |
| Cityscapes (test) | leftImg8bit_test.zip | data/cityscapes |
| ImageNet (train) | ILSVRC2012_devkit_t12.tar.gz | data/ImageNet |
| ImageNet (train) | ILSVRC2012_img_train.tar | data/ImageNet |
| ImageNet (val) | ILSVRC2012_devkit_t12.tar.gz | data/ImageNet |
| ImageNet (val) | ILSVRC2012_img_val.tar | data/ImageNet |
| WiderFace (train) | widerface_train.zip | data/widerface |
| WiderFace (val) | widerface_val.zip | data/widerface |
You are responsible for adhering to the terms and conditions of each dataset's license.
Performance characteristics
The tables below list all Model Zoo models for this SDK release. Columns:
- Ref FP32 — accuracy of the original floating-point model
- Accuracy loss — FP32 accuracy minus quantized int8 accuracy (lower is better)
- Ref PCIe FPS — host throughput on Intel Core i9-13900K + Metis 1× PCIe card
- Ref M.2 FPS — host throughput on Intel Core i5-1145G7E + Metis 1× M.2 card
Accuracy is measured using:
./inference.py \<model\> dataset --pipe=torch-aipu --no-display
Throughput is measured using a 720p h.264 video file:
./inference.py \<model\> media/traffic2_720p.mp4 --pipe=gst --no-display
Image Classification
| Model | ONNX | Repo | Resolution | Dataset | Ref FP32 Top1 | Accuracy loss | Ref PCIe FPS | Ref M.2 FPS | Model license |
|---|---|---|---|---|---|---|---|---|---|
| DenseNet-121 | 🔗 | 🔗 | 224x224 | ImageNet-1K | 74.44 | 0.86 | 281 | 156 | BSD-3-Clause |
| EfficientNet-B0 | 🔗 | 🔗 | 224x224 | ImageNet-1K | 77.67 | 1.12 | 1429 | 1450 | BSD-3-Clause |
| EfficientNet-B1 | 🔗 | 🔗 | 224x224 | ImageNet-1K | 77.6 | 0.47 | 972 | 960 | BSD-3-Clause |
| EfficientNet-B2 | 🔗 | 🔗 | 224x224 | ImageNet-1K | 77.79 | 0.46 | 903 | 863 | BSD-3-Clause |
| EfficientNet-B3 | 🔗 | 🔗 | 224x224 | ImageNet-1K | 78.54 | 0.50 | 787 | 721 | BSD-3-Clause |
| EfficientNet-B4 | 🔗 | 🔗 | 224x224 | ImageNet-1K | 79.27 | 0.71 | 576 | 436 | BSD-3-Clause |
| MobileNetV2 | 🔗 | 🔗 | 224x224 | ImageNet-1K | 71.87 | 1.50 | 3670 | 3638 | BSD-3-Clause |
| MobileNetV4-small | 🔗 | 🔗 | 224x224 | ImageNet-1K | 73.74 | 5.07 | 4937 | 4807 | Apache 2.0 |
| MobileNetV4-medium | 🔗 | 🔗 | 224x224 | ImageNet-1K | 79.04 | 0.90 | 2517 | 2395 | Apache 2.0 |
| MobileNetV4-large | 🔗 | 🔗 | 384x384 | ImageNet-1K | 82.92 | 0.95 | 761 | 460 | Apache 2.0 |
| MobileNetV4-aa_large | 🔗 | 🔗 | 384x384 | ImageNet-1K | 83.22 | 1.96 | 667 | 391 | Apache 2.0 |
| SqueezeNet 1.0 | 🔗 | 🔗 | 224x224 | ImageNet-1K | 58.1 | 2.80 | 953 | 811 | BSD-3-Clause |
| SqueezeNet 1.1 | 🔗 | 🔗 | 224x224 | ImageNet-1K | 58.19 | 1.86 | 7298 | 7264 | BSD-3-Clause |
| Inception V3 | 🔗 | 🔗 | 224x224 | ImageNet-1K | 69.85 | 0.25 | 1136 | 636 | BSD-3-Clause |
| RegNetX-1_6GF | 🔗 | 🔗 | 224x224 | ImageNet-1K | 79.33 | 0.22 | 695 | 369 | BSD-3-Clause |
| RegNetX-400MF | 🔗 | 🔗 | 224x224 | ImageNet-1K | 74.48 | 0.36 | 1199 | 636 | BSD-3-Clause |
| RegNetY-1_6GF | 🔗 | 🔗 | 224x224 | ImageNet-1K | 80.73 | 0.24 | 595 | 322 | BSD-3-Clause |
| RegNetY-400MF | 🔗 | 🔗 | 224x224 | ImageNet-1K | 75.63 | 0.13 | 1642 | 975 | BSD-3-Clause |
| ResNet-18 | 🔗 | 🔗 | 224x224 | ImageNet-1K | 69.76 | 0.36 | 3904 | 3749 | BSD-3-Clause |
| ResNet-34 | 🔗 | 🔗 | 224x224 | ImageNet-1K | 73.3 | 0.12 | 2282 | 2075 | BSD-3-Clause |
| ResNet-50 v1.5 | 🔗 | 🔗 | 224x224 | ImageNet-1K | 76.15 | 0.18 | 1946 | 1756 | BSD-3-Clause |
| ResNet-101 | 🔗 | 🔗 | 224x224 | ImageNet-1K | 77.37 | 0.79 | 1049 | 673 | BSD-3-Clause |
| ResNet-152 | 🔗 | 🔗 | 224x224 | ImageNet-1K | 78.31 | 0.23 | 493 | 261 | BSD-3-Clause |
| ResNet-10t | 🔗 | 🔗 | 224x224 | ImageNet-1K | 68.22 | 1.06 | 5212 | 5015 | Apache 2.0 |
| ResNeXt50_32x4d | 🔗 | 🔗 | 224x224 | ImageNet-1K | 77.61 | 0.08 | 437 | 236 | BSD-3-Clause |
| Wide ResNet-50 | 🔗 | 🔗 | 224x224 | ImageNet-1K | 78.48 | 0.36 | 436 | 236 | BSD-3-Clause |
Object Detection
| Model | ONNX | Repo | Resolution | Dataset | Ref FP32 mAP | Accuracy loss | Ref PCIe FPS | Ref M.2 FPS | Model license |
|---|---|---|---|---|---|---|---|---|---|
| GELAN-s | 🔗 | 🔗 | 640x640 | COCO2017 | 46.41 | 2.99 | 376 | 237 | GPL-3.0 |
| GELAN-m | 🔗 | 🔗 | 640x640 | COCO2017 | 50.86 | 1.06 | 203 | 148 | GPL-3.0 |
| GELAN-c | 🔗 | 🔗 | 640x640 | COCO2017 | 52.3 | 0.49 | 199 | 144 | GPL-3.0 |
| RetinaFace - Resnet50 | 🔗 | 🔗 | 840x840 | WiderFace | 95.25 | 0.25 | 90 | 51 | MIT |
| RetinaFace - mb0.25 | 🔗 | 🔗 | 640x640 | WiderFace | 89.44 | 1.36 | 1020 | 774 | MIT |
| SSD-MobileNetV1 | 🔗 | 🔗 | 300x300 | COCO2017 | 24.77 | -0.05 | 3356 | 3019 | Apache 2.0 |
| SSD-MobileNetV2 | 🔗 | 🔗 | 300x300 | COCO2017 | 19.25 | 0.87 | 2261 | 2195 | Apache 2.0 |
| YOLOv3 | 🔗 | 🔗 | 640x640 | COCO2017 | 46.61 | 0.79 | 163 | 96 | AGPL-3.0 |
| YOLOv5s-Relu | 🔗 | 🔗 | 640x640 | COCO2017 | 35.09 | 0.52 | 785 | 536 | AGPL-3.0 |
| YOLOv5s-v5 | 🔗 | 🔗 | 640x640 | COCO2017 | 36.18 | 0.37 | 790 | 526 | AGPL-3.0 |
| YOLOv5n | 🔗 | 🔗 | 640x640 | COCO2017 | 27.72 | 0.87 | 1028 | 656 | AGPL-3.0 |
| YOLOv5s | 🔗 | 🔗 | 640x640 | COCO2017 | 37.25 | 0.80 | 865 | 824 | AGPL-3.0 |
| YOLOv5m | 🔗 | 🔗 | 640x640 | COCO2017 | 44.94 | 0.85 | 455 | 322 | AGPL-3.0 |
| YOLOv5l | 🔗 | 🔗 | 640x640 | COCO2017 | 48.67 | 0.84 | 299 | 204 | AGPL-3.0 |
| YOLOv7 | 🔗 | 🔗 | 640x640 | COCO2017 | 51.02 | 0.58 | 212 | 173 | GPL-3.0 |
| YOLOv7-tiny | 🔗 | 🔗 | 416x416 | COCO2017 | 33.12 | 0.49 | 1441 | 1110 | GPL-3.0 |
| YOLOv7 640x480 | 🔗 | 🔗 | 640x480 | COCO2017 | 50.78 | 0.52 | 242 | 164 | GPL-3.0 |
| YOLOv8n | 🔗 | 🔗 | 640x640 | COCO2017 | 37.12 | 1.18 | 834 | 764 | AGPL-3.0 |
| YOLOv8s | 🔗 | 🔗 | 640x640 | COCO2017 | 44.8 | 0.93 | 643 | 524 | AGPL-3.0 |
| YOLOv8m | 🔗 | 🔗 | 640x640 | COCO2017 | 50.16 | 1.32 | 242 | 177 | AGPL-3.0 |
| YOLOv8l | 🔗 | 🔗 | 640x640 | COCO2017 | 52.83 | 2.06 | 181 | 142 | AGPL-3.0 |
| YOLOv8n-obb | 🔗 | 🔗 | 1024x1024 | DOTAv1DetectionOBBDataset | 48.73 | 5.68 | 269 | 162 | AGPL-3.0 |
| YOLOv8l-obb | 🔗 | 🔗 | 1024x1024 | DOTAv1DetectionOBBDataset | 56.06 | 4.41 | 36 | 19 | AGPL-3.0 |
| YOLOX-s | 🔗 | 🔗 | 640x640 | COCO2017 | 39.24 | -0.81 | 642 | 423 | Apache-2.0 |
| YOLOX-m | 🔗 | 🔗 | 640x640 | COCO2017 | 46.26 | -0.37 | 349 | 268 | Apache-2.0 |
| YOLOX-x Human | 🔗 | 🔗 | 1440x800 | COCO2017 | 57.66 | 3.38 | 21 | - | MIT |
| YOLOv9t | 🔗 | 🔗 | 640x640 | COCO2017 | 37.81 | 1.25 | 415 | 247 | AGPL-3.0 |
| YOLOv9s | 🔗 | 🔗 | 640x640 | COCO2017 | 46.28 | 1.12 | 374 | 237 | AGPL-3.0 |
| YOLOv9m | 🔗 | 🔗 | 640x640 | COCO2017 | 51.24 | 2.29 | 203 | 148 | AGPL-3.0 |
| YOLOv9c | 🔗 | 🔗 | 640x640 | COCO2017 | 52.67 | 2.35 | 194 | 150 | AGPL-3.0 |
| YOLOv10n | 🔗 | 🔗 | 640x640 | COCO2017 | 38.08 | 0.74 | 738 | 561 | AGPL-3.0 |
| YOLOv10s | 🔗 | 🔗 | 640x640 | COCO2017 | 45.74 | 0.45 | 580 | 461 | AGPL-3.0 |
| YOLOv10b | 🔗 | 🔗 | 640x640 | COCO2017 | 51.79 | 0.45 | 251 | 217 | AGPL-3.0 |
| YOLO11n | 🔗 | 🔗 | 640x640 | COCO2017 | 39.17 | 0.71 | 759 | 574 | AGPL-3.0 |
| YOLO11s | 🔗 | 🔗 | 640x640 | COCO2017 | 46.54 | 0.55 | 565 | 426 | AGPL-3.0 |
| YOLO11m | 🔗 | 🔗 | 640x640 | COCO2017 | 51.31 | 0.55 | 269 | 196 | AGPL-3.0 |
| YOLO11l | 🔗 | 🔗 | 640x640 | COCO2017 | 53.23 | 0.49 | 183 | 125 | AGPL-3.0 |
| YOLO11x | 🔗 | 🔗 | 640x640 | COCO2017 | 54.67 | 0.58 | 53 | 31 | AGPL-3.0 |
| YOLO11n-obb | 🔗 | 🔗 | 1024x1024 | DOTAv1DetectionOBBDataset | 50.01 | 1.07 | 250 | 172 | AGPL-3.0 |
| YOLO11l-obb | 🔗 | 🔗 | 1024x1024 | DOTAv1DetectionOBBDataset | 56.41 | 1.08 | 36 | 20 | AGPL-3.0 |
| YOLO26n | 🔗 | 🔗 | 640x640 | COCO2017 | 40.18 | 1.95 | 662 | 487 | AGPL-3.0 |
| YOLO26s | 🔗 | 🔗 | 640x640 | COCO2017 | 47.66 | 2.05 | 498 | 396 | AGPL-3.0 |
| YOLO26m | 🔗 | 🔗 | 640x640 | COCO2017 | 52.45 | 2.14 | 258 | 192 | AGPL-3.0 |
| YOLO26l | 🔗 | 🔗 | 640x640 | COCO2017 | 54.11 | 2.03 | 179 | 122 | AGPL-3.0 |
| YOLO26x | 🔗 | 🔗 | 640x640 | COCO2017 | 56.92 | 2.43 | 53 | 31 | AGPL-3.0 |
| YOLO26n-obb | 🔗 | 🔗 | 1024x1024 | DOTAv1DetectionOBBDataset | 49.41 | 3.12 | 206 | 139 | AGPL-3.0 |
| YOLO26s-obb | 🔗 | 🔗 | 1024x1024 | DOTAv1DetectionOBBDataset | 54.02 | 2.01 | 167 | 114 | AGPL-3.0 |
| YOLO26m-obb | 🔗 | 🔗 | 1024x1024 | DOTAv1DetectionOBBDataset | 56.66 | 1.72 | 58 | 33 | AGPL-3.0 |
| YOLO26l-obb | 🔗 | 🔗 | 1024x1024 | DOTAv1DetectionOBBDataset | 57.35 | 1.05 | 34 | 19 | AGPL-3.0 |
| YOLO26x-obb | 🔗 | 🔗 | 1024x1024 | DOTAv1DetectionOBBDataset | 58.4 | 5.34 | 15 | - | AGPL-3.0 |
| YOLO-NAS S | 🔗 | 🔗 | 640x640 | COCO2017 | 47.06 | 450 | 318 | Apache-2.0 | |
| YOLO-NAS M | 🔗 | 🔗 | 640x640 | COCO2017 | 51.0 | 285 | 221 | Apache-2.0 | |
| YOLO-NAS L | 🔗 | 🔗 | 640x640 | COCO2017 | 51.79 | 157 | 96 | Apache-2.0 |
Semantic Segmentation
| Model | ONNX | Repo | Resolution | Dataset | Ref FP32 mIoU | Accuracy loss | Ref PCIe FPS | Ref M.2 FPS | Model license |
|---|---|---|---|---|---|---|---|---|---|
| U-Net FCN 256 | 🔗 | 🔗 | 256x256 | Cityscapes | 57.75 | 0.34 | 249 | 198 | Apache 2.0 |
| U-Net FCN 512 | 🔗 | 512x512 | Cityscapes | 66.62 | 0.01 | 34 | 19 | Apache 2.0 |
Instance Segmentation
| Model | ONNX | Repo | Resolution | Dataset | Ref FP32 mAP | Accuracy loss | Ref PCIe FPS | Ref M.2 FPS | Model license |
|---|---|---|---|---|---|---|---|---|---|
| YOLOv8n-seg | 🔗 | 🔗 | 640x640 | COCO2017 | 29.98 | 0.92 | 639 | 433 | AGPL-3.0 |
| YOLOv8s-seg | 🔗 | 🔗 | 640x640 | COCO2017 | 36.32 | 0.57 | 482 | 345 | AGPL-3.0 |
| YOLOv8m-seg | 🔗 | 🔗 | 640x640 | COCO2017 | 40.39 | 0.65 | 198 | 156 | AGPL-3.0 |
| YOLOv8l-seg | 🔗 | 🔗 | 640x640 | COCO2017 | 42.27 | 1.11 | 167 | 134 | AGPL-3.0 |
| YOLO11n-seg | 🔗 | 🔗 | 640x640 | COCO2017 | 31.84 | 1.11 | 598 | 406 | AGPL-3.0 |
| YOLO11l-seg | 🔗 | 🔗 | 640x640 | COCO2017 | 43.26 | 0.13 | 156 | 107 | AGPL-3.0 |
| YOLO26n-seg | 🔗 | 🔗 | 640x640 | COCO2017 | 32.95 | 2.64 | 516 | 352 | AGPL-3.0 |
| YOLO26s-seg | 🔗 | 🔗 | 640x640 | COCO2017 | 39.28 | 3.28 | 385 | 292 | AGPL-3.0 |
| YOLO26m-seg | 🔗 | 🔗 | 640x640 | COCO2017 | 43.34 | 1.40 | 201 | 156 | AGPL-3.0 |
| YOLO26l-seg | 🔗 | 🔗 | 640x640 | COCO2017 | 45.09 | 1.87 | 141 | 96 | AGPL-3.0 |
| YOLO26x-seg | 🔗 | 🔗 | 640x640 | COCO2017 | 46.54 | 2.00 | 46 | 28 | AGPL-3.0 |
Keypoint Detection
| Model | ONNX | Repo | Resolution | Dataset | Ref FP32 mAP | Accuracy loss | Ref PCIe FPS | Ref M.2 FPS | Model license |
|---|---|---|---|---|---|---|---|---|---|
| YOLOv8n-pose | 🔗 | 🔗 | 640x640 | COCO2017 | 51.11 | 1.75 | 822 | 723 | AGPL-3.0 |
| YOLOv8s-pose | 🔗 | 🔗 | 640x640 | COCO2017 | 60.65 | 2.98 | 592 | 471 | AGPL-3.0 |
| YOLOv8m-pose | 🔗 | 🔗 | 640x640 | COCO2017 | 65.58 | 1.91 | 231 | 168 | AGPL-3.0 |
| YOLOv8l-pose | 🔗 | 🔗 | 640x640 | COCO2017 | 68.39 | 1.47 | 186 | 145 | AGPL-3.0 |
| YOLO11n-pose | 🔗 | 🔗 | 640x640 | COCO2017 | 51.15 | 3.23 | 759 | 532 | AGPL-3.0 |
| YOLO11l-pose | 🔗 | 🔗 | 640x640 | COCO2017 | 67.44 | 3.14 | 179 | 122 | AGPL-3.0 |
| YOLO26n-pose | 🔗 | 🔗 | 640x640 | COCO2017 | 57.66 | 6.54 | 658 | 450 | AGPL-3.0 |
| YOLO26s-pose | 🔗 | 🔗 | 640x640 | COCO2017 | 63.61 | 5.12 | 467 | 359 | AGPL-3.0 |
| YOLO26m-pose | 🔗 | 🔗 | 640x640 | COCO2017 | 69.54 | 4.83 | 235 | 166 | AGPL-3.0 |
| YOLO26l-pose | 🔗 | 🔗 | 640x640 | COCO2017 | 71.05 | 3.02 | 174 | 120 | AGPL-3.0 |
| YOLO26x-pose | 🔗 | 🔗 | 640x640 | COCO2017 | 72.75 | 16.62 | 51 | 30 | AGPL-3.0 |
Depth Estimation
| Model | ONNX | Repo | Resolution | Dataset | Ref FP32 RMSE | Accuracy loss | Ref PCIe FPS | Ref M.2 FPS | Model license |
|---|---|---|---|---|---|---|---|---|---|
| FastDepth | 🔗 | 🔗 | 224x224 | NYUDepthV2 | 0.6574 | -0.0065 | 974 | 855 | MIT |
License Plate Recognition
| Model | ONNX | Repo | Resolution | Dataset | Ref FP32 WLA | Accuracy loss | Ref PCIe FPS | Ref M.2 FPS | Model license |
|---|---|---|---|---|---|---|---|---|---|
| LPRNet | 🔗 | 94x24 | LPRNetDataset | 89.4 | 1.90 | 10268 | 9335 | Apache-2.0 |
Image Enhancement (Super Resolution)
| Model | ONNX | Repo | Resolution | Dataset | Ref FP32 PSNR | Accuracy loss | Ref PCIe FPS | Ref M.2 FPS | Model license |
|---|---|---|---|---|---|---|---|---|---|
| Real-ESRGAN-x4plus | 🔗 | 🔗 | 128x128 | SuperResolutionCustomSet128x128 | 24.77 | - | - | BSD-3-Clause |
Face Recognition
| Model | ONNX | Repo | Resolution | Dataset | Ref FP32 top1_avg | Accuracy loss | Ref PCIe FPS | Ref M.2 FPS | Model license |
|---|---|---|---|---|---|---|---|---|---|
| FaceNet - InceptionResnetV1 | 🔗 | 🔗 | 160x160 | LFWTorchvisionPair | 98.35 | 0.00 | 1321 | 720 | MIT |
Re-Identification
| Model | ONNX | Repo | Resolution | Dataset | Ref FP32 mAP | Accuracy loss | Ref PCIe FPS | Ref M.2 FPS | Model license |
|---|---|---|---|---|---|---|---|---|---|
| OSNet x1_0 | 🔗 | 🔗 | 256x128 | Market1501ReIdDataset | 82.55 | 0.93 | 1732 | 1770 | Apache-2.0 |
| SBS50 | 🔗 | 🔗 | 384x128 | Market1501ReIdDataset | 89.02 | -0.16 | 666 | 405 | Apache-2.0 |
Large Language Models
For usage details see the LLM Inference guide.
| Model | Max context (tokens) | Required PCIe card RAM |
|---|---|---|
| microsoft/Phi-3-mini-4k-instruct | 512 | 4 GB |
| microsoft/Phi-3-mini-4k-instruct | 1024 | 16 GB |
| microsoft/Phi-3-mini-4k-instruct | 2048 | 16 GB |
| meta-llama/Llama-3.2-1B-Instruct | 1024 | 4 GB |
| meta-llama/Llama-3.2-3B-Instruct | 1024 | 4 GB |
| meta-llama/Llama-3.1-8B-Instruct | 1024 | 16 GB |
| Almawave/Velvet-2B | 1024 | 4 GB |
Experimenting with optimized input shapes
Most models are trained on square inputs (640×640), but real-world video is often rectangular (16:9). Standard pipelines pad the input ("letterboxing"), forcing the model to process empty pixels.
By switching to a rectangular input shape that matches your video's aspect ratio, you can often achieve significant speedups with minimal accuracy impact. This is especially effective for fixed-camera applications like surveillance or traffic monitoring.
How to test
Export models with dynamic input shapes, then compare:
# Standard 640×640
./inference.py yolox-m-coco-onnx dataset --pipe=torch-aipu --no-display
# Rectangular 640×480
./inference.py yolox-m-coco-onnx-rect dataset --pipe=torch-aipu --no-display
Expected results
| Configuration | Input shape | Speedup | mAP impact | Best for |
|---|---|---|---|---|
| Standard | 640×640 | Baseline | Baseline | General purpose, diverse content |
| Optimized | 640×480 | +24% | −0.3% | Near-square content, balanced performance |
| Optimized | 640×384 | +47% | −2.0% | Landscape video (16:9), maximum throughput |
Custom weights
You can use your own trained weights with any model architecture. This involves updating the model's YAML configuration to point to your custom weight file. See Deploy Custom Weights for the full walkthrough.
See also
- First Inference — run your first model
- Measure Accuracy — benchmark model performance
- inference.py — full command reference
- Glossary — definitions of COCO, YOLO, mAP, and other terms