Skip to main content

GStreamer Operators

The Voyager SDK ships a set of built-in GStreamer operators that cover the most common pre-processing, post-processing, and utility tasks. These operators appear in the pipeline YAML files inside ax_models/ and can be combined to build custom pipelines.

See Pipelines — How Inference Works for an overview of how operators fit into a pipeline.


How operators are specified

Each operator entry in a pipeline YAML has three fields:

- instance: axtransform          # GStreamer element type
lib: libtransform_resize.so # shared library implementing the operator
options: width:640;height:640 # semicolon-separated key:value pairs

Lists within options use commas: mean:0.485,0.456,0.406


Pre-processing operators

These operators prepare each video frame before it reaches the AI model.

transform_resize

Resizes a video frame. Supports letterboxing (preserving aspect ratio with padding) or stretching.

OptionTypeDefaultDescription
widthintOutput width in pixels
heightintOutput height in pixels
sizeintAlternative to width/height: scales the shorter edge to this value
letterboxint11 = preserve aspect ratio with padding; 0 = stretch to fit
paddingintPixel fill value for letterbox padding (e.g. 114 for gray)
to_tensorint01 = output as NHWC tensor instead of video frame (requires RGBA input)
- instance: axtransform
lib: libtransform_resize.so
options: width:640;height:640;padding:114;letterbox:1

transform_resizeratiocropexcess

Resizes while keeping aspect ratio, then crops excess. Common for classification models that expect a square input.

OptionTypeDescription
resize_sizeintResize so the shorter edge equals this value
final_size_after_cropintOptional: center-crop to this square size after resize
- instance: axtransform
lib: libtransform_resizeratiocropexcess.so
options: resize_size:256;final_size_after_crop:224

transform_cropresize

Crops a region of interest (ROI) from the frame and resizes it. Used in cascaded pipelines where a detector's output feeds a classifier.

OptionTypeDescription
meta_keystringKey in the metadata map containing bounding box ROIs
widthintOutput width
heightintOutput height
respect_aspectratioint1 = preserve aspect ratio and crop excess; 0 = stretch
- instance: axtransform
lib: libtransform_cropresize.so
options: meta_key:detections;width:224;height:224;respect_aspectratio:1

transform_totensor

Converts a video frame to a tensor for model input.

OptionTypeDescription
typestringint8 — NHWC format, copies uint8 values; float32 — NCHW format, normalizes to [0, 1]
- instance: axtransform
lib: libtransform_totensor.so
options: type:int8

inplace_normalize

Normalizes a tensor by applying mean/std scaling and optional quantization. Used to prepare float inputs for int8 quantized models.

Formula: x = (x - mean) / std, then optionally quantize.

OptionTypeDescription
meanfloat listPer-channel mean values (comma-separated)
stdfloat listPer-channel standard deviation values (comma-separated)
quant_scalefloatQuantization scale parameter
quant_zeropointfloatQuantization zero-point parameter
simdstringSIMD acceleration: avx2 or avx512 (int8 only)
- instance: axinplace
lib: libinplace_normalize.so
mode: write
options: mean:0.485,0.456,0.406;std:0.229,0.224,0.225;quant_scale:0.01863;quant_zeropoint:-14

transform_dequantize

Converts a quantized int8 tensor to float32. Formula: y = scale × (x − zero_point).

OptionTypeDescription
dequant_scalefloat listDequantisation scale per tensor (comma-separated)
dequant_zeropointint listDequantisation zero-point per tensor (comma-separated)
transposeint1 = transpose NHWC → NCHW; 0 = leave as-is
- instance: axtransform
lib: libtransform_dequantize.so
options: dequant_scale:0.1304;dequant_zeropoint:-70

transform_padding

Adds padding to a tensor. Padding values are available in manifest.json after model compilation.

OptionTypeDescription
paddingint listPadding before and after each dimension, comma-separated (length = 2 × number of dimensions)
fillintFill value for padded bytes (typically the quantization zero-point)
input_shapeint listOptional reshape before padding
output_shapeint listOptional reshape after padding
- instance: axtransform
lib: libtransform_padding.so
options: padding:0,0,0,0,0,8,0,0;fill:114

transform_yolopreproc

Executes the first layer of a YOLO network on the host CPU before sending to the AIPU. Reshapes 2×2 pixel patches into single pixels with 4× channels (e.g. 640×640×3 → 320×320×12), then adds required tensor padding.

OptionTypeDescription
paddingint listPadding at beginning and end of each dimension
fillintFill value for padded bytes
- instance: axtransform
lib: libtransform_yolopreproc.so
options: padding:0,0,0,0,0,0,0,52;fill:114

Post-processing operators

These operators convert raw model output tensors into structured results (bounding boxes, class labels, etc.).

decode_yolov5

Decodes YOLOv5 and YOLOv7/v8-compatible model output into object detection bounding boxes. Handles dequantisation, sigmoid activation, and anchor-based decoding.

OptionTypeDescription
meta_keystringKey in the metadata map to store detection results
anchorsfloat listAnchor values from model_info.json
classesintNumber of object classes the model detects
confidence_thresholdfloatMinimum confidence to keep a detection (e.g. 0.25)
topkintMaximum number of boxes before NMS
multiclassint1 = consider all classes per box; 0 = top class only
sigmoid_in_postprocessint1 = apply sigmoid here; 0 = model already applied it
transposeint1 = transpose NHWC → NCHW
scalesfloat listDequantisation scales from manifest.json
zero_pointsint listDequantisation zero-points from manifest.json
label_filterint listKeep only these class IDs (optional)
- instance: decode_muxer
lib: libdecode_yolov5.so
options: meta_key:detections;classes:80;confidence_threshold:0.25;topk:100;
multiclass:0;sigmoid_in_postprocess:1;transpose:1;
anchors:1.25,1.625,2.0,3.75,4.125,2.875;
scales:0.0814,0.0950,0.0929;zero_points:70,82,66

decode_ssd2

Decodes SSD (Single Shot MultiBox Detector) model output. Handles sigmoid/softmax activation, anchor generation, dequantisation.

OptionTypeDescription
meta_keystringKey in the metadata map to store detection results
classesintNumber of object classes
confidence_thresholdfloatMinimum confidence to keep a detection
topkintMaximum number of boxes
class_agnosticint1 = NMS across all classes; 0 = per-class NMS
transposeint1 = transpose NHWC → NCHW
softmaxint1 = apply softmax; 0 = apply sigmoid
scalesfloat listDequantisation scales from manifest.json
zero_pointsint listDequantisation zero-points from manifest.json
saved_anchorsstringPath to anchor file, or omit to auto-generate
label_filterint listKeep only these class IDs (optional)
- instance: decode_muxer
lib: libdecode_ssd2.so
options: meta_key:detections;classes:90;confidence_threshold:0.4;
topk:1000;class_agnostic:1;transpose:1;scales:0.9;zero_points:0

decode_classification

Decodes classifier model output into top-K class predictions.

OptionTypeDescription
meta_keystringKey in the metadata map to store classification results
classlabels_filestringPath to a text file with one label per line
top_kintNumber of top predictions to keep
sortedint1 = sort results by confidence
largestint1 = highest-confidence label first
softmaxint1 = apply softmax before selecting top-K
box_metastringIf set, attach classification to existing detection boxes instead of creating new metadata
- instance: decode_muxer
lib: libdecode_classification.so
options: meta_key:classification;classlabels_file:labels.txt;top_k:5;softmax:0

inplace_nms

Non-Maximum Suppression. Removes duplicate bounding boxes, keeping only the most confident detection for each object.

OptionTypeDescription
meta_keystringKey in the metadata map containing detections to filter
max_boxesintMaximum number of boxes to keep after NMS
nms_thresholdfloatIoU threshold above which overlapping boxes are suppressed (e.g. 0.45)
class_agnosticint1 = suppress across all classes; 0 = suppress per class
locationstringCPU or GPU (OpenCL)
- instance: axinplace
lib: libinplace_nms.so
options: meta_key:detections;max_boxes:300;nms_threshold:0.45;class_agnostic:0;location:CPU

Utility operators

inplace_draw

Draws inference results (bounding boxes, labels, classification overlays) onto the video frame. Iterates over all metadata entries and calls each one's draw function.

No options required.

- instance: axinplace
lib: libinplace_draw.so
mode: write

inplace_tracker

Assigns persistent tracking IDs to detected objects across frames. Supports multiple tracking algorithms.

OptionTypeDescription
algorithmstringsort, oc-sort, bytetrack, or scalarmot
detection_meta_keystringKey in metadata map for input detections
tracking_meta_keystringKey in metadata map to store tracker output
streamid_meta_keystringKey for stream ID in multi-stream applications
history_lengthintNumber of time steps to retain per tracked object
algo_params_jsonstringPath to JSON file with additional algorithm parameters
- instance: axinplace
lib: libinplace_tracker.so
options: detection_meta_key:detections;tracking_meta_key:tracks;algorithm:sort

inplace_addstreamid

Adds a stream ID to frame metadata. Required in multi-stream pipelines where multiple input streams are merged before inference.

OptionTypeDefaultDescription
stream_idintThe ID to assign to this stream
meta_keystringstream_idKey in the metadata map
- instance: axinplace
lib: libinplace_addstreamid.so
mode: meta
options: stream_id:0

See also