Raw Tensor Output with Custom Postprocessing

This example shows how to access raw output tensors from the neural network and perform your own postprocessing in Python. Instead of using the SDK's built-in detection pipeline, it retrieves the YOLOv8 output tensor as a NumPy array, decodes bounding boxes manually, and renders them with OpenCV.

Use this when you need full control over postprocessing logic, want to implement a custom decoder, or are working with a model whose output format is not yet supported by the built-in task handlers.

What you'll learn

How to use a network configuration that outputs raw tensors (yolov8n-output-tensor)
How to extract NumPy tensors from frame result metadata
How to implement YOLOv8 postprocessing (anchor decoding, letterbox correction, NMS)
How to render detections manually with OpenCV

Prerequisites

Voyager SDK installed and activated
OpenCV (cv2) and NumPy installed
Sample media files available in media/ (included with the SDK)

Source

Download

This example is included in the SDK at examples/application_tensor.py.

application_tensor.py
#!/usr/bin/env python
# Copyright Axelera AI, 2025
import cv2
import numpy as np

from axelera.app import config, display
from axelera.app.stream import create_inference_stream

stream = create_inference_stream(
    network="yolov8n-output-tensor",
    sources=[
        str(config.env.framework / "media/traffic1_1080p.mp4"),
    ],
)


def postprocess_yolov8(
    data, shape, orig_w, orig_h, model_w=640, model_h=640, conf_threshold=0.25, letterboxed=True
):
    # YOLOv8 output: (1, 84, 8400) => (batch, channels, num_anchors)
    # Each anchor: [x, y, w, h, score_0, ..., score_79]
    # We'll use only the first batch
    while data.ndim > 3:
        data = np.squeeze(data, axis=1)
    shape = data.shape
    num_classes = shape[1] - 4
    num_anchors = shape[2]
    detections = []
    for i in range(num_anchors):
        x = data[0, 0, i]
        y = data[0, 1, i]
        w = data[0, 2, i]
        h = data[0, 3, i]
        scores = data[0, 4:, i]
        class_id = np.argmax(scores)
        score = scores[class_id]
        if score > conf_threshold:
            # Convert from center x, y, w, h to x1, y1, x2, y2
            x1 = x - w / 2
            y1 = y - h / 2
            x2 = x + w / 2
            y2 = y + h / 2

            # Map to original image coordinates
            if letterboxed:
                # Calculate scale and padding
                r = min(model_w / orig_w, model_h / orig_h)
                new_w, new_h = int(orig_w * r), int(orig_h * r)
                pad_w, pad_h = (model_w - new_w) // 2, (model_h - new_h) // 2

                # Undo letterbox
                x1 = (x1 - pad_w) / r
                y1 = (y1 - pad_h) / r
                x2 = (x2 - pad_w) / r
                y2 = (y2 - pad_h) / r
            else:
                # Simple resize
                x1 = x1 * orig_w / model_w
                y1 = y1 * orig_h / model_h
                x2 = x2 * orig_w / model_w
                y2 = y2 * orig_h / model_h

            detections.append((x1, y1, x2, y2, class_id, float(score)))
    return detections


def render_detections(image, detections, labels=None):
    if labels is None:
        labels = [f"object_{i}" for i in range(80)]
    for x1, y1, x2, y2, class_id, score in detections:
        pt1 = (int(x1), int(y1))
        pt2 = (int(x2), int(y2))
        cv2.rectangle(image, pt1, pt2, (0, 255, 255), 2)
        label = labels[class_id] if class_id < len(labels) else str(class_id)
        text = f"{label} {score:.2f}"
        cv2.putText(
            image, text, (int(x1), int(y1) - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 255), 2
        )


def main(window, stream):
    display_w, display_h = 640, 360  # or any size you prefer
    for frame_result in stream:
        tensor_wrapper = frame_result.meta['detections']
        tensor = tensor_wrapper.tensors[0]  # numpy array
        rgb_img = frame_result.image.asarray()
        # Resize image first for faster processing and display
        rgb_img_small = cv2.resize(rgb_img, (display_w, display_h))
        orig_h, orig_w = rgb_img_small.shape[:2]
        detections = postprocess_yolov8(
            tensor, tensor.shape, orig_w, orig_h, model_w=640, model_h=640, letterboxed=True
        )
        bgr_img = cv2.cvtColor(rgb_img_small, cv2.COLOR_RGB2BGR)
        render_detections(bgr_img, detections)
        cv2.imshow('Detections', bgr_img)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break


with display.App(renderer=False) as app:
    app.start_thread(main, (None, stream), name='InferenceThread')
    app.run()
stream.stop()

Key concepts

Raw tensor access is enabled by using a network configuration that ends with -output-tensor (here yolov8n-output-tensor). Instead of producing parsed detections, the pipeline places raw NumPy arrays into frame_result.meta['detections'].tensors. This gives you direct access to the model's output for custom decoding.

YOLOv8 postprocessing is implemented in the postprocess_yolov8 function. The model outputs a tensor of shape (1, 84, 8400) where each of the 8400 anchors contains 4 box coordinates plus 80 class scores. The function converts center-format boxes to corner-format, applies confidence thresholding, and corrects for letterbox padding to map coordinates back to the original image dimensions.

Manual rendering with OpenCV replaces the SDK's built-in renderer. Notice that display.App(renderer=False) disables the SDK renderer entirely, and the code uses cv2.imshow and cv2.rectangle directly. This is the pattern to follow when you need pixel-level control over the output visualization or are integrating with an existing OpenCV-based pipeline.

What you'll learn​

Prerequisites​

Source​

Key concepts​

What you'll learn

Prerequisites

Source

Key concepts