Version: v1.7

Coordinate System Tutorial

How coordinates work in Pipeline Builder pipelines, from single-model detection to multi-level cascades.

Quick Reference

# Get pixel coordinates for drawing (detection, pose, segmentation)
x0, y0, x1, y1 = det.bbox.to_pixels(img_width, img_height)

# Get frame-level pixels (works at any nesting depth)
x0, y0, x1, y1 = det.bbox.frame_pixels(img_width, img_height)

# OBB: get frame-level rotated corners
points = obb_det.frame_corners(img_width, img_height)

# Keypoints: get frame-level pixel coordinates
px, py = kp.frame_x(img_width), kp.frame_y(img_height)

# Manual coordinate composition (power-user)
frame_bbox = child_bbox.in_frame_of(parent_bbox)

Key rule: BBox stores normalized [0,1] coordinates. Call to_pixels() or frame_pixels() when you need integers for drawing.

BBox API

Storage: Normalized [0,1]

All BBox coordinates are normalized relative to their local region:

det.bbox.x0  # float in [0, 1] -- left edge as fraction of region width
det.bbox.y0  # float in [0, 1] -- top edge as fraction of region height
det.bbox.x1  # float in [0, 1] -- right edge
det.bbox.y1  # float in [0, 1] -- bottom edge

Why normalized? Because normalized coordinates compose via pure arithmetic. parent.x0 + child.x0 * parent.width gives you the child's position in the parent's coordinate system -- no image dimensions needed until you want pixels. This is what makes multi-level cascades work without passing image sizes around.

`to_pixels(w, h)` -- Convert to pixel integers

x0, y0, x1, y1 = bbox.to_pixels(img_width, img_height)
# Returns clamped integers: (int, int, int, int)

Multiplies normalized coords by image dimensions and clamps to [0, w) / [0, h). Use this when you have a BBox relative to a known image and need pixel coordinates.

`frame_pixels(w, h)` -- Frame-level pixel coordinates

x0, y0, x1, y1 = bbox.frame_pixels(img_width, img_height)

If bbox._frame is set (the pipeline populated frame-level coordinates), uses _frame to compute pixels. Otherwise falls back to to_pixels().

This is the function to use in rendering code. It returns correct pixel positions in the original frame regardless of nesting depth.

`in_frame_of(parent)` -- Compose coordinates

frame_bbox = child_bbox.in_frame_of(parent_bbox)

Computes the child's position within the parent's coordinate frame:

result.x0 = parent.x0 + child.x0 * parent.width
result.y0 = parent.y0 + child.y0 * parent.height
result.x1 = parent.x0 + child.x1 * parent.width
result.y1 = parent.y0 + child.y1 * parent.height

Returns a new BBox. The _frame field is NOT set on the result -- use this for manual coordinate arithmetic (see Line-by-Line Patterns below).

`width` / `height` properties

bbox.width   # x1 - x0 (normalized width)
bbox.height  # y1 - y0 (normalized height)

`_frame` -- Pre-computed frame coordinates

bbox._frame  # BBox | None

Set automatically by the pipeline (AxDetection, AxPose, AxSegmentation) when running inside a ForEach. Contains the bbox composed all the way up to the original frame level. Not visible in repr() or == comparison.

Why on BBox and not on DetectedObject? Because _frame travels WITH the bbox. If someone replaces det.bbox, the old _frame detaches naturally. And it works for all object types (DetectedObject, PoseObject, SegmentedObject) without modifying each one.

Pipeline Patterns (op.seq users)

For regular pipeline users, coordinate composition is automatic.

Single-level detection

pipeline = op.seq(
    op.color_convert('RGB', 'BGR'),
    op.letterbox(640, 640),
    op.to_tensor(),
    op.load('yolov8n-coco.axm'),
    op.decode_detections(algo='yolov8', num_classes=80),
    op.nms(),
    op.to_image_space(),      # MODEL_PIXEL -> NORMALIZED [0,1]
    op.ax_detection(),         # creates BBox with normalized coords
)

with op.frame_context(img):
    detections = pipeline(img)

for det in detections:
    x0, y0, x1, y1 = det.bbox.to_pixels(w, h)
    cv2.rectangle(frame, (x0, y0), (x1, y1), color, 2)

At root level (no ForEach), _frame is None. to_pixels() and frame_pixels() return the same values.

Two-level cascade (nested detection)

pipeline = op.seq(
    op.color_convert('RGB', 'BGR'),
    # Level 1: Detect vehicles in full frame
    op.letterbox(640, 640),
    op.to_tensor(),
    op.load('yolov8n-coco.axm'),
    op.decode_detections(algo='yolov8', num_classes=80),
    op.nms(),
    op.to_image_space(),
    op.ax_detection(class_id_type=op.CocoClasses),
    # Level 2: For each vehicle, detect plates
    op.for_each(
        'plates',
        op.crop_roi(property='bbox'),
        op.resize(size=(640, 640)),
        op.to_tensor(),
        op.load('plate-detector.axm', name='plates'),
        op.decode_detections(algo='yolov8', num_classes=1),
        op.nms(),
        op.to_image_space(),
        op.ax_detection(),
    ),
)

What happens automatically:

ForEach sets _parent_frame_box_var from the vehicle's bbox._frame (or bbox)
CropRoi reads _source_image_var (the vehicle ROI), crops the plate region
AxDetection reads _parent_frame_box_var and calls bbox.in_frame_of(parent)
The plate's bbox._frame now holds frame-level coordinates

To render plate detections on the original frame:

for plate_det in result.plates:
    # frame_pixels() uses _frame to map back to original image
    x0, y0, x1, y1 = plate_det.bbox.frame_pixels(w, h)
    cv2.rectangle(frame, (x0, y0), (x1, y1), (0, 255, 255), 1)

See rt-demo.py: nested_detection() for a complete runnable example.

Three or more levels

The same pattern extends to arbitrary depth. Each ForEach/CropRoi/AxDetection cycle composes one more level of coordinates. The _frame field always holds the full composition back to the original frame.

Line-by-Line Patterns (power users)

If you write detection loops manually instead of using op.seq, you can compose coordinates explicitly with in_frame_of().

Manual two-level cascade

with op.frame_context(img):
    # Level 1
    vehicles = detect_pipeline(img)

    for vehicle in vehicles:
        # Crop the vehicle region
        vx0, vy0, vx1, vy1 = vehicle.bbox.to_pixels(w, h)
        roi = img[vy0:vy1, vx0:vx1]

        # Level 2: detect plates in ROI
        with op.frame_context(roi):
            plates = plate_pipeline(roi)

        # Compose coordinates manually
        for plate in plates:
            frame_bbox = plate.bbox.in_frame_of(vehicle.bbox)
            px0, py0, px1, py1 = frame_bbox.to_pixels(w, h)
            cv2.rectangle(frame, (px0, py0), (px1, py1), color, 1)

The key difference from pipeline usage: you call in_frame_of() explicitly, and you use to_pixels() on the composed result (not frame_pixels(), since _frame is not set in manual mode).

See rt-demo.py: debug_nested_detection() for a complete runnable example with coordinate values printed at each level.

Internals

ContextVars for nesting state

Two module-level ContextVars in op/_core.py carry nesting state:

_source_image_var: ContextVar[np.ndarray | None] -- The image for the current nesting level
_parent_frame_box_var: ContextVar[BBox | None] -- The parent's frame-level bbox

Why ContextVars instead of FrameContext? ForEach is the ONLY operator that creates nesting, so it should own the nesting state. ContextVars have proper set/reset semantics with try/finally. FrameContext stays at 182 lines and unchanged -- it does not need to know about nesting.

ForEach manages the vars

For each item in the collection, ForEach:

Saves the current var values (for restore after iteration)
Sets _source_image_var to the current image (from var or fc.input)
Sets _parent_frame_box_var to the item's bbox._frame (or bbox)
Runs the inner operators
Restores the saved values (via try/finally)

Items without a bbox attribute set _parent_frame_box_var to None.

CropRoi reads and updates the var

CropRoi in object-mode:

Reads _source_image_var.get() for the source image (falls back to fc.input)
Converts normalized bbox to pixels via bbox.to_pixels(img_w, img_h)
Crops the region
Sets _source_image_var to the cropped image (so nested CropRoi gets the right source)

Result operators populate `_frame`

AxDetection, AxPose, AxSegmentation:

Read _parent_frame_box_var.get()
If set, call bbox.in_frame_of(parent_frame_box) and store in bbox._frame
If None (root level), _frame stays None

AxObb (oriented bounding boxes):

Read _parent_frame_box_var.get()
If set, create an axis-aligned BBox from the OBB center/size, compose via in_frame_of(), and store in _frame_bbox
frame_corners(w, h) maps the rotated corners through _frame_bbox

AxPose (keypoints):

Read _parent_frame_box_var.get()
If set, compute frame-level keypoint coords: frame_x = parent.x0 + kp.x * parent.width
Store in kp._frame_x / kp._frame_y
kp.frame_x(w) / kp.frame_y(h) return frame-level pixel coords

Zero overhead for non-cascade pipelines

ContextVar.get(None) costs ~20ns. ForEach only sets the vars when it runs. Non-cascade pipelines never touch the nesting machinery.

Migration Guide (IMAGE_PIXEL -> NORMALIZED)

Before (IMAGE_PIXEL)

# BBox stored pixel integers
det.bbox.x0  # e.g., 320 (pixels)
cv2.rectangle(frame, (int(det.bbox.x0), int(det.bbox.y0)),
              (int(det.bbox.x1), int(det.bbox.y1)), color, 2)

After (NORMALIZED)

# BBox stores normalized [0,1] floats
det.bbox.x0  # e.g., 0.5 (50% of image width)
x0, y0, x1, y1 = det.bbox.frame_pixels(w, h)
cv2.rectangle(frame, (x0, y0), (x1, y1), color, 2)

Search patterns for code that needs updating

# Direct pixel access (needs frame_pixels/to_pixels)
grep -rn 'int(.*bbox\.x0\|int(.*bbox\.y0\|int(.*box\.x0' .

# BBox with pixel integer values (needs normalization)
grep -rn 'BBox([0-9][0-9]' .

# Old coordinate format checks
grep -rn 'IMAGE_PIXEL' .

Key changes

Before	After
`int(det.bbox.x0)`	`det.bbox.frame_pixels(w, h)[0]`
`BBox(100, 200, 300, 400)`	`BBox(0.15, 0.4, 0.45, 0.8)`
`CoordSpace.IMAGE_PIXEL`	`CoordSpace.NORMALIZED`
Pixel coords in tests	Normalized [0,1] coords in tests

Validation

Run these demos to verify nested coordinate composition works for each task type. All commands use --no-display --save-dir rtout so they work headless; check the output images in rtout/ to confirm correct rendering.

source containerless.sh

# 1. Nested detection (det -> det) -- the baseline
python3 rt-demo.py nested_detection data/coco/images/val2017 --no-display --save-dir rtout
# Expect: "Level 1: N detections, Level 2: M sub-detections"
# Check rtout/: green L1 boxes + cyan L2 boxes at correct frame positions

# 2. Nested pose (det -> pose with keypoints)
python3 rt-demo.py nested_pose data/coco/images/val2017 --no-display --save-dir rtout
# Expect: "L1: N detections, L2: M poses, K keypoints"
# Check rtout/: green L1 boxes + yellow L2 pose boxes + cyan keypoint dots at frame positions

# 3. Nested segmentation (det -> instance segmentation)
python3 rt-demo.py nested_segmentation data/coco/images/val2017 --no-display --save-dir rtout
# Expect: "L1: N detections, L2: M segmented objects"
# Check rtout/: green L1 boxes + yellow L2 segmentation boxes at frame positions

# 4. Nested OBB (obb -> det via AABB crop)
python3 rt-demo.py nested_obb data/P0019_0_2304.png --no-display --save-dir rtout
# Expect: "L1: N OBB, L2: M sub-detections"
# Check rtout/: green L1 rotated boxes + yellow L2 axis-aligned boxes

What to look for in the output images:

L2 boxes should appear INSIDE their L1 parent box -- if L2 boxes are in the wrong position (e.g., top-left corner), the frame coordinate composition is broken
Keypoints should align with visible body parts -- if dots cluster at (0,0), the keypoint frame mapping is not working
L2 boxes should NOT be tiny -- if they are, the coordinates may still be in ROI-local space instead of frame space

Quick Reference​

BBox API​

Storage: Normalized [0,1]​

to_pixels(w, h) -- Convert to pixel integers​

frame_pixels(w, h) -- Frame-level pixel coordinates​

in_frame_of(parent) -- Compose coordinates​

width / height properties​

_frame -- Pre-computed frame coordinates​

Pipeline Patterns (op.seq users)​

Single-level detection​

Two-level cascade (nested detection)​

Three or more levels​

Line-by-Line Patterns (power users)​

Manual two-level cascade​

Internals​

ContextVars for nesting state​

ForEach manages the vars​

CropRoi reads and updates the var​

Result operators populate _frame​

Zero overhead for non-cascade pipelines​

Migration Guide (IMAGE_PIXEL -> NORMALIZED)​

Before (IMAGE_PIXEL)​

After (NORMALIZED)​

Search patterns for code that needs updating​

Key changes​

Validation​