Skip to main content

Hardware Overview

What Metis is, how it works, and what numbers like "TOPS" and "4 cores" mean in practice.


Metis: the AIPU

Metis is an AIPU (AI Processing Unit) designed specifically for neural network inference. Unlike a CPU or GPU, it is purpose-built for the matrix-vector multiplications that dominate neural network computation.

Key specifications

SpecificationValue
ArchitectureIn-Memory Compute (IMC)
AI cores4
PrecisionINT8 (weights and activations)
Clock speed800 MHz (default)
Peak compute~214 TOPS (INT8)
DDR memory4 GB (PCIe / M.2)
Form factorsPCIe, M.2, Compute Board (SBC)
InterfacePCIe Gen 3 x4

What "In-Memory Compute" means

Traditional processors move data between separate memory and compute units. Metis uses In-Memory Compute: the matrix-vector multiply (MVM) happens inside the SRAM arrays where the model weights are stored. This eliminates the memory bandwidth bottleneck that limits GPU efficiency on small-to-medium models.

The practical effect: Metis achieves high throughput on the neural network core while consuming much less power than a GPU for the same task.


The 4 cores

Metis has 4 AI cores that share the on-chip SRAM. Each core can run an independent model instance, or the 4 cores can work together on a single model (sharing memory). This determines your compilation strategy:

  • Independent cores (resources_used: 0.25 per model): Lower latency, more flexible — good for real-time pipelines with a single model type.
  • Shared-memory batch (resources_used: 1.0, aipu_cores_used: 4): Higher throughput on memory-intensive models — good for throughput-first workloads.

See Compiler Configuration for how to configure this.


The host pipeline

Metis does not work alone. A typical inference pipeline involves both the host CPU and the AIPU:

Camera / File

Host: decode compressed video (H.264, etc.)

Host: color convert, resize, letterbox
Host: normalize, quantize to INT8

PCIe transfer → Metis AIPU

AIPU: neural network forward pass (INT8)

PCIe transfer ← results

Host: dequantise, decode output tensors
Host: NMS, bounding box extraction

Application: FrameResult with metadata

Host-side acceleration (VA-API for video decode, OpenCL for pre/post-processing, OpenGL for rendering) can be enabled to reduce CPU load. See inference.py for the relevant flags.


What TOPS means

TOPS = Tera Operations Per Second. For INT8 matrix multiply:

  • 1 TOPS = 10¹² multiply-accumulate operations per second

Metis delivers ~214 INT8 TOPS. This is the theoretical peak for INT8 matrix-vector multiply across all 4 cores at 800 MHz.

Real-world throughput depends on the model: how much of the computation fits in on-chip memory, how much PCIe transfer is needed, and how efficient the pipeline stages are. Use --show-stats with inference.py to measure actual pipeline throughput rather than relying on peak TOPS numbers.


Form factors

Form factorTypical use
PCIeWorkstation, server, edge server
M.2Embedded, compact edge devices
Compute Board (SBC)Standalone evaluation and prototyping; mini-ITX with ARM host onboard

All form factors use the same AIPU die and software stack. Each is available in multiple variants differing by AIPU count (1× or 4×), memory, and cooling (active or passive) — see the product datasheets for specific configurations.


Operating environment

  • Ambient temperature range: −20°C to +70°C
  • Software throttling: configurable (default: disabled — see Thermal Management)
  • Hardware shutdown: 120°C junction temperature (automatic, requires power cycle to recover)

See also