types

Model specification types. Defines ModelSpec which encapsulates a model’s identity, input tile requirements, normalisation parameters, output contract, and compute hints. Reuses TileSpec for tile geometry so the same spec drives both the extraction pipeline and the model’s expectations. Example:

from bioptimus.extraction.wsi.types import TileSpec
from bioptimus.io.wsi.types import MPP, MeasurementUnit
from bioptimus.models.types import ModelSpec

spec = ModelSpec(
    model_name="h1",
    version="1.0.0",
    tile_spec=TileSpec(
        size=(224, 224),
        stride=(224, 224),
        resolution=MPP(0.5),
        unit=MeasurementUnit.PIXELS,
    ),
    embedding_dim=1536,
    output_type="cls+patch_mean",
    num_prefix_tokens=1,
    patch_size=14,
    mean=(0.707, 0.579, 0.704),
    std=(0.212, 0.230, 0.178),
    weights_source="bioptimus/H1",
)

Models

class Models(str, Enum)

Known model identifiers. Values correspond to the model_name field in each YAML config under bioptimus/models/configs/.

ModelSpec

@dataclass(frozen=True)
class ModelSpec()

Describes a backbone model’s input and output contract. Combines tile extraction requirements (via TileSpec) with preprocessing, architecture, and output metadata so that data pipelines can prepare inputs and interpret outputs correctly without manual configuration. This is framework-agnostic — it contains only plain Python types and can be consumed by PyTorch, TensorFlow, JAX, or any other framework. Groups: Identity & provenance — who is this model? Input / preprocessing — what does the model expect? Architecture — structural hints for generic code. Output — how to interpret the raw model output. Compute — precision & hardware hints.

Unique identifier / registry key (e.g. "h1").

Model version string (e.g. "1.0.0"). Ensures embeddings extracted with v1 are not mixed with v2.

HuggingFace repo, URL, or local path to weights (e.g. "bioptimus/H1"). None = no auto-download.

SPDX identifier or short description (e.g. "proprietary", "apache-2.0").

Tile geometry the model expects (size, stride, resolution, measurement unit). Directly reusable by TileExtractor.

Number of input image channels (default 3 for RGB).

Per-channel normalisation mean (channel order matches input).

Per-channel normalisation std.

Resize interpolation mode string (e.g. "bicubic", "bilinear").

Whether the resize operation should use antialiasing.

Expected colour space of the input image ("RGB", "BGR", "HED").

Stain normalisation method applied before the model, or None for no stain normalisation. (e.g. "macenko", "reinhard", "vahadane").

Architecture family ("vit", "swin", "resnet", "convnext", …).

ViT patch size (e.g. 14, 16). Determines the number of output tokens = (tile_size / patch_size)². None for non-ViT architectures.

Number of non-spatial prefix tokens before patch tokens in the output sequence (CLS, register tokens, …). E.g. DINOv2-reg = 5, most ViTs = 1.

Dimensionality of the final output feature vector (after any post-processing like CLS + mean pooling).

How the raw model output is consumed: "cls" | "patch_mean" | "cls+patch_mean" | "dense" | "token_sequence".

For dense / segmentation models that output a spatial feature map, e.g. (16, 16). None for pooled-output models.

Recommended inference precision ("fp32", "fp16", "bf16").

Overview

Get Started

Preprocessing

Workflows

Reference

Models

ModelSpec

​Models

​ModelSpec

Models

ModelSpec