Skip to main content
Model specification types. Defines ModelSpec which encapsulates a model’s identity, input tile requirements, normalisation parameters, output contract, and compute hints. Reuses TileSpec for tile geometry so the same spec drives both the extraction pipeline and the model’s expectations. Example:
from bioptimus.extraction.wsi.types import TileSpec
from bioptimus.io.wsi.types import MPP, MeasurementUnit
from bioptimus.models.types import ModelSpec

spec = ModelSpec(
    model_name="h0-mini",
    version="1.0.0",
    tile_spec=TileSpec(
        size=(224, 224),
        stride=(224, 224),
        resolution=MPP(0.5),
        unit=MeasurementUnit.PIXELS,
    ),
    embedding_dim=1536,
    output_type="cls+patch_mean",
    num_prefix_tokens=1,
    patch_size=14,
    mean=(0.707, 0.579, 0.704),
    std=(0.212, 0.230, 0.178),
    weights_source="bioptimus/H0-mini",
)

Models

class Models(str, Enum)
Known model identifiers. Values correspond to the model_name field in each YAML config under bioptimus/models/configs/.

ModelSpec

@dataclass(frozen=True)
class ModelSpec()
Describes a backbone model’s input and output contract. Combines tile extraction requirements (via TileSpec) with preprocessing, architecture, and output metadata so that data pipelines can prepare inputs and interpret outputs correctly without manual configuration. This is framework-agnostic — it contains only plain Python types and can be consumed by PyTorch, TensorFlow, JAX, or any other framework. Groups: Identity & provenance — who is this model? Input / preprocessing — what does the model expect? Architecture — structural hints for generic code. Output — how to interpret the raw model output. Compute — precision & hardware hints.
model_name
Unique identifier / registry key (e.g. "h0-mini").
version
Model version string (e.g. "1.0.0"). Ensures embeddings extracted with v1 are not mixed with v2.
weights_source
HuggingFace repo, URL, or local path to weights (e.g. "bioptimus/H0-mini"). None = no auto-download.
license
SPDX identifier or short description (e.g. "proprietary", "apache-2.0").
tile_spec
Tile geometry the model expects (size, stride, resolution, measurement unit). Directly reusable by TileExtractor.
num_channels
Number of input image channels (default 3 for RGB).
mean
Per-channel normalisation mean (channel order matches input).
std
Per-channel normalisation std.
interpolation
Resize interpolation mode string (e.g. "bicubic", "bilinear").
antialias
Whether the resize operation should use antialiasing.
color_space
Expected colour space of the input image ("RGB", "BGR", "HED").
stain_normalization
Stain normalisation method applied before the model, or None for no stain normalisation. (e.g. "macenko", "reinhard", "vahadane").
architecture
Architecture family ("vit", "swin", "resnet", "convnext", …).
patch_size
ViT patch size (e.g. 14, 16). Determines the number of output tokens = (tile_size / patch_size)². None for non-ViT architectures.
num_prefix_tokens
Number of non-spatial prefix tokens before patch tokens in the output sequence (CLS, register tokens, …). E.g. DINOv2-reg = 5, most ViTs = 1.
embedding_dim
Dimensionality of the final output feature vector (after any post-processing like CLS + mean pooling).
output_type
How the raw model output is consumed: "cls" | "patch_mean" | "cls+patch_mean" | "dense" | "token_sequence".
output_spatial_dims
For dense / segmentation models that output a spatial feature map, e.g. (16, 16). None for pooled-output models.
precision
Recommended inference precision ("fp32", "fp16", "bf16").