Glossary

A quick reference for technical and non-technical readers.

Products & interfaces

Term	Meaning
Bioptimus SDK	The Python client package. Handles WSI reading, tiling, tissue masking, bulk RNA alignment, and concurrent dispatch, against either the on-premise server or a SageMaker endpoint. See the Bioptimus SDK.
`Inference` (pipeline)	The high-level, one-object pipeline (config + `tissue()`/`embed()`/`predict()`/`run()`/`save_config()`), with cached masks, a structured workspace, and reproducible config. Its `api_url` argument is the on-premise server URL — the same value as `Backbone`’s `base_url`. See the Inference pipeline guide.
`SlideInference`	The lower-level, per-slide inference class (`Backbone` + mask provider + writer) for explicit control.
`Cohort`	A batch manifest of slides (and optional bulk RNA) that is the single source of truth for an experiment. Supports late-binding bulk RNA.
`AWSClient`	The low-level per-tile client for a SageMaker endpoint; injects `model_name`/`mode` into each request.
`OutputFormat`	The output writer format — Zarr (default), HDF5, or NPZ.
Model Server / API	The FastAPI inference server (shipped in the on-premise container) that exposes the REST endpoints (`/api/embed/h1`, `/api/predict/m-optimus`, …) and a Swagger UI at `/docs`. See the API reference.
Backbone	The Bioptimus SDK factory class used to obtain a model client, e.g. `Backbone(Models.H1, backend="remote", base_url=...)` or `backend="aws"`. Takes a `Models` enum member (`Models.H1`, `Models.M_OPTIMUS`); the companion tissue model uses the string `"tissue-seg"`.
Inference endpoint	A running model service that accepts tile requests and returns outputs — an on-premise container or a SageMaker endpoint.
SageMaker	AWS’s managed ML hosting service used for Bioptimus cloud deployment; the Bioptimus SDK reaches it via the `/invocations` dispatch endpoint.
Model package	The on-premise container variant — H1 (H-Optimus + tissue-seg) or M (M-Optimus + tissue-seg).

Model & backend identifiers

The same model is referred to by a product name, a version, an SDK enum, and a server/endpoint id depending on context. This table is the canonical mapping.

Model	Current version	SDK identifier	Server id	Endpoints	Hugging Face
H-Optimus	H-Optimus-1	`Models.H1` (`"h1"`)	`h1`	`/api/embed/h1`	`bioptimus/H-optimus-1`
M-Optimus	M-Optimus-1	`Models.M_OPTIMUS` (`"m-optimus"`)	`m-optimus`	`/api/embed/m-optimus`, `/api/predict/m-optimus`	—
Tissue segmentation	—	`"tissue-seg"` (string)	`tissue-seg`	`/api/predict/tissue-seg`	—

The Bioptimus SDK reaches a deployment through a backend argument. The product term and the code value differ:

Platform (product term)	SDK `backend`	Key connection args
On-premise container	`"remote"`	`base_url` (`api_url` on the `Inference` pipeline)
AWS SageMaker	`"aws"`	`endpoint_name`, `region_name`
In-process (local GPU, no server)	`"local"`	`model_dir` / `checkpoint`

Models & outputs

Term	Meaning
Foundation model (FM)	A large model pre-trained on vast data that produces general-purpose representations reused across many downstream tasks.
H-Optimus	Bioptimus’s histology foundation model; outputs tile embeddings. Current version: H-Optimus-1.
M-Optimus	Bioptimus’s multimodal model; predicts spatial gene expression from histology (and optional bulk RNA), and also outputs embeddings.
Embedding	A numeric feature vector summarizing a tile. H-Optimus returns the CLS token of size 1536.
1536-d	The dimensionality of H-Optimus and M-Optimus tile embeddings — a 1536-number feature vector per tile.
CLS token	The transformer’s summary output used as the tile embedding. H-Optimus returns the 1536-d CLS token; M-Optimus returns its MLP output. The embedding type is fixed (not user-selectable).
Spatial gene expression	Gene expression mapped to locations across a slide. M-Optimus predicts this from H&E tiles.
Tissue segmentation (`tissue-seg`)	A companion model (bundled in both packages) that produces a binary tissue/background mask.

Imaging & data

Term	Meaning
WSI (whole slide image)	A digitized microscope slide — often gigapixel-scale — split into tiles for processing.
H&E	Hematoxylin and eosin, the standard tissue stain in routine pathology.
Tile (patch)	A fixed-size crop of a WSI. Embeddings use 224×224; tissue segmentation uses 512×512.
MPP (microns per pixel)	Physical image resolution. Embeddings use 0.5 µm/px; tissue segmentation uses 8.0 µm/px.
Bulk RNA-seq	Aggregate (non-spatial) gene expression for a sample. Optional input to M-Optimus prediction.
Ensembl gene ID	Standardized gene identifier (e.g. `ENSG00000000003`) used in bulk RNA inputs and gene-set metadata.
Zarr / HDF5 / NPZ	Output file formats the Bioptimus SDK writes per-slide results to.

Infrastructure

Term	Meaning
CUDA Compute Capability	An NVIDIA GPU architecture version. Compute Capability 8.6 on x86-64 runs the container out of the box; other architectures require recompilation.
NVIDIA Container Toolkit	Enables GPU access inside Docker containers (`--gpus all`).
`ml.g5.xlarge`	The recommended SageMaker instance (a single 24 GB A10G-class GPU).

Benchmarks

Term	Meaning
PathBench	A multi-task, multi-organ pathology FM benchmark (HKUST). H-Optimus-1 ranks first overall.
HEST	A benchmark for predicting gene expression from histology (Harvard; Jaume et al. 2025), scored by Pearson correlation.
MIL (multiple instance learning)	A method that aggregates many tile-level features into a slide-level prediction.

Get Started

Models

Use Cases

Resources

Products & interfaces

Model & backend identifiers

Models & outputs

Imaging & data

Infrastructure

Benchmarks

​Products & interfaces

​Model & backend identifiers

​Models & outputs

​Imaging & data

​Infrastructure

​Benchmarks

Products & interfaces

Model & backend identifiers

Models & outputs

Imaging & data

Infrastructure

Benchmarks