Skip to main content
A quick reference for technical and non-technical readers.

Products & interfaces

TermMeaning
SDK (bioptimus)The Python client package. Handles WSI reading, tiling, tissue masking, bulk RNA alignment, and concurrent dispatch, against either the on-premise server or a SageMaker endpoint. See the SDK guide.
Inference (facade)The high-level, one-object pipeline (config + tissue()/embed()/run()/save()), with cached masks, a structured workspace, and reproducible config. See the Inference facade guide.
SlideInferenceThe lower-level, per-slide inference class (Backbone + mask provider + writer) for explicit control.
CohortA batch manifest of slides (and optional bulk RNA) that is the single source of truth for an experiment. Supports late-binding bulk RNA.
AWSClientThe low-level per-tile client for a SageMaker endpoint; injects model_name/mode into each request.
OutputFormatThe output writer format — Zarr (default), HDF5, or NPZ.
Model Server / APIThe FastAPI inference server (shipped in the on-premise container) that exposes the REST endpoints (/api/embed/h1, /api/predict/m-optimus, …) and a Swagger UI at /docs. See the API reference.
BackboneThe SDK factory class used to obtain a model client, e.g. Backbone(Models.H1, backend="remote", base_url=...) or backend="aws". Takes a Models enum member (Models.H1, Models.M_OPTIMUS); the companion tissue model uses the string "tissue-seg".
Inference endpointA running model service that accepts tile requests and returns outputs — an on-premise container or a SageMaker endpoint.
SageMakerAWS’s managed ML hosting service used for Bioptimus cloud deployment; the SDK reaches it via the /invocations dispatch endpoint.
Model packageThe on-premise container variant — H1 (H-Optimus + tissue-seg) or M (M-Optimus + tissue-seg).

Models & outputs

TermMeaning
Foundation model (FM)A large model pre-trained on vast data that produces general-purpose representations reused across many downstream tasks.
H-OptimusBioptimus’s histology foundation model; outputs tile embeddings. Current version: H-Optimus-1.
M-OptimusBioptimus’s multimodal model; predicts spatial gene expression from histology (and optional bulk RNA), and also outputs embeddings.
EmbeddingA numeric feature vector summarizing a tile. H-Optimus returns the CLS token of size 1536.
1536-dThe dimensionality of H-Optimus and M-Optimus tile embeddings — a 1536-number feature vector per tile.
CLS tokenThe transformer’s summary output used as the tile embedding. H-Optimus returns the 1536-d CLS token; M-Optimus returns its MLP output. The embedding type is fixed (not user-selectable).
Spatial gene expressionGene expression mapped to locations across a slide. M-Optimus predicts this from H&E tiles.
Tissue segmentation (tissue-seg)A companion model (bundled in both packages) that produces a binary tissue/background mask.

Imaging & data

TermMeaning
WSI (whole slide image)A digitized microscope slide — often gigapixel-scale — split into tiles for processing.
H&EHematoxylin and eosin, the standard tissue stain in routine pathology.
Tile (patch)A fixed-size crop of a WSI. Embeddings use 224×224; tissue segmentation uses 512×512.
MPP (microns per pixel)Physical image resolution. Embeddings use 0.5 µm/px; tissue segmentation uses 8.0 µm/px.
Bulk RNA-seqAggregate (non-spatial) gene expression for a sample. Optional input to M-Optimus prediction.
Ensembl gene IDStandardized gene identifier (e.g. ENSG00000000003) used in bulk RNA inputs and gene-set metadata.
Zarr / HDF5 / NPZOutput file formats the SDK writes per-slide results to.

Infrastructure

TermMeaning
CUDA Compute CapabilityAn NVIDIA GPU architecture version. Compute Capability 8.6 on x86-64 runs the container out of the box; other architectures require recompilation.
NVIDIA Container ToolkitEnables GPU access inside Docker containers (--gpus all).
ml.g5.xlargeThe recommended SageMaker instance (a single 24 GB A10G-class GPU).

Benchmarks

TermMeaning
PathBenchA multi-task, multi-organ pathology FM benchmark (HKUST). H-Optimus-1 ranks first overall.
HESTA benchmark for predicting gene expression from histology (Harvard; Jaume et al. 2025), scored by Pearson correlation.
MIL (multiple instance learning)A method that aggregates many tile-level features into a slide-level prediction.