- H-Optimus turns a tile into a 1536-d feature vector (embedding). You build downstream models on top of those features.
- M-Optimus produces the same 1536-d embeddings and predicts spot-level spatial gene expression from an H&E tile — optionally conditioned on bulk RNA — giving a molecular readout without a spatial assay.
- Tissue segmentation is a companion model (bundled with both) that masks tissue from background.
Side by side
| H-Optimus | M-Optimus | |
|---|---|---|
| Primary output | Tile embedding (1536-d) | Tile embedding (1536-d) + spatial gene-expression prediction (spot-level) |
| Morphology features | Yes | Yes |
| Extra input | None | Optional bulk RNA (Ensembl-ID counts) |
| Answers | ”What does this tissue look like, as features?" | "What does it look like and what is the gene expression here?” |
| Endpoints | /api/embed/h1 | /api/embed/m-optimus, /api/predict/m-optimus |
| Hugging Face (academic) | Yes | No |
| Typical use | Biomarker/morphology models from H&E | Biomarker/morphology models from H&E; spatial transcriptomics; multimodal discovery |
Decision guide
I want features from H&E to train my own models
Both H-Optimus and M-Optimus produce 1536-d embeddings — M-Optimus generally gives stronger features, while H-Optimus is the lighter, feature-only option.
I want gene expression predicted from H&E
Use M-Optimus prediction (add bulk RNA for multimodal).
I need to remove background first
Use tissue segmentation before either model.
I'm an academic, H&E features only
Load H-Optimus-1 from Hugging Face.

