
H-optimus-1 is a 1.1 billion parameter vision transformer trained with self-supervised learning on an extensive proprietary dataset. It consists of billions of histology images, sampled from over 1 million slides from more than 800,000 patients.
The model can be accessed for academic research purposes here.
A crucial component in developing a strong FM is the quality and diversity of the dataset used for training the model.
H-optimus-1 was trained on an extensive collection of over 1 million H&E-stained histology slides of more than 50 organs digitized with 3 scanner types across more than 4,000 clinical centers.
Importantly, the dataset used to train H-optimus-1 is, to the best of our knowledge, the most patient-diverse dataset ever used to train a pathology FM, including histology slides of more than 800,000 patients[2] with various diseases. This patient diversity enables the model to learn from various histology patterns and diseases during training, ultimately resulting in rich and generalizable features that are useful for solving complex tasks.
H-optimus-1 was benchmarked on 13 downstream tasks encompassing 15 datasets at both the slide level and tile level, including the HEST benchmark [Jaume et al. 2025].
This task consists of predicting gene expression from histology images in nine different organs. More details about this benchmark can be found here.
The metric used is Pearson’s correlation coefficient (higher is better). The models are ordered by decreasing average performance. Standard deviations are reported in parentheses. Bold indicates the highest score in a column.
Table 1:
| Average | IDC | PRAD | PAAD | SKCM | COAD | READ | SCCRCC | LUAD | LYMPH-IDC | |
|---|---|---|---|---|---|---|---|---|---|---|
| H-optimus-1 | 0.422 (0.019) | 0.602 (0.081) | 0.378 (0.012) | 0.496 (0.051) | 0.659 (0.048) | 0.32 (0.016) | 0.242 (0.015) | 0.245 (0.125) | 0.578 (0.012) | 0.277 (0.039) |
| H-optimus-0 | 0.413 (0.021) | 0.598 (0.085) | 0.385 (0.0) | 0.491 (0.04) | 0.645 (0.062) | 0.309 (0.0) | 0.222 (0.048) | 0.255 (0.135) | 0.559 (0.032) | 0.259 (0.04) |
| UNI2-h | 0.413 (0.02) | 0.59 (0.081) | 0.357 (0.049) | 0.50 (0.04) | 0.659 (0.017) | 0.301 (0.004) | 0.223 (0.038) | 0.261 (0.132) | 0.558 (0.014) | 0.272 (0.04) |
| Virchow2 | 0.396 (0.02) | 0.592 (0.08) | 0.348 (0.031) | 0.472 (0.065) | 0.619 (0.028) | 0.259 (0.016) | 0.209 (0.05) | 0.257 (0.123) | 0.553 (0.017) | 0.255 (0.026) |
| Prov-GigaPath | 0.386 (0.02) | 0.551 (0.073) | 0.37 (0.022) | 0.475 (0.048) | 0.562 (0.061) | 0.299 (0.021) | 0.196 (0.062) | 0.232 (0.115) | 0.541 (0.036) | 0.25 (0.05) |
| UNI | 0.385 (0.02) | 0.574 (0.08) | 0.294 (0.09) | 0.481 (0.07) | 0.635 (0.04) | 0.262 (0.03) | 0.184 (0.05) | 0.238 (0.12) | 0.546 (0.02) | 0.256 (0.04) |
| GPFM | 0.378 (0.024) | 0.566 (0.08) | 0.342 (0.078) | 0.46 (0.062) | 0.589 (0.048) | 0.248 (0.024) | 0.164 (0.071) | 0.253 (0.138) | 0.547 (0.014) | 0.237 (0.041) |
| Phikon-v2 | 0.373 (0.021) | 0.541 (0.077) | 0.354 (0.015) | 0.445 (0.066) | 0.555 (0.036) | 0.25 (0.018) | 0.175 (0.059) | 0.257 (0.14) | 0.542 (0.011 | 0.244 (0.046) |
| CONCH | 0.37 (0.019) | 0.537 (0.084) | 0.357 (0.004) | 0.438 (0.065) | 0.572 (0.041) | 0.27 (0.006) | 0.161 (0.055) | 0.206 (0.108) | 0.538 (0.004) | 0.254 (0.039) |
We have benchmarked H-optimus-1 and other leading pathology FMs on a diverse set of slide-level downstream tasks using multiple instance learning:
The metric used is the area under the ROC curve (higher is better). More details about the evaluation methodology can be found in the ‘Slide-level tasks evaluation methodology’ section below. The models are ordered by decreasing average performance and standard deviations are reported in parentheses. Bold indicates the highest score in a row.
Table 2:
| Task | Dataset | H-optimus-1 | UNI2-h | Virchow2 | H-optimus-0 | Prov-GigaPath | GPFM | UNI | Phikon-v2 | CONCH |
|---|---|---|---|---|---|---|---|---|---|---|
| Average | 0.856 | 0.851 | 0.843 | 0.835 | 0.834 | 0.824 | 0.823 | 0.813 | 0.786 | |
| META-BC | CAMELYON16 Test | 0.996 (0.001) | 0.996 (0.003) | 0.976 (0.002) | 0.998 (0.001) | 0.985 (0.001) | 0.992 (0.002) | 0.981 (0.006) | 0.985 (0.001) | 0.974 (0.002) |
| META-BC | SLN-Breast | 0.959 (0.003) | 0.984 (0.002) | 0.985 (0.002) | 0.953 (0.002) | 0.977 (0.001) | 0.944 (0.014) | 0.963 (0.005) | 0.938 (0.008) | 0.959 (0.001) |
| MSI-GC | TCGA-STAD Test | 0.915 (0.003) | 0.903 (0.004) | 0.923 (0.014) | 0.899 (0.006) | 0.863 (0.004) | 0.892 (0.007) | 0.907 (0.008) | 0.912 (0.003) | 0.891 (0.012) |
| MSI-CRC | PAIP2020 | 0.984 (0.003) | 0.971 (0.001) | 0.988 (0.002) | 0.974 (0.002) | 0.970 (0.005) | 0.974 (0.001) | 0.966 (0.003) | 0.972 (0.002) | 0.894 (0.015) |
| MSI-CRC | FR-CRC-Bio | 0.917 (0.002) | 0.894 (0.003) | 0.887 (0.003) | 0.888 (0.009) | 0.876 (0.008) | 0.837 (0.003) | 0.829 (0.005) | 0.865 (0.002) | 0.838 (0.007) |
| MSI-CRC | CPTAC-COAD | 0.957 (0.003) | 0.953 (0.003) | 0.959 (0.004) | 0.923 (0.010) | 0.947 (0.015) | 0.913 (0.001) | 0.928 (0.004) | 0.929 (0.003) | 0.882 (0.006) |
| MSI-CRC | SURGEN | 0.914 (0.003) | 0.899 (0.002) | 0.896 (0.003) | 0.903 (0.013) | 0.913 (0.002) | 0.865 (0.002) | 0.857 (0.006) | 0.862 (0.006) | 0.796 (0.004) |
| KRAS-CRC | CPTAC-COAD | 0.625 (0.005) | 0.649 (0.008) | 0.706 (0.006) | 0.592 (0.010) | 0.581 (0.011) | 0.688 (0.004) | 0.687 (0.010) | 0.659 (0.015) | 0.647 (0.014) |
| KRAS-CRC | SURGEN | 0.683 (0.006) | 0.675 (0.009) | 0.654 (0.005) | 0.662 (0.002) | 0.692 (0.007) | 0.638 (0.003) | 0.664 (0.004) | 0.612 (0.007) | 0.631 (0.011) |
| BRAF-CRC | CPTAC-COAD | 0.722 (0.006) | 0.758 (0.007) | 0.800 (0.011) | 0.693 (0.021) | 0.743 (0.007) | 0.813 (0.006) | 0.766 (0.015) | 0.780 (0.005) | 0.740 (0.014) |
| BRAF-CRC | SURGEN | 0.823 (0.001) | 0.827 (0.01) | 0.760 (0.023) | 0.786 (0.007) | 0.809 (0.005) | 0.780 (0.004) | 0.799 (0.013) | 0.707 (0.003) | 0.730 (0.017) |
| HER2-BC | YALE-HER2 | 0.899 (0.011) | 0.87 (0.004) | 0.826 (0.016) | 0.899 (0.009) | 0.849 (0.011) | 0.863 (0.012) | 0.825 (0.013) | 0.742 (0.030) | 0.801 (0.020) |
| HER2-BC | IMPRESS | 0.903 (0.018) | 0.853 (0.008) | 0.810 (0.041) | 0.888 (0.02) | 0.860 (0.009) | 0.625 (0.034) | 0.745 (0.033) | 0.682 (0.009) | 0.608 (0.012) |
| HER2-BC | BCNB | 0.683 (0.007) | 0.674 (0.008) | 0.692 (0.004) | 0.656 (0.005) | 0.680 (0.01) | 0.692 (0.005) | 0.677 (0.008) | 0.673 (0.003) | 0.650 (0.015) |
| ER-BC | IMPRESS | 0.836 (0.007) | 0.835 (0.005) | 0.834 (0.008) | 0.834 (0.012) | 0.821 (0.006) | 0.839 (0.004) | 0.824 (0.008) | 0.860 (0.005) | 0.759 (0.006) |
| ER-BC | BCNB | 0.903 (0.005) | 0.902 (0.002) | 0.847 (0.008) | 0.854 (0.007) | 0.848 (0.005) | 0.853 (0.005) | 0.861 (0.003) | 0.835 (0.002) | 0.814 (0.002) |
| PR-BC | IMPRESS | 0.831 (0.014) | 0.830 (0.005) | 0.834 (0.009) | 0.867 (0.004) | 0.814 (0.014) | 0.813 (0.003) | 0.761 (0.034) | 0.821 (0.006) | 0.767 (0.007) |
| PR-BC | BCNB | 0.854 (0.008) | 0.853 (0.002) | 0.803 (0.01) | 0.769 (0.022) | 0.793 (0.005) | 0.814 (0.008) | 0.777 (0.029) | 0.805 (0.004) | 0.764 (0.006) |
We have also benchmarked the different pathology FMs on tile-level tasks using linear probing. These tasks are:
The metric used is the top-1 accuracy (higher is better). More details about the evaluation methodology can be found in the ‘Tile-level tasks evaluation methodology’ section below. The models are ordered by decreasing average performance and standard deviations are reported in parentheses. Bold indicates the highest score in a column.
Table 3:
| Text | Average | MHIST | TCGA-UNIFORM | CAM17-WILDS | CRC-NO-NORM |
|---|---|---|---|---|---|
| H-optimus-1 | 0.908 | 0.835 (0.001) | 0.851 (0.000) | 0.991 (0.000) | 0.956 (0.002) |
| UNI2-h | 0.904 | 0.826 (0.001) | 0.831 (0.000) | 0.988 (0.000) | 0.969 (0.001) |
| H-optimus-0 | 0.904 | 0.848 (0.001) | 0.835 (0.001) | 0.986 (0.001) | 0.945 (0.012) |
| Virchow2 | 0.9 | 0.851 (0.001) | 0.830 (0.000) | 0.986 (0.001) | 0.933 (0.011) |
| GPFM | 0.895 | 0.824 (0.002) | 0.827 (0.001) | 0.972 (0.004) | 0.955 (0.004) |
| Prov-GigaPath | 0.887 | 0.831 (0.003) | 0.804 (0.000) | 0.968 (0.003) | 0.945 (0.003) |
| UNI | 0.883 | 0.840 (0.002) | 0.805 (0.001) | 0.980 (0.001) | 0.906 (0.015) |
| Phikon-v2 | 0.877 | 0.797 (0.001) | 0.794 (0.000) | 0.972 (0.001) | 0.946 (0.002) |
| CONCH | 0.844 | 0.783 (0.003) | 0.679 (0.000) | 0.972 (0.000) | 0.940 (0.000) |
We list in the table below the characteristics of the models benchmarked. For each model, the [CLS] token embedding was used for the downstream evaluations.
Table 4:
| Model | Authors | Model architecture(number of parameters) | Number of histology slides used for pre-training |
|---|---|---|---|
| H-optimus-1 | Bioptimus | ViT-g/14 (1.1B) | 1M+ |
| UNI2-h | Mahmood Lab | Modified ViT-H/14 (681M) | 350k+ |
| UNI | Mahmood Lab [Chen et al. 2024] | ViT-L/16 (307M) | 100k |
| H-optimus-0 | Bioptimus [Saillard et al. 2024] | ViT-g/14 (1.1B) | 500k+ |
| Virchow2 | Paige / Microsoft Research[Zimmerman et al. 2024] | ViT-H/14 (632M) | 3.1M |
| GPFM | Hong Kong University of Science and Technology[Ma et al. 2024] | ViT-L/14 (307M) | 86k |
| Prov-GigaPath | Microsoft Research [Xu et al. 2024] | ViT-g/16 (1.1B) | 171k |
| Phikon-v2 | Owkin [Filiot et al. 2024] | ViT-L/16 (307M) | 58k |
| CONCH | Mahmood Lab [Lu et al. 2024] | Modified ViT-B/16 (90M) | 21k slides & 1.2M image-text pairs |
We list in the table below the different tasks defined for the slide evaluation benchmark, and the datasets used to define these tasks.
FR-CRC-Bio is an internal dataset consisting of 727 CRC biopsies from multiple French hospitals. TCGA datasets were retrieved from https://portal.gdc.cancer.gov/.