H1_architecture.png

H-optimus-1 is a 1.1 billion parameter vision transformer trained with self-supervised learning on an extensive proprietary dataset. It consists of billions of histology images, sampled from over 1 million slides from more than 800,000 patients.

The model can be accessed for academic research purposes here.

H-optimus-1 pre-training dataset

A crucial component in developing a strong FM is the quality and diversity of the dataset used for training the model.

H-optimus-1 was trained on an extensive collection of over 1 million H&E-stained histology slides of more than 50 organs digitized with 3 scanner types across more than 4,000 clinical centers.

Importantly, the dataset used to train H-optimus-1 is, to the best of our knowledge, the most patient-diverse dataset ever used to train a pathology FM, including histology slides of more than 800,000 patients[2] with various diseases. This patient diversity enables the model to learn from various histology patterns and diseases during training, ultimately resulting in rich and generalizable features that are useful for solving complex tasks.

Model evaluation

Results

H-optimus-1 was benchmarked on 13 downstream tasks encompassing 15 datasets at both the slide level and tile level, including the HEST benchmark [Jaume et al. 2025].

HEST

This task consists of predicting gene expression from histology images in nine different organs. More details about this benchmark can be found here.

The metric used is Pearson’s correlation coefficient (higher is better). The models are ordered by decreasing average performance. Standard deviations are reported in parentheses. Bold indicates the highest score in a column.

Table 1:

Average IDC PRAD PAAD SKCM COAD READ SCCRCC LUAD LYMPH-IDC
H-optimus-1 0.422 (0.019) 0.602 (0.081) 0.378 (0.012) 0.496 (0.051) 0.659 (0.048) 0.32 (0.016) 0.242 (0.015) 0.245 (0.125) 0.578 (0.012) 0.277 (0.039)
H-optimus-0 0.413 (0.021) 0.598 (0.085) 0.385 (0.0) 0.491 (0.04) 0.645 (0.062) 0.309 (0.0) 0.222 (0.048) 0.255 (0.135) 0.559 (0.032) 0.259 (0.04)
UNI2-h 0.413 (0.02) 0.59 (0.081) 0.357 (0.049) 0.50 (0.04) 0.659 (0.017) 0.301 (0.004) 0.223 (0.038) 0.261 (0.132) 0.558 (0.014) 0.272 (0.04)
Virchow2 0.396 (0.02) 0.592 (0.08) 0.348 (0.031) 0.472 (0.065) 0.619 (0.028) 0.259 (0.016) 0.209 (0.05) 0.257 (0.123) 0.553 (0.017) 0.255 (0.026)
Prov-GigaPath 0.386 (0.02) 0.551 (0.073) 0.37 (0.022) 0.475 (0.048) 0.562 (0.061) 0.299 (0.021) 0.196 (0.062) 0.232 (0.115) 0.541 (0.036) 0.25 (0.05)
UNI 0.385 (0.02) 0.574 (0.08) 0.294 (0.09) 0.481 (0.07) 0.635 (0.04) 0.262 (0.03) 0.184 (0.05) 0.238 (0.12) 0.546 (0.02) 0.256 (0.04)
GPFM 0.378 (0.024) 0.566 (0.08) 0.342 (0.078) 0.46 (0.062) 0.589 (0.048) 0.248 (0.024) 0.164 (0.071) 0.253 (0.138) 0.547 (0.014) 0.237 (0.041)
Phikon-v2 0.373 (0.021) 0.541 (0.077) 0.354 (0.015) 0.445 (0.066) 0.555 (0.036) 0.25 (0.018) 0.175 (0.059) 0.257 (0.14) 0.542 (0.011 0.244 (0.046)
CONCH 0.37 (0.019) 0.537 (0.084) 0.357 (0.004) 0.438 (0.065) 0.572 (0.041) 0.27 (0.006) 0.161 (0.055) 0.206 (0.108) 0.538 (0.004) 0.254 (0.039)

Slide-level tasks

We have benchmarked H-optimus-1 and other leading pathology FMs on a diverse set of slide-level downstream tasks using multiple instance learning:

The metric used is the area under the ROC curve (higher is better). More details about the evaluation methodology can be found in the ‘Slide-level tasks evaluation methodology’ section below. The models are ordered by decreasing average performance and standard deviations are reported in parentheses. Bold indicates the highest score in a row.

Table 2:

Task Dataset H-optimus-1 UNI2-h Virchow2 H-optimus-0 Prov-GigaPath GPFM UNI Phikon-v2 CONCH
Average 0.856 0.851 0.843 0.835 0.834 0.824 0.823 0.813 0.786
META-BC CAMELYON16 Test 0.996 (0.001) 0.996 (0.003) 0.976 (0.002) 0.998 (0.001) 0.985 (0.001) 0.992 (0.002) 0.981 (0.006) 0.985 (0.001) 0.974 (0.002)
META-BC SLN-Breast 0.959 (0.003) 0.984 (0.002) 0.985 (0.002) 0.953 (0.002) 0.977 (0.001) 0.944 (0.014) 0.963 (0.005) 0.938 (0.008) 0.959 (0.001)
MSI-GC TCGA-STAD Test 0.915 (0.003) 0.903 (0.004) 0.923 (0.014) 0.899 (0.006) 0.863 (0.004) 0.892 (0.007) 0.907 (0.008) 0.912 (0.003) 0.891 (0.012)
MSI-CRC PAIP2020 0.984 (0.003) 0.971 (0.001) 0.988 (0.002) 0.974 (0.002) 0.970 (0.005) 0.974 (0.001) 0.966 (0.003) 0.972 (0.002) 0.894 (0.015)
MSI-CRC FR-CRC-Bio 0.917 (0.002) 0.894 (0.003) 0.887 (0.003) 0.888 (0.009) 0.876 (0.008) 0.837 (0.003) 0.829 (0.005) 0.865 (0.002) 0.838 (0.007)
MSI-CRC CPTAC-COAD 0.957 (0.003) 0.953 (0.003) 0.959 (0.004) 0.923 (0.010) 0.947 (0.015) 0.913 (0.001) 0.928 (0.004) 0.929 (0.003) 0.882 (0.006)
MSI-CRC SURGEN 0.914 (0.003) 0.899 (0.002) 0.896 (0.003) 0.903 (0.013) 0.913 (0.002) 0.865 (0.002) 0.857 (0.006) 0.862 (0.006) 0.796 (0.004)
KRAS-CRC CPTAC-COAD 0.625 (0.005) 0.649 (0.008) 0.706 (0.006) 0.592 (0.010) 0.581 (0.011) 0.688 (0.004) 0.687 (0.010) 0.659 (0.015) 0.647 (0.014)
KRAS-CRC SURGEN 0.683 (0.006) 0.675 (0.009) 0.654 (0.005) 0.662 (0.002) 0.692 (0.007) 0.638 (0.003) 0.664 (0.004) 0.612 (0.007) 0.631 (0.011)
BRAF-CRC CPTAC-COAD 0.722 (0.006) 0.758 (0.007) 0.800 (0.011) 0.693 (0.021) 0.743 (0.007) 0.813 (0.006) 0.766 (0.015) 0.780 (0.005) 0.740 (0.014)
BRAF-CRC SURGEN 0.823 (0.001) 0.827 (0.01) 0.760 (0.023) 0.786 (0.007) 0.809 (0.005) 0.780 (0.004) 0.799 (0.013) 0.707 (0.003) 0.730 (0.017)
HER2-BC YALE-HER2 0.899 (0.011) 0.87 (0.004) 0.826 (0.016) 0.899 (0.009) 0.849 (0.011) 0.863 (0.012) 0.825 (0.013) 0.742 (0.030) 0.801 (0.020)
HER2-BC IMPRESS 0.903 (0.018) 0.853 (0.008) 0.810 (0.041) 0.888 (0.02) 0.860 (0.009) 0.625 (0.034) 0.745 (0.033) 0.682 (0.009) 0.608 (0.012)
HER2-BC BCNB 0.683 (0.007) 0.674 (0.008) 0.692 (0.004) 0.656 (0.005) 0.680 (0.01) 0.692 (0.005) 0.677 (0.008) 0.673 (0.003) 0.650 (0.015)
ER-BC IMPRESS 0.836 (0.007) 0.835 (0.005) 0.834 (0.008) 0.834 (0.012) 0.821 (0.006) 0.839 (0.004) 0.824 (0.008) 0.860 (0.005) 0.759 (0.006)
ER-BC BCNB 0.903 (0.005) 0.902 (0.002) 0.847 (0.008) 0.854 (0.007) 0.848 (0.005) 0.853 (0.005) 0.861 (0.003) 0.835 (0.002) 0.814 (0.002)
PR-BC IMPRESS 0.831 (0.014) 0.830 (0.005) 0.834 (0.009) 0.867 (0.004) 0.814 (0.014) 0.813 (0.003) 0.761 (0.034) 0.821 (0.006) 0.767 (0.007)
PR-BC BCNB 0.854 (0.008) 0.853 (0.002) 0.803 (0.01) 0.769 (0.022) 0.793 (0.005) 0.814 (0.008) 0.777 (0.029) 0.805 (0.004) 0.764 (0.006)

Tile-level tasks

We have also benchmarked the different pathology FMs on tile-level tasks using linear probing. These tasks are:

The metric used is the top-1 accuracy (higher is better). More details about the evaluation methodology can be found in the ‘Tile-level tasks evaluation methodology’ section below. The models are ordered by decreasing average performance and standard deviations are reported in parentheses. Bold indicates the highest score in a column.

Table 3:

Text Average MHIST TCGA-UNIFORM CAM17-WILDS CRC-NO-NORM
H-optimus-1 0.908 0.835 (0.001) 0.851 (0.000) 0.991 (0.000) 0.956 (0.002)
UNI2-h 0.904 0.826 (0.001) 0.831 (0.000) 0.988 (0.000) 0.969 (0.001)
H-optimus-0 0.904 0.848 (0.001) 0.835 (0.001) 0.986 (0.001) 0.945 (0.012)
Virchow2 0.9 0.851 (0.001) 0.830 (0.000) 0.986 (0.001) 0.933 (0.011)
GPFM 0.895 0.824 (0.002) 0.827 (0.001) 0.972 (0.004) 0.955 (0.004)
Prov-GigaPath 0.887 0.831 (0.003) 0.804 (0.000) 0.968 (0.003) 0.945 (0.003)
UNI 0.883 0.840 (0.002) 0.805 (0.001) 0.980 (0.001) 0.906 (0.015)
Phikon-v2 0.877 0.797 (0.001) 0.794 (0.000) 0.972 (0.001) 0.946 (0.002)
CONCH 0.844 0.783 (0.003) 0.679 (0.000) 0.972 (0.000) 0.940 (0.000)

Additional information

Models benchmarked

We list in the table below the characteristics of the models benchmarked.  For each model, the [CLS] token embedding was used for the downstream evaluations.

Table 4:

Model Authors Model architecture(number of parameters) Number of histology slides used for pre-training
H-optimus-1 Bioptimus ViT-g/14 (1.1B) 1M+
UNI2-h Mahmood Lab Modified ViT-H/14 (681M) 350k+
UNI Mahmood Lab [Chen et al. 2024] ViT-L/16 (307M) 100k
H-optimus-0 Bioptimus [Saillard et al. 2024] ViT-g/14 (1.1B) 500k+
Virchow2 Paige / Microsoft Research[Zimmerman et al. 2024] ViT-H/14 (632M) 3.1M
GPFM Hong Kong University of Science and Technology[Ma et al. 2024] ViT-L/14 (307M) 86k
Prov-GigaPath Microsoft Research [Xu et al. 2024] ViT-g/16 (1.1B) 171k
Phikon-v2 Owkin [Filiot et al. 2024] ViT-L/16 (307M) 58k
CONCH Mahmood Lab [Lu et al. 2024] Modified ViT-B/16 (90M) 21k slides & 1.2M image-text pairs

Slide-level evaluation tasks

We list in the table below the different tasks defined for the slide evaluation benchmark, and the datasets used to define these tasks.

FR-CRC-Bio is an internal dataset consisting of 727 CRC biopsies from multiple French hospitals. TCGA datasets were retrieved from https://portal.gdc.cancer.gov/.