Getting started with H-optimus

This tutorial is to get you started with using H-optimus models using the lazyslide package. For this tutorial we assume you have access to the relevant model on HuggingFace and have an account on the same. We also assume you have installed the lazyslide package (for more information on this please visit the project page here: https://github.com/rendeirolab/LazySlide)

First login to HuggingFace and please submit your HF token when prompted.

# Login to HF
from huggingface_hub import login, hf_hub_download
login()

Loading data

For this tutorial we will load a small slide from lung cancer. This will be our test case as to how to pre-process the slide data and then extract features using the H-optimus model. After we load the data into a wsi object we can take a look at the slide and metadata.

import lazyslide as zs
wsi = zs.datasets.lung_carcinoma(with_data=False)
wsi

Image

WSI: /Users/xxxxx/.cache/huggingface/hub/datasets--RendeiroLab--LazySlide-data/snapshots/9644d886889040fa10e757d912f249bbf936a979/lung_carcinoma.ndpi

Reader: openslide

Dimensions: 15616×16384 (h×w), 8 Pyramids

Pixel physical size: 0.23 MPP (40X)

SpatialData object └── Images └── 'wsi_thumbnail': DataArray[cyx] (3, 1817, 1906) with coordinate systems: ▸ 'global', with elements: wsi_thumbnail (Images)

# View slide properties
wsi.properties

Slide Properties

Field	Value
shape	[15616, 16384]
n_level	8
level_shape	[[15616, 16384], [7808, 8192], [3904, 4096], [1952, 2048], [976, 1024], [488, 512], [244, 256], [122, 128]]
level_downsample	[1.0, 2.0, 4.0, 8.0, 16.0, 32.0, 64.0, 128.0]
mpp	0.22731405710129116
magnification	40.0
bounds	[0, 0, 16384, 15616]

Tissue Segmentation

Before we break up the tissues into tiles we first need to detect the area of the slide which contains tissue. This way we can reduce the computational burden of dealing with tiles which are whitespace. Fortunately this is very easy with lazyslide with the find_tissues function.

# Let's try tissue segmentation
zs.pp.find_tissues(wsi)
zs.pl.tissue(wsi)

Tiling

For tiling we can specify the size of the tiles and also any overlap. For H-optimus we do not need overlapping tiles and the tile size needs to be 224 pixels.

# Try tiling across all tissues
zs.pp.tile_tissues(wsi, 224, mpp=0.25)
wsi.tile_spec("tiles")

Tile at: 0.25 mpp

Tile size: 224×224 (h×w)

Stride: 224×224 (0×0 overlap)

Operation size: 246×246, level=0

Base size: 246×246, level=0

Target tissue: 'tissues'

We can take a look at how the tiles are looking across the slide image.

zs.pl.tiles(wsi, tissue_id="all", linewidth=0.5)

wsi["tiles"]

	tile_id	tissue_id	geometry
0	0	0	POLYGON ((2870 10646, 2870 10892, 2624 10892, …
1	1	0	POLYGON ((2870 10892, 2870 11138, 2624 11138, …
2	2	0	POLYGON ((3116 10154, 3116 10400, 2870 10400, …
3	3	0	POLYGON ((3116 10400, 3116 10646, 2870 10646, …
4	4	0	POLYGON ((3116 10646, 3116 10892, 2870 10892, …
…	…	…	…
1051	1051	0	POLYGON ((14432 4250, 14432 4496, 14186 4496, …
1052	1052	0	POLYGON ((14432 4496, 14432 4742, 14186 4742, …
1053	1053	0	POLYGON ((14432 4742, 14432 4988, 14186 4988, …
1054	1054	0	POLYGON ((14432 4988, 14432 5234, 14186 5234, …
1055	1055	0	POLYGON ((14432 5234, 14432 5480, 14186 5480, …

1056 rows × 3 columns

Applying Foundation Model

So after tiling we have 1056 tiles which represents the tissues across this slide image. Now we can send each of these tiles to the H-optimus model to extract features. Each tile will result in a feature vector of size 1536 values. Depending on your hardware this step may take a while.

# NOTE: Here we use "h-optimus-0" but you can replace with "h-optimus-1" if you have been granted access

zs.tl.feature_extraction(wsi, "h-optimus-1", device="cuda")

# If you are on a device without cuda enabled GPUs you can try using the cpu

# zs.tl.feature_extraction(wsi, "h-optimus-0", device="cpu")

(The above operation can take a few minutes depending on your hardware)

Now we can examine the wsi to take a look at our new wsi object. As part of the wsi object we now see under Tables a new slot with h-optimus-0_tiles.

wsi

WSI: /Users/xxxxx/.cache/huggingface/hub/datasets--RendeiroLab--LazySlide-data/snapshots/9644d886889040fa10e757d912f249bbf936a979/lung_carcinoma.ndpi

Reader: openslide

Dimensions: 15616×16384 (h×w), 8 Pyramids

Pixel physical size: 0.23 MPP (40X)

SpatialData object
├── Images
│     └── 'wsi_thumbnail': DataArray[cyx] (3, 1817, 1906)
├── Shapes
│     ├── 'tiles': GeoDataFrame shape: (1056, 3) (2D shapes)
│     └── 'tissues': GeoDataFrame shape: (1, 2) (2D shapes)
└── Tables
      └── 'h-optimus-0_tiles': AnnData (1056, 1536)
with coordinate systems:
    ▸ 'global', with elements:
        wsi_thumbnail (Images), tiles (Shapes), tissues (Shapes)

Feature aggregation

Now we have the features for each individual tile we can run feature aggregation to aggregate all these feature vectors. We can then use this aggregated feature for downstream tasks such as predictions based on the features extracted from the foundation model.

# Feature aggregation by tile and by tissue
zs.tl.feature_aggregation(wsi, "h-optimus-1")
zs.tl.feature_aggregation(wsi, "h-optimus-1", by="tissue_id")

wsi.fetch.features_anndata("h-optimus-1")

AnnData object with n_obs × n_vars = 1056 × 1536
    obs: 'tile_id', 'tissue_id'
    uns: 'tile_spec', 'slide_properties'
    obsm: 'spatial'
    varm: 'agg_slide', 'agg_tissue_id'

Save your work

Now we can save all the data in the same wsi object for later use. This means once you process a slide you don’t need to re-process it through the model again, the feature vectors and aggregation are ready to use for downstream analysis.

# Save all the data
wsi.write(file_path="./lung_cancer_test") # Store in current directory
wsi

WSI: /Users/xxxxx/.cache/huggingface/hub/datasets--RendeiroLab--LazySlide-data/snapshots/9644d886889040fa10e757d912f249bbf936a979/lung_carcinoma.ndpi

Reader: openslide

Dimensions: 15616×16384 (h×w), 8 Pyramids

Pixel physical size: 0.23 MPP (40X)

SpatialData object, with associated Zarr store: /Users/kpatel/Projects/Tutorials/lung_cancer_test
├── Images
│     └── 'wsi_thumbnail': DataArray[cyx] (3, 1817, 1906)
├── Shapes
│     ├── 'tiles': GeoDataFrame shape: (1056, 3) (2D shapes)
│     └── 'tissues': GeoDataFrame shape: (1, 2) (2D shapes)
└── Tables
      └── 'h-optimus-0_tiles': AnnData (1056, 1536)
with coordinate systems:
    ▸ 'global', with elements:
        wsi_thumbnail (Images), tiles (Shapes), tissues (Shapes)

To load the data you can open the slide and all your other data will be automatically loaded

from wsidata import open_wsi

# Note: Here slide_path must be the path to your original slide image (e.g. *.svs)
wsi = open_wsi(slide_path)

Congratulations! You’ve finished this getting started with H-optimus tutorial. Please look out for other tutorials on how to deal with multiple slides, downstream prediction and other topics.

Resources:

Python notebook for this tutorial:

Getting_started_with_H-optimus.ipynb

Lazyslide documentation: https://lazyslide.readthedocs.io/en/latest/