This notebook demonstrates how to use the lazyslide Python package to process whole-slide images (WSIs) in batches using local hardware. We will perform a multi-step workflow consisting of:
For this tutorial we assume you have access to the relevant model on HuggingFace and have an account on the same. We also assume you have installed the lazyslide package; for more information on this please visit the project page here: https://github.com/rendeirolab/LazySlide
First, please login to HuggingFace and submit your HF token when prompted.
# Login to HF
from huggingface_hub import login, hf_hub_download
login()
Next, let’s import all the necessary libraries and define the paths to our data and where we’ll save the output embeddings.
import lazyslide as zs
import numpy as np
import os
import glob
from tqdm.notebook import tqdm
# Please ensure you have the following two directories created in the same folder you run this Notebook
DATA_DIR = "data/" # Please copy your slide images here
EMBEDDINGS_DIR = "embeddings/" # This is where we will store the final result
# Ensure the output directory exists
os.makedirs(EMBEDDINGS_DIR, exist_ok=True)
# Find all slide files
slide_paths = glob.glob(os.path.join(DATA_DIR, "*.svs"))
# We assume the slide images are in *.svs format (pls change if you are using a different format
print(f"Found {len(slide_paths)} slides to process:")
for path in slide_paths:
print(os.path.basename(path))
Found 2 slides to process:
GTEX-1117F-1026.svs
GTEX-111FC-0426.svs
In this step, we will iterate through each slide, identify the tissue regions, and then generate tiles from those regions. These operations are stored within the .zarr directory created for each slide.
for slide_path in tqdm(slide_paths, desc="Preprocessing Slides"):
print(f'\\nProcessing {os.path.basename(slide_path)}...')
# Open the whole-slide image
wsi = zs.open_wsi(slide_path)
# 1. Find tissues print("Finding tissues...")
zs.pp.find_tissues(wsi)
# 2. Tile tissues print("Tiling tissues...")
zs.pp.tile_tissues(wsi, 224, mpp=0.5)
# Save the slide data
wsi.write()
print(f"Finished preprocessing for {os.path.basename(slide_path)}.")
Processing GTEX-1117F-1026.svs...
Finding tissues...
Tiling tissues...
Finished preprocessing for GTEX-1117F-1026.svs.
Processing GTEX-111FC-0426.svs...
Finding tissues...
Tiling tissues...
Finished preprocessing for GTEX-111FC-0426.svs.
Now that the slides are pre-processed, we can extract features from the tiles. We will use the h-optimus-0 model. This step can be time-consuming, and performance will depend heavily on your available hardware (GPU is recommended).
Note: Ensure you have a CUDA-compatible GPU and the necessary drivers installed to use device="cuda".
Warning: This step can take several minutes per slide on a local machine, we recommend if testing on single GPU machine to use smaller slides.
for slide_path in tqdm(slide_paths, desc="Extracting Features"):
print(f'\\nExtracting features for {os.path.basename(slide_path)}...')
# Re-open the WSI to access the pre-processed data
wsi = zs.open_wsi(slide_path)
# Extract features
zs.tl.feature_extraction(wsi, "h-optimus-1", device="cuda")
# If you do not have a cuda compatible GPU you can use device="cpu"
# Save the embeddings
wsi.write()
print(f"Finished feature extraction for {os.path.basename(slide_path)}.")
Extracting features for GTEX-1117F-1026.svs...
Finished feature extraction for GTEX-1117F-1026.svs.
Extracting features for GTEX-111FC-0426.svs...
Finished feature extraction for GTEX-111FC-0426.svs.
With the features extracted and stored in the .zarr directories, we can now load them and save them as individual .npy files. This format is convenient for loading into machine learning frameworks like PyTorch or TensorFlow, or for general analysis with NumPy and Scikit-learn.
for slide_path in tqdm(slide_paths, desc="Exporting Embeddings"):
slide_basename = os.path.basename(slide_path)
slide_name, _ = os.path.splitext(slide_basename)
print(f'\\nExporting embeddings for {slide_basename}...')
# Re-open the WSI to access all data
wsi = zs.open_wsi(slide_path)
# Fetch features as an AnnData object
adata = wsi.fetch.features_anndata("h-optimus-1")
# The embeddings are stored in adata.X
embeddings = adata.X
# Define the output path and save the embeddings
output_path = os.path.join(EMBEDDINGS_DIR, f'{slide_name}.npy')
np.save(output_path, embeddings)
print(f'Saved embeddings to {output_path} ({embeddings.shape})')
Exporting embeddings for GTEX-1117F-1026.svs...
Saved embeddings to embeddings/GTEX-1117F-1026.npy ((3254, 1536))
Exporting embeddings for GTEX-111FC-0426.svs...
Saved embeddings to embeddings/GTEX-111FC-0426.npy ((655, 1536))
Finally, let’s check the embeddings/ directory to confirm that our .npy files have been created successfully.
print(f"Contents of the '{EMBEDDINGS_DIR}' directory:")
for file in os.listdir(EMBEDDINGS_DIR):
if file.endswith(".npy"):
print(file)
Contents of the 'embeddings/' directory:
GTEX-111FC-0426.npy
GTEX-1117F-1026.npy
Generating_embeddings_for_multiple_slides_locally.ipynb
<aside>
Quick Navigation
</aside>
Latest version: December 16, 2025
Support: [email protected]