WSIData (single-slide wrapper) and WSIDataset (multi-slide collection) that bridge the bioptimus I/O + extraction layers into structures ready for feature extraction or training.
Typical usage:
WSIData
WSI and the extractor is fitted + executed to produce a list of RegionSpec. Each spec describes one tile/patch location. Patches are read lazily via get_patch.
path
Path to the WSI file (any format supported by
WSI).extractor
A configured
TileExtractor. A fresh fit_extract is called for every slide so the same extractor object can be reused across slides.transform
Optional callable applied to the raw
np.ndarray patch before it is returned. Receives (H, W, C) uint8 and should return a transformed array (or tensor).Resolved slide path.
Open reader for the slide.
Extraction plan (one entry per patch).
Stem of the slide filename.
get_patch
Index into
specs.A tuple
(patch, metadata) where patch is an np.ndarray of shape (H, W, C) (or whatever the transform returns), and metadata is a dict with at minimum source, x, y, width, height, slide_name, and tissue_ratio.IndexError— If idx is out of range.
close
WSIDataset
WSIData wrappers with flat patch indexing.
Supports two access patterns:
- Slide-level —
dataset.slides[i]returns aWSIData. - Patch-level (flat) —
dataset[k]maps a global patch index across all slides and returns(patch, metadata).
O(log N) via bisect) so random access is fast regardless of how many slides are loaded.
data_list
Pre-built
WSIData instances.from_paths
Iterable of WSI file paths.
Shared
TileExtractor (re-fitted per slide).Optional callable applied to each extracted tile array.
A new
WSIDataset.num_slides
Slide count.
slide_patch_counts
Dict[str, int]:
{slide_name: patch_count} for every slide.
