Skip to main content
WSI Data & Dataset Module. Provides WSIData (single-slide wrapper) and WSIDataset (multi-slide collection) that bridge the bioptimus I/O + extraction layers into structures ready for feature extraction or training. Typical usage:
from bioptimus.data.wsi import WSIData, WSIDataset
from bioptimus.extraction.wsi.tile_extraction import TileExtractor
from bioptimus.extraction.wsi.types import TileSpec
from bioptimus.io.wsi.types import Level, MeasurementUnit

spec = TileSpec(size=(256, 256), stride=(256, 256), resolution=Level(0), unit=MeasurementUnit.PIXELS)
extractor = TileExtractor(tile_spec=spec, mask_threshold=0.5)

dataset = WSIDataset.from_paths(
    paths=["slide_1.svs", "slide_2.svs"],
    extractor=extractor,
)

# Iterate over all patches (flat indexing across slides)
for i in range(len(dataset)):
    patch, meta = dataset[i]  # np.ndarray (H, W, C), dict

WSIData

class WSIData()
Wraps a single Whole Slide Image with its extraction plan. On construction the slide is opened via WSI and the extractor is fitted + executed to produce a list of RegionSpec. Each spec describes one tile/patch location. Patches are read lazily via get_patch.
path
Path to the WSI file (any format supported by WSI).
extractor
A configured TileExtractor. A fresh fit_extract is called for every slide so the same extractor object can be reused across slides.
transform
Optional callable applied to the raw np.ndarray patch before it is returned. Receives (H, W, C) uint8 and should return a transformed array (or tensor).
path
Path
Resolved slide path.
reader
WSIReader
Open reader for the slide.
specs
List[RegionSpec]
Extraction plan (one entry per patch).
slide_name
str
Stem of the slide filename.
Example:
wsi = WSIData("tissue.svs", extractor)
patch, meta = wsi.get_patch(0)

get_patch

def get_patch(idx: int) -> Tuple[Any, Dict[str, Any]]
Reads a single patch from the slide.
idx
int
required
Index into specs.
returns
Tuple[Any, Dict[str, Any]]
A tuple (patch, metadata) where patch is an np.ndarray of shape (H, W, C) (or whatever the transform returns), and metadata is a dict with at minimum source, x, y, width, height, slide_name, and tissue_ratio.
Raises:
  • IndexError — If idx is out of range.

close

def close() -> None
Closes the underlying WSI reader and releases resources. Example:
>>> wsi.close()

WSIDataset

class WSIDataset()
A collection of WSIData wrappers with flat patch indexing. Supports two access patterns:
  • Slide-leveldataset.slides[i] returns a WSIData.
  • Patch-level (flat) — dataset[k] maps a global patch index across all slides and returns (patch, metadata).
The flat indexing uses a cumulative-sum lookup (O(log N) via bisect) so random access is fast regardless of how many slides are loaded.
data_list
Pre-built WSIData instances.
Example:
dataset = WSIDataset.from_paths(paths, extractor)
len(dataset)            # total patch count across all slides
patch, meta = dataset[42]
slide = dataset.slides[0]

from_paths

@classmethod
def from_paths(
        cls,
        paths: Sequence[Union[str, Path]],
        extractor: TileExtractor,
        transform: Optional[Callable[[np.ndarray], Any]] = None) -> WSIDataset
Creates a dataset by opening and extracting every slide.
paths
Sequence[Union[str, Path]]
required
Iterable of WSI file paths.
extractor
TileExtractor
required
Shared TileExtractor (re-fitted per slide).
transform
Optional[Callable[[np.ndarray], Any]]
Optional callable applied to each extracted tile array.
returns
WSIDataset
A new WSIDataset.
Example:
dataset = WSIDataset.from_paths(glob.glob("slides/*.svs"), extractor)

num_slides

@property
def num_slides() -> int
Returns the number of loaded slides.
returns
int
Slide count.
Example:
>>> dataset.num_slides
3

slide_patch_counts

def slide_patch_counts() -> Dict[str, int]
Returns a mapping of slide names to their patch counts.
returns
Dict[str, int]
Dict[str, int]: {slide_name: patch_count} for every slide.
Example:
>>> dataset.slide_patch_counts()
{'slide_001': 150, 'slide_002': 350}

close

def close() -> None
Closes all underlying WSI readers. Example:
>>> dataset.close()