Skip to main content
PyTorch adapter for WSI tile datasets. Provides PytorchTileDataset, a Dataset adapter that wraps WSIDataset for use with DataLoader. Two sampling modes are supported:
  • "sequential" — patches in slide-load order (deterministic).
  • "random" — same patches, but indices are shuffled each epoch.
Both modes visit every patch exactly once per epoch. The data argument accepts three forms:
  • A ready-made WSIDataset.
  • A directory path (str or Path) — all supported WSI files inside it are discovered and loaded.
  • A list of file paths — each path is opened as a slide.
Example:
>>> from bioptimus.data.wsi.torch_dataset import PytorchTileDataset
>>> ds = PytorchTileDataset(wsi_dataset, sampling="sequential")
>>> ds = PytorchTileDataset("slides/", extractor=ext, sampling="random", seed=0)
>>> loader = DataLoader(ds, batch_size=64, num_workers=4)

SamplingMode

class SamplingMode(str, Enum)
Patch sampling strategy.
SEQUENTIAL
Iterate over every patch in slide-load order.
RANDOM
Shuffle the global patch indices each epoch.
Example:
>>> mode = SamplingMode("random")
>>> mode == SamplingMode.RANDOM
True

PytorchTileDataset

class PytorchTileDataset(Dataset)
PyTorch-compatible dataset over WSI tiles with configurable ordering. Accepts a WSIDataset, a directory path, or a list of file paths. When a path or list is given, extractor is required so the slides can be opened and tiled automatically. "sequential" (default) Patches are returned in the order slides were loaded — slide 0 patch 0, slide 0 patch 1, …, slide N patch M. "random" The same set of patches, but the global indices are shuffled. Call shuffle between epochs (or at init) to re-randomise. In both modes len() equals the total patch count, and every patch is visited exactly once per full iteration.
data
One of: - A WSIDataset instance (used directly). - A str or Path pointing to a directory — all supported WSI files are discovered and loaded. - A list of file paths — each is opened as a slide.
extractor
Required when data is a path or list of paths. A configured TileExtractor used to tile each slide.
sampling
"sequential" or "random" (default "sequential").
transform
Optional callable applied to the np.ndarray (H, W, C) patch.
seed
Optional RNG seed for reproducible shuffling.
Raises:
  • ValueError — If data is a path/list but extractor is not provided.
  • FileNotFoundError — If a directory path contains no supported WSIs.
Example:
ds = PytorchTileDataset(wsi_dataset, sampling="sequential")
ds = PytorchTileDataset("slides/", extractor=ext, sampling="random", seed=0)
ds = PytorchTileDataset(["a.svs", "b.svs"], extractor=ext)

shuffle

def shuffle(seed: Optional[int] = None) -> None
Re-shuffles the global index mapping. Call this between epochs to get a different ordering. Has no effect when sampling is "sequential".
seed
int | None
Optional RNG seed for reproducibility.
Example:
>>> ds = PytorchTileDataset(dataset, sampling="random", seed=42)
>>> ds.shuffle(seed=123)  # new ordering for next epoch