Skip to main content
WSI Tile Extraction Module. This module provides the TileExtractor class, which generates a grid of tile region specifications from a Whole Slide Image (WSI). It uses a binary tissue mask to filter out background tiles and supports resolution scaling via Level, Magnification, or MPP. The class follows a scikit-learn–style API:
  1. Configure — instantiate with a TileSpec, optional tissue mask, and threshold.
  2. Fit — call fit with an open WSIReader to bind the extractor to a specific slide.
  3. Extract — call extract (or the shortcut fit_extract) to compute tile positions. Results are stored in region_specs_ and also returned.
  4. Export — call save, to_csv, or to_json to persist the results. These methods use the fitted state so you do not need to pass the reader or regions again.
Usage:
>>> from bioptimus.extraction.wsi.tile_extraction import TileExtractor
>>> from bioptimus.extraction.wsi.types import TileSpec
>>> from bioptimus.io.wsi.types import Level, MeasurementUnit
>>> spec = TileSpec(size=(256, 256), stride=(256, 256),
...                 resolution=Level(0), unit=MeasurementUnit.PIXELS)
>>> extractor = TileExtractor(tile_spec=spec, mask=tissue_mask, mask_threshold=0.5)
>>> with SomeWSIReader("path/to/slide.svs") as reader:
...     extractor.fit(reader).extract()
...     extractor.to_csv("output/")   # writes output/slide.csv
...     extractor.to_json("output/")  # writes output/slide.json
...     extractor.save("tiles/")

TileExtractor

class TileExtractor(WSIExtractor)
Extracts a grid of tile regions from a WSI, filtered by a tissue mask. The extractor follows a scikit-learn–style lifecycle:
extractor = TileExtractor(tile_spec=spec, mask=mask)
extractor.fit(reader)          # bind to a slide
tiles = extractor.extract()    # compute tile positions
extractor.save("out/")         # write tile images
extractor.to_csv("tiles.csv")  # export metadata
Or more concisely via method chaining:
tiles = TileExtractor(tile_spec=spec, mask=mask).fit_extract(reader)
Configuration attributes (set at init, do not change after fitting):
tile_spec
TileSpec
Tile size, stride, resolution, and measurement unit.
mask
np.ndarray | None
Binary tissue mask (non-zero = tissue).
mask_provider
TissueMaskProvider | None
Provider that generates a per-slide mask during fit. When set, takes precedence over the static mask array.
mask_threshold
float | None
Minimum mean mask value for a tile to be kept.
save_mask
bool
Whether to include the per-tile mask array in each RegionMaskSpec.
Fitted attributes (populated by fit / extract):
source_
WSIReader
The WSI reader bound by fit.
slide_name_
str
Stem of the slide filename (e.g. "slide_001").
region_specs_
List[RegionSpec]
Tile regions produced by extract. Empty until extraction is run.

fit

def fit(source: WSIReader) -> "TileExtractor"
Binds the extractor to an open WSI reader. Stores the reader and slide name so that subsequent calls to extract, save, to_csv, and to_json do not need the reader passed again. If a mask_provider is set, its generate method is called here so that each slide receives its own tissue mask automatically. Following scikit-learn convention, fitted attributes are suffixed with an underscore (source_, slide_name_, region_specs_).
source
WSIReader
required
An open WSI reader for the target slide.
returns
TileExtractor
self, to allow method chaining (e.g. extractor.fit(reader).extract()).
Example:
>>> extractor.fit(reader)
<TileExtractor ...>
>>> extractor.slide_name_
'slide_001'

extract

def extract(source: Optional[WSIReader] = None) -> List[RegionSpec]
Extracts tile region specs from a WSI. If source is provided it is used directly (and the extractor is automatically fit to it). If omitted, the previously fitted source is used — call fit first in that case. The method operates in four stages:
  1. Reference-level setup — identifies the lowest-resolution pyramid level and computes the relative downsample between it and the target resolution.
  2. Mask scaling — resizes the tissue mask (or creates an all-tissue mask) to match the reference level’s bounded dimensions.
  3. Tile-spec scaling — converts tile size and stride from the target resolution (pixels or microns) into reference-level pixel coordinates.
  4. Grid walk — iterates over the reference-level grid, filters by mask_threshold, and maps surviving tile coordinates back to the target resolution.
Results are stored in region_specs_ and returned.
source
WSIReader | None
An open WSI reader. When given, the extractor is fitted to it automatically. When None (default), the extractor must already be fitted.
returns
List[RegionSpec]
Region specifications for every tile that passes the tissue-mask threshold.
Raises:
  • RuntimeError — If no source is provided and fit has not been called.
  • ValueError — If tile_spec.unit is not MeasurementUnit.PIXELS or MeasurementUnit.UM.
Example:
>>> extractor.fit(reader)
>>> tiles = extractor.extract()
>>> tiles[0].location
RegionLocation(top=1024, left=512)

fit_extract

def fit_extract(source: WSIReader) -> List[RegionSpec]
Convenience method: fit + extract in one call.
source
WSIReader
required
An open WSI reader.
returns
List[RegionSpec]
Extracted tile region specifications.
Example:
>>> tiles = extractor.fit_extract(reader)
>>> len(tiles)
42

to_csv

def to_csv(output_dir: Union[str, Path],
           regions: Optional[List[RegionSpec]] = None) -> Path
Saves region metadata to a CSV file named after the slide. The file is written as <output_dir>/<slide_name>.csv, reusing the slide_name_ captured during fit. Uses the internally stored region_specs_ by default. Pass regions explicitly to override. Each row contains the slide name, location, shape, resolution, and tissue ratio. Mask arrays are not included — use save to persist tile images instead.
output_dir
str | Path
required
Destination directory. Created (including parents) if it does not exist.
regions
List[RegionSpec] | None
Region specifications to serialise. Defaults to region_specs_.
returns
Path
The resolved path of the written CSV file (e.g. output/slide_001.csv).
Raises:
  • RuntimeError — If the extractor has not been fitted or no regions are available.
Example:
>>> extractor.fit_extract(reader)  # slide file is 'abc.svs'
>>> extractor.to_csv("output/")
PosixPath('output/abc.csv')

to_json

def to_json(output_dir: Union[str, Path],
            regions: Optional[List[RegionSpec]] = None,
            indent: int = 2) -> Path
Saves region metadata to a JSON file named after the slide. The file is written as <output_dir>/<slide_name>.json, reusing the slide_name_ captured during fit. Uses the internally stored region_specs_ by default. Pass regions explicitly to override. The output is a JSON array of objects, one per RegionSpec, containing the slide name, location, shape, resolution, and tissue ratio. Mask arrays are not included.
output_dir
str | Path
required
Destination directory. Created (including parents) if it does not exist.
regions
List[RegionSpec] | None
Region specifications to serialise. Defaults to region_specs_.
indent
int
Number of spaces for pretty-printing. Defaults to 2. Set to 0 or None for compact output.
returns
Path
The resolved path of the written JSON file (e.g. output/slide_001.json).
Raises:
  • RuntimeError — If the extractor has not been fitted or no regions are available.
Example:
>>> extractor.fit_extract(reader)  # slide file is 'abc.svs'
>>> extractor.to_json("output/")
PosixPath('output/abc.json')

save

def save(output_dir: Union[str, Path],
         regions: Optional[List[RegionSpec]] = None,
         workers: int = 4,
         image_format: str = "png") -> None
Saves extracted tile images to disk using multi-threaded I/O. Uses the fitted source_ and internally stored region_specs_ by default. Pass regions explicitly to override the region list. Each tile is read from the WSI via read_region and written to output_dir with a metadata-rich filename:
{slide_stem}_x{left}_y{top}_w{width}_h{height}_{res}_mask_r{ratio}.{fmt}
Where {res} encodes the resolution (e.g. l_0, mpp_0.25, mag_40.0x) and {ratio} is the tissue fraction rounded to two decimal places.
The tile coordinates produced by extract are absolute slide coordinates, so this method reads regions with bounded=False to avoid double-offsetting.
output_dir
str | Path
Directory where tile images will be written. Created (including parents) if it does not exist.
regions
List[RegionSpec] | None
Region specifications to save. Defaults to region_specs_.
workers
int
Maximum number of threads for parallel I/O. Defaults to 4.
image_format
str
Image file extension / format accepted by PIL.Image.save (e.g. "png", "jpeg", "tiff"). Defaults to "png".
Returns: None Raises:
  • RuntimeError — If the extractor has not been fitted, no regions are available, or one or more tiles fail to save.
Example:
>>> extractor.fit_extract(reader)
>>> extractor.save("output/tiles")