TileExtractor class, which generates a grid of tile region specifications from a Whole Slide Image (WSI). It uses a binary tissue mask to filter out background tiles and supports resolution scaling via Level, Magnification, or MPP.
The class follows a scikit-learn–style API:
- Configure — instantiate with a
TileSpec, optional tissue mask, and threshold. - Fit — call
fitwith an openWSIReaderto bind the extractor to a specific slide. - Extract — call
extract(or the shortcutfit_extract) to compute tile positions. Results are stored inregion_specs_and also returned. - Export — call
save,to_csv, orto_jsonto persist the results. These methods use the fitted state so you do not need to pass the reader or regions again.
TileExtractor
Tile size, stride, resolution, and measurement unit.
Binary tissue mask (non-zero = tissue).
Provider that generates a per-slide mask during
fit. When set, takes precedence over the static mask array.Minimum mean mask value for a tile to be kept.
Whether to include the per-tile mask array in each
RegionMaskSpec.fit / extract):
The WSI reader bound by
fit.Stem of the slide filename (e.g.
"slide_001").Tile regions produced by
extract. Empty until extraction is run.fit
extract, save, to_csv, and to_json do not need the reader passed again.
If a mask_provider is set, its generate method is called here so that each slide receives its own tissue mask automatically.
Following scikit-learn convention, fitted attributes are suffixed with an underscore (source_, slide_name_, region_specs_).
An open WSI reader for the target slide.
self, to allow method chaining (e.g. extractor.fit(reader).extract()).extract
fit to it). If omitted, the previously fitted source is used — call fit first in that case.
The method operates in four stages:
- Reference-level setup — identifies the lowest-resolution pyramid level and computes the relative downsample between it and the target resolution.
- Mask scaling — resizes the tissue mask (or creates an all-tissue mask) to match the reference level’s bounded dimensions.
- Tile-spec scaling — converts tile size and stride from the target resolution (pixels or microns) into reference-level pixel coordinates.
- Grid walk — iterates over the reference-level grid, filters by
mask_threshold, and maps surviving tile coordinates back to the target resolution.
region_specs_ and returned.
An open WSI reader. When given, the extractor is fitted to it automatically. When
None (default), the extractor must already be fitted.Region specifications for every tile that passes the tissue-mask threshold.
RuntimeError— If no source is provided andfithas not been called.ValueError— Iftile_spec.unitis notMeasurementUnit.PIXELSorMeasurementUnit.UM.
fit_extract
fit + extract in one call.
An open WSI reader.
Extracted tile region specifications.
to_csv
<output_dir>/<slide_name>.csv, reusing the slide_name_ captured during fit.
Uses the internally stored region_specs_ by default. Pass regions explicitly to override.
Each row contains the slide name, location, shape, resolution, and tissue ratio. Mask arrays are not included — use save to persist tile images instead.
Destination directory. Created (including parents) if it does not exist.
Region specifications to serialise. Defaults to
region_specs_.The resolved path of the written CSV file (e.g.
output/slide_001.csv).RuntimeError— If the extractor has not been fitted or no regions are available.
to_json
<output_dir>/<slide_name>.json, reusing the slide_name_ captured during fit.
Uses the internally stored region_specs_ by default. Pass regions explicitly to override.
The output is a JSON array of objects, one per RegionSpec, containing the slide name, location, shape, resolution, and tissue ratio. Mask arrays are not included.
Destination directory. Created (including parents) if it does not exist.
Region specifications to serialise. Defaults to
region_specs_.Number of spaces for pretty-printing. Defaults to
2. Set to 0 or None for compact output.The resolved path of the written JSON file (e.g.
output/slide_001.json).RuntimeError— If the extractor has not been fitted or no regions are available.
save
source_ and internally stored region_specs_ by default. Pass regions explicitly to override the region list.
Each tile is read from the WSI via read_region and written to output_dir with a metadata-rich filename:
{res} encodes the resolution (e.g. l_0, mpp_0.25, mag_40.0x) and {ratio} is the tissue fraction rounded to two decimal places.
The tile coordinates produced by
extract are absolute slide coordinates, so this method reads regions with bounded=False to avoid double-offsetting.Directory where tile images will be written. Created (including parents) if it does not exist.
Region specifications to save. Defaults to
region_specs_.Maximum number of threads for parallel I/O. Defaults to
4.Image file extension / format accepted by
PIL.Image.save (e.g. "png", "jpeg", "tiff"). Defaults to "png".RuntimeError— If the extractor has not been fitted, no regions are available, or one or more tiles fail to save.

