Skip to main content
WSI Mega-Tile Extraction Module. This module provides the MegaTileExtractor class, which groups individual tile positions from a Whole Slide Image into mega tiles — fixed- size grids of rows × cols tiles. It builds on the tile-level tissue filtering from TileExtractor and adds a second filtering stage: only mega tiles where the fraction of tissue-positive tiles falls within a configurable [min_valid_tiles_ratio, max_valid_tiles_ratio] band are kept. The class follows the same scikit-learn–style lifecycle as TileExtractor:
  1. Configure — instantiate with a MegaTileSpec, optional tissue mask, and threshold.
  2. Fit — call fit with an open WSIReader.
  3. Extract — call extract (or the shortcut fit_extract). Results are stored in megatile_regions_ and also returned.
  4. Export — call to_csv or to_json to persist the results.
Usage:
>>> from bioptimus.extraction.wsi.megatile_extraction import MegaTileExtractor
>>> from bioptimus.extraction.wsi.types import MegaTileSpec, TileSpec, GridLayout
>>> from bioptimus.io.wsi.types import Level, MeasurementUnit
>>> tile = TileSpec(
...     size=(256, 256), stride=(256, 256),
...     resolution=Level(0), unit=MeasurementUnit.PIXELS,
... )
>>> mega = MegaTileSpec(
...     megatile_shape=(5, 5), tile_spec=tile,
...     grid_layout=GridLayout.RECTANGULAR,
...     min_valid_tiles_ratio=0.5,
... )
>>> extractor = MegaTileExtractor(
...     megatile_spec=mega,
...     mask=tissue_mask, mask_threshold=0.5,
... )
>>> with SomeWSIReader("path/to/slide.svs") as reader:
...     megatiles = extractor.fit_extract(reader)
...     extractor.to_csv("output/")
...     extractor.to_json("output/")

MegaTileExtractor

class MegaTileExtractor(WSIExtractor)
Extracts mega tiles (grids of tiles) from a WSI, filtered by tissue. A mega tile is a rectangular or hexagonal grid of individual tiles. Extraction proceeds in two stages:
  1. Tile-level filtering — a binary tissue mask and mask_threshold determine which individual tile positions contain sufficient tissue.
  2. Mega-tile-level filtering — only mega tiles where the fraction of valid (tissue-positive) tiles falls within [min_valid_tiles_ratio, max_valid_tiles_ratio] are retained.
The extractor follows the same lifecycle as TileExtractor:
extractor = MegaTileExtractor(megatile_spec=spec, mask=mask)
extractor.fit(reader)
megatiles = extractor.extract()
extractor.to_csv("output/")
Or via method chaining:
megatiles = MegaTileExtractor(megatile_spec=spec).fit_extract(reader)
Configuration attributes (set at init):
megatile_spec
MegaTileSpec
Mega-tile grid shape, per-tile spec, layout, stride, and valid-ratio bounds.
mask
np.ndarray | None
Binary tissue mask.
mask_provider
TissueMaskProvider | None
Per-slide mask generator.
mask_threshold
float | None
Per-tile tissue fraction threshold.
save_mask
bool
Whether to store per-tile masks.
max_megatiles
int | None
Optional cap on returned mega tiles. Fitted attributes (populated by fit / extract):
source_
WSIReader
Bound WSI reader.
slide_name_
str
Stem of the slide filename.
megatile_regions_
List[MegaTileRegion]
Extracted mega tiles.

fit

def fit(source: WSIReader) -> "MegaTileExtractor"
Binds the extractor to an open WSI reader. Stores the reader and slide name. If a mask_provider is set, its generate method is called to create a per-slide tissue mask.
source
WSIReader
required
An open WSI reader.
returns
MegaTileExtractor
self for method chaining.
Example:
>>> extractor.fit(reader)
>>> extractor.slide_name_
'slide_001'

extract

def extract(source: Optional[WSIReader] = None) -> List[MegaTileRegion]
Extracts mega-tile regions from a WSI. The method operates in five stages:
  1. Reference-level setup — as in TileExtractor.
  2. Mask scaling — resize tissue mask to the reference level.
  3. Tile-spec scaling — convert tile dimensions to reference-level pixels.
  4. Per-tile tissue evaluation — build an integral image and compute the tissue fraction for every candidate tile position.
  5. Mega-tile grouping — slide a rows × cols window (in tile units) across the tile grid, collect per-tile validity, and filter by [min_valid_tiles_ratio, max_valid_tiles_ratio].
Results are stored in megatile_regions_ and returned.
source
WSIReader | None
An open WSI reader. When given, the extractor is fitted automatically. Otherwise fit must have been called.
returns
List[MegaTileRegion]
Mega-tile regions that pass filtering.
Raises:
  • RuntimeError — If no source is provided and fit has not been called.
Example:
>>> megatiles = extractor.fit(reader).extract()
>>> len(megatiles)
12

fit_extract

def fit_extract(source: WSIReader) -> List[MegaTileRegion]
Convenience method: fit + extract in one call.
source
WSIReader
required
An open WSI reader.
returns
List[MegaTileRegion]
Extracted mega-tile regions.
Example:
>>> megatiles = extractor.fit_extract(reader)
>>> len(megatiles)
12

to_csv

def to_csv(output_dir: Union[str, Path],
           regions: Optional[List[MegaTileRegion]] = None) -> Path
Saves mega-tile metadata to a CSV file named after the slide. Each row represents one tile within a mega tile. Mega-tile–level columns (megatile_index, megatile_top, etc.) allow grouping.
output_dir
str | Path
required
Destination directory (created if needed).
regions
List[MegaTileRegion] | None
Override the stored regions. Defaults to megatile_regions_.
returns
Path
Path of the written CSV file.
Raises:
  • RuntimeError — If no regions are available.
Example:
>>> extractor.fit_extract(reader)
>>> extractor.to_csv("output/")
PosixPath('output/slide_001.csv')

to_json

def to_json(output_dir: Union[str, Path],
            regions: Optional[List[MegaTileRegion]] = None,
            indent: int = 2) -> Path
Saves mega-tile metadata to a JSON file named after the slide. The output is a JSON array of mega-tile objects, each containing the mega-tile location, grid info, valid ratio, and a nested tiles array with per-tile metadata.
output_dir
str | Path
required
Destination directory (created if needed).
regions
List[MegaTileRegion] | None
Override the stored regions.
indent
int
JSON pretty-printing indent.
returns
Path
Path of the written JSON file.
Raises:
  • RuntimeError — If no regions are available.
Example:
>>> extractor.fit_extract(reader)
>>> extractor.to_json("output/")
PosixPath('output/slide_001_megatiles.json')