Skip to main content
A Cohort is the single source of truth for a multi-slide experiment. You build it from a directory of WSIs (or a manifest CSV), run a model over it, and outputs are organized into a reproducible workspace. Cohorts can also hold bulk RNA for multimodal prediction — see Spatial transcriptomics.

Runnable notebook

m-jumpstart includes a cohort batch-processing example.

1. Build a cohort

from bioptimus.data.cohort import Cohort

cohort = Cohort.from_directories(wsi_dir="/data/wsi/tcga_mini_coad")
print(cohort.summary())
print(cohort.wsi_ids)
print(cohort[0].available_modalities)   # e.g. ['image']

2. Run a model over the cohort

Create one Inference for the cohort, then run it. Tissue masks are cached and resume automatically, so re-running only processes what’s missing. Pick your backend below (only the common dict differs), then your model.
common = dict(
    api_url="http://localhost:8080",
    tissue=True, mask_threshold=0.5,
    output_path="/data/output", experiment="tcga_coad", run=1,
    workers=5,
)
from bioptimus.inference import Inference
from bioptimus.models.types import Models

infer = Inference(model_name=Models.H1, cohort=cohort, variant="mini", **common)
infer.tissue()              # shared masks; only computes what's missing
infer.run(mode="embed")     # H-Optimus produces embeddings
infer.report()              # status summary
Each output is tagged with the modalities used (e.g. ["image"]).

3. Extract & save tiles (optional)

Independent of inference — useful for QC or external pipelines:
from bioptimus.extraction.wsi.tile_extraction import TileExtractor
from bioptimus.io.wsi.factory import WSI

tile_spec = infer._model.model_spec.tile_spec
extractor = TileExtractor(tile_spec=tile_spec, mask=mask, mask_threshold=0.5)
with WSI(wsi_path) as reader:
    extractor.fit_extract(reader)
    extractor.save(tile_dir, image_format="png", workers=4)
    extractor.to_csv(csv_dir)