Cohort, a typed registry that tracks the mapping between patients, wsis, bulk RNA samples, timepoints, labels, and arbitrary clinical metadata. WSIRecord is the single source of truth for each WSI — including per-model outputs and processing status.
Construction paths:
- From a user-provided CSV via
from_csv. - Auto-matching from directories via
from_directories. - From a YAML manifest via
load(for resume).
save and loaded back for full reproducibility and resume support.
Example:
ModelOutput
model_name
Name of the model that produced this output.
embedding_path
Path to the embedding output file.
prediction_path
Path to the prediction output file.
tiles_csv_path
Path to the tile coordinates CSV.
tiles_dir
Directory containing exported tile images.
modalities
Input modalities used for this output (e.g.
["image"] or ["image", "bulk_rna"]).timestamp
ISO 8601 string of when the output was produced.
status
Processing status (
"pending", "done", "error").error
Error message if status is
"error".is_done
True if status is done.
has_embedding
True if an embedding path is set and exists.
has_prediction
True if a prediction path is set and exists.
set_stage_output
embedding_path / prediction_path).
"embed" or "predict".Modalities used for this output.
Output file path.
ISO 8601 timestamp.
get_stage_path
"embed" or "predict".Modality combination to look up. Falls back to the top-level path when
None.Resolved path, or
None if not recorded.to_dict
A dictionary representation of the model output.
from_dict
Dictionary with serialised model output fields.
A new
ModelOutput instance.WSIRecord
patient_id
Unique patient identifier.
wsi_id
Unique WSI identifier (typically the filename stem).
wsi_path
Path to the WSI file.
bulk_rna_id
Identifier for the paired bulk RNA sample.
bulk_rna_path
Path to the bulk RNA CSV/TSV file.
mask_path
Path to a tissue mask (pre-computed or cached).
mask_timestamp
When the mask was computed/discovered.
timepoint
Temporal ordering label (e.g.
"t0", "t1").outputs
Per-model output records keyed by model name.
labels
Arbitrary categorical annotations.
metadata
Arbitrary clinical/treatment metadata.
get_output
Model identifier string.
The
ModelOutput for the given model.is_stage_done
modality_outputs are tracked, the check is scoped to the record’s current available_modalities so that linking new data (e.g. bulk RNA) automatically surfaces pending work without requiring force=True.
Model identifier.
"embed" or "predict".True if the output path exists on disk.has_mask
True if a mask path is set and exists on disk.
available_modalities
"image". Includes "bulk_rna" when bulk_rna_path is set.
to_dict
A dictionary representation of the WSI record.
from_dict
Dictionary with serialised WSI record fields.
A new
WSIRecord instance.PatientRecord
patient_id
Unique patient identifier.
wsis
Ordered list of WSI records (by timepoint).
labels
Patient-level categorical annotations.
metadata
Patient-level clinical metadata.
Cohort
WSIRecord entries grouped by patient. It stores all information needed to reproduce and resume an inference run: file locations, pairing logic, timepoints, per-model outputs, labels, and clinical metadata.
Construction: Use from_csv, from_directories, or load rather than calling the constructor directly.
records
Initial list of
WSIRecord entries.from_csv
wsi_id: WSI identifier (filename stem or full name).
patient_id: if absent, derived fromwsi_id.bulk_rna_id: paired RNA sample identifier.timepoint: temporal label.wsi_path: explicit path override.bulk_rna_path: explicit path override.- Any other columns are treated as labels if prefixed with
label_or as metadata otherwise.
wsi_id, patient_id, bulk_rna_id, timepoint, wsi_path, bulk_rna_path.
path
Path to the CSV manifest.
wsi_dir
Directory to resolve WSI paths from.
bulk_rna_dir
Directory to resolve bulk RNA paths from.
columns
Mapping of canonical field names to actual CSV column names. Unmapped fields fall back to their canonical name.
Cohort.
from_directories
t0, t1, …) in alphabetical order.
Directory containing WSI files.
Directory containing bulk RNA files. When
None, wsis are registered without RNA.Directory containing pre-computed masks (PNG files whose stem matches the WSI stem).
Optional callable that maps a filename stem to a patient ID.
A populated
Cohort.save
Destination file path.
The resolved output path.
load
Path to the YAML manifest file.
A fully-populated
Cohort.upsert
WSI identifier.
Patient identifier (defaults to wsi_id).
Path to WSI file.
Path to bulk RNA file.
Path to tissue mask.
Timepoint label.
The inserted or updated
WSIRecord.get_wsi
None.
Unique WSI identifier (typically the file stem).
The matching
WSIRecord, or None if not found.link_bulk_rna
bulk_rna_path are skipped.
This enables late-binding of bulk RNA data: build a cohort from WSIs first, then call this method to attach RNA when it becomes available.
Directory containing bulk RNA files.
File extensions to match. Defaults to
{".csv", ".tsv"}.Column delimiter for parsing. When
None the separator is inferred from the file extension.Column name containing gene identifiers. Required (with
value_column) for long-format files (e.g. GDC/TCGA gene quantification TSVs).Column name containing expression values to read (e.g.
"tpm_unstranded").When
True, strips version suffixes from gene identifiers (e.g. ENSG…00003.15 → ENSG…00003).Number of records that were linked.
add_labels
Mapping of
{identifier: {label_name: value}}.Key to match on —
"patient_id" or "wsi_id".add_metadata
Mapping of
{identifier: {field: value}}.Key to match on —
"patient_id" or "wsi_id".add_labels_from_csv
patient_id). Columns prefixed with label_ are treated as labels; remaining columns as metadata.
Path to the labels CSV.
Join key column name.
num_patients
patients
get_patient
PatientRecord for a given patient.
Patient identifier string.
The matching
PatientRecord.KeyError— If patient_id is not found.
wsi_ids
patient_ids
wsi_paths
None).
pending
Model identifier.
"embed" or "predict".List of
WSIRecord entries still needing processing.to_csv
save.
Destination file path.
The resolved output path.
summary
Multi-line summary string.

