BulkRNAData — a lightweight wrapper around a CSV/TSV of gene expression counts for a single patient sample.
Wide format (default) — one row per sample, genes as columns:
gene_column and value_column:
.tsv → tab, otherwise comma) but can be overridden via separator.
BulkRNAData
np.float32 array for zero-copy tensor creation via counts.
Supports two layouts:
- Wide (default): one row per sample, genes as columns. The first column is the sample identifier (index).
- Long: one row per gene, activated by setting
gene_columnandvalue_column.
path
Path to the CSV or TSV file.
gene_names
Optional gene list to filter/align columns to. Missing genes are zero-filled, extras are dropped. When
None all columns are kept.omic_transform
Optional preprocessing transform applied when accessing
counts.separator
Column delimiter. When
None the separator is inferred from the file extension (.tsv → tab, otherwise comma).gene_column
Column name that contains gene identifiers. Required for long-format files.
value_column
Column name that contains expression values to read. Required for long-format files.
sample_name
Sample identifier (first row index in wide format, or the file stem in long format).
gene_names
Ordered list of gene column names.
num_genes
counts
get_counts
omic_transform is configured, returns a plain list[float]. With a transform, returns the transform output (typically an np.ndarray).
Ignored (kept for backward compatibility).
Gene expression values in column order.

