Skip to main content
Bulk RNA-seq data reader. Provides BulkRNAData — a lightweight wrapper around a CSV/TSV of gene expression counts for a single patient sample. Wide format (default) — one row per sample, genes as columns:
sample_id,gene_1,gene_2,...,gene_N
patient_001,12.3,4.5,...,0.1
The first column is treated as the sample identifier (index). All remaining columns are gene expression values. Only the first row is used; additional rows are ignored. Long format — one row per gene, activated by setting gene_column and value_column:
gene_name,raw_count,normalized
gene_1,12.3,0.5
gene_2,4.5,0.2
The separator is inferred from the file extension (.tsv → tab, otherwise comma) but can be overridden via separator.

BulkRNAData

class BulkRNAData()
Single-sample bulk RNA-seq counts for PyTorch pipelines. Reads a CSV or TSV with gene expression counts and stores the values as a contiguous np.float32 array for zero-copy tensor creation via counts. Supports two layouts:
  • Wide (default): one row per sample, genes as columns. The first column is the sample identifier (index).
  • Long: one row per gene, activated by setting gene_column and value_column.
path
Path to the CSV or TSV file.
gene_names
Optional gene list to filter/align columns to. Missing genes are zero-filled, extras are dropped. When None all columns are kept.
omic_transform
Optional preprocessing transform applied when accessing counts.
separator
Column delimiter. When None the separator is inferred from the file extension (.tsv → tab, otherwise comma).
gene_column
Column name that contains gene identifiers. Required for long-format files.
value_column
Column name that contains expression values to read. Required for long-format files.
sample_name
Sample identifier (first row index in wide format, or the file stem in long format).
gene_names
Ordered list of gene column names.

num_genes

@property
def num_genes() -> int
Number of gene columns.

counts

@property
def counts() -> list[float]
Gene counts as a list of floats.

get_counts

def get_counts(sample_name: str | None = None) -> list[float]
Return gene counts as a flat sequence. When no omic_transform is configured, returns a plain list[float]. With a transform, returns the transform output (typically an np.ndarray).
sample_name
str | None
Ignored (kept for backward compatibility).
returns
list[float]
Gene expression values in column order.