bioptimus.data.omics.bulkrna

Bulk RNA-seq data reader. Provides BulkRNAData — a lightweight wrapper around a CSV/TSV of gene expression counts for a single patient sample. Wide format (default) — one row per sample, genes as columns:

sample_id,gene_1,gene_2,...,gene_N
patient_001,12.3,4.5,...,0.1

The first column is treated as the sample identifier (index). All remaining columns are gene expression values. Only the first row is used; additional rows are ignored. Long format — one row per gene, activated by setting gene_column and value_column:

gene_name,raw_count,normalized
gene_1,12.3,0.5
gene_2,4.5,0.2

The separator is inferred from the file extension (.tsv → tab, otherwise comma) but can be overridden via separator.

BulkRNAData

class BulkRNAData()

Single-sample bulk RNA-seq counts for PyTorch pipelines. Reads a CSV or TSV with gene expression counts and stores the values as a contiguous np.float32 array for zero-copy tensor creation via counts. Supports two layouts:

Wide (default): one row per sample, genes as columns. The first column is the sample identifier (index).
Long: one row per gene, activated by setting gene_column and value_column.

Path to the CSV or TSV file.

Optional gene list to filter/align columns to. Missing genes are zero-filled, extras are dropped. When None all columns are kept.

Optional preprocessing transform applied when accessing counts.

Column delimiter. When None the separator is inferred from the file extension (.tsv → tab, otherwise comma).

Column name that contains gene identifiers. Required for long-format files.

Column name that contains expression values to read. Required for long-format files.

Sample identifier (first row index in wide format, or the file stem in long format).

Ordered list of gene column names.

num_genes

@property
def num_genes() -> int

Number of gene columns.

counts

@property
def counts() -> list[float]

Gene counts as a list of floats.

get_counts

def get_counts(sample_name: str | None = None) -> list[float]

Return gene counts as a flat sequence. When no omic_transform is configured, returns a plain list[float]. With a transform, returns the transform output (typically an np.ndarray).

str | None

Ignored (kept for backward compatibility).

list[float]

Gene expression values in column order.

Overview

Get Started

Preprocessing

Workflows

Reference

BulkRNAData

num_genes

counts

get_counts

​BulkRNAData

​num_genes

​counts

​get_counts

BulkRNAData

num_genes

counts

get_counts