sparank.data.simulate
- sparank.data.simulate(adata, vocabs, modality_names, top_ks, cell_types, *, celltype_key, save_dir, cfg, all_features=None, context2id=None, sim_batch_key=None, context_key=None)[source]
Batch-proportional pseudo-spot simulation with memmap writing.
The input adata is a pre-concatenated multi-modal AnnData whose
.Xcontains features from all modalities (columns prefixed by modality name, e.g.rna-GAPDH,adt-CD3). Pseudo-spots are generated independently per batch, with the number of spots proportional to each batch’s share of total cells. Results are tokenised on-the-fly and streamed to memory-mapped files.- Parameters:
adata (AnnData) – Concatenated multi-modal single-cell reference.
.obsmust contain celltype_key (and optionally sim_batch_key, context_key)..Xhas shape(n_cells, sum_of_features_across_modalities).vocabs (Dict[str, Dict[str, int]]) – Per-modality token vocabularies mapping prefixed feature names to integers.
modality_names (List[str]) – Ordered list of modalities, e.g.
["rna"]or["rna", "adt"].top_ks (Dict[str, int]) – Per-modality mapping defining the number of top-ranked features to keep per spot.
cell_types (List[str]) – Ordered list of cell-type names for the label vector.
celltype_key (str) – Column name in
adata.obsstoring the cell-type labels.save_dir (str) – Directory path where the output memmap files will be saved.
cfg (SimulationConfig) – Configuration object containing the simulation hyperparameters.
all_features (List[str], optional) – Optional flat list of all prefixed feature names across modalities. If provided, the adata is subset to these features for optimization.
context2id (Dict[str, int], optional) – Optional mapping from context labels to integer IDs.
sim_batch_key (str, optional) – Optional column in
adata.obswith identifiers to split the simulated context.context_key (str, optional) – Optional column in
adata.obsstoring context labels.
- Returns:
A tuple containing: - real_total: Number of valid samples actually written. - inp_path: Path to the tokenised-input memmap file. - lbl_path: Path to the label memmap file. - ctx_path: Path to the context memmap file, or
Noneif unused.- Return type: