sparank.data.build_vocab
- sparank.data.build_vocab(modality_features, cell_types, context_categories=None)[source]
Build per-modality vocabularies.
- Parameters:
modality_features (Dict[str, List[str]]) – A dictionary mapping modality names to lists of prefixed feature names. E.g.,
{"rna": ["rna-GAPDH", ...], "adt": ["adt-CD3", ...]}. For unimodal workflows without prefixes, use any key (e.g.,"rna").cell_types (List[str]) – A list of sorted cell-type labels.
context_categories (List[str], optional) – A list of context labels.
Noneindicates no context vocabulary should be generated.
- Returns:
A dictionary containing the generated mappings: -
vocabs:{mod_name: {feature: id, ...}}-mask_ids:{mod_name: int}-type2id:{cell_type: int}-cell_types: Original list of cell types. -context2id:{context: int}(only if context_categories is given).- Return type:
Dict[str, Any]