sparank.data.build_vocab

sparank.data.build_vocab(modality_features, cell_types, context_categories=None)[source]

Build per-modality vocabularies.

Parameters:
  • modality_features (Dict[str, List[str]]) – A dictionary mapping modality names to lists of prefixed feature names. E.g., {"rna": ["rna-GAPDH", ...], "adt": ["adt-CD3", ...]}. For unimodal workflows without prefixes, use any key (e.g., "rna").

  • cell_types (List[str]) – A list of sorted cell-type labels.

  • context_categories (List[str], optional) – A list of context labels. None indicates no context vocabulary should be generated.

Returns:

A dictionary containing the generated mappings: - vocabs: {mod_name: {feature: id, ...}} - mask_ids: {mod_name: int} - type2id: {cell_type: int} - cell_types: Original list of cell types. - context2id: {context: int} (only if context_categories is given).

Return type:

Dict[str, Any]