mixer.generate and model.fit error #9

Yooooopick · 2023-06-25T09:09:06Z

Hello,
Thank you for your hard work for Kassandra. It's a nice and useful tool for cell fraction detection from bulk RNAseq data. After git clone https://github.com/BostonGene/Kassandra/ and running the "Model Training.ipynb" vignettes using the example data in the "/data" directory, I get the following error:

expr,values = mixer.generate('General_cells') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/root/Kassandra/core/mixer.py", line 133, in generate **self.generate_pure_cell_expressions(genes, self.num_av, [modeled_cell])} File "/root/Kassandra/core/mixer.py", line 189, in generate_pure_cell_expressions cells_index = self.change_subtype_proportions(cell=cell, File "/root/Kassandra/core/mixer.py", line 288, in change_subtype_proportions subtype_proportions = {cell: dict(self.proportions.loc[specified_subtypes])} File "/root/anaconda3/envs/kassandra/lib/python3.8/site-packages/pandas/core/indexing.py", line 1091, in __getitem__ check_dict_or_set_indexers(key) File "/root/anaconda3/envs/kassandra/lib/python3.8/site-packages/pandas/core/indexing.py", line 2618, in check_dict_or_set_indexers raise TypeError( TypeError: Passing a set as an indexer is not supported. Use a list instead.

and then,
>>> model.fit(mixer) ============== L1 models ============== Generating mixes for B_cells model Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/root/Kassandra/core/model.py", line 78, in fit expr, values = mixer.generate(cell, genes=self.cell_types[cell].genes, random_seed=i+1) File "/root/Kassandra/core/mixer.py", line 132, in generate average_cells = {**self.generate_pure_cell_expressions(genes, 1, cells_to_mix), File "/root/Kassandra/core/mixer.py", line 189, in generate_pure_cell_expressions cells_index = self.change_subtype_proportions(cell=cell, File "/root/Kassandra/core/mixer.py", line 288, in change_subtype_proportions subtype_proportions = {cell: dict(self.proportions.loc[specified_subtypes])} File "/root/anaconda3/envs/kassandra/lib/python3.8/site-packages/pandas/core/indexing.py", line 1091, in __getitem__ check_dict_or_set_indexers(key) File "/root/anaconda3/envs/kassandra/lib/python3.8/site-packages/pandas/core/indexing.py", line 2618, in check_dict_or_set_indexers raise TypeError( TypeError: Passing a set as an indexer is not supported. Use a list instead.

Do you know what the problem might be?
Thank you!

The text was updated successfully, but these errors were encountered:

shpakb · 2023-06-27T13:38:51Z

Hi @Yooooopick,

There are some cell types terms that are missing in "Cell_type" column of cells annotation data frame. Here is some code to check what you are missing:

missing_cts = [x for x in cell_types.get_all_subtypes('General_cells') if not x in cells_annot['Cell_type'].unique()]
missing_cts

There should't be any problems if you just run "Model Training.ipynb" as it is. Just checked it.

Yooooopick · 2023-06-28T08:59:58Z

Thank you for your kind reply.
I run the code and the result is shown below:
> ['Immune_general', 'Monocytic_cells']
And I think the 'Immune_general','Monocytic_cells' belong to the upper level of annotation to such as Monocytes and macrophage and actually can not appear in the training data.
But concerning about this reason, I edit the "/config/cell_types.yaml" file and remove the 'Immune_general' and 'Monocytic_cells' ones and change the parent_type to "General_cells" despite the cell_proportion and so on will not be accurate. The same error appeared again.

Actually, I run the "Model Training.ipynb" vignettes using the example data in the "/data" directory after getting clone from the website just like below and this error is still here.
cancer_sample_annot = pd.read_csv('data/cancer_samples_annot.tsv.tar.gz', sep='\t', index_col=0)
cancer_expr = pd.read_csv('data/cancer_expr.tsv.tar.gz', sep='\t', index_col=0)
cells_sample_annot = pd.read_csv('data/cells_samples_annot.tsv.tar.gz', sep='\t', index_col=0)
cells_expr = pd.read_csv('data/cells_expr.tsv.tar.gz', sep='\t', index_col=0)

I will appreciate your recommended solution.

shpakb · 2023-06-28T12:21:17Z

Here is some code to patch annotation for missing cell types:

# adding missing cell types
cell_types = CellTypes.load('configs/full_blood_model.yaml')
missing_cts = [x for x in cell_types.get_all_subtypes('General_cells') if not x in cells_annot['Cell_type'].unique()]

for ct in missing_cts:
    subtypes = cell_types.get_direct_subtypes(ct)
    annot = cells_annot.loc[cells_annot['Cell_type'].isin(subtypes)]
    annot.index
    expr = cells_expr[annot.index]
    annot['Cell_type'] = ct
    annot.index = annot.index + f'_{ct}'
    annot['Dataset'] = annot.index
    expr.columns = expr.columns + f'_{ct}'
    cells_expr = pd.concat([cells_expr, expr], axis=1)
    cells_annot = pd.concat([cells_annot, annot])

It will duplicate annotation and expressions for all the direct subtypes of "Monocytic_cells" (Monocytes, Macrophages) and "Immune_general" (T, B, NK, mono, etc). Then you can proceed with the training using original config.

jsangalang · 2023-11-18T22:28:44Z

Hello, I still believe there is an error with the training dataset provided on the website. I tried the additional patch you included, but there are still no "Dendritic_cells" cell type found in the training dataset from cell_types.yaml.
I commented the Dendritic_cells from cell_types.yaml, which worked. Please address this issue in your dataset annotation.

model_column = 'Tumor_model_annot'
samples = data_annot.loc[data_annot['Tumor_model_annot'] == 'cancer_cells'].index
cancer_expr = data_expr[samples]
cancer_annot = data_annot.loc[samples]
cancer_annot['Tumor_type'] = cancer_annot['Dataset']
cancer_annot = cancer_annot[['Tumor_type', 'Dataset']]

samples = data_annot.loc[~data_annot[model_column].isna()].index
cells_expr = data_expr[samples]

cells_annot = data_annot.loc[samples]
cells_annot = cells_annot[[model_column, 'Dataset']]
cells_annot.columns = ['Cell_type', 'Dataset']
cells_annot = pd.concat([lab_annot, cells_annot])
cells_annot.loc[cells_annot['Dataset'].isna(), 'Dataset'] = cells_annot.loc[cells_annot['Dataset'].isna()].index
cells_expr = pd.concat([lab_expr, cells_expr], axis=1)

# to make sure that there is no repeated samples
samples = sorted(list(set(cells_annot.index).intersection(set(cells_expr.columns))))
cells_expr = cells_expr[samples]
cells_annot = cells_annot.loc[samples]

print(cells_expr.shape, cells_annot.shape)
print(cancer_expr.shape, cancer_annot.shape)

##############################

# Load cell types model

cell_types = CellTypes.load('configs/cell_types.yaml')
missing_cts = [x for x in cell_types.get_all_subtypes('General_cells') if not x in cells_annot['Cell_type'].unique()]
missing_cts

for ct in missing_cts:
    subtypes = cell_types.get_direct_subtypes(ct)
    annot = cells_annot.loc[cells_annot['Cell_type'].isin(subtypes)]
    annot.index
    expr = cells_expr[annot.index]
    annot['Cell_type'] = ct
    annot.index = annot.index + f'_{ct}'
    annot['Dataset'] = annot.index
    expr.columns = expr.columns + f'_{ct}'
    cells_expr = pd.concat([cells_expr, expr], axis=1)
    cells_annot = pd.concat([cells_annot, annot])

# to make sure that there is no repeated samples
samples = sorted(list(set(cells_annot.index).intersection(set(cells_expr.columns))))
cells_expr = cells_expr[samples]
cells_annot = cells_annot.loc[samples]
print(cells_expr.shape, cells_annot.shape)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mixer.generate and model.fit error #9

mixer.generate and model.fit error #9

Yooooopick commented Jun 25, 2023

shpakb commented Jun 27, 2023

Yooooopick commented Jun 28, 2023

shpakb commented Jun 28, 2023

jsangalang commented Nov 18, 2023

mixer.generate and model.fit error #9

mixer.generate and model.fit error #9

Comments

Yooooopick commented Jun 25, 2023

shpakb commented Jun 27, 2023

Yooooopick commented Jun 28, 2023

shpakb commented Jun 28, 2023

jsangalang commented Nov 18, 2023