Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

key error #255

Closed
anke-king opened this issue Mar 14, 2024 · 5 comments
Closed

key error #255

anke-king opened this issue Mar 14, 2024 · 5 comments

Comments

@anke-king
Copy link

Setup

I am reporting a problem with GSEApy version, Python version, and operating
system as follows:

3.9.18 | packaged by conda-forge | (main, Dec 23 2023, 16:36:46) 
[Clang 16.0.6 ]
CPython
macOS-12.4-x86_64-i386-64bit
1.1.1

(Please copy and run the above in your Python, and copy-and-paste the output)

Expected behaviour

prerank

Actual behaviour

key error: "gene_name"

Steps to reproduce

pre_res = gp.prerank(rnk = ranking, gene_sets = 'GO_Biological_Process_2021', seed = 6, permutation_num = 100, min_size = 5 )
ranking is df with 2 columns.

@zqfang
Copy link
Owner

zqfang commented Mar 14, 2024

can you print first 5 row of your ranking ?

make sure the first column is gene_symbol, the second column is ranking values

@anke-king
Copy link
Author

This is how I calculate the rank:
df['Rank']=np.log10(df.padj)*df.log2FoldChange
df = df.sort_values('Rank', ascending = False).reset_index(drop = True)
ranking = df[['gene_name', 'Rank']]

and ranking.head(5) yields:
gene_name Rank
0 mCherry_F 715.775855
1 ORF_032 297.716912
2 ORF_061 286.317331
3 IFI44L 244.128925
4 ORF_029 223.571278

@zqfang
Copy link
Owner

zqfang commented Mar 15, 2024

can you print out the full error message ?

I don't know where the errors come from

@anke-king
Copy link
Author

here is the entire error message:

KeyError Traceback (most recent call last)
Cell In[21], line 1
----> 1 pre_res = gp.prerank(rnk = ranking, gene_sets = 'GO_Biological_Process_2021', seed = 6, permutation_num = 100, min_size = 5 )

File ~/miniconda3/lib/python3.9/site-packages/gseapy/init.py:396, in prerank(rnk, gene_sets, outdir, pheno_pos, pheno_neg, min_size, max_size, permutation_num, weight, ascending, threads, figsize, format, graph_num, no_plot, seed, verbose, *arg, **kwargs)
375 weight = kwargs["weighted_score_type"]
377 pre = Prerank(
378 rnk,
379 gene_sets,
(...)
394 verbose,
395 )
--> 396 pre.run()
397 return pre

File ~/miniconda3/lib/python3.9/site-packages/gseapy/gsea.py:444, in Prerank.run(self)
441 assert self.min_size <= self.max_size
443 # parsing rankings
--> 444 dat2 = self.load_ranking()
445 assert len(dat2) > 1
446 self.ranking = dat2

File ~/miniconda3/lib/python3.9/site-packages/gseapy/gsea.py:418, in Prerank.load_ranking(self)
415 rank_metric = self._load_data(self.rnk) # gene id is the first column
416 if rank_metric.select_dtypes(np.number).shape[1] == 1:
417 # return series
--> 418 return self._load_ranking(rank_metric)
419 ## In case the input type multi-column ranking dataframe
420 # drop na gene id values
421 rank_metric = rank_metric.dropna(subset=rank_metric.columns[0])

File ~/miniconda3/lib/python3.9/site-packages/gseapy/gsea.py:385, in Prerank._load_ranking(self, rank_metric)
383 rank_metric.dropna(how="any", inplace=True)
384 # rename duplicate id, make them unique
--> 385 rank_metric = self.make_unique(rank_metric, col_idx=0)
386 # reset ranking index, because you have sort values and drop duplicates.
387 rank_metric.reset_index(drop=True, inplace=True)

File ~/miniconda3/lib/python3.9/site-packages/gseapy/base.py:246, in GSEAbase.make_unique(self, rank_metric, col_idx)
243 self.logger.info("Input gene rankings contains duplicated IDs")
244 mask = rank_metric.duplicated(subset=id_col, keep=False)
245 dups = (
--> 246 rank_metric.loc[mask, id_col]
247 .groupby(id_col)
248 .cumcount()
249 .map(lambda c: "
" + str(c) if c else "")
250 )
251 rank_metric.loc[mask, id_col] = rank_metric.loc[mask, id_col] + dups
252 return rank_metric

File ~/miniconda3/lib/python3.9/site-packages/pandas/core/series.py:2238, in Series.groupby(self, by, axis, level, as_index, sort, group_keys, observed, dropna)
2235 raise TypeError("as_index=False only valid with DataFrame")
2236 axis = self._get_axis_number(axis)
-> 2238 return SeriesGroupBy(
2239 obj=self,
2240 keys=by,
2241 axis=axis,
2242 level=level,
2243 as_index=as_index,
2244 sort=sort,
2245 group_keys=group_keys,
2246 observed=observed,
2247 dropna=dropna,
2248 )

File ~/miniconda3/lib/python3.9/site-packages/pandas/core/groupby/groupby.py:1329, in GroupBy.init(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, observed, dropna)
1326 self.dropna = dropna
1328 if grouper is None:
-> 1329 grouper, exclusions, obj = get_grouper(
1330 obj,
1331 keys,
1332 axis=axis,
1333 level=level,
1334 sort=sort,
1335 observed=False if observed is lib.no_default else observed,
1336 dropna=self.dropna,
1337 )
1339 if observed is lib.no_default:
1340 if any(ping._passed_categorical for ping in grouper.groupings):

File ~/miniconda3/lib/python3.9/site-packages/pandas/core/groupby/grouper.py:1043, in get_grouper(obj, key, axis, level, sort, observed, validate, dropna)
1041 in_axis, level, gpr = False, gpr, None
1042 else:
-> 1043 raise KeyError(gpr)
1044 elif isinstance(gpr, Grouper) and gpr.key is not None:
1045 # Add key to exclusions
1046 exclusions.add(gpr.key)

KeyError: 'gene_name'

@zqfang
Copy link
Owner

zqfang commented Mar 19, 2024

I think the issue is same to #251, can you install v1.1.2 and try again ? the error should be gone in the fixed version( v1.1.2)

@zqfang zqfang closed this as completed Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants