Remove filtering of duplicate gene symbols when rownames of counts matrix are Ensembl IDs #111

allyhawkins · 2024-05-16T19:29:25Z

I was coming across an error when trying to run this using an object that had Ensembl IDs as the row names rather than gene symbols. During the annotateGenes() function, I was getting the following error:

Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 7104, 7107

I narrowed this down to this line, which removes any genes that have a duplicated gene symbol in the reference edb matrix. However, you don't do the same thing to the mtx variable.

SCEVAN/R/preProcessing.R

Line 32 in 228beea

edb <- edb[!duplicated(edb$gene_name),]

This is probably necessary with gene symbols since the dimensions between edb and mtx may not match if duplicated values are present in edb. However, if using Ensembl IDs there are no duplicated IDs, so this step isn't necessary. Also you should only remove duplicated for IDs for the column indicated with use_geneID, although I think if it's gene_id, then I would skip this step all together.

The text was updated successfully, but these errors were encountered:

AntonioDeFalco · 2024-06-07T08:48:37Z

Thanks @allyhawkins,
I fixed it in the last commit f1394b3.

Regards

allyhawkins changed the title ~~Remove filtering by duplicate gene symbols when rownames of counts matrix are Ensembl IDs~~ Remove filtering of duplicate gene symbols when rownames of counts matrix are Ensembl IDs May 16, 2024

allyhawkins mentioned this issue May 17, 2024

Use copy number inference to estimate tumor/normal cells in a Ewing sarcoma sample AlexsLemonade/OpenScPCA-analysis#403

Closed

AntonioDeFalco closed this as completed Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove filtering of duplicate gene symbols when rownames of counts matrix are Ensembl IDs #111

Remove filtering of duplicate gene symbols when rownames of counts matrix are Ensembl IDs #111

allyhawkins commented May 16, 2024

AntonioDeFalco commented Jun 7, 2024

Remove filtering of duplicate gene symbols when rownames of counts matrix are Ensembl IDs #111

Remove filtering of duplicate gene symbols when rownames of counts matrix are Ensembl IDs #111

Comments

allyhawkins commented May 16, 2024

AntonioDeFalco commented Jun 7, 2024