-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transcript ID in reference and expression profile do not match #237
Comments
Hi @SimonHegele, I'm not sure that I understand the issue - It looks like I hope that helps - thank you for your interest in NanoSim! |
Hi, The error message: I added two lines of code: |
Hi @SimonHegele, Thanks for that extra information and context! After tracing back the error message from the Without knowing the background of why that was done in that way from those who initially implemented NanoSim and trans-NanoSim, I am hesitant to change the code. I think that your workaround makes sense (although I know it isn't ideal) - I can more information to that error messages suggesting your fix so that it is less cryptic in the future. |
Hi @SimonHegele, Thanks for using NanoSim and bringing this issue to our attention. To better assist you, could you please share how you quantified the expression profile for the reference transcripts? To clarify, NanoSim provides a built-in functionality to quantify expression profiles. If you use the same reference transcriptome file for both the Just as a reminder, NanoSim allows users to employ any expression profile of their own (calculated by other tools or one with manually adjusted values), provided that the file adheres to a pre-defined format: a three-column, tab-delimited file with Transcript ID, Count, and TPM value, including a header. NanoSim doesn’t necessarily detect variations in the IDs in different files automatically, so users need to ensure that the transcript IDs in the reference transcriptome and the quantification files do match with each other if they decide to provide their own file directly and not run NanoSim's For more clarity and your reference to further look into: When running NanoSim in Feel free to reach out if you need further assistance. Thank you @lcoombe for looking into this. Cheers. |
Thanks for looking at this @SaberHQ! Please correct me if I'm wrong, @SimonHegele - I believe the issue is that the IDs did match between the quantification file and the reference transcriptome, but were not being recognized as matching by NanoSim? For example, |
You’re absolutely right @lcoombe —this is the key question, and it’s why I asked @SimonHegele for clarification earlier. Understanding how the expression file was created is crucial. Specifically, whether it was generated using NanoSim’s quantify mode or another method makes a big difference. From our extensive benchmarking, we’ve consistently seen that IDs remain identical when the quantify mode is used. This has been reliable for years, so I’m hesitant to label this as a bug without further details. However, there’s always the possibility of a specific edge case we might have overlooked, which we should definitely investigate if needed. I also suspect the issue might be linked to the following lines of code in https://github.com/bcgsc/NanoSim/blob/master/src/get_primary_sam.py#L252-L261 |
@SaberHQ I used NanoSim for the quantification. |
Thanks for the clarification @SimonHegele Then, looks like there is a bug and we will look into it for sure. I suspect that is either related to the code lines in PS: Would you be able to also share a couple lines from the alignment file? I wanna make sure it is not due to minimap2 sam format output etc... Thanks. |
After quantification I tried simulating
IDs in expression profile:
NM_001001130.3
IDs in reference:
>NM_001001130.3 Mus musculus zinc finger protein 85 (Zfp85), mRNA
The text was updated successfully, but these errors were encountered: