-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
agat_sp_merge_annotations.pl output incompatible with cellranger #457
Comments
It sounds the code involved in your issue has been updated after v1.0.0.
The sanitization is made by all script with |
This comment was marked as outdated.
This comment was marked as outdated.
We are still getting an error when trying to build a custom Cell Ranger index based on the AGAT-merged GTF, even when using v.1.4.0. I'll try to provide more details tomorrow. |
Could you share a sample of the data used? |
I suspect the problematic entries come from this GFF file which defines some pairs of genes with the exact same coordinates and structure but different IDs. |
Following this post in parallel https://www.biostars.org/p/9601455/#9601546 |
To avoid to have mRNA instead of transcript in the output you should avoid converting in "relax" mode that allow all type of feature type in column 3. |
This comment was marked as resolved.
This comment was marked as resolved.
There are probably two issues. One is the fact that you have multiple value attributes like gene_id "name1" "name2"; Could you send me over the fasta file corresponding to the GFF file you provided? |
This comment was marked as resolved.
This comment was marked as resolved.
@Juke34: Is there anything I can do to troubleshoot this further or do I need to give up on using AGAT and find an alternative approach to generate my GTF? @LliliansCalvo: Would you mind sharing your post-processing script that enabled you to feed AGAT's output into Cell Ranger? |
@mschilli87 I can easily add patch to avoid gene_id attribute to have several values but I do need the GFF sample that provided the
GTF output. |
This comment was marked as resolved.
This comment was marked as resolved.
I wanted the fasta to make some test directly with cell ranger. |
@mschilli87 Could you try the code from branch 457 ? And let me know if the error left? |
This comment was marked as outdated.
This comment was marked as outdated.
I am still seing some multi-value tags (incl. edit: Could this be related to me having |
Yes I remove double attributes only for P.S: Merging annotation should merge 2 isoforms in one only and only if they are 100% identical. But multi-values is not something problematic in general, except for ID/Parent. attributes in GFF and gene_id, transcript_id where it should be forbidden to keep file consistency. Does it work with cellranger? |
No. It seems to be throwing off there parser. A line with double |
Also, would it be possible to preserve that information somewhere? In our case, one gene_id might be an ENSEMBL Gene ID while the other is just a random number. While I don't expect AGAT to pick the more useful one, it would be great to be able to check if the random number ID masks a well know gene if it comes up as interesting in a downstream analysis. Maybe an altarnative IDs attribute or at least mentioning the IDs in a parsable way in the warning messages. This would also make it easier now to quantify the problem and trace some of the cases upstream to figure out why some genes have shadow copies in that annotation. |
In the current code it is saved in |
I still don't get how you end up with several values in gene_id and transcript_id, I should definitely fix that if it is AGAT that makes this mistake. |
This comment was marked as outdated.
This comment was marked as outdated.
OK. Here is an update:
After merging them with our annotation into GFF3, those lines are reformatted by AGAT to contain
From there we use AGAT fto convert the GFF3 to GTF where we get
Which seems to be throwing off CellRanger:
Note that the erro points towards the
However, when substituting Regarding the issue with several IDs values, I am still investigating and hope this turns out to be an error on my end somehow. |
Update 2: I definitely still get multiple gene IDs after merging. Input GFF:
Output GFF3:
|
Thank you, this helps a lot. |
This comment was marked as resolved.
This comment was marked as resolved.
I have been swamped. |
This comment was marked as resolved.
This comment was marked as resolved.
You can try branch 457. It will be available in the next release (I didn't plan any soon...) |
Fetaures may still have several values in an attributes while merging, but now we avoid that for gene_id, ID, Parent attributes (Which are sensible because used to understand the relationship between the features). If cellranger do need to absolute avoid several values attrributes, I may add an extra script to clean line by line this issue. |
This comment was marked as outdated.
This comment was marked as outdated.
Dear @Juke34, We are still getting the same error from CellRanger. If possible I would like to test with a version that has single-value attributes throughout, to make sure CellRanger processes such a file just fine. Then, we could maybe try to come up with a reasonable interface to specify the attributes that must not contain > 1 value so I could test which ones I could preserve (if any) for the cellRanger analysis. Thank you again for you support. |
Ok then I will add an extra option for the GFF2GTF to deduplicate multi values "attributes" i.e. gene_name "NameA" "NameB" "NameC" becomes gene_name ''NameA"; gene_name1 "NameB"; gene_name1 "NameC" |
That sounds very useful. Thank you! |
Should be fine by now. Get the last commit from the 457 branch. Do the install :
then activate the new param: then run your test: |
Actually I will make a new release. I will wait your feedback to integrate also this fix in the new release. |
I'll try to get back to you by tomorrow. |
Sorry for the delay. While I cannot report 100 % success, this seams to be due to a problem with the genome FASTA rather than the GTF. At least the error message above is gone now. So feel free to close this issue with the new release. Thank you again for your support in this. |
Thx for your feedback |
agat_sp_merge_annotations.pl output incompatible with cellranger
I have two gff files I want to merge.
One is an old annotation, and the other is a new annotation I have just made using braker3.
In order to merge them i am using agat_1.0.0 in the singularity container.
Here is the code with all the steps I have done:
As you can see agat generates 2 gene_id and 2 transcript_id for this transcript. I have fixed manually this one but this also happens for other genes.
Hope you can help !
Thanks !!
The text was updated successfully, but these errors were encountered: