-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Funcotator Update for Datasource Release V1.8 #8512
Conversation
@jonn-smith I did not fully tackle the tests for the uplaoder script and I suspect they are pretty broken now... I have spot-checked that they work with all 4 bundles... |
Github actions tests reported job failures from actions build 6053420535
|
Github actions tests reported job failures from actions build 6053391476
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. I have a few questions / comments. Once tests are passing I'll review them
@@ -18,6 +18,7 @@ set -e | |||
|
|||
COSMIC_FILE=CosmicCompleteTargetedScreensMutantExport.tsv | |||
OUT_DB_FILE="Cosmic.db" | |||
OUT_TMP_FOLDER="~/tmp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you decide to remove the temp dir?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other way around... i added a tmp folder and i needed to do so because i was running into space issues on-prem
@@ -2,5 +2,5 @@ | |||
|
|||
# Downloads the HGNC data source from the HGNC website. | |||
|
|||
curl 'https://www.genenames.org/cgi-bin/download?col=gd_hgnc_id&col=gd_app_sym&col=gd_app_name&col=gd_status&col=gd_locus_type&col=gd_locus_group&col=gd_prev_sym&col=gd_prev_name&col=gd_aliases&col=gd_name_aliases&col=gd_pub_chrom_map&col=gd_date_mod&col=gd_date_sym_change&col=gd_date_name_change&col=gd_pub_acc_ids&col=gd_enz_ids&col=gd_pub_eg_id&col=gd_pub_ensembl_id&col=gd_pubmed_ids&col=gd_pub_refseq_ids&col=family.id&col=family.name&col=gd_ccds_ids&col=gd_vega_ids&col=md_eg_id&col=md_mim_id&col=md_refseq_id&col=md_prot_id&col=md_ensembl_id&col=md_ucsc_id&status=Approved&status_opt=2&where=&order_by=gd_app_sym_sort&format=text&limit=&hgnc_dbtag=on&submit=submit' > hgnc_download_$(date +%b%d%Y).tsv | |||
curl 'https://www.genenames.org/cgi-bin/download/custom?col=gd_hgnc_id&col=gd_app_sym&col=gd_app_name&col=gd_status&col=gd_locus_type&col=gd_locus_group&col=gd_prev_sym&col=gd_prev_name&col=gd_aliases&col=gd_name_aliases&col=gd_pub_chrom_map&col=gd_date_mod&col=gd_date_sym_change&col=gd_date_name_change&col=gd_pub_acc_ids&col=gd_enz_ids&col=gd_pub_eg_id&col=gd_pub_ensembl_id&col=gd_pubmed_ids&col=gd_pub_refseq_ids&col=family.id&col=family.name&col=gd_ccds_ids&col=gd_vega_ids&col=md_eg_id&col=md_mim_id&col=md_refseq_id&col=md_prot_id&col=md_ensembl_id&col=md_ucsc_id&status=Approved&status_opt=2&where=&order_by=gd_app_sym_sort&format=text&limit=&hgnc_dbtag=on&submit=submi' > hgnc_download_$(date +%b%d%Y).tsv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last characters before the >
look to be truncated.
Is it really submit=submi
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i just checked and that link works... ask the HGNC devs? Evidently both submi and submit work there...
@@ -2094,7 +2096,7 @@ private static List<Event> simpleSplitIntoBiallelics(final VariantContext vc) { | |||
final List<Event> result = new ArrayList<>(); | |||
|
|||
if (vc.isBiallelic()) { | |||
return Collections.singletonList(Event.ofWithoutAttributes(vc)); | |||
return Collections.singletonList(Event.ofWithoutAttributes(GATKVariantContextUtils.trimAlleles(vc, true, true))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you need this for testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, this is the fix from #8445. Let me rebase and this should disappear from the diff..
@@ -53,7 +53,9 @@ private DataSourceUtils() {} | |||
private static final String MANIFEST_SOURCE_LINE_START = "Source:"; | |||
private static final String MANIFEST_ALT_SOURCE_LINE_START = "Alternate Source:"; | |||
@VisibleForTesting | |||
static final Pattern VERSION_PATTERN = Pattern.compile(MANIFEST_VERSION_LINE_START + "\\s+(\\d+)\\.(\\d+)\\.(\\d\\d\\d\\d)(\\d\\d)(\\d\\d)(.*)"); | |||
static final Pattern OLD_VERSION_PATTERN = Pattern.compile(MANIFEST_VERSION_LINE_START + "\\s+(\\d+)\\.(\\d+)\\.(\\d\\d\\d\\d)(\\d\\d)(\\d\\d)(.*)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its in the tests... i'm somewhat remiss to get rid of the old syntax entirely since it does correspond to real releases that still exist and pretending every old version had the ref version included seems incorrect...
b502006
to
62abde0
Compare
@jonn-smith back to you. I fixed the broken test... Should be good for you to do your final datasource testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accompanying this branch is and will be an official new release of the Funcotator Datasource bundles:
gs://broad-public-datasets/funcotator/funcotator_dataSources.v1.8.hg19.20230908g.tar.gz
gs://broad-public-datasets/funcotator/funcotator_dataSources.v1.8.hg19.20230908s.tar.gz
gs://broad-public-datasets/funcotator/funcotator_dataSources.v1.8.hg38.20230908g.tar.gz
gs://broad-public-datasets/funcotator/funcotator_dataSources.v1.8.hg38.20230908s.tar.gz
Note that the format of the datasources bundles has changed somewhat, importantly they are now split into separate hg19 and hg38 bundles to cut down on size. In this branch are:
Fixes #8296