-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Funcotator - Need Performance Upgrades #4586
Comments
A large speedup has already been achieved in #4740. According to @lbergelson:
|
We should investigate tuning the caching parameter to increase speed further. We still need to profile, increase speed further, and compare to Oncotator. |
We also need to profile on multiple machines and verify that |
Turns out the caching fix is only a partial fix because of the discrepancies between b37 contig names, hg19 contig names, and the fact that the datasources are inconsistent between them. |
Fixes #4586 Intermediate checkin. Need to fix data sources to go along with this change. They must be uniformly set to HG19. Must update the minimum data source version to the new release.
Fixes #4586 Intermediate checkin. Need to fix data sources to go along with this change. They must be uniformly set to HG19. Must update the minimum data source version to the new release. Now HG19 annotations go fast. Made code in GATK assume data sources are HG19. Requires a new set of data sources (1.4.20180615). Simplified `Funcotator::enqueueAndHandleVariant`. Not clear that the `--allow-hg19-gencode-b37-contig-matching-override` flag does anything anymore. Now testCanAnnotateMixedContigHg19Clinvar will pass.
Fixes #4586 Released new version of datasources to go with this release (1.4.20180615). This was necessary because the data sources needed to be made consistent with hg19 (before they were a mix of hg19 and b37 contig names). Now Funcotator assumes all data sources for the hg19 reference are compliant with hg19 contig names. Updated the minimum data source version to the new release (1.4.20180615). Requires a new set of data sources (). Simplified `Funcotator::enqueueAndHandleVariant`. Not clear that the `--allow-hg19-gencode-b37-contig-matching-override` flag does anything anymore. Updated the `getDbSNP.sh` and `createSqliteCosmicDb.sh` data source scripts to preprocess those data sources to be have hg19-compliant contigs names.
Fixes #4586 Released new version of datasources to go with this release (1.4.20180615). This was necessary because the data sources needed to be made consistent with hg19 (before they were a mix of hg19 and b37 contig names). Now Funcotator assumes all data sources for the hg19 reference are compliant with hg19 contig names. Updated the minimum data source version to the new release (1.4.20180615). Requires a new set of data sources (). Simplified `Funcotator::enqueueAndHandleVariant`. Not clear that the `--allow-hg19-gencode-b37-contig-matching-override` flag does anything anymore. Updated the `getDbSNP.sh` and `createSqliteCosmicDb.sh` data source scripts to preprocess those data sources to be have hg19-compliant contigs names.
Fixes #4586 Released new version of datasources to go with this release (1.4.20180615). This was necessary because the data sources needed to be made consistent with hg19 (before they were a mix of hg19 and b37 contig names). Now Funcotator assumes all data sources for the hg19 reference are compliant with hg19 contig names. Updated the minimum data source version to the new release (1.4.20180615). Requires a new set of data sources (). Simplified `Funcotator::enqueueAndHandleVariant`. Not clear that the `--allow-hg19-gencode-b37-contig-matching-override` flag does anything anymore. Updated the `getDbSNP.sh` and `createSqliteCosmicDb.sh` data source scripts to preprocess those data sources to be have hg19-compliant contigs names.
Fixes #4586 Released new version of datasources to go with this release (1.4.20180615). This was necessary because the data sources needed to be made consistent with hg19 (before they were a mix of hg19 and b37 contig names). Now Funcotator assumes all data sources for the hg19 reference are compliant with hg19 contig names. Updated the minimum data source version to the new release (1.4.20180615). Requires a new set of data sources (). Simplified `Funcotator::enqueueAndHandleVariant`. Not clear that the `--allow-hg19-gencode-b37-contig-matching-override` flag does anything anymore. Updated the `getDbSNP.sh` and `createSqliteCosmicDb.sh` data source scripts to preprocess those data sources to be have hg19-compliant contigs names.
Fixes #4586 Released new version of datasources to go with this release (1.4.20180615). This was necessary because the data sources needed to be made consistent with hg19 (before they were a mix of hg19 and b37 contig names). Now Funcotator assumes all data sources for the hg19 reference are compliant with hg19 contig names. Updated the minimum data source version to the new release (1.4.20180615). Requires a new set of data sources (). Simplified `Funcotator::enqueueAndHandleVariant`. Not clear that the `--allow-hg19-gencode-b37-contig-matching-override` flag does anything anymore. Updated the `getDbSNP.sh` and `createSqliteCosmicDb.sh` data source scripts to preprocess those data sources to be have hg19-compliant contigs names.
) Now Funcotator assumes all data sources for the HG19 reference are compliant with HG19 contig names, and translates B37 contig names to their HG19 equivalents as needed. This fixes a major performance issue with HG19/B37 inputs where we were systematically getting cache misses when querying the datasources with the wrong contig names. Released new version of datasources to go with this release (1.4.20180615). This was necessary because the data sources needed to be made consistent with HG19 (before they were a mix of HG19 and B37 contig names). Fixes #4586
Funcotator can be slow.
Need to profile it and make it fast all the time.
The text was updated successfully, but these errors were encountered: