Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Funcotator - Need Performance Upgrades #4586

Closed
jonn-smith opened this issue Mar 26, 2018 · 4 comments
Closed

Funcotator - Need Performance Upgrades #4586

jonn-smith opened this issue Mar 26, 2018 · 4 comments
Assignees
Milestone

Comments

@jonn-smith
Copy link
Collaborator

Funcotator can be slow.

Need to profile it and make it fast all the time.

@droazen
Copy link
Contributor

droazen commented May 9, 2018

A large speedup has already been achieved in #4740. According to @lbergelson:

Original (46a8661): 31.1 minutes, 1658 variants / minute
Changes (#4740): 1.26 minutes, 40933 variants / minute

~24.6x speedup

@jonn-smith
Copy link
Collaborator Author

We should investigate tuning the caching parameter to increase speed further.

We still need to profile, increase speed further, and compare to Oncotator.

@jonn-smith
Copy link
Collaborator Author

We also need to profile on multiple machines and verify that Funcotator will run at the same speed on all of them.

@jonn-smith
Copy link
Collaborator Author

Turns out the caching fix is only a partial fix because of the discrepancies between b37 contig names, hg19 contig names, and the fact that the datasources are inconsistent between them.

jonn-smith added a commit that referenced this issue Jun 15, 2018
Fixes #4586

Intermediate checkin.

Need to fix data sources to go along with this change.  They must be
uniformly set to HG19.

Must update the minimum data source version to the new release.
jonn-smith added a commit that referenced this issue Jun 19, 2018
Fixes #4586

Intermediate checkin.

Need to fix data sources to go along with this change.  They must be
uniformly set to HG19.

Must update the minimum data source version to the new release.

Now HG19 annotations go fast.

Made code in GATK assume data sources are HG19.
Requires a new set of data sources (1.4.20180615).
Simplified `Funcotator::enqueueAndHandleVariant`.

Not clear that the `--allow-hg19-gencode-b37-contig-matching-override`
flag does anything anymore.

Now testCanAnnotateMixedContigHg19Clinvar will pass.
jonn-smith added a commit that referenced this issue Jun 21, 2018
Fixes #4586

Released new version of datasources to go with this release (1.4.20180615).
This was necessary because the data sources needed to be made
consistent with hg19 (before they were a mix of hg19 and b37
contig names).

Now Funcotator assumes all data sources for the hg19 reference are
compliant with hg19 contig names.

Updated the minimum data source version to the new release (1.4.20180615).

Requires a new set of data sources ().

Simplified `Funcotator::enqueueAndHandleVariant`.

Not clear that the `--allow-hg19-gencode-b37-contig-matching-override`
flag does anything anymore.

Updated the `getDbSNP.sh` and `createSqliteCosmicDb.sh` data source
scripts to preprocess those data sources to be have hg19-compliant
contigs names.
jonn-smith added a commit that referenced this issue Jun 21, 2018
Fixes #4586

Released new version of datasources to go with this release (1.4.20180615).
This was necessary because the data sources needed to be made
consistent with hg19 (before they were a mix of hg19 and b37
contig names).

Now Funcotator assumes all data sources for the hg19 reference are
compliant with hg19 contig names.

Updated the minimum data source version to the new release (1.4.20180615).

Requires a new set of data sources ().

Simplified `Funcotator::enqueueAndHandleVariant`.

Not clear that the `--allow-hg19-gencode-b37-contig-matching-override`
flag does anything anymore.

Updated the `getDbSNP.sh` and `createSqliteCosmicDb.sh` data source
scripts to preprocess those data sources to be have hg19-compliant
contigs names.
jonn-smith added a commit that referenced this issue Jun 25, 2018
Fixes #4586

Released new version of datasources to go with this release (1.4.20180615).
This was necessary because the data sources needed to be made
consistent with hg19 (before they were a mix of hg19 and b37
contig names).

Now Funcotator assumes all data sources for the hg19 reference are
compliant with hg19 contig names.

Updated the minimum data source version to the new release (1.4.20180615).

Requires a new set of data sources ().

Simplified `Funcotator::enqueueAndHandleVariant`.

Not clear that the `--allow-hg19-gencode-b37-contig-matching-override`
flag does anything anymore.

Updated the `getDbSNP.sh` and `createSqliteCosmicDb.sh` data source
scripts to preprocess those data sources to be have hg19-compliant
contigs names.
jonn-smith added a commit that referenced this issue Jun 27, 2018
Fixes #4586

Released new version of datasources to go with this release (1.4.20180615).
This was necessary because the data sources needed to be made
consistent with hg19 (before they were a mix of hg19 and b37
contig names).

Now Funcotator assumes all data sources for the hg19 reference are
compliant with hg19 contig names.

Updated the minimum data source version to the new release (1.4.20180615).

Requires a new set of data sources ().

Simplified `Funcotator::enqueueAndHandleVariant`.

Not clear that the `--allow-hg19-gencode-b37-contig-matching-override`
flag does anything anymore.

Updated the `getDbSNP.sh` and `createSqliteCosmicDb.sh` data source
scripts to preprocess those data sources to be have hg19-compliant
contigs names.
jonn-smith added a commit that referenced this issue Jun 28, 2018
Fixes #4586

Released new version of datasources to go with this release (1.4.20180615).
This was necessary because the data sources needed to be made
consistent with hg19 (before they were a mix of hg19 and b37
contig names).

Now Funcotator assumes all data sources for the hg19 reference are
compliant with hg19 contig names.

Updated the minimum data source version to the new release (1.4.20180615).

Requires a new set of data sources ().

Simplified `Funcotator::enqueueAndHandleVariant`.

Not clear that the `--allow-hg19-gencode-b37-contig-matching-override`
flag does anything anymore.

Updated the `getDbSNP.sh` and `createSqliteCosmicDb.sh` data source
scripts to preprocess those data sources to be have hg19-compliant
contigs names.
droazen pushed a commit that referenced this issue Jun 29, 2018
)

Now Funcotator assumes all data sources for the HG19 reference are compliant with HG19 contig names, and translates B37 contig names to their HG19 equivalents as needed. This fixes a major performance issue with HG19/B37 inputs where we were systematically getting cache misses when querying the datasources with the wrong contig names.

Released new version of datasources to go with this release (1.4.20180615). This was necessary because the data sources needed to be made consistent with HG19 (before they were a mix of HG19 and B37 contig names).

Fixes #4586
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants