-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TASK-6722 - Variant Walker to enable user defined variant analysis #2522
Merged
Merged
Changes from all commits
Commits
Show all changes
68 commits
Select commit
Hold shift + click to select a range
4b8dad2
storage: Add variant-walker tool #TASK-6722
j-coll 9ea00eb
storage: Add STDERR to exception thrown. Fix max_bytes_per_map. #TASK…
j-coll 7558a26
storage: Add satus details when throwing exceptions. #TASK-6722
j-coll bc7c6ae
storage: Fix walker output file name #TASK-6722
j-coll ab4dff5
storage: Properly configure task java heap #TASK-6722
j-coll 7af8020
storage: Run docker image prune on cleanup. #TASK-6722
j-coll c5375ea
storage: Ensure walker output is sorted. #TASK-6722
j-coll 663c03a
storage: Extract walker STDERR file from MR execution. #TASK-6722
j-coll 154befa
storage: Do not write multiple headers. #TASK-6722
j-coll 85aac6d
storage: Fix NoSuchMethodError creating StopWatch. #TASK-6722
j-coll 697b08b
storage: Ensure stderr file is moved from scratch dir. #TASK-6722
j-coll 356567e
storage: Fix stderr sorting. #TASK-6722
j-coll 6253da3
storage: Write `\n` after the json header #TASK-6722
j-coll 5789628
storage: Do not interrupt header with empty records. #TASK-6722
j-coll 4ff0655
storage: Add a custom Partitioner to ensure sorted data with multiple…
j-coll 8268266
storage: Fix partitioner. #TASK-6722
j-coll 4147d01
storage: Restart process when changing chromosome to ensure correct s…
j-coll 7fd439a
storage: Fix GenomeHellper generateBootPreSplits. #TASK-6722
j-coll e6128b0
storage: Do not interrupt header with empty lines while concat. #TASK…
j-coll 100fecf
storage: Replace ImmutableBytesWritable with VariantLocusKey as map o…
j-coll 0df69dc
storage: Use VariantLocusKey and VariantLocusPartitioner in VariantEx…
j-coll f6fd3d4
storage: Fix VariantLocusKey serialization. #TASK-6722
j-coll fa3c9f2
storage: Fix "Request body si too large" #TASK-6722
j-coll b528c03
analysis: Do not try to close twice the same ERM. #TASK-6722
j-coll 96e5679
storage: Do not use flush on outputstream. HADOOP-16548 #TASK-6722
j-coll bcd8185
storage: Add VariantExporterDirectMultipleOutputsMapper to ensure sor…
j-coll c4c3d3b
storage: Do not use reduce step on variant-walker. #TASK-6722
j-coll 0100097
storage: Fix VariantRecordWriter bytes_written counter. #TASK-6722
j-coll b52ca27
storage: Reduce number of intermediate mapper files. #TASK-6722
j-coll ad3521e
storage: Use SNAPPY as intermediate compression algorithm. #TASK-6722
j-coll ab50d6e
storage: Disable flush on AbfsOutputStream. HADOOP-16548 #TASK-6722
j-coll 212f8ce
storage: Centralize variantMapperJob initialitation. #TASK-6722
j-coll 2a39303
storage: Fix NoClassDefFoundError tephra. #TASK-7194 #TASK-6722
j-coll ae26598
storage: Fix NPE exporting from sampleindex. #TASK-6722
j-coll b000ec7
storage: Ensure variant-exports are sorted even from Phoenix. #TASK-6722
j-coll 0a741d5
storage: Use HDFS to store intermediate MapReduce files. Concat local…
j-coll cd50a3c
storage: Improve MapReduceOutputFile concatMrOutputToLocal. #TASK-6722
j-coll d430391
storage: Increase mapreduce.task.timeout to 30min #TASK-6722
j-coll e35ee83
storage: Fix temporary mapreduce outdir. #TASK-6722
j-coll 0c48603
storage: Do not double copy hdfs files #TASK-6722
j-coll ccf7438
storage: Use reducer to concat binary files #TASK-6722
j-coll f87686e
storage: Do not fail vairant-walker if no output is produced. #TASK-6722
j-coll a389e10
storage: Split PhoenixInputSplits into smaller splits. #TASK-6722
j-coll f453090
storage: Improve log message. #TASK-6722
j-coll 47535c1
storage: Add HadoopVariantWalkerTest. #TASK-6722
j-coll 003e467
storage: Rename some variant-walker params. Add descriptions #TASK-6722
j-coll 48e1592
storage: Fix NPE running SampleVariantStats #TASK-6722
j-coll 1d86756
storage: Fix CustomPhoenixInputFormat generateSplit for first and las…
j-coll 5141031
analysis: Fix NPE at relatedness tool. #TASK-6722
j-coll c48ce0a
Merge branch 'release-3.x.x' into TASK-6722
j-coll f2bc782
cicd: Upload tests logs as artifacts. Reduce action log size. #TASK-6722
j-coll dd684aa
storage: Fix NPE at CohortVariantStatsDriver. #TASK-6722
j-coll 9795c6a
cicd: Fix NPE. #TASK-6722
j-coll 923651c
storage: Fix AIOOBE SampleVariantStatsDriver #TASK-6722
j-coll 90010ac
storage: Do not produce a .crc checksum file copying from hdfs. #TASK…
j-coll 9f326d9
storage: Improve docker process failure. Do not close the stdin twice…
j-coll 627e56a
storage: Fix AIOOBE SampleVariantStatsDriver #TASK-6722
j-coll 98ce6f8
storage: Do not produce a .crc checksum file copying from hdfs. #TASK…
j-coll 14c07d9
analysis: Do not use the scratchDir as intermediate folder for export…
j-coll 050c1ee
storage: Improve collections usage in SampleVariantStatsDriver. #TASK…
j-coll a0c2a5f
analysis: Fix VariantAnalysisTest. #TASK-6722
j-coll 3853c63
app: Regenerate cli. #TASK-6722
j-coll eb61609
storage: Fix junit tests. #TASK-6722
j-coll 54acc28
cicd: Increase "Publish Test Report on GitHub" memory #TASK-6722
j-coll 4e96492
core: Fix NumberFormatException from IOUtils. #TASK-6722
j-coll 0851ea9
Merge branch 'release-3.x.x' into TASK-6722
j-coll 852ffca
core: Remove unused method. #TASK-6722
j-coll 005855f
storage: Do not add new abstract methods to VariantStorageEngine. #TA…
j-coll File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
84 changes: 84 additions & 0 deletions
84
opencga-analysis/src/main/java/org/opencb/opencga/analysis/variant/VariantWalkerTool.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
/* | ||
* Copyright 2015-2020 OpenCB | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.opencb.opencga.analysis.variant; | ||
|
||
import org.apache.solr.common.StringUtils; | ||
import org.opencb.commons.datastore.core.Query; | ||
import org.opencb.commons.datastore.core.QueryOptions; | ||
import org.opencb.opencga.analysis.tools.OpenCgaTool; | ||
import org.opencb.opencga.core.models.common.Enums; | ||
import org.opencb.opencga.core.models.variant.VariantWalkerParams; | ||
import org.opencb.opencga.core.tools.annotations.Tool; | ||
import org.opencb.opencga.core.tools.annotations.ToolParams; | ||
import org.opencb.opencga.storage.core.variant.io.VariantWriterFactory; | ||
|
||
import java.nio.file.Path; | ||
import java.util.Arrays; | ||
import java.util.List; | ||
|
||
@Tool(id = VariantWalkerTool.ID, description = VariantWalkerTool.DESCRIPTION, | ||
scope = Tool.Scope.PROJECT, resource = Enums.Resource.VARIANT) | ||
public class VariantWalkerTool extends OpenCgaTool { | ||
public static final String ID = "variant-walk"; | ||
public static final String DESCRIPTION = "Filter and walk variants from the variant storage to produce a file"; | ||
|
||
@ToolParams | ||
protected VariantWalkerParams toolParams = new VariantWalkerParams(); | ||
|
||
private VariantWriterFactory.VariantOutputFormat format; | ||
|
||
@Override | ||
protected void check() throws Exception { | ||
super.check(); | ||
|
||
if (StringUtils.isEmpty(toolParams.getInputFormat())) { | ||
toolParams.setInputFormat(VariantWriterFactory.VariantOutputFormat.VCF.toString()); | ||
} | ||
|
||
format = VariantWriterFactory.toOutputFormat(toolParams.getInputFormat(), toolParams.getOutputFileName()); | ||
if (format.isBinary()) { | ||
throw new IllegalArgumentException("Binary format not supported for VariantWalkerTool"); | ||
} | ||
if (!format.isPlain()) { | ||
format = format.inPlain(); | ||
} | ||
|
||
if (StringUtils.isEmpty(toolParams.getOutputFileName())) { | ||
toolParams.setOutputFileName("output.txt.gz"); | ||
} else if (!toolParams.getOutputFileName().endsWith(".gz")) { | ||
toolParams.setOutputFileName(toolParams.getOutputFileName() + ".gz"); | ||
} | ||
} | ||
|
||
@Override | ||
protected List<String> getSteps() { | ||
return Arrays.asList(ID, "move-files"); | ||
} | ||
|
||
@Override | ||
protected void run() throws Exception { | ||
step(ID, () -> { | ||
Path outDir = getOutDir(); | ||
String outputFile = outDir.resolve(toolParams.getOutputFileName()).toString(); | ||
Query query = toolParams.toQuery(); | ||
QueryOptions queryOptions = new QueryOptions().append(QueryOptions.INCLUDE, toolParams.getInclude()) | ||
.append(QueryOptions.EXCLUDE, toolParams.getExclude()); | ||
variantStorageManager.walkData(outputFile, | ||
format, query, queryOptions, toolParams.getDockerImage(), toolParams.getCommandLine(), token); | ||
}); | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add --no-transfer-progress to avoid writing the progress logs of the download of the libraries every time