Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map IntervalList format column four to feature name #1159

Closed
wants to merge 2 commits into from

Conversation

heuermh
Copy link
Member

@heuermh heuermh commented Sep 9, 2016

Fixes #1152, #1168

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1477/

Build result: FAILURE

GitHub pull request #1159 of commit e1b6772 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1159/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 6d7e26f # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1159/merge^{commit} # timeout=10Checking out Revision 6d7e26f (origin/pr/1159/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 6d7e26f9c46f4086de5bd3974336b0ae56d35520First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

if (fields.length < 8 || fields.length > 9) {
log.warn("Empty or invalid GTF/GFF2 line: {}", line)
return Seq()
if (stringency == ValidationStringency.STRICT) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside, I feel like we've got enough of this patten (print error message if STRICT, log if LENIENT, and return None if not STRICT) around where we might as well wrap it up into a function and factor it out.

Copy link
Member Author

@heuermh heuermh Sep 13, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I'm seeing it, if it were made a function, the error messages would always be formatted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How so?

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1487/
Test PASSed.

fb.setEnd(fields(2).toLong)
val f = Feature.newBuilder()
.setContigName(fields(0))
.setStart(fields(1).toLong) // NarrowPeak ranges are 0-based
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try/catch around the fields(x).toLong/fields(x).toDouble calls, both here and below and above and everywhere. We could factor it out like so:

def checkAndSet[T](field: String,
  convFn: String => T,
  setFn: T => Unit,
  stringency: ValidationStringency) {
  try {
    val t = convFn(field)
    setFn(t)
  } catch {
    case e: Throwable => {
      if (stringency == ValidationStringency.LENIENT) {
        log.warn("Failed to convert %s.".format(field))
      } else if (stringency == ValidationStringency.STRICT) {
        throw new IllegalArgumentException("Setting field from %s failed with %s.".format(
          field,
          e.msg))
      }
    }
  }
}

Perhaps we factor this out into a trait:

sealed trait FeatureConverterCanBeValidated {
  val stringency: ValidationStringency

  def validateOrNone(...): Option[Feature]

  def checkAndSet[T](field: String,
    convFn: String => T,
    setFn: T => Unit)
}

All of the classes in here could be turned into case classes that extend that trait. What do you think? Should you proceed, the trait name I proposed is terrible, so please change it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will check to see what htsjdk does with NumberFormatExceptions and validation stringency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to fiddly to me, so I've punted for now

@fnothaft
Copy link
Member

Generally LGTM! Thanks for taking this on @heuermh. I've dropped a variety of line notes inline.

@heuermh
Copy link
Member Author

heuermh commented Sep 13, 2016

Addressed some review comments and rebased.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1490/
Test PASSed.

@fnothaft
Copy link
Member

LGTM! I will leave this open until Friday morning to wait for any further review comments.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1495/
Test PASSed.

@heuermh heuermh changed the title Map IntervalList format column four to feature name [ADAM-1152] Map IntervalList format column four to feature name Sep 20, 2016
@heuermh heuermh changed the title [ADAM-1152] Map IntervalList format column four to feature name Map IntervalList format column four to feature name Sep 20, 2016
@fnothaft
Copy link
Member

Thanks @heuermh! Merged as 53b8f48 and dff43ef.

@fnothaft fnothaft closed this Sep 26, 2016
@heuermh heuermh deleted the interval-list-name branch September 26, 2016 21:22
@heuermh
Copy link
Member Author

heuermh commented Sep 26, 2016

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants