Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing the 'index 0' bug in features2adam #308

Merged
merged 2 commits into from
Jul 23, 2014

Conversation

tdanford
Copy link
Contributor

Fixes the bug where the error message "Argument with index 0 is used more than once" appears and prevents us from running features2adam from the command line.

Also, added a test for Features2ADAM, and updated PrintADAM (in support of the test) to use scala.Console.out rather than System.out, which appears to play better with the scala.Console.withOut method used in tests.

@tdanford
Copy link
Contributor Author

This fixes Issue #307

@fnothaft
Copy link
Member

Hmmm, no dice on the tests...?

@tdanford
Copy link
Contributor Author

Weird, I ... don't understand what's happening. Running it again locally. It was all working fine 10 minutes ago :-(

@fnothaft
Copy link
Member

Gremlins.

@tdanford
Copy link
Contributor Author

@fnothaft can you do me a favor when you get a chance? Grab the branch and try building it yourself? It's still working for me, I don't quite understand what the problem is :-/

@fnothaft
Copy link
Member

@tdanford sure! Will do now...

@fnothaft
Copy link
Member

@tdanford ran OK for me as well...

Jenkins, retest this please.

@fnothaft
Copy link
Member

Jenkins, test this please.

@tdanford
Copy link
Contributor Author

I'm at a loss.

@fnothaft
Copy link
Member

And this is why we can't have nice things.

@fnothaft
Copy link
Member

Jenkins, retest this please.

@carlyeks
Copy link
Member

The test failure seems to be mis-sorted values. Not sure why the server is reading them in a different order, but we should probably not use the Parquet printer for this, and instead sort them for stability of the test.

@tdanford
Copy link
Contributor Author

@carlyeks @fnothaft What release do we want this PR to go into, 0.12.1 or 0.13?

If 0.12.1, can one of you help me understand why this latest rewrite of the Features2ADAMSuite is still failing? I'm using the ParquetLister class, but I'm getting a Thrift error (!) when I try to read the Parquet file header.

@tdanford tdanford added this to the 0.12.1 milestone Jul 23, 2014
@tdanford
Copy link
Contributor Author

This PR will fix Issue #322 as well.

@tdanford
Copy link
Contributor Author

Doing a mvn clean install, I get an error that looks like:

Features2ADAMSuite:
2014-07-23 08:28:25 WARN  Utils:70 - Your hostname, wm1a5-fef resolves to a loopback address: 127.0.0.1; using 192.168.1.112 instead (on interface en0)
2014-07-23 08:28:25 WARN  Utils:70 - Set SPARK_LOCAL_IP if you need to bind to another address
2014-07-23 08:28:27.168 java[73274:1703] Unable to load realm info from SCDynamicStore
2014-07-23 08:28:27 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
- can convert a simple BED file *** FAILED ***
  java.io.IOException: can not read class parquet.format.PageHeader: null
  at parquet.format.Util.read(Util.java:50)
  at parquet.format.Util.readPageHeader(Util.java:26)
  at org.bdgenomics.adam.rdd.ParquetColumnChunk.readPageHeader(ParquetCommon.scala:190)
  at org.bdgenomics.adam.rdd.ParquetColumnChunk.readAllPages(ParquetCommon.scala:246)
  at org.bdgenomics.adam.parquet_reimpl.ParquetPartition$$anonfun$2.apply(ParquetPartition.scala:111)
  at org.bdgenomics.adam.parquet_reimpl.ParquetPartition$$anonfun$2.apply(ParquetPartition.scala:111)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
  ...
  Cause: parquet.org.apache.thrift.transport.TTransportException:
  at parquet.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
  at parquet.org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
  at parquet.org.apache.thrift.protocol.TCompactProtocol.readBinary(TCompactProtocol.java:641)
  at parquet.org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:102)
  at parquet.org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:112)
  at parquet.org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:138)
  at parquet.org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:60)
  at parquet.format.PageHeader.read(PageHeader.java:851)
  at parquet.format.Util.read(Util.java:47)
  at parquet.format.Util.readPageHeader(Util.java:26)
  ...

For some reason, we'd neglected to include an ADAMFeatureField class in the adam.projections
package!  This commit fixes that.
@tdanford
Copy link
Contributor Author

@fnothaft Carl and I talked, and while neither of us can get this test (Features2ADAMSuite) to work locally or on Jenkins, it's clear that the bug in Features2ADAM itself has been fixed by these commits. I've marked the test as "ignore" for now, and I've opened a ticket (Issue #323) to fix it. As soon as Jenkins successfully builds this, I think it's ready to be reviewed-and-merged.

@fnothaft
Copy link
Member

@tdanford that sounds good here, I will wait for Jenkins to build it and willl then review.

@fnothaft
Copy link
Member

Looks like this is still failing...

@carlyeks
Copy link
Member

Jenkins isn't building the right revisions (again).

The general build 118 shows the right revision, but the test matrix isn't building that one...

@carlyeks
Copy link
Member

Jenkins, retest this please.

Features2ADAM fails because it doesn't include an 'index' field in its
second command-line argument, and therefore Args4j can't parse the Features2ADAMArgs
object off a command line.

This commit adds the missing 'index' argument to the Features2ADAMArgs class.

There is a test for this additional feature -- however, that test is marked as IGNORE
because we can't get it to work either locally or on Jenkins.  This should be addressed,
and I'll open up a ticket for it.  In the meantime, however, we've verified that manually
that this patch works as advertised.

Also modified PrintADAM to use the scala.Console.out rather than System.out.
This seems to allow us to (in tests) redirect the output stream using the
Console.withOut command.
@carlyeks
Copy link
Member

Formatting issue -- just pushed up an update.

@fnothaft
Copy link
Member

Jenkins, retest this please.

fnothaft added a commit that referenced this pull request Jul 23, 2014
Fixing the 'index 0' bug in features2adam
@fnothaft fnothaft merged commit 05535b7 into bigdatagenomics:master Jul 23, 2014
@fnothaft
Copy link
Member

Merged! Thanks @tdanford and @carlyeks!

@tdanford tdanford deleted the features2adam-bug branch July 23, 2014 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants