Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker run failed: command failed: /tmp/ggp-494856422: line 16: type: gsutil: not found\ndebconf #27

Closed
chenshan03 opened this issue Dec 28, 2017 · 12 comments

Comments

@chenshan03
Copy link

Dear All,

I am trying to run gcloud alpha genomics but have recurrently encountered the same issues about authentification and docker run.

The bash file for Deep Variant and error logs are below:
BASH file https://storage.googleapis.com/wgs-test-shan/test_samples/deepVariant.sh
YAML file https://storage.googleapis.com/wgs-test-shan/test_samples/deepvariant_wes_pipeline.yaml
LOG file https://storage.googleapis.com/wgs-test-shan/test_samples/runner_logs/ENjW7s2JLBjf3aql19nvyv8BIKeM6-b_FyoPcHJvZHVjdGlvblF1ZXVl-stderr.log

I have contacted Cloud support center and obtained suggestions as below. However this did not mend the problem. What is your suggestion?
https://enterprise.google.com/supportcenter/managecases#Case/001f200001TaEgT/U-14552728

Thank you.
I will appreciate your help.
Best,
Shan

@depristo
Copy link

depristo commented Jan 2, 2018

@arostamianfar This seems like an issue for you.

@arostamianfar
Copy link
Contributor

The actual error is "The TF examples in /mnt/data/input/gs/wgs-test-shan/test_samples/UDN689484temp/examples/examples_output.tfrecord-00000-of-00064.gz has image/format 'None' (expected 'raw') which means you might need to rerun make_examples to genenerate the examples again."

@pichuan @depristo this is odd since the pipeline ran as a single workflow. The model and docker binary paths also seem correct. One issue I can think of is most of the shards being empty (the output has 64 shards, but it's only 1.3KB in total). Do you know if empty shards could cause such an error?

P.S. the 'gsutil not found' error is actually harmless. I think we should provide a 'parser' for these errors based on the logs that provides a meaningful error message.

@arostamianfar
Copy link
Contributor

yeap, it's caused by empty shards. I was able to reproduce this by using 64 shards with the quickstart test data. @depristo should I file a separate issue for this as it's not really a docker issue?

@chenshan03: thanks for the report. As a workaround until this bug is fixed, you may reduce the number of shards to avoid having empty ones.

@depristo
Copy link

depristo commented Jan 3, 2018

@pichuan @scott7z I believe the empty shards bug has been fixed, is that correct?

@pichuan
Copy link
Collaborator

pichuan commented Jan 7, 2018

Hi Mark and Asha,
here's what I believe the current status is:
(1) If there is just an empty shard (a shard file that exist, but just contains 0 record) out of many, what happens is the code will move on to the next shard to attempt to read image/format. -- this is what Mark meant by the previously fixed empty shards bug.
(2) However, if all the shard files exist but all of them contains 0 records, the current code can fail with that error message above.

In this case, if the actual error message observed is:
The TF examples in /mnt/data/input/gs/wgs-test-shan/test_samples/UDN689484temp/examples/examples_output.tfrecord-00000-of-00064.gz has image/format 'None' (expected 'raw')

It seems like this call_variant run is specifically being done on on that one file. And if that file has 0 record, unfortunately it will currently fail with that error. :-(

So, I think this is a real bug that we should fix. Because we do expect the use case where users run 64 separate call_variants, and some of them might have complete empty single input file. Is that correct?

@arostamianfar
Copy link
Contributor

yes, I think this is a real bug that still exists.
Due to the distributed nature of the cloud process, some machines may get shards that are all empty. Also, we actually only supply one of the shards to each process, so (1) doesn't really apply (there is no 'next shard').
You can reproduce this by adding "--shards 64" to the quickstart test data configuration in https://cloud.google.com/genomics/deepvariant.

@depristo
Copy link

depristo commented Jan 8, 2018

My view is that if all shards are empty we should just write an empty CVO file. If that's not what happens right now, let's add a bug to buganizer and fix it.

@pichuan
Copy link
Collaborator

pichuan commented Jan 8, 2018

I filed a bug in buganizer.

@cmclean
Copy link
Collaborator

cmclean commented Feb 9, 2018

This has been fixed by the DeepVariant 0.5.1 release that just came out a few minutes ago. Thank you for raising attention to this issue.

@cmclean cmclean closed this as completed Feb 9, 2018
@pgrosu
Copy link

pgrosu commented Feb 10, 2018

Hi Cory (@cmclean),

Thank you for the new release, but if we look at the new timings with the 0.5.1 release, they seem to have gotten longer than with the previous version:

Commit v0.5.1

Timings: Whole Genome Case Study - [0.5 (pink) vs. 0.5.1 (green)]

whole-genome-case-study-timing

Timings: Exome Case Study - [0.5 (pink) vs. 0.5.1 (green)]

exome-case-study-timings

What is the cause of the additional delay in version 0.5.1 as compared to the previous one?

Thanks,
Paul

@depristo
Copy link

Hi Paul,

Two quick suggestions. First, I'd recommend posting this question in a separate issue, to keep the discussion clean since this is a very interesting and general observation.

Second, it's unclear to us if this is normal variation in cloud timing [not all machines you create are identical. For example, the case study command:

gcloud beta compute instances create "${USER}-deepvariant-casestudy"  --scopes "compute-rw,storage-full,cloud-platform" --image-family "ubuntu-1604-lts" --image-project "ubuntu-os-cloud" --machine-type "custom-64-131072" --boot-disk-size "300" --boot-disk-type "pd-ssd" --zone "us-west1-b"

Doesn't specify the exact machine type, so we're likely getting skylake processors sometimes and broadwell processors other times. That alone could account for the variation in timing we are seeing here.

@pichuan
Copy link
Collaborator

pichuan commented May 1, 2018

Hi all,
it has recently be reported again that the crashing issue on empty shard for call_variants wasn't fully resolved last time. I just released v0.6.1 that should really resolve this issue now:
https://github.com/google/deepvariant/releases/tag/v0.6.1

The issue was that I didn't properly return in the if branch where an empty shard was detected:
12f9e67
(And the unit test I had for it was flawed. We'll fix the unit test in a later release.)

This time I've tested it manually on an empty shard, and confirmed that call_variants works when there is zero record.

Please feel free to report if you see any issues again. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants