Docker run failed: command failed: /tmp/ggp-494856422: line 16: type: gsutil: not found\ndebconf #27

chenshan03 · 2017-12-28T16:05:08Z

Dear All,

I am trying to run gcloud alpha genomics but have recurrently encountered the same issues about authentification and docker run.

The bash file for Deep Variant and error logs are below:
BASH file https://storage.googleapis.com/wgs-test-shan/test_samples/deepVariant.sh
YAML file https://storage.googleapis.com/wgs-test-shan/test_samples/deepvariant_wes_pipeline.yaml
LOG file https://storage.googleapis.com/wgs-test-shan/test_samples/runner_logs/ENjW7s2JLBjf3aql19nvyv8BIKeM6-b_FyoPcHJvZHVjdGlvblF1ZXVl-stderr.log

I have contacted Cloud support center and obtained suggestions as below. However this did not mend the problem. What is your suggestion?
https://enterprise.google.com/supportcenter/managecases#Case/001f200001TaEgT/U-14552728

Thank you.
I will appreciate your help.
Best,
Shan

depristo · 2018-01-02T14:39:46Z

@arostamianfar This seems like an issue for you.

arostamianfar · 2018-01-03T14:59:31Z

The actual error is "The TF examples in /mnt/data/input/gs/wgs-test-shan/test_samples/UDN689484temp/examples/examples_output.tfrecord-00000-of-00064.gz has image/format 'None' (expected 'raw') which means you might need to rerun make_examples to genenerate the examples again."

@pichuan @depristo this is odd since the pipeline ran as a single workflow. The model and docker binary paths also seem correct. One issue I can think of is most of the shards being empty (the output has 64 shards, but it's only 1.3KB in total). Do you know if empty shards could cause such an error?

P.S. the 'gsutil not found' error is actually harmless. I think we should provide a 'parser' for these errors based on the logs that provides a meaningful error message.

arostamianfar · 2018-01-03T15:15:15Z

yeap, it's caused by empty shards. I was able to reproduce this by using 64 shards with the quickstart test data. @depristo should I file a separate issue for this as it's not really a docker issue?

@chenshan03: thanks for the report. As a workaround until this bug is fixed, you may reduce the number of shards to avoid having empty ones.

depristo · 2018-01-03T15:26:53Z

@pichuan @scott7z I believe the empty shards bug has been fixed, is that correct?

pichuan · 2018-01-07T07:24:46Z

Hi Mark and Asha,
here's what I believe the current status is:
(1) If there is just an empty shard (a shard file that exist, but just contains 0 record) out of many, what happens is the code will move on to the next shard to attempt to read image/format. -- this is what Mark meant by the previously fixed empty shards bug.
(2) However, if all the shard files exist but all of them contains 0 records, the current code can fail with that error message above.

In this case, if the actual error message observed is:
The TF examples in /mnt/data/input/gs/wgs-test-shan/test_samples/UDN689484temp/examples/examples_output.tfrecord-00000-of-00064.gz has image/format 'None' (expected 'raw')

It seems like this call_variant run is specifically being done on on that one file. And if that file has 0 record, unfortunately it will currently fail with that error. :-(

So, I think this is a real bug that we should fix. Because we do expect the use case where users run 64 separate call_variants, and some of them might have complete empty single input file. Is that correct?

arostamianfar · 2018-01-08T15:22:28Z

yes, I think this is a real bug that still exists.
Due to the distributed nature of the cloud process, some machines may get shards that are all empty. Also, we actually only supply one of the shards to each process, so (1) doesn't really apply (there is no 'next shard').
You can reproduce this by adding "--shards 64" to the quickstart test data configuration in https://cloud.google.com/genomics/deepvariant.

depristo · 2018-01-08T15:39:55Z

My view is that if all shards are empty we should just write an empty CVO file. If that's not what happens right now, let's add a bug to buganizer and fix it.

pichuan · 2018-01-08T17:17:11Z

I filed a bug in buganizer.

cmclean · 2018-02-09T20:20:05Z

This has been fixed by the DeepVariant 0.5.1 release that just came out a few minutes ago. Thank you for raising attention to this issue.

pgrosu · 2018-02-10T05:19:51Z

Hi Cory (@cmclean),

Thank you for the new release, but if we look at the new timings with the 0.5.1 release, they seem to have gotten longer than with the previous version:

Commit v0.5.1

Timings: Whole Genome Case Study - [`0.5 (pink) vs. 0.5.1 (green)`]

Timings: Exome Case Study - [`0.5 (pink) vs. 0.5.1 (green)`]

What is the cause of the additional delay in version 0.5.1 as compared to the previous one?

Thanks,
Paul

depristo · 2018-02-10T19:39:22Z

Hi Paul,

Two quick suggestions. First, I'd recommend posting this question in a separate issue, to keep the discussion clean since this is a very interesting and general observation.

Second, it's unclear to us if this is normal variation in cloud timing [not all machines you create are identical. For example, the case study command:

gcloud beta compute instances create "${USER}-deepvariant-casestudy"  --scopes "compute-rw,storage-full,cloud-platform" --image-family "ubuntu-1604-lts" --image-project "ubuntu-os-cloud" --machine-type "custom-64-131072" --boot-disk-size "300" --boot-disk-type "pd-ssd" --zone "us-west1-b"

Doesn't specify the exact machine type, so we're likely getting skylake processors sometimes and broadwell processors other times. That alone could account for the variation in timing we are seeing here.

pichuan · 2018-05-01T03:57:25Z

Hi all,
it has recently be reported again that the crashing issue on empty shard for call_variants wasn't fully resolved last time. I just released v0.6.1 that should really resolve this issue now:
https://github.com/google/deepvariant/releases/tag/v0.6.1

The issue was that I didn't properly return in the if branch where an empty shard was detected:
12f9e67
(And the unit test I had for it was flawed. We'll fix the unit test in a later release.)

This time I've tested it manually on an empty shard, and confirmed that call_variants works when there is zero record.

Please feel free to report if you see any issues again. Thank you!

cmclean closed this as completed Feb 9, 2018

pgrosu mentioned this issue Feb 24, 2018

Generalized performance analysis between the versions #50

Closed

cmclean mentioned this issue Mar 7, 2018

The TF examples has image/format 'None' #52

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker run failed: command failed: /tmp/ggp-494856422: line 16: type: gsutil: not found\ndebconf #27

Docker run failed: command failed: /tmp/ggp-494856422: line 16: type: gsutil: not found\ndebconf #27

chenshan03 commented Dec 28, 2017

depristo commented Jan 2, 2018

arostamianfar commented Jan 3, 2018

arostamianfar commented Jan 3, 2018

depristo commented Jan 3, 2018

pichuan commented Jan 7, 2018

arostamianfar commented Jan 8, 2018

depristo commented Jan 8, 2018

pichuan commented Jan 8, 2018

cmclean commented Feb 9, 2018

pgrosu commented Feb 10, 2018

depristo commented Feb 10, 2018

pichuan commented May 1, 2018

Docker run failed: command failed: /tmp/ggp-494856422: line 16: type: gsutil: not found\ndebconf #27

Docker run failed: command failed: /tmp/ggp-494856422: line 16: type: gsutil: not found\ndebconf #27

Comments

chenshan03 commented Dec 28, 2017

depristo commented Jan 2, 2018

arostamianfar commented Jan 3, 2018

arostamianfar commented Jan 3, 2018

depristo commented Jan 3, 2018

pichuan commented Jan 7, 2018

arostamianfar commented Jan 8, 2018

depristo commented Jan 8, 2018

pichuan commented Jan 8, 2018

cmclean commented Feb 9, 2018

pgrosu commented Feb 10, 2018

Timings: Whole Genome Case Study - [0.5 (pink) vs. 0.5.1 (green)]

Timings: Exome Case Study - [0.5 (pink) vs. 0.5.1 (green)]

depristo commented Feb 10, 2018

pichuan commented May 1, 2018

Timings: Whole Genome Case Study - [`0.5 (pink) vs. 0.5.1 (green)`]

Timings: Exome Case Study - [`0.5 (pink) vs. 0.5.1 (green)`]