Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible signature bug #1108

Closed
minqiao opened this issue Dec 14, 2018 · 22 comments
Closed

Possible signature bug #1108

minqiao opened this issue Dec 14, 2018 · 22 comments

Comments

@minqiao
Copy link

minqiao commented Dec 14, 2018

I come up with this weird issue that I cannot reproduce with a truly minimal example. I got a 38Mb file that minimally reproduce the bug at least on my computer:

https://www.dropbox.com/s/rgchwiltaidbbxt/signature_bug.tar.gz?dl=0

To reproduce:

screenshot from 2018-12-14 09-54-28

The weirdness are:

  1. If I use less input (currently 20 input groups), say, 15, I cannot reproduce the issue.
  2. If I insert one line print(_input) in between input: and output: statment I do not have the issue.

I suspect it is related to stop_if(os.stat(_input[0]).st_size == 0). Basically I want to filter input (from a previous output step in reality, not shown in this example i uploaded) such that the step will skip if the input file size is zero. Not sure how to do that properly yet. But apart from that, the bug above is a weird one.

@BoPeng
Copy link
Contributor

BoPeng commented Dec 14, 2018

Just to make sure. Are you using the latest release or trunk?

@minqiao
Copy link
Author

minqiao commented Dec 14, 2018

I had issues with version 0.17.5 initially, then I updated to trunk, still the same issue.

@gaow
Copy link
Member

gaow commented Dec 14, 2018

@BoPeng This problem should possibly be dealt with more elegantly with our proposed interface #1106

input: sos_group([x for x in output_from(-1) if os.stat(x).st_size > 0], by = 2)

Notice here I introduced the -1 convention to indicate the previous step ... or, we can also change the syntax to

from_output()

and by default it means from previous output so we do not need -1 there.

@BoPeng
Copy link
Contributor

BoPeng commented Dec 27, 2018

Let me know if there is still a bug..

@gaow
Copy link
Member

gaow commented Dec 31, 2018

Okay an alternative implementation to it has led to another error. Here is a MWE:

[1]

chunks = [(1,2,3)]

input: for_each = 'chunks'
output: summary = f"{_chunks[-1] if len(_chunks) > 3 else '%s_%s_%s' % (_chunks[0], _chunks[1], _chunks[2])}.pkl"
task:
bash: expand = True
   touch {_output}

[2]
input: [x for x in output_from(-1)], group_by = 1
output: f"{_input['summary']}.txt"
_output.touch()

You see after input: [x for x in output_from(-1)] the attributes are lost. So the output file name are wrong.

Relevant to this ticket though, the command I recommand @minqiao to use was:

input: [x for x in output_from(-1) if os.stat(x).st_size > 0], group_by = 2

and failed for the reason you can see in the MWE.

@BoPeng
Copy link
Contributor

BoPeng commented Dec 31, 2018

This MWE works with the latest latest master as I have just pushed some changes that passes all tests.

@gaow
Copy link
Member

gaow commented Jan 1, 2019

Sorry it does not work on my end:

INFO: Running 2: 
INFO: 2 (index=0) is ignored due to saved signature
INFO: output:   .txt
INFO: Workflow default (ID=ec9c175fc7bf8aa6) is ignored with 2 ignored steps.

notice the output of the 2nd step: .txt is not what it is meant to bel. This is because it interoplates _input['summary'] as empty string '' -- should possibly throw an error instead even if I put in some crazy keys for attributes?

@BoPeng
Copy link
Contributor

BoPeng commented Jan 1, 2019

By converting

output_from(-1)

from a sos_targets to list,

[x for x in output_from(-1)]

You throw away the source information so there is no summary.

input: output_from(-1), group_by = 1

works just fine.

@gaow
Copy link
Member

gaow commented Jan 1, 2019

Yes I understand that, but the actual usage was

input: [x for x in output_from(-1) if os.stat(x).st_size > 0], group_by = 2

Hence posted to this ticket. Are there any other suggestions?

BTW I thought each x is a target anyways. But looks like the attribute is with step_input and step_output?

@BoPeng
Copy link
Contributor

BoPeng commented Jan 1, 2019

Yes, the source is a sos_targets only attribute.

I do not have any good suggestion at this point, but I can think of something like

input: output_from(-1).filter(lambda x: x.stat().st_size>0)

BoPeng pushed a commit that referenced this issue Jan 1, 2019
@gaow
Copy link
Member

gaow commented Jan 1, 2019

Looks like you are implementing that :) another option for @minqiao is to use her original implementation stop_if after _output is specified. Which did not work when the issue was reported, but I just tested the MWE in the first thread of the ticket. it seems to have been fixed.

@BoPeng
Copy link
Contributor

BoPeng commented Jan 1, 2019

There is a more or less hidden slice function for sos_targets, used for slice(1), slice('source'), slice(1:5) etc. I have just added slice(func) so that you can do

input: output_from(-1).slice(lambda x: x.stat().st_size>0)

@BoPeng
Copy link
Contributor

BoPeng commented Jan 1, 2019

Note that we had from the very beginning something like filetype, which accepts a lambda function to remove (or keep?) targets, but I removed it due to lack of good use of it. This slice thing is just for testing. It returns a separate object like slice(1) but it is possible to implement something to modify the sos_targets in place....

slice is meant to be hidden. If we expose it, it might be called something like select().

@gaow
Copy link
Member

gaow commented Jan 1, 2019

but I removed it due to lack of good use of it.

and possible complications too? Do u see this example here a good use of it?

I assume Min can stick to stop_if now that it works. What do you suggest?

@BoPeng
Copy link
Contributor

BoPeng commented Jan 1, 2019

cannot remember clearly but the original usage was users send a bunch of input files with extensions fastq fastq.gz and even README and VPT steps use filetyoe to get what they can process.

@gaow
Copy link
Member

gaow commented Jan 1, 2019

Okay I guess another problem using stop_if as we've been discussing earlier #1132 is if I use:

output: f"{_input['summary_stats']:nn}.{anno}.gz"
stop_if(_input.stat().st_size == 0)

then it should properly also remove those relevant output. Maybe inserting stop_if in between input: and output will do? it is a bit ugly though.

@BoPeng
Copy link
Contributor

BoPeng commented Jan 1, 2019

Yes, there are many subtleties ... but let us worry about it next year. Family time now.

@gaow
Copy link
Member

gaow commented Jan 1, 2019

BTW the line

input: output_from(-1).slice(lambda x: x.stat().st_size>0)

does work for this example. Just not sure if it is meant to be permanent.

@gaow
Copy link
Member

gaow commented Jan 1, 2019

Certainly, that is the most important particularly tonight. Happy New Year!

@BoPeng
Copy link
Contributor

BoPeng commented Jan 1, 2019

In general, we have ways to compose input, but not a way to remove them. I am ok with adding another interface but not sure what to use. To keep things consistent, we should probably add a parameter like

input: whatever, select=??

instead of

input: sos_targets(whatever).select(??)

but then we have more freedom in a function to do something like

input: sos_targets(whatever).select(size=0, type=file_target)

to make certain things easier.

@gaow
Copy link
Member

gaow commented Jan 1, 2019

I guess we are not that motivated because we haven't had that need in the past, and #1132 if properly implemented, should cover users cases like this -- basically skipping some input and adjusting the output accordingly. Also for many cases input: [x for x in output_from(-1) ...] will work (though not the case in this ticket).

So instead of adding a select feature, maybe we want to think of finalizing #1132? Unless there is an appealing reason not to do that, or an appealing reason to add select here.

@gaow
Copy link
Member

gaow commented Jan 2, 2019

I close this issue because the original bug cannot be reproduced in current master. Also we will use #1132 to discuss and possibly revisit the need of a select feature after.

@gaow gaow closed this as completed Jan 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants