Adjusting tests to julia. #2

hildebrandecon · 2022-01-14T13:55:45Z

Closes #1 as far as I can tell: I adjusted the tests in

test_parallel.py and
test_parametrize.py.

Now, the run through.

I also adjusted the ReadMe.

for more information, see https://pre-commit.ci

hildebrandecon · 2022-01-14T13:57:52Z

I think this should be fine, however, I couldn't figure out (so far) how exactly julia treats wrong cmd line arguments.

codecov · 2022-01-14T14:27:20Z

Codecov Report

Merging #2 (c99b27c) into main (2b15b07) will increase coverage by 29.47%.
The diff coverage is 100.00%.

❗ Current head c99b27c differs from pull request most recent head 4ff5c48. Consider uploading reports for the commit 4ff5c48 to get more accurate results

@@             Coverage Diff             @@
##             main       #2       +/-   ##
===========================================
+ Coverage   69.06%   98.54%   +29.47%     
===========================================
  Files          13       13               
  Lines         320      343       +23     
===========================================
+ Hits          221      338      +117     
+ Misses         99        5       -94

Flag	Coverage Δ
end_to_end	`85.42% <91.07%> (?)`
unit	`67.93% <44.64%> (-1.14%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/pytask_julia/execute.py	`100.00% <ø> (ø)`
src/pytask_julia/collect.py	`100.00% <100.00%> (+3.77%)`	⬆️
tests/test_collect.py	`98.61% <100.00%> (+0.01%)`	⬆️
tests/test_execute.py	`100.00% <100.00%> (+48.07%)`	⬆️
tests/test_parallel.py	`96.15% <100.00%> (+65.38%)`	⬆️
tests/test_parametrize.py	`100.00% <100.00%> (+63.33%)`	⬆️
tests/conftest.py	`100.00% <0.00%> (+14.28%)`	⬆️
tests/test_config.py	`100.00% <0.00%> (+33.33%)`	⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2b15b07...4ff5c48. Read the comment docs.

hmgaudecker

Looks great to me! Just added some pausing for the parallel tasks, those failed on my machine.

I was wondering, however, whether we should use the great momentum right away to implement dict-style passing of arguments as well?

I could see either using argparse or some standardized (hidden) json file. The latter might be more flexible and actually make it easier to run from editors / ides?

Something like

.pytask.[task_file].[task_name].[id].json

with contents:

{
    "depends_on": ["[dict]", "[list]", "[single file]", null],
    "produces": ["[dict]", "[list]", "[single file]", null],
    "other_keys": "[whatever]"
}

hmgaudecker · 2022-01-14T15:07:33Z

I understand this is a free service, but are those i386 machines? 😆

tobiasraabe · 2022-01-14T15:21:01Z

@hmgaudecker If one test occasionally fails I have no problem with that. Same in the other repos but it is really not that likely.

tobiasraabe · 2022-01-14T15:22:33Z

@hildebrandecon Please, follow the link of the failing the pre-commit ci step and see whether you can fix the errors to make the pipeline pass.

hmgaudecker · 2022-01-14T15:32:08Z

@tobiasraabe, what do you think about implementing passing of options via json, see above.

Also, we may want to check that command line options recognized by Julia are parsed correctly. Something like pass number of threads and check within Julia that the number is correct? Once with 1, once with 4?

tobiasraabe · 2022-01-14T15:39:58Z

@hmgaudecker Just for consistency and finishing stuff, I would advocate for merging this PR first to align with the other plugins. We will release it as v0.1.0.

After that, and with a potential v0.2.0 release the interface can be reworked. I am nor sure I understand how you want to use the json. Do you want to be able to define tasks in json files?

hmgaudecker · 2022-01-14T15:45:18Z

Sounds great.

No, I would just want to use it to pass the dependencies and targets (and other options) in a way that resembles Python tasks. Right now, you can only pass arguments by order (=2 lists and on Julia/R side impossible to discriminate between deps/targets without knowing the mapping of list elements to files). Very error prone.

hildebrandecon · 2022-01-14T19:21:24Z

@hildebrandecon Please, follow the link of the failing the pre-commit ci step and see whether you can fix the errors to make the pipeline pass.

@tobiasraabe I managed to resolve some issues. I'm not sure how to fix/treat the remaining ones, unfortunately.

hmgaudecker · 2022-01-15T12:34:27Z

Just fix by ignoring (although I left that explicitly out from the course 😄)

Also, we may want to check that command line options recognized by Julia are parsed correctly. Something like pass number of threads and check within Julia that the number is correct? Once with 1, once with 4?

@hildebrandecon @mbrambeer Could you draft a test for this? That is, pass explicitly to Julia that it should use only 1 thread at most, then check inside Julia that this is caught correctly. Repeat with 4. Test should include depends_on/produces because the order mattered in our experiments yesterday.

I think this is more important than checking what happens with wrong options. Not really pytask's Job to check whether the user typed stuff correctly (it would be nice, of course, but could also become a maintenance burden).

tobiasraabe · 2022-01-15T13:29:56Z

Sounds great.

No, I would just want to use it to pass the dependencies and targets (and other options) in a way that resembles Python tasks. Right now, you can only pass arguments by order (=2 lists and on Julia/R side impossible to discriminate between deps/targets without knowing the mapping of list elements to files). Very error prone.

Yes, makes sense. To be more bold, you could pass the json data via the command line as an argument and parsing the string inside the program to get rid of the .pytask.....json file.

hmgaudecker · 2022-01-15T13:43:44Z

In principle yes, but the nice thing about the file (also relative to command line options) is that it makes it easy to run Julia/R from editors/IDEs/any way other than pytask.

hildebrandecon · 2022-01-17T08:02:05Z

Just fix by ignoring (although I left that explicitly out from the course 😄)

Also, we may want to check that command line options recognized by Julia are parsed correctly. Something like pass number of threads and check within Julia that the number is correct? Once with 1, once with 4?

@hildebrandecon @mbrambeer Could you draft a test for this? That is, pass explicitly to Julia that it should use only 1 thread at most, then check inside Julia that this is caught correctly. Repeat with 4. Test should include depends_on/produces because the order mattered in our experiments yesterday.

I think this is more important than checking what happens with wrong options. Not really pytask's Job to check whether the user typed stuff correctly (it would be nice, of course, but could also become a maintenance burden).

@hmgaudecker I must say that I don't understand what exactly you have in mind, probably because I'm not as an IT expert. However, here would be my rough idea what you would want to do:

we can tell julia how many threads to use via julia -t nthreads, where nthreads is the number of threads.
then, we can check within julia that all went well with Threads.nthreads() which should evaluate to nthreads.
apart from that, I'm not sure how to continue, I think because I don't have a clear idea how I can check the 2nd step within pytask.

tobiasraabe · 2022-01-17T08:27:22Z

Just add something to the script which makes it fail if nthreads does not have the desired number. In Python you would use the assert statement or raising an exception. Both make the script fail with nonzero exit codes which pytask will detect and mark the task as failed.

hmgaudecker · 2022-01-17T08:42:11Z

Yip, that's precisely what I had in mind!

hildebrandecon · 2022-01-17T09:22:43Z

Ok, I wrote a draft but in fact, it doesn't work. The test should fail since in line 141 we pass 4 threads. If I run the same stuff in the cmd line, it in fact works.

Replication:

create a file script.jl that contains write("out.txt", "So, so you think you can tell heaven from hell?") \n @assert Threads.nthreads() == 1
in cmd line, run julia -t 1 script.jl: works
in cmd line, run julia -t 4 script.jl: gives error
note: if we run julia script.jl -t 4, we don't get an error (this is the order-sensitivity that we expected...)

hildebrandecon · 2022-01-17T09:32:14Z

It seems that one has to pass arguments regarding Julia startup before the script but arguments used in the function itself after the script, replication:

script.jl contains:

for arg in ARGS
    println(arg)
end
@assert Threads.nthreads() == 1

In the shell, run: julia --threads 1 script.jl "hello1" "hello2" "hello3" without error but julia --threads 4 script.jl "hello1" "hello2" "hello3" with error. Putting --threads 1 at the end yields that it is printed too.

hmgaudecker · 2022-01-17T09:33:20Z

Thanks! I would hope writing the test is enough, @tobiasraabe has too much of an advantage of fixing the call from pytask for us to worry much about it 😉

However, can you parametrize the test with both 1 and 4 threads? I think it defaults to 4 on a machine with 4 cores (I might misremember) so that would be just to avoid random passes based on the machine type.

tobiasraabe · 2022-01-17T09:37:00Z

tests/test_execute.py

+    os.chdir(tmp_path)
+    session = main({"paths": tmp_path})


You dont need to switch paths and better use the runner patter from here

pytask-julia/tests/test_parallel.py

Line 58 in 2b15b07

result = runner.invoke(cli, [tmp_path.as_posix()])

tobiasraabe · 2022-01-17T14:00:44Z

I don't really have a good idea. I feel like this forces us to replicate the command line interface and understand whether options are flags, accept and input, accept multiple inputs and what are arguments to the script, split them and place them to the left or right of the script.

One approach to circumvent the replication would be to ask users to add -- when they are adding flags/options and arguments to the script in the decorator as a separator. For example, @pytask.mark.julia(("--threads", "1", "--", "0")). -- is also a valid separator on the command line to do that. Then we could split arguments at the double-dash, add the script just before the separator and recreate the command line string again.

hmgaudecker · 2022-01-17T14:19:55Z

Ah, sorry, I thought you had some sense or order in there. My understanding is that all you'd need to do is to swap the contents of mark.julia(...) with whatever is in depends_on / produces? "Options first" seems a good enough understanding for how Julia parses the command line, it seems to me...

tobiasraabe · 2022-01-17T15:20:43Z

Unfortunately, it is not enough to match the content of the decorator with depends_on and produces since you could also pass something like seeds to the script. And those other values could be related to a preceding command line option or not. I don't want to resolve something like this

--threads 1  # belongs together.
--verbose 1  # does not belong together and 1 should be passed to the script.

Ok, plan for the implementation.

A decorator with no content will default to subprocess.run(("julia", "--", "script.jl")).
If content is provided, I would suggest that the double dash is required and we raise an error with a helpful error message if not. I think it will avoid confusion which justifies entering -- from time to time.

Everyone agrees?

hmgaudecker · 2022-01-17T15:30:18Z

Okay, talking past each other here.

As per our JSON discussion, I agree that pytask should not worry about resolving stuff.

I would say the pytask.mark("...") content is exclusively reserved for options passed to Julia as opposed to the script. pytask just needs to ensure this ends up before script.jl in subprocess.run(("julia", "--", "script.jl")), contrary to what is currently happening.

Anything else will be solved by different means in a different PR.

hmgaudecker · 2022-01-17T15:33:40Z

contrary to what is currently happening.

That is, what is happening based on what @hildebrandecon posted above, I did not check in detail and it might not catch the option for other reasons.

tobiasraabe · 2022-01-17T16:23:07Z

Okay, talking past each other here.

As per our JSON discussion, I agree that pytask should not worry about resolving stuff.

I would say the pytask.mark("...") content is exclusively reserved for options passed to Julia as opposed to the script. pytask just needs to ensure this ends up before script.jl in subprocess.run(("julia", "--", "script.jl")), contrary to what is currently happening.

Anything else will be solved by different means in a different PR.

This would mean that v0.1.0 does not support all functionality. It does not allow users to pass any arguments to script as it is possible with the other plugins. We could pass depends on and produces by default to the script and the user can optionally parse them or not. But, something like passing a seed to a script would not work.

I see the json approach as an alternative interface and a way to move forward, but I would prefer something fully working before that.

hmgaudecker · 2022-01-17T19:13:26Z

Fair enough, I cannot delve deeply enough into it. So your view is to not pass any options to Julia, just to scripts?

hildebrandecon · 2022-01-17T20:27:50Z

I think I cannot follow the discussion entirely, given that I don't understand most of the pytask details. Hence, I'm also not sure whether I can give any educated opinion about what best/not to do. Of course, let me know if I can be of help!

tobiasraabe · 2022-01-18T00:43:05Z

I am sorry. I feel like I have not expressed myself clearly enough and caused some unnecessary confusion. Let me try it once more (although rather lengthy).

The decorator @pytask.mark.julia or @pytask.mark.r are used to pass options (like --threads) to the executable and arguments (like a seed or paths from dependencies or products) to the script.

@pytask.mark.r(("input.md", "1", "--verbose"))
@pytask.mark.depends_on(["script.R", "input.md"])
@pytask.mark.produces("out.md")
def task_write_number_with_r_to_file():
    pass

This way, although highly ugly, the R scripts receives the path to the markdown file and a number which can be written to the output file.

We also pass the --verbose flag which is just representative of any option which can be passed to Rscript.

The main point here is that pytask-r can just take the tuple in the decorator and pass it on the command line to Rscript like

$ Rscript script.R input.md 1 --verbose

With julia this is not possible since a flag like --verbose or --threads need to be put on the command line after the executable julia and before the script.

This means pytask-julia needs to know whether something is an option for the executable or input to the script.

As a super lightweight workaround and ugly hack, I propose to require users, if they use the @pytask.mark.julia decorator, to separate options to the executable and inputs to the script via --. Taking the example from above, this would look like this:

@pytask.mark.julia(("--verbose", "--", "input.md", "1"))
@pytask.mark.depends_on(["script.jl", "input.md"])
@pytask.mark.produces("out.md")
def task_write_number_with_julia_to_file():
    pass

This way pytask-julia knows how to build the input to subprocess.run. Put everything before -- right after the executable and before adding the path to the script taking from the first dependency. Then, add everything to the right side of --.

$ julia --verbose -- script.jl input.md 1

I now also want to make it mandatory to always specify -- if the user passes any values to @pytask.mark.julia so that it is always clear whether input refers to the executable or script. Two examples

@pytask.mark.julia(("--", "input.md"))  # to pass a path to the script

@pytask.mark.julia(("--threads", "4", "--"))  # to enable 4 threads while running the program.

The reason why I believe -- should be mandatory is that it is not clear looking at the two cases above whether one case could omit the -- because it would feel more natural to pass something to the executable or to the script.

I know this is an extremely dirty hack and I am deeply unsatisfied with the situation. It is not an excuse to say waf was equally bad. We need to circumvent the positional indexing and get something label-based soon. But, in this PR, I just want to make pytask-julia as powerful as the other plugins so we have a common base to start from.

hildebrandecon · 2022-01-18T10:01:25Z

thanks @tobiasraabe for the extensive explanation, I think your arguments make perfect sense!

hmgaudecker · 2022-01-18T11:49:23Z

Sounds great, thanks for the detailed explanation.

I think my view is to just not spend too much time on this right now. There will be few users and even less power users for the moment.

Looking back at Waf and the current state of pytask-{R, Stata, Julia), I do think that requiring command-line options is a major inhibitor for take-up. For the typical (student) data cruncher, the most important feature is that starting from an IDE is seamless and the same as starting from pytask. But that's for another discussion!

…into dev_seb

tobiasraabe · 2022-01-18T19:18:52Z

@hildebrandecon @hmgaudecker Hi, I implemented the described approach. The tests checking the number of threads inside the script works as well.

Please review the readme and test whether the package works for you.

hildebrandecon · 2022-01-18T21:54:33Z

@tobiasraabe tests run smoothly on my machine & I think the readme is in line with all changes. Thanks a lot!

hmgaudecker

Looks great, just a couple of comments on documenting things and a suggestion for another trivial test.

README.rst

src/pytask_julia/collect.py

hmgaudecker · 2022-01-19T07:25:59Z

tests/test_execute.py

 def test_check_passing_cmd_line_options(tmp_path):
    task_source = """
    import pytask



I think this should be doubled up with a second number, different from 4. Just to avoid that whatever machine it is run on, it passes for reasons of the default happening to be 4.

for more information, see https://pre-commit.ci

hildebrandecon · 2022-01-19T07:57:46Z

I extended the cmd line option test as @hmgaudecker has proposed. I tried to do it with parametrize at first but I'm not sure whether one can check the session code separately then.

hildebrandecon added 4 commits January 14, 2022 12:05

Minor fixes, adding ToDo.

2e20e6f

Fixing test_parallel.py

cef6170

Fixing test_parametrize.py

3edca63

Adjusting the ReadMe.

b3cc4ba

hildebrandecon requested review from hmgaudecker and tobiasraabe January 14, 2022 13:55

[pre-commit.ci] auto fixes from pre-commit.com hooks

f2d2654

for more information, see https://pre-commit.ci

Make parallel timing meaningful.

2e8d4ad

hmgaudecker reviewed Jan 14, 2022

View reviewed changes

Half a second was too optimistic for GA machines...

7f09b9b

[WIP] Fixing the pre-commit issues.

5ad6a9d

Cheap way around pre-commit failure.

c545bec

tobiasraabe added 4 commits January 15, 2022 19:34

Last fixes.

c638c0f

Fix tests.

2a99009

Increase sleep in paralell tests.

51edea3

remove julia reference in github actions.

5249296

[WIP] Test for cmd line options.

e2e6677

tobiasraabe reviewed Jan 17, 2022

View reviewed changes

tobiasraabe added 2 commits January 18, 2022 20:05

Implement suggestion and align readme.

b2113aa

Merge branch 'dev_seb' of https://github.com/pytask-dev/pytask-julia …

5940f66

…into dev_seb

tobiasraabe added this to the v0.1.0 milestone Jan 18, 2022

hmgaudecker approved these changes Jan 19, 2022

View reviewed changes

hildebrandecon and others added 2 commits January 19, 2022 08:56

Extending cmd line options test.

7f31d5c

[pre-commit.ci] auto fixes from pre-commit.com hooks

9ce4f30

for more information, see https://pre-commit.ci

tobiasraabe added 2 commits January 19, 2022 19:01

Integrate all review comments.

7d38f89

Fix.

4ff5c48

tobiasraabe merged commit e47e8db into main Jan 19, 2022

tobiasraabe deleted the dev_seb branch January 19, 2022 18:09

Adjusting tests to julia. #2

Adjusting tests to julia. #2

Uh oh!

Conversation

hildebrandecon commented Jan 14, 2022 • edited by tobiasraabe Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hildebrandecon commented Jan 14, 2022

Uh oh!

codecov bot commented Jan 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hmgaudecker left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hmgaudecker commented Jan 14, 2022

Uh oh!

tobiasraabe commented Jan 14, 2022

Uh oh!

tobiasraabe commented Jan 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hmgaudecker commented Jan 14, 2022

Uh oh!

tobiasraabe commented Jan 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hmgaudecker commented Jan 14, 2022

Uh oh!

hildebrandecon commented Jan 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hmgaudecker commented Jan 15, 2022

Uh oh!

tobiasraabe commented Jan 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hmgaudecker commented Jan 15, 2022

Uh oh!

hildebrandecon commented Jan 17, 2022

Uh oh!

tobiasraabe commented Jan 17, 2022

Uh oh!

hmgaudecker commented Jan 17, 2022

Uh oh!

hildebrandecon commented Jan 17, 2022

Uh oh!

hildebrandecon commented Jan 17, 2022

Uh oh!

hmgaudecker commented Jan 17, 2022

Uh oh!

tobiasraabe Jan 17, 2022

Choose a reason for hiding this comment

Uh oh!

tobiasraabe commented Jan 17, 2022

Uh oh!

hmgaudecker commented Jan 17, 2022

Uh oh!

tobiasraabe commented Jan 17, 2022

Uh oh!

hmgaudecker commented Jan 17, 2022

Uh oh!

hmgaudecker commented Jan 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tobiasraabe commented Jan 17, 2022

Uh oh!

hmgaudecker commented Jan 17, 2022

Uh oh!

hildebrandecon commented Jan 17, 2022

Uh oh!

tobiasraabe commented Jan 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hildebrandecon commented Jan 18, 2022

Uh oh!

hmgaudecker commented Jan 18, 2022

Uh oh!

hildebrandecon commented Jan 14, 2022 •

edited by tobiasraabe

Loading

codecov bot commented Jan 14, 2022 •

edited

Loading

hmgaudecker left a comment •

edited

Loading

tobiasraabe commented Jan 14, 2022 •

edited

Loading

tobiasraabe commented Jan 14, 2022 •

edited

Loading

hildebrandecon commented Jan 14, 2022 •

edited

Loading

tobiasraabe commented Jan 15, 2022 •

edited

Loading

hmgaudecker commented Jan 17, 2022 •

edited

Loading

tobiasraabe commented Jan 18, 2022 •

edited

Loading

hildebrandecon commented Jan 18, 2022 •

edited

Loading