[Ready for review] use gfile to support remote directories #2164

f4hy · 2020-06-12T20:46:56Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?
If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

f4hy · 2020-06-13T21:03:21Z

Thanks @samj1912 fix the one missing call. I also cleaned up how this was being called and fixed up the tests. The tests were using the tmpdir pytest fixture which gives a py.path.local object where the contract in many places is type annotated str or Optional[str]. So I felt the best choice was to update the tests. Howerver it might be a better solution to just make the contracts some sort of path like object.

f4hy · 2020-06-13T21:05:35Z

Rebased since the requirements paths changed on master.

f4hy · 2020-06-13T21:07:25Z

I will leave this a draft PR in case a different solution is suggested for issue #2161

f4hy · 2020-06-14T04:40:41Z

Hmm. ok the tests here are now failing on just the 'minimal' builds where it passes on the lastest. A better approach here might be to wrap all the IO in something that can detect what is installed. If running on a minimum of dependencies only support local disk writes, and if more/later deps are installed support cloud file paths.

Borda · 2020-06-24T20:11:11Z

tests/trainer/test_trainer.py

@@ -501,7 +501,7 @@ def test_benchmark_option(tmpdir):

    # fit model
    trainer = Trainer(
-        default_root_dir=tmpdir,
+        default_root_dir=str(tmpdir),


rather so this casting in Trainer

Sounds good. will make that change.

The reason to do it here is that the trainer and other places correctly type annotate that it should be a str or Optional[str] and it is hte tests which are wrong in passing a path object. @Borda I think if the tests are going to do this we should update the type annotations to accept a union of string and path. what do you think?

yes, then pls update annotation :]

codecov · 2020-06-24T22:49:36Z

Codecov Report

Merging #2164 into master will decrease coverage by 0%.
The diff coverage is 86%.

@@          Coverage Diff           @@
##           master   #2164   +/-   ##
======================================
- Coverage      90%     90%   -0%     
======================================
  Files          79      79           
  Lines        7236    7275   +39     
======================================
+ Hits         6530    6541   +11     
- Misses        706     734   +28

pep8speaks · 2020-06-25T02:10:23Z

Hello @f4hy! Thanks for updating this PR.

In the file pytorch_lightning/utilities/cloud_io.py:

Line 41:50: E231 missing whitespace after ':'

Comment last updated at 2020-08-08 23:37:43 UTC

williamFalcon · 2020-06-26T13:38:18Z

@f4hy is this ready for 0.8.2 today?

f4hy · 2020-06-26T16:26:38Z

@f4hy is this ready for 0.8.2 today?

Yep! Just removed some extra cruft. If you @williamFalcon think this is a good solution to this then I think its ready to go now. I was worried the tests are not passing on the 'minimal' builds, I tried to fix it but looks like its failing on master for minimal right now.

Borda · 2020-06-26T22:11:04Z

btw, how does it goes with #2175?

f4hy · 2020-06-27T01:00:58Z

btw, how does it goes with #2175?

So s3 support is just one cloud hosting provider. My pr should support any of the backends that tensorboard currently supports which is s3/hdfs/gcs/local and maybe others have been added. So if #2175 is merged it wont solve the issue I have of #2161 .

If the other PR does something this one does not we can add it here, but my understanding is that this is a superset of that.

mergify · 2020-06-27T01:39:54Z