Extend CI #44

Borda · 2019-08-05T22:04:41Z

Extending actual CI and linked fixes/updates, reflecting #43

codecov-io · 2019-08-05T22:48:01Z

Codecov Report

❗ No coverage uploaded for pull request base (master@a79de1e). Click here to learn what that means.
The diff coverage is 39%.

@@          Coverage Diff           @@
##             master   #44   +/-   ##
======================================
  Coverage          ?   78%           
======================================
  Files             ?    13           
  Lines             ?   833           
  Branches          ?     0           
======================================
  Hits              ?   652           
  Misses            ?   181           
  Partials          ?     0

Borda · 2019-08-05T22:58:42Z

@williamFalcon it seems that your test-tube does not support python3.5, do you want to drop it also for this package? See: https://travis-ci.org/Borda/pytorch-lightning/jobs/568119944

Borda · 2019-08-05T23:03:24Z

@williamFalcon pls create your free accounts and update the following badges:

williamFalcon · 2019-08-05T23:40:50Z

@Borda the code coverage badge was generated by doing this:
https://github.com/williamFalcon/pytorch-lightning/tree/master/tests#running-coverage

williamFalcon · 2019-08-06T01:30:17Z

@williamFalcon pls create your free accounts and update the following badges:

Ah, was looking for something like codefactor. Good addition.

Wondering why you suggest changing from circle ci to app veyor? what are the advantages?

As mentioned above, i was thinking about auto codecov but i realized it was going to show something super low because a lot of code is GPU specific. Thus, I opted for running codecov and gou tests on a gpu machine. Sadly, not sure auto-codecov will work for this repo. Any suggestions on bridging that gap? (ie gpu use need)

Borda · 2019-08-06T07:58:11Z

#44 (comment) ok, then the Coverage is fair, but it not very transparent from my point of view... as it is part of the repo and at the first glance it looks like just added illustration (especially when you see there 99%, it is like an ideal case...) no offence, just trying to help :)
The codecov.io is made exactly for this purpose and you can also do kind of push of coverage results to the codecov... I believe that the use-case is

pip install codecov
codecov -t 17327163-8cca-4a5d-86c8-ca5f2ef700bc

where the has is a unique token to your this/your project...
Advantage of this is that it interactively visualise the touched lines and gives some statistic

Borda · 2019-08-06T08:07:48Z

#44 (comment) a good alternative to Codefactor is Codacy which has almost the same features...
I have not proposed to change CicleCI (you are not using it yet) these badges listed in #44 (comment) are newly added co you need to adjust the link to your project account. Al the past badges are still there as you can see https://github.com/Borda/pytorch-lightning/tree/extend-CI
The CircleCI is also a good option but almost the same as Travis co probably no need for adding CircleCI now... The Appveyor is an alternative which is running on Windows (Travis is running Linux and macOS)
I see your point with missing testing on GPU... Personally, I would use the automatic CPU Codecov even it gives a lower score and I will try to have look id there is a platform which allows GPU for CI testing... maybe the have look at Azure

Borda · 2019-08-06T09:06:40Z

maybe Running GPU Executors - CircleCI but it seems to be running on AWS

Adam Hartley (CircleCI), Aug 6, 02:10 PDT

Hi Jirka, thank you for reaching out to us!
GPU instances are not available publically at this time as they are still in the development phase. 
Please stay tuned for news of availability in the future.
Please let us know if there is anything else we can help with and happy building!

Adam, Customer Support Engineer @ CircleCI

Borda · 2019-08-06T09:59:38Z

just added also CircleCI for python3.6 and python 3.7...

Borda · 2019-08-06T12:02:35Z

it seems that also some tests are not very suitable for CI, they take too long... https://circleci.com/gh/Borda/pytorch-lightning/16
would it be feasible to nake them smaller, e.g. lest epochs, fewer examples... ?

williamFalcon · 2019-08-06T15:01:25Z

Great options. Let me address each individually.

Re travis vs circle vs AppVeyor

I mistyped haha I meant advantage over Travis. I agree there's no need for both (i've used both in the past, but i think i picked Travis for this because it could handle long tests and it was free).

Re: Windows:

Didn't officially try to support Windows as I think most people doing AI are using linux/macs (i know, i know, haha...), but in efforts to get this adopted by big older corps, I assume Windows support might be necessary.

So, let's add the windows tests, and modify whatever we need to change in the library to achieve windows compatibility. (I really have no idea if it'll be that much different tbh). Hopefully we get Windows support for free.

Re GPU tests

Thanks for reaching out to AWS. I spoke with the PyTorch team here at FB and the general consensus is that there really isn't a free way to run GPU tests. So, the suggestion was to allow people to run them on their own GPU machines (especially devs on the package).

So, let's maybe table this for now until we find a good free solution? Azure would be great if they can support it (give the AI community some of that OpenAI money haha).

Re test length

A place to maybe pick up speed is to not download MNIST for every test (i realized this week that clearing the build folder also removes MNIST). That should provide a big speed-up.

I think all the other tests are only training for 1 epoch and 1/10th of the data. There are 1 or 2 tests (CPU, GPU) respectively which train on more epochs to make sure it can achieve SOTA results on MNIST as a test.

Codecov

I agree it's not optimal and I also hate not having a thrid-party way to validate that the coverage wasn't faked. I didn't know about the submit option, so why don't we just do that? We can ask devs to run codecov and submit the results with a PR.

I'd like to keep the coverage at 99%+ which means PRs have to be well covered.

Summary

To summarize, I think this is what we've converged on:

Keep Travis.
Add AppVeyor for Windows.
Add codefactor.
Table auto-GPU tests until a free version is available. In the meantime, a team dev with GPUs can run GPU tests before merging a PR. (I can do that in the meantime doesn't take long and it's pretty easy for me). Academics and corporate contributors can do this on their clusters.
Switch to codecov and we'll push outputs of local codecov with:

pip install codecov
codecov -t 17327163-8cca-4a5d-86c8-ca5f2ef700bc

Cache MNIST during tests.

Anything I'm missing?

Things I owe:

create an Appveyor acct.
create codefactor acct.
create codecov acct.

Anything else?

Borda · 2019-08-06T15:23:30Z

what about python3.5, your setup say python_requires=">=3.5" but you are using test-tube which has invalid syntax (formatting outputs using f"some text {variable}...") for py3.5 so you want to drop support also py3.5 also for this project of fixing it in the other one... the fix is quite simple, see https://github.com/Borda/pytorch-lightning/blob/extend-CI/pytorch_lightning/models/trainer.py#L430

williamFalcon · 2019-08-06T15:28:28Z

what about python3.5, your setup say python_requires=">=3.5" but you are using test-tube which has invalid syntax (formatting outputs using f"some text {variable}...") for py3.5 so you want to drop support also py3.5 also for this project of fixing it in the other one... the fix is quite simple, see https://github.com/Borda/pytorch-lightning/blob/extend-CI/pytorch_lightning/models/trainer.py#L430

Oh yeah... let's make the fix. Seems simple enough. I don't have strong opinions about which python version to support. At a minimum no support for 2. But if you have good reasons for starting at some version let's do that.

But let's get rid of the formatting for that comment which seems trivial enough

williamFalcon

i thought we were cutting out circle-ci

.codecov.yml

.travis.yml

README.md

docs/Trainer/Distributed training.md

examples/new_project_templates/single_cpu_template.py

examples/new_project_templates/single_gpu_node_16bit_template.py

examples/new_project_templates/trainer_cpu_template.py

pytorch_lightning/__init__.py

pytorch_lightning/models/trainer.py

williamFalcon · 2019-08-06T17:56:57Z

pytorch_lightning/models/trainer.py


        # make DP and DDP mutually exclusive
        # single GPU will also use DP with devices=[0]
-        have_gpus = self.data_parallel_device_ids is not None and len(self.data_parallel_device_ids) > 0
-        if have_gpus:
+        if self.data_parallel_device_ids:


unfortunately the check needs to be:

self.data_parallel_device_ids is not None and len(self.data_parallel_device_ids) > 0

Case 1: is not None (user didn't pass gpus), so skip statement
Case 2: user did pass gpus AND it's more than a single GPU then we do whatever backend the user wants.
Case 3: use passed in a single GPU. In this case, we don't want to do DDP. We want to keep it as DP because DDP won't work well with a single GPU. So the above check leaves the default as 'dp' in this case which is what we want.

I feel lost, what is the case when you want to enter the if block? not empty array?

Python 3.6.8 (default, Jan 14 2019, 11:02:34) var = None True if var else False Out[3]: False var = [] True if var else False Out[5]: False var = [2] True if var else False Out[7]: True

1 GPUs. But if you don't do the None check before doing the check, it'll crash...

You want:

if more_than_1_gpus: # enter

But can't do this:

more_than_1_gpus = len(self.data_parallel_device_ids) > 0

because self.data_parallel_device_ids is None, so it'll crash.

Thus you have to check for that first as well (which is the case when Non GPU ids are passed)

pytorch_lightning/models/trainer.py

pytorch_lightning/root_module/model_saving.py

williamFalcon · 2019-08-06T18:03:44Z

tox.ini

@@ -0,0 +1,47 @@
+# this file is *not* meant to cover or endorse the use of tox or pytest or testing in general,


what is this file for?

it is a configuration of testing and defining test and formatting configurations at one place... see https://tox.readthedocs.io/en/latest

williamFalcon · 2019-08-06T18:08:52Z

Good changes. Added comments inline

Borda · 2019-08-06T23:10:41Z

./pytorch_lightning/models/trainer.py:183: [E501] line too long (111 > 100 characters) @williamFalcon 😉

williamFalcon · 2019-08-07T01:53:16Z

there's no pytorch 1.1.0 support for windows... requires custom install

pip3 install https://download.pytorch.org/whl/cu90/torch-1.1.0-cp36-cp36m-win_amd64.whl
pip3 install https://download.pytorch.org/whl/cu90/torchvision-0.3.0-cp36-cp36m-win_amd64.whl

Borda · 2019-08-07T05:20:02Z

I know about the missing pytorch for Win, just didn't have time yet to resolve it as you mention that Win is not the priority :)

Borda · 2019-08-07T09:31:34Z

requested help for PyTorch installing...
https://help.appveyor.com/discussions/support/2527-failing-installing-python-wheel-pytorch
@williamFalcon or any idea how to properly install pyTorch on Windows?

williamFalcon · 2019-08-07T10:09:27Z

@Borda let's remove the build failing badge for windows until we have it resolved. i don't want to give the impression that the project is failing at the moment haha.

williamFalcon · 2019-08-07T11:12:48Z

@Borda can't access codecov getting a 504 error on their page...

is this a stable service? (https://codecov.io/login/gh)

williamFalcon · 2019-08-07T12:29:06Z

they're back up

Borda · 2019-08-07T12:31:46Z

it seems to be working for me, could you try to rproduce it? (I do not have sufficiently large GPU)

jb@PH-NTB-009:~/Dropbox/Workspace/pytorch-lightning$ coverage report -m
Name                                                       Stmts   Miss  Cover   Missing
----------------------------------------------------------------------------------------
pytorch_lightning/__init__.py                                 11      0   100%
pytorch_lightning/callbacks/__init__.py                        2      0   100%
pytorch_lightning/models/__init__.py                           0      0   100%
pytorch_lightning/models/trainer.py                          388     72    81%   24, 167-171, 201-208, 357, 409, 420-433, 444, 472-477, 493-543, 554-569, 572-579, 751, 781, 797-799, 806-808
pytorch_lightning/pt_overrides/__init__.py                     0      0   100%
pytorch_lightning/pt_overrides/override_data_parallel.py      20      2    90%   62-63
pytorch_lightning/root_module/__init__.py                      0      0   100%
pytorch_lightning/root_module/decorators.py                    5      0   100%
pytorch_lightning/root_module/grads.py                        17      1    94%   22
pytorch_lightning/root_module/hooks.py                        11      0   100%
pytorch_lightning/root_module/memory.py                       88      1    99%   42
pytorch_lightning/root_module/model_saving.py                 96      0   100%
pytorch_lightning/root_module/root_module.py                  46      1    98%   106
pytorch_lightning/testing/__init__.py                          0      0   100%
pytorch_lightning/testing/lm_test_module.py                  101     20    80%   100, 109, 130, 134-135, 144-148, 207-209, 242-266
pytorch_lightning/utilities/__init__.py                        0      0   100%
pytorch_lightning/utilities/arg_parse.py                      47     45     4%   12-99
pytorch_lightning/utilities/debugging.py                       1      0   100%
----------------------------------------------------------------------------------------
TOTAL                                                        833    142    83%
jb@PH-NTB-009:~/Dropbox/Workspace/pytorch-lightning$ coverage xml
jb@PH-NTB-009:~/Dropbox/Workspace/pytorch-lightning$ codecov -t 17327163-8cca-4a5d-86c8-ca5f2ef700bc  -v

      _____          _
     / ____|        | |
    | |     ___   __| | ___  ___ _____   __
    | |    / _ \ / _  |/ _ \/ __/ _ \ \ / /
    | |___| (_) | (_| |  __/ (_| (_) \ V /
     \_____\___/ \____|\___|\___\___/ \_/
                                    v2.0.15

==> Detecting CI provider
  -> Got branch from git/hg
  -> Got sha from git/hg
==> Preparing upload
==> Processing gcov (disable by -X gcov)
    Executing gcov (find /home/jb/Dropbox/Workspace/pytorch-lightning -not -path './bower_components/**' -not -path './node_modules/**' -not -path './vendor/**' -type f -name '*.gcno'  -exec gcov -pb  {} +)
==> Collecting reports
    + /home/jb/Dropbox/Workspace/pytorch-lightning/coverage.xml bytes=34380
==> Uploading
    .url https://codecov.io
    .query yaml=.codecov.yml&token=<secret>&commit=421c4fab7dda431887026166ef767fab4f3174b0&branch=extend-CI&package=py2.0.15
    Pinging Codecov...
    Uploading to S3...
    https://codecov.io/github/Borda/pytorch-lightning/commit/421c4fab7dda431887026166ef767fab4f3174b0

…into extend-CI

Borda · 2019-08-07T13:07:26Z

@williamFalcon it seems that we (me) missed somewhere in the process the new badge for the license, could you please fix it... Thx

* [PYT-210] Update Gallery cards * Tweak gallery sizing

Update script to load Search functionality

Borda mentioned this pull request Aug 6, 2019

Cut-out examples #39

Merged