Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend CI #44

Merged
merged 25 commits into from
Aug 7, 2019
Merged

Extend CI #44

merged 25 commits into from
Aug 7, 2019

Conversation

Borda
Copy link
Member

@Borda Borda commented Aug 5, 2019

Extending actual CI and linked fixes/updates, reflecting #43

@codecov-io
Copy link

codecov-io commented Aug 5, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@a79de1e). Click here to learn what that means.
The diff coverage is 39%.

@@          Coverage Diff           @@
##             master   #44   +/-   ##
======================================
  Coverage          ?   78%           
======================================
  Files             ?    13           
  Lines             ?   833           
  Branches          ?     0           
======================================
  Hits              ?   652           
  Misses            ?   181           
  Partials          ?     0

@Borda
Copy link
Member Author

Borda commented Aug 5, 2019

@williamFalcon it seems that your test-tube does not support python3.5, do you want to drop it also for this package? See: https://travis-ci.org/Borda/pytorch-lightning/jobs/568119944

@Borda
Copy link
Member Author

Borda commented Aug 5, 2019

@williamFalcon pls create your free accounts and update the following badges:

  • Build status
  • codecov
  • CodeFactor

@williamFalcon
Copy link
Contributor

williamFalcon commented Aug 5, 2019

@Borda the code coverage badge was generated by doing this:
https://github.com/williamFalcon/pytorch-lightning/tree/master/tests#running-coverage

@williamFalcon
Copy link
Contributor

@williamFalcon pls create your free accounts and update the following badges:

  • Build status
  • codecov
  • CodeFactor

Ah, was looking for something like codefactor. Good addition.

Wondering why you suggest changing from circle ci to app veyor? what are the advantages?

As mentioned above, i was thinking about auto codecov but i realized it was going to show something super low because a lot of code is GPU specific. Thus, I opted for running codecov and gou tests on a gpu machine. Sadly, not sure auto-codecov will work for this repo. Any suggestions on bridging that gap? (ie gpu use need)

@Borda
Copy link
Member Author

Borda commented Aug 6, 2019

#44 (comment) ok, then the Coverage is fair, but it not very transparent from my point of view... as it is part of the repo and at the first glance it looks like just added illustration (especially when you see there 99%, it is like an ideal case...) no offence, just trying to help :)
The codecov.io is made exactly for this purpose and you can also do kind of push of coverage results to the codecov... I believe that the use-case is

pip install codecov
codecov -t 17327163-8cca-4a5d-86c8-ca5f2ef700bc

where the has is a unique token to your this/your project...
Advantage of this is that it interactively visualise the touched lines and gives some statistic

@Borda
Copy link
Member Author

Borda commented Aug 6, 2019

#44 (comment) a good alternative to Codefactor is Codacy which has almost the same features...
I have not proposed to change CicleCI (you are not using it yet) these badges listed in #44 (comment) are newly added co you need to adjust the link to your project account. Al the past badges are still there as you can see https://github.com/Borda/pytorch-lightning/tree/extend-CI
The CircleCI is also a good option but almost the same as Travis co probably no need for adding CircleCI now... The Appveyor is an alternative which is running on Windows (Travis is running Linux and macOS)
I see your point with missing testing on GPU... Personally, I would use the automatic CPU Codecov even it gives a lower score and I will try to have look id there is a platform which allows GPU for CI testing... maybe the have look at Azure

@Borda
Copy link
Member Author

Borda commented Aug 6, 2019

maybe Running GPU Executors - CircleCI but it seems to be running on AWS

Adam Hartley (CircleCI), Aug 6, 02:10 PDT

Hi Jirka, thank you for reaching out to us!
GPU instances are not available publically at this time as they are still in the development phase. 
Please stay tuned for news of availability in the future.
Please let us know if there is anything else we can help with and happy building!

Adam, Customer Support Engineer @ CircleCI

@Borda
Copy link
Member Author

Borda commented Aug 6, 2019

just added also CircleCI for python3.6 and python 3.7... CircleCI

@Borda
Copy link
Member Author

Borda commented Aug 6, 2019

it seems that also some tests are not very suitable for CI, they take too long... https://circleci.com/gh/Borda/pytorch-lightning/16
would it be feasible to nake them smaller, e.g. lest epochs, fewer examples... ?

@williamFalcon
Copy link
Contributor

williamFalcon commented Aug 6, 2019

Great options. Let me address each individually.

Re travis vs circle vs AppVeyor

I mistyped haha I meant advantage over Travis. I agree there's no need for both (i've used both in the past, but i think i picked Travis for this because it could handle long tests and it was free).

Re: Windows:

Didn't officially try to support Windows as I think most people doing AI are using linux/macs (i know, i know, haha...), but in efforts to get this adopted by big older corps, I assume Windows support might be necessary.

So, let's add the windows tests, and modify whatever we need to change in the library to achieve windows compatibility. (I really have no idea if it'll be that much different tbh). Hopefully we get Windows support for free.

Re GPU tests

Thanks for reaching out to AWS. I spoke with the PyTorch team here at FB and the general consensus is that there really isn't a free way to run GPU tests. So, the suggestion was to allow people to run them on their own GPU machines (especially devs on the package).

So, let's maybe table this for now until we find a good free solution? Azure would be great if they can support it (give the AI community some of that OpenAI money haha).

Re test length

A place to maybe pick up speed is to not download MNIST for every test (i realized this week that clearing the build folder also removes MNIST). That should provide a big speed-up.

I think all the other tests are only training for 1 epoch and 1/10th of the data. There are 1 or 2 tests (CPU, GPU) respectively which train on more epochs to make sure it can achieve SOTA results on MNIST as a test.

Codecov

I agree it's not optimal and I also hate not having a thrid-party way to validate that the coverage wasn't faked. I didn't know about the submit option, so why don't we just do that? We can ask devs to run codecov and submit the results with a PR.

I'd like to keep the coverage at 99%+ which means PRs have to be well covered.

Summary

To summarize, I think this is what we've converged on:

  1. Keep Travis.
  2. Add AppVeyor for Windows.
  3. Add codefactor.
  4. Table auto-GPU tests until a free version is available. In the meantime, a team dev with GPUs can run GPU tests before merging a PR. (I can do that in the meantime doesn't take long and it's pretty easy for me). Academics and corporate contributors can do this on their clusters.
  5. Switch to codecov and we'll push outputs of local codecov with:
pip install codecov
codecov -t 17327163-8cca-4a5d-86c8-ca5f2ef700bc 
  1. Cache MNIST during tests.

Anything I'm missing?

Things I owe:

  • create an Appveyor acct.
  • create codefactor acct.
  • create codecov acct.

Anything else?

@Borda
Copy link
Member Author

Borda commented Aug 6, 2019

what about python3.5, your setup say python_requires=">=3.5" but you are using test-tube which has invalid syntax (formatting outputs using f"some text {variable}...") for py3.5 so you want to drop support also py3.5 also for this project of fixing it in the other one... the fix is quite simple, see https://github.com/Borda/pytorch-lightning/blob/extend-CI/pytorch_lightning/models/trainer.py#L430

@williamFalcon
Copy link
Contributor

what about python3.5, your setup say python_requires=">=3.5" but you are using test-tube which has invalid syntax (formatting outputs using f"some text {variable}...") for py3.5 so you want to drop support also py3.5 also for this project of fixing it in the other one... the fix is quite simple, see https://github.com/Borda/pytorch-lightning/blob/extend-CI/pytorch_lightning/models/trainer.py#L430

Oh yeah... let's make the fix. Seems simple enough. I don't have strong opinions about which python version to support. At a minimum no support for 2. But if you have good reasons for starting at some version let's do that.

But let's get rid of the formatting for that comment which seems trivial enough

@Borda Borda mentioned this pull request Aug 6, 2019
Copy link
Contributor

@williamFalcon williamFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought we were cutting out circle-ci

.codecov.yml Outdated Show resolved Hide resolved

# make DP and DDP mutually exclusive
# single GPU will also use DP with devices=[0]
have_gpus = self.data_parallel_device_ids is not None and len(self.data_parallel_device_ids) > 0
if have_gpus:
if self.data_parallel_device_ids:
Copy link
Contributor

@williamFalcon williamFalcon Aug 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately the check needs to be:

self.data_parallel_device_ids is not None and len(self.data_parallel_device_ids) > 0

Case 1: is not None (user didn't pass gpus), so skip statement
Case 2: user did pass gpus AND it's more than a single GPU then we do whatever backend the user wants.
Case 3: use passed in a single GPU. In this case, we don't want to do DDP. We want to keep it as DP because DDP won't work well with a single GPU. So the above check leaves the default as 'dp' in this case which is what we want.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel lost, what is the case when you want to enter the if block? not empty array?

Python 3.6.8 (default, Jan 14 2019, 11:02:34) 
var = None
True if var else False
Out[3]: False
var = []
True if var else False
Out[5]: False
var = [2]
True if var else False
Out[7]: True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 GPUs. But if you don't do the None check before doing the check, it'll crash...

You want:

if more_than_1_gpus:
    # enter

But can't do this:

more_than_1_gpus = len(self.data_parallel_device_ids) > 0

because self.data_parallel_device_ids is None, so it'll crash.

Thus you have to check for that first as well (which is the case when Non GPU ids are passed)

@@ -0,0 +1,47 @@
# this file is *not* meant to cover or endorse the use of tox or pytest or testing in general,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this file for?

Copy link
Member Author

@Borda Borda Aug 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is a configuration of testing and defining test and formatting configurations at one place... see https://tox.readthedocs.io/en/latest

@williamFalcon
Copy link
Contributor

Good changes. Added comments inline

@Borda
Copy link
Member Author

Borda commented Aug 6, 2019

./pytorch_lightning/models/trainer.py:183: [E501] line too long (111 > 100 characters) @williamFalcon 😉

@williamFalcon
Copy link
Contributor

there's no pytorch 1.1.0 support for windows... requires custom install

pip3 install https://download.pytorch.org/whl/cu90/torch-1.1.0-cp36-cp36m-win_amd64.whl
pip3 install https://download.pytorch.org/whl/cu90/torchvision-0.3.0-cp36-cp36m-win_amd64.whl

@Borda
Copy link
Member Author

Borda commented Aug 7, 2019

I know about the missing pytorch for Win, just didn't have time yet to resolve it as you mention that Win is not the priority :)

@Borda
Copy link
Member Author

Borda commented Aug 7, 2019

requested help for PyTorch installing...
https://help.appveyor.com/discussions/support/2527-failing-installing-python-wheel-pytorch
@williamFalcon or any idea how to properly install pyTorch on Windows?

@williamFalcon
Copy link
Contributor

@Borda let's remove the build failing badge for windows until we have it resolved. i don't want to give the impression that the project is failing at the moment haha.

@williamFalcon
Copy link
Contributor

@Borda can't access codecov getting a 504 error on their page...

is this a stable service? (https://codecov.io/login/gh)

@williamFalcon
Copy link
Contributor

they're back up

@Borda
Copy link
Member Author

Borda commented Aug 7, 2019

it seems to be working for me, could you try to rproduce it? (I do not have sufficiently large GPU)

jb@PH-NTB-009:~/Dropbox/Workspace/pytorch-lightning$ coverage report -m
Name                                                       Stmts   Miss  Cover   Missing
----------------------------------------------------------------------------------------
pytorch_lightning/__init__.py                                 11      0   100%
pytorch_lightning/callbacks/__init__.py                        2      0   100%
pytorch_lightning/models/__init__.py                           0      0   100%
pytorch_lightning/models/trainer.py                          388     72    81%   24, 167-171, 201-208, 357, 409, 420-433, 444, 472-477, 493-543, 554-569, 572-579, 751, 781, 797-799, 806-808
pytorch_lightning/pt_overrides/__init__.py                     0      0   100%
pytorch_lightning/pt_overrides/override_data_parallel.py      20      2    90%   62-63
pytorch_lightning/root_module/__init__.py                      0      0   100%
pytorch_lightning/root_module/decorators.py                    5      0   100%
pytorch_lightning/root_module/grads.py                        17      1    94%   22
pytorch_lightning/root_module/hooks.py                        11      0   100%
pytorch_lightning/root_module/memory.py                       88      1    99%   42
pytorch_lightning/root_module/model_saving.py                 96      0   100%
pytorch_lightning/root_module/root_module.py                  46      1    98%   106
pytorch_lightning/testing/__init__.py                          0      0   100%
pytorch_lightning/testing/lm_test_module.py                  101     20    80%   100, 109, 130, 134-135, 144-148, 207-209, 242-266
pytorch_lightning/utilities/__init__.py                        0      0   100%
pytorch_lightning/utilities/arg_parse.py                      47     45     4%   12-99
pytorch_lightning/utilities/debugging.py                       1      0   100%
----------------------------------------------------------------------------------------
TOTAL                                                        833    142    83%
jb@PH-NTB-009:~/Dropbox/Workspace/pytorch-lightning$ coverage xml
jb@PH-NTB-009:~/Dropbox/Workspace/pytorch-lightning$ codecov -t 17327163-8cca-4a5d-86c8-ca5f2ef700bc  -v

      _____          _
     / ____|        | |
    | |     ___   __| | ___  ___ _____   __
    | |    / _ \ / _  |/ _ \/ __/ _ \ \ / /
    | |___| (_) | (_| |  __/ (_| (_) \ V /
     \_____\___/ \____|\___|\___\___/ \_/
                                    v2.0.15

==> Detecting CI provider
  -> Got branch from git/hg
  -> Got sha from git/hg
==> Preparing upload
==> Processing gcov (disable by -X gcov)
    Executing gcov (find /home/jb/Dropbox/Workspace/pytorch-lightning -not -path './bower_components/**' -not -path './node_modules/**' -not -path './vendor/**' -type f -name '*.gcno'  -exec gcov -pb  {} +)
==> Collecting reports
    + /home/jb/Dropbox/Workspace/pytorch-lightning/coverage.xml bytes=34380
==> Uploading
    .url https://codecov.io
    .query yaml=.codecov.yml&token=<secret>&commit=421c4fab7dda431887026166ef767fab4f3174b0&branch=extend-CI&package=py2.0.15
    Pinging Codecov...
    Uploading to S3...
    https://codecov.io/github/Borda/pytorch-lightning/commit/421c4fab7dda431887026166ef767fab4f3174b0

@williamFalcon williamFalcon merged commit 04de151 into Lightning-AI:master Aug 7, 2019
@Borda Borda deleted the extend-CI branch August 7, 2019 13:04
@Borda
Copy link
Member Author

Borda commented Aug 7, 2019

@williamFalcon it seems that we (me) missed somewhere in the process the new badge for the license, could you please fix it... Thx

luiscape pushed a commit to luiscape/pytorch-lightning that referenced this pull request Jan 17, 2020
* [PYT-210] Update Gallery cards

* Tweak gallery sizing
luiscape pushed a commit to luiscape/pytorch-lightning that referenced this pull request Jan 17, 2020
Update script to load Search functionality
@anthonytec2 anthonytec2 mentioned this pull request Jun 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants