Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] n3fit unit testing #590

Merged
merged 19 commits into from
Nov 7, 2019
Merged

[WIP] n3fit unit testing #590

merged 19 commits into from
Nov 7, 2019

Conversation

scarlehoff
Copy link
Member

@scarlehoff scarlehoff commented Oct 10, 2019

This PR adds tests files for n3fit. Whenever possible (mostly the backend) while for the fit it will be rather a regression test.

I will edit this PR with the things that are tested (so I can copy it verbatim to the documentation at the end).

Unit tests

  • Tests that all backend operations do exactly what you would expect numpy to do.
  • Tests that loss functions are doing what is expected from them.
  • Tests for the DIS and Hadronic convolutions

@scarlehoff scarlehoff added the n3fit Issues and PRs related to n3fit label Oct 10, 2019
@scarlehoff scarlehoff self-assigned this Oct 10, 2019
@scarlehoff scarlehoff added the good first issue Good for newcomers label Oct 11, 2019
@scarlehoff
Copy link
Member Author

scarlehoff commented Oct 11, 2019

I think adding tests for the different pieces of n3fit could be a good issue for getting to learn the code. The ones that are already there can be used as a starting point.

The one I will add myself is the integration test at the fit level as that one is a bit more tricky.

@Zaharid
Copy link
Contributor

Zaharid commented Oct 11, 2019

Can this run at the CI?

@Zaharid
Copy link
Contributor

Zaharid commented Oct 11, 2019

Also I have to say that a big reason for merging the libnnpf, nnpdfcpp and validphys repos was not to have to deal with repeated build infrastructure such as conda recipes, test scripts, CI configurations and the like.

That suggests that it wouldn't be the worst idea to move n3fit into the validphys namespace as discussed at some point.

@scarlehoff
Copy link
Member Author

Ups, forgot. I was very happy because none of the commits failed to pass.

@Zaharid
Copy link
Contributor

Zaharid commented Oct 11, 2019

Good. This seems to be running something now.

@Zaharid
Copy link
Contributor

Zaharid commented Oct 11, 2019

Btw, given the embarrassing experience with #587 being able to run the whole new machinery in the ci would be quite useful (although not sure it is really possible with apfel).

@scarlehoff
Copy link
Member Author

Apfel is not possible but apfel is not part of the whole thing in this case (thanks god!)

Copy link
Contributor

@Zaharid Zaharid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This look very good and useful.

@Zaharid
Copy link
Contributor

Zaharid commented Oct 15, 2019

However it would be good if the various functions had some docstings.

@scarlehoff
Copy link
Member Author

However it would be good if the various functions had some docstings.

I've added docstrings to the auxiliary functions.
To the rest I am not sure what is the "protocol", in the sense that the docstring for test_function is basicallly "tests function"? (or test method).
Let me know.

@scarlehoff
Copy link
Member Author

scarlehoff commented Oct 17, 2019

@Zaharid @scarrazza have a look when you have time to the test_fit.py file, it contains one example of what can be used to test a fit.

At the moment I am considering the .fitinfo file because it contains all the chi2, the arclenghts, the epoch in which the fit stopped and whether the positivity passed (adding all other files is trivial ofc, this is just a first prototype).

The pipeline for travis is:

  • create a tmp folder
  • move there a quick runcard and run it
  • check the result hasn't changed with respect to the regression folder

Then we want to have some kind of flag so if you know something is supposed to change you can regenerate the result or part of the result. I don't know is whether this should be regenerated by Travis (the results depend on many random seeds coming from different places so I am guessing even if you run under conda, Mac and Linux won't get the exact same results)

Note: for now it is not running because it won't pass as I am not sure how to tell Travis to create regression data. Maybe it needs to be created, uploaded to the nnpdf server and then downloaded back for the next iterations?

import subprocess as sp

log = logging.getLogger(__name__)
REGRESSION_FOLDER = pathlib.Path().absolute() / "regressions"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can use importlib.resources instead? Not sure it is good for anything other than being the new recommended way. It does have the theoretical advantage that you can put the package in a zip file and have it working.

@scarlehoff
Copy link
Member Author

scarlehoff commented Oct 31, 2019

@Zaharid question.
The regression "data" should be generated by Travis and uploaded to the vp server?
If so how do I do it?

To first approximation I see I can shamelessly do

from validphys.uploadutils import FileUploader
uploader = FileUploader()
uploader.target_dir = 'WEB/thisisatesttodelete/'
uploader.root_url = 'https://vp.nnpdf.science/'
with uploader.upload_context( '.' ):
    pass

for instance. But is this the way to do it? Want to know for sure before cluttering the server with random tests.

@scarlehoff
Copy link
Member Author

scarlehoff commented Oct 31, 2019

Want to know for sure before cluttering the server with random tests.

By which I mean before keeping on cluttering the server. I'll remove what I already uploaded.

@scarlehoff
Copy link
Member Author

scarlehoff commented Oct 31, 2019

I imagine I can also create a fit called regression and upload the full fit and keep overwriting that one. Don't know.
To download things later I was planning on using loader.download_file btw.

@Zaharid
Copy link
Contributor

Zaharid commented Oct 31, 2019

We keep the existing test regression data in the git repo itself. The idea is that you compute it and you compare with the existing local file anywhere. I'd say that's doable for a fit if you don't look at things like the replica files, and would be preferred because it is simpler.

If not, there is a way to more or less transparently download a validphys output file.

def check_vp_output_file(self, filename, extra_paths=('.',)):

def download_vp_output_file(self, filename, **kwargs):

@scarlehoff
Copy link
Member Author

We keep the existing test regression data in the git repo itself.

You mean uploaded by the user? The problem is that I don't think the Max and Linux regression tests, for instance, will produce the same results. Or the user and Travis for that matter since we are depending on several random seeds.

@Zaharid
Copy link
Contributor

Zaharid commented Oct 31, 2019

Apparently there is something like that:

https://stackoverflow.com/a/57094699/1007990

While it is a surprisingly difficult problem (see e.g. #77) it seems to me that is is something that should have been considered rather early in the design (seems clear tf is not really for scientists). In any case you cannot claim to have any handling of the random seeds (as per the sphinx docs) if it turns out it's going to lead to different results across the board.

@scarlehoff
Copy link
Member Author

scarlehoff commented Oct 31, 2019

Can't manage to reproduce results between system 1 with gcc 9 pip-installed-tf and system 2 with gcc8 and conda-installed-tf (other than that the two systems should be equivalent).

I'll study a bit more. I don't know if I cannot use the tf seeding of that stackoverflow thread because I'm letting Keras manage the initialization or because I am using the v1 compatibility features (edit: or because I am doing something wrong which is ofc entirely possible)

If I don't manage I'll fallback to making travis upload the fit.

@scarlehoff
Copy link
Member Author

scarlehoff commented Nov 4, 2019

The test runs correctly in Linux and it does work in OS X with the KMP_DUPLICATE_LIB_OK flag. I don't know whether the error is in Travis' side or whether it is the run script (I don't have a Mac to test outside of Travis' debug environment).

Asking for community input: should I set the flag such that it works in OS X (the test is single threaded so the fact that two OMP libs are linked should make no difference) or should we try to find out whether the problem is in the build/run script so that this test also checks whether the OpenMP installation is correct?

edit: I'll push the change anyway so I can be sure it works without me touching anything...

Copy link
Contributor

@Zaharid Zaharid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the environment thing, which I think needs to be changed, this is good to go as far as I am concerned. The other things are minor but of course wouldn't hurt.

from numpy.testing import assert_almost_equal

log = logging.getLogger(__name__)
REGRESSION_FOLDER = pathlib.Path(__file__).with_name("regressions")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer using importlib.resources here, but don't mind it particularly.

THRESHOLD = 1e-6

# Helper functions
def generate_input_had(flavs=3, xsize=2, ndata=4, n_combinations=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would look really nice with hypothesis.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tried and it takes too long unless I put a minmax limit very strict which sort of defeats the purpose :(

@Zaharid
Copy link
Contributor

Zaharid commented Nov 6, 2019

Hello, this is @Zaharid's automated QA script. Please note it is highly experimental. I ran pylint on your changes and found some new issues.

On n3fit/src/n3fit/backends/keras_backend/internal_state.py, pylint has reported the following new issues:

  • Line 31: Module 'tensorflow._api.v1.random' has no 'set_seed' member; maybe 'get_seed'?

On n3fit/src/n3fit/fit.py, pylint has reported the following new issues:

  • Line 15: Too many statements (104/50)
  • Line 15: Too many branches (23/12)

On n3fit/src/n3fit/tests/test_fit.py, pylint has reported the following new issues:

  • Line 69: Using subprocess.run without explicitly set check is not recommended.

@Zaharid
Copy link
Contributor

Zaharid commented Nov 6, 2019

Pylint makes a good point that the test should fail if the process returns non zero exit status.

@Zaharid
Copy link
Contributor

Zaharid commented Nov 7, 2019 via email

@scarlehoff
Copy link
Member Author

env={'foo':'bar', **os.environ})

So I did understand it correctly. But ok, I'll pass the whole of os.environ to sp. I suppose is preferable in order to isolate the extra variable.

# The flag KMP_DUPLICATE_LIB_OK is necessary to avoid some errors
# related to the linking of OMP in travis when running under MacOS
sp.run(f"{EXE} {QUICKCARD} {REPLICA}", shell=True)
proc = sp.run(f"{EXE} {QUICKCARD} {REPLICA}", shell=True, env=new_environment, cwd = tmp_path)
assert proc.returncode == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just add check=True to subprocess.run.

@Zaharid
Copy link
Contributor

Zaharid commented Nov 7, 2019 via email

@scarlehoff
Copy link
Member Author

The recommendation is not to use shell=True

I know, I was trying to see whether it worked in Os X but Travis kicked me out :(

@scarlehoff
Copy link
Member Author

Ok, ready, I think I did not forget anything?

@Zaharid Zaharid merged commit d35489b into master Nov 7, 2019
@scarrazza scarrazza deleted the n3fit-unitTesting branch April 22, 2020 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers n3fit Issues and PRs related to n3fit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants