Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements, and associated API changes #138

Merged
merged 58 commits into from
May 10, 2021

Conversation

davidhassell
Copy link
Contributor

Fixes #130
Fixes #132
Fixes #137

@davidhassell davidhassell marked this pull request as draft April 23, 2021 07:25
@davidhassell davidhassell marked this pull request as ready for review April 23, 2021 07:25
@davidhassell davidhassell marked this pull request as draft April 23, 2021 07:35
@davidhassell davidhassell marked this pull request as ready for review April 23, 2021 07:36
Copy link
Member

@sadielbartholomew sadielbartholomew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic! Users should be very pleased with the performance gains. Great work.

I'm happy with the API changes proposed in the assocaited issues and satisfied that they are implemented successfully in this PR.

Regarding the code, I have made a few minor comments as provided in-line.

For your reference, below I have outlined the results of some speed tests I conducted to gauge the performance gains. All tests go faster, though there wasn't nearly as much of a relative speed up as you have reported, especially in the latter case. Do you know why that might be?

Observed performance improvements

Test suite

As a quick indication for the speed, as reported in the pytest message "Ran tests in s" rather than by using a time command), of running the full test suite in my local environment, across five runs on each branch I saw:

  • current master: "Ran 171 tests in ..." between 30.2 and 32.3 s;
  • this branch: "Ran 195 tests in ..." between 27.7 and 29.6 s.

So the suite is running faster despite having more tests to get through, which is great.

Snippet from (end of) #130 (comment)

Running that snippet I see:

  • current master: ~0.021 (0.020565794229769382)
  • this branch: ~0.017 (0.016937515400059056)
    which is faster after this PR but not by nearly as much as suggested when you did the same test?

cfdm/constructs.py Outdated Show resolved Hide resolved
cfdm/constructs.py Outdated Show resolved Hide resolved
cfdm/constructs.py Outdated Show resolved Hide resolved
Copy link
Member

@sadielbartholomew sadielbartholomew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just registering a few more minor comments to complete my review started above. Thanks.

cfdm/mixin/container.py Outdated Show resolved Hide resolved
* If ignore_type=False and the LHS operand is not of the same type, or
a squblcass of, the RHS operand then return False
* If ignore_type=False and the LHS operand is not of the same
type, or a squblcass of, the RHS operand then return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type, or a squblcass of, the RHS operand then return False
type, or a subclass of, the RHS operand then return False

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in next commit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite, it looks like! We can always sort it later for the pre-release checks, though, if you prefer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh - I see! How about now?

cfdm/mixin/fielddomain.py Outdated Show resolved Hide resolved
cfdm/mixin/fielddomain.py Outdated Show resolved Hide resolved
cfdm/mixin/fielddomain.py Show resolved Hide resolved
cfdm/mixin/fielddomain.py Outdated Show resolved Hide resolved
@davidhassell
Copy link
Contributor Author

davidhassell commented May 10, 2021

Re. Snippet from (end of) #130 (comment)

I still get a factor of 10 speed up:

In [1]: import cfdm, timeit

In [2]: f = cfdm.example_field(1)

In [3]: cfdm.write(f, 'tmp.nc')

In [4]: assert cfdm.__version__ == '1.8.9.0'

In [5]: sum(timeit.repeat("cfdm.read('tmp.nc')", globals=globals(), repeat=100, number=1))/100
Out[5]: 0.02655579654999656
In [1]: import cfdm, timeit

In [2]: assert cfdm.__version__ == '1.8.8.0'

In [3]: sum(timeit.repeat("cfdm.read('tmp.nc')", globals=globals(), repeat=100, number=1))/100
Out[3]: 0.2563364723699988

???

@sadielbartholomew
Copy link
Member

I still get a factor of 10 speed up

🤔 Ah, wait, sorry @davidhassell I was about to ask which master state you are using to compare to and but I see from the versioning outputs it is instead probably the tagged 1.8.8.0 release, in which case that could be the root of it because I was comparing against the current master. Still, I think my test is the fairer one for this PR, as testing against 1.8.8.0, though you acknowledge that it I think:

Note that much of the improvement comes from unrelated changes (such as removing unnecessary repr calls and unnecessary deep copies).

My bad, let me re-do the test comparing against 1.8.8.0, though it is noteworthy that we're only seeing factor ~1.25 speed up from this PR, at least in my environment. Still, that's good in itself!

@sadielbartholomew
Copy link
Member

sadielbartholomew commented May 10, 2021

OK when attempting comparison with the v1.8.8.0 release branch, I keep seeing a seg fault ☹️, even if I reduce the repeat parameter to 2 and also 1!:

>>> import cfdm, timeit
>>> f = cfdm.example_field(1)
>>> cfdm.write(f, 'tmp.nc')
>>> assert cfdm.__version__ == '1.8.8.0'
>>> sum(timeit.repeat("cfdm.read('tmp.nc')", globals=globals(), repeat=100, number=1))/100
Segmentation fault (core dumped)

which isn't useful but potentially illuminating in the seg faulting saga. I'm not sure I've seen one outside of the test suite before. Luckily I didn't get any such seg faults on the current master and/or this branch, so that could imply something has been fixed in a dependency else in our library...

Either way, because of that I can't do the comparison test with 1.8.8.0. Let me try an earlier version, one moment...

@sadielbartholomew
Copy link
Member

sadielbartholomew commented May 10, 2021

Every release branch I try below 1.8.8.0 is seg faulting so let's assume there is some issue with my environment there and that I can't do the comparison test. I'm happy to take your factor of 10 comparison from v1.8.8.0 to master after this is merged as representative.

@sadielbartholomew
Copy link
Member

All feedback except for one typo (which can be fixed later if you like) addressed well now in the feedback response commit, thanks. So I think we are ready to merge?

@davidhassell
Copy link
Contributor Author

Great - thanks @sadielbartholomew, as ever, for the careful review. I agree that my various PRs relating to this are not that clean (sorry!), but I think we've got there.

I'll merge when I've heard back from you on that typo (0a60a47), and done a final run of the unit tests (and checked that the upstream cf-python tests still pass).

I'm looking forward to reviewing the append mode PR later.

@sadielbartholomew
Copy link
Member

All good, thanks @davidhassell. Please merge away!

@davidhassell
Copy link
Contributor Author

Great - will do!

@davidhassell davidhassell merged commit 4120dcf into NCAS-CMS:master May 10, 2021
@davidhassell davidhassell changed the title Performance improvement, and associate API changes Performance improvements, and associated API changes May 10, 2021
@davidhassell davidhassell added this to the 1.8.9.0 milestone May 10, 2021
@davidhassell davidhassell deleted the performance2 branch November 14, 2022 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants