Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lazy PMap reader #917

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

gonzaponte
Copy link
Collaborator

Cities that read PMaps consume a large amount of RAM because all events are kept in memory simultaneously. This PR adds new functions to read the PMaps on an event-by-event basis, in a lazy manner. The old readers have been renamed using the suffix _eager for clarity, while the new ones contain the suffix _lazy. The names of the old readers have been reused in the interfacing functions that allow the user to switch between the lazy and the eager readers. The readers' default to the eager mode for backward compatibility. However, the cities will read them lazily to avoid large memory allocations.

This PR is currently being tested in production mode.

@gonzaponte gonzaponte changed the title Add lazy pmap reader Add lazy PMap reader Nov 6, 2024
@gonzaponte gonzaponte marked this pull request as ready for review November 13, 2024 15:57
Copy link
Collaborator

@jwaiton jwaiton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've read through the PR, very interesting work! My comments at the moment are mostly cosmetic/about tests.

I have read all the new lazy functions, but my knowledge of generators is a bit murky so I need to take more time to understand exactly how they work here in the pmap loader.


assert isinstance(lazy, Generator)
element = next(lazy)
assert len(element ) == 5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this spacing after element is the way it is? It doesn't appear to align with anything

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a leftover from a previous version. Will fix.

invisible_cities/io/pmaps_io_test.py Show resolved Hide resolved
assert_PMap_equality(pmap_lazy, pmaps_eager[evt])


def test_load_pmaps_eager_lazy(KrMC_pmaps_filename):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test appears to be ensuring that the content from load_pmaps is as expected (type and length-wise), but the test name is a big vague in that sense. Especially since the previous test (test_load_pmaps_lazy) is almost the same name, but accomplishes a different task. Maybe this should be changes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm struggling to find a good name. Do you have any suggestions?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess test_load_pmaps_returns_correct_format or ...correct_type_and_shape? The lazy and eager are both tested so I don't think they need to be specified distinctly.

@@ -203,7 +233,7 @@ def test_load_pmaps_as_df_without_ipmt(KrMC_pmaps_without_ipmt_filename, KrMC_pm


@given(stg.data())
def test_load_pmaps(output_tmpdir, data):
def test_load_pmaps_eager(output_tmpdir, data):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise in naming convention for this test, its a bit vague since it matches the other related tests in naming convention but differs in content.

It appears to be writing a pmap, then reading it in and comparing it to itself 'pre-writing'. Is that correct? In that case, is there a reason that the pmap is generated here explicitly but in the other tests KrMC_pmaps_filename is used?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I didn't read the test, actually, I just renamed it and refactored it to make it work with the new version. But I don't see a reason why it shouldn't follow the same logic as the other ones, so I will change it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that some of other tests are wrong, as they compare the output of load_pmaps_as_df_eager with the output of a fixture, which only reads pmaps using that very function. So it's testing nothing essentially. I will refactor this test to make it simpler (but keep the logic) and fix the other ones.

Verified

This commit was signed with the committer’s verified signature.
... and related tests
Now input file must have the /Run/events node in order to check for
the file integrity. We fix the test by adding this table.
This now includes an explicit pmap coded in a fixture instead of using
the composite implemented in a different file. Hopefully this makes
the test more clear.
This test tested nothing. Now at least it makes sense.
This might come back in the future, but it will have to be refactored
anyway.

Moreover, this test was not doing anything sensible. so it will be removed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants