Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add core population #1279

Merged
merged 10 commits into from
Oct 17, 2024
Merged

feat: add core population #1279

merged 10 commits into from
Oct 17, 2024

Conversation

bonjourmauko
Copy link
Member

Depends on #1274
Depended upon by #1238

New features

  • Introduce populations.CorePopulation
    • Allows for testing and better subclassing custom populations

@bonjourmauko bonjourmauko added the kind:feat A feature request, a feature deprecation label Oct 14, 2024
@bonjourmauko bonjourmauko requested review from a team October 14, 2024 17:00
@bonjourmauko bonjourmauko self-assigned this Oct 14, 2024
@bonjourmauko bonjourmauko force-pushed the docs/add-docs-to-experimental branch from 0da8886 to 0e10d12 Compare October 16, 2024 15:26
@bonjourmauko bonjourmauko requested a review from benjello October 16, 2024 15:27
@bonjourmauko bonjourmauko force-pushed the feat/add-core-population branch 2 times, most recently from 1d53eeb to 20887aa Compare October 16, 2024 15:30
entity: t.CoreEntity

#: ???
ids: Sequence[str] = []
Copy link
Member Author

@bonjourmauko bonjourmauko Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjello What are the ids here? household, or Juan?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be the ids of the different households o the different people for the atomic/person entity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a string, like household_1, person_2, and so on. Attributes that only exist in the context of a simulation.

Copy link
Member

@benjello benjello Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say so. Bit O am not the person who refactored that part. @Morendil is your manI

Copy link
Member Author

@bonjourmauko bonjourmauko Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense, in a way, although a bit confusing, that there is no data class to represent an invidivual member or entry of a population, as these are finally just the value at index n on the corresponding variable vectors.

So, say we have tax_1 and tax_2, on a SinglePopulation with count = 3 of entity Person, behind the scenes, we should have two Holder for a given period:

holder_tax_1 = numpy.array([100, 200, 300])
holder_tax_2 = numpy.array([300, 200, 100])
ids = ["Mauko", "Mahdi", "Sandra"]

How much of tax_1 have we got for "Mauko"?

tax_1_mauko = holder_tax_1[[ids.index("Mauko")]] # [100]

Now we group "Mauko" and "Mahdi" in a GroupEntity, Foundation.

How much of tax_1 have we got for "OpenFisca Foundation"? Fancy-indexing:

tax_1_foundation = sum(holder_tax_1[[ids.index("Mauko"), ids.index("Mahdi")]]) # [300]

So, this attribute id of each entity entry in a simulation, is sort of ephemeral. There is no class modeling it, and they exist internally only as reference records to get values out of holders.

Is that how it works, @Morendil ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjello Why can't we model a population as a structured array or a dataframe? I mean, it looks like it is more or less the way it works now. As long as you have such indices, then you can link columns the way you like.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that pandas and polars are great for this kind of use.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A challenge there are the periods. It would be like a huge 3D sparse matrix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A dataframe per period would be ok.
Or adding the period to the index: every value would be indentified by an index (period, id) for a column variable in the corresponding entity dataframe.

I am concerned about performance, but we may experiment with different backend.

Historically, we stuck to numpy because it was the more stable package at the time. Pandas was in its infancy. Polars didn't exist yet.

Copy link
Member Author

@bonjourmauko bonjourmauko Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example:

import numpy as np

dtype = [
    ("people", "U10"),
    ("variables", [("tax", np.int32), ("pension", bool), ("benefit", np.uint8)]),
]

array = np.array(
    [("juan", (100, True, 1)), ("jean", (200, False, 2)), ("john", (300, True, 3))],
    dtype=dtype,
)

print(array["people"])
# ['juan' 'jean' 'john']

print(array["variables"])
# [(100,  True, 1) (200, False, 2) (300,  True, 3)]

print(array["variables"]["tax"])
# [100 200 300]

print(array[array['people'] == 'juan']['variables']['tax'])
# [100]

# operation only in sub-vector
array["variables"]["tax"] *= 2

# it didn't copy data
assert array["variables"]["tax"].base is not None

Base automatically changed from docs/add-docs-to-experimental to master October 17, 2024 12:31
@bonjourmauko bonjourmauko force-pushed the feat/add-core-population branch from 20887aa to 6b12b80 Compare October 17, 2024 12:37
@coveralls
Copy link

Coverage Status

coverage: 82.273% (+0.006%) from 82.267%
when pulling 6b12b80 on feat/add-core-population
into 877d87f on master.

1 similar comment
@coveralls
Copy link

Coverage Status

coverage: 82.273% (+0.006%) from 82.267%
when pulling 6b12b80 on feat/add-core-population
into 877d87f on master.

@bonjourmauko bonjourmauko merged commit 5a131cf into master Oct 17, 2024
29 checks passed
@bonjourmauko bonjourmauko deleted the feat/add-core-population branch October 17, 2024 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:feat A feature request, a feature deprecation
Projects
Development

Successfully merging this pull request may close these issues.

4 participants