Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Population.freq and Population.size are returning the weighted hh count not person count #89

Open
fredshone opened this issue Jun 18, 2021 · 3 comments
Assignees
Labels
good first issue Good for newcomers

Comments

@fredshone
Copy link
Collaborator

No description provided.

@fredshone fredshone added the good first issue Good for newcomers label Jun 18, 2021
@JosePazNoguera JosePazNoguera self-assigned this Oct 26, 2021
@brynpickering
Copy link
Contributor

To help me understand freq some more (I have already read the relevant section in the docs), I have a couple of questions:

  1. If we have a population of 100 people distributed across 40 households, would we generally set up a model such that sum(hh.freq for hh in population.households.values()) = 40 and sum(person.freq for hh in population.households.values() for person in hh.people.values()) = 100?

  2. If we have two Person objects in one household, one with a freq of 2 and another with a freq of 5, does that imply 7 people live in this household type, represented by two agents?

  3. If we then have that household having a freq of 10, does that imply 70 people of our population are accounted for in that household?

Finally, why is it freq? Why not weight or size? (the latter I can understand, since size has an existing meaning for objects in Python).

@fredshone
Copy link
Collaborator Author

  1. If we have a population of 100 people distributed across 40 households, would we generally set up a model such that sum(hh.freq for hh in population.households.values()) = 40 and sum(person.freq for hh in population.households.values() for person in hh.people.values()) = 100?

Nice questions. Freq or weight is intended to represent the quantity of an entity that would be found in a (representative) population. It is used to sample individual agents in a representative way.

Back to basics: we represent a real population of N with a subset of agents in pam. The the freq of all hhs should add up to N and the freq of all persons should add up to N and the freq of all trips should add up to N.

BUT we historically found trip, person and household "weights" to be inconsistent in input data sets, within households and between a hh and persons within, within a persons trips and so on. This is because we were using data intended for use in (for example) a trip based model, where there was no requirement for consistency at higher levels.

A user might fix this themselves before sampling a MATSim population (all agents in MATSim have the same weight*). Or more commonly (now), weights are calculated at household level from our own population synthesis process and weighting should be consistent. Or more simply only specified at hh level.

The assumption is that any weighted sampler will use the weight from whatever it samples. So a "hh sampler" would use the household weighting. There are mechanics in pam that will use a weighted average of person weights for a hh, if the hh has no weight. This was convenient in past but is perhaps now redundant. I would be open to removing it.

I am happy to call either frequency or weight. We made heavy use of "observed frequency based sampling" in past, hence freq. I think size would be a bit misleading (for a hh I would expect it to be the number of occupants).

  • earlier i said "all agents in MATSim have the same weight". For example in a 10% sample of the population, all agents would have weight of 10. This could be relaxed in future for a good reason. But not sure what that would be.

@fredshone
Copy link
Collaborator Author

Seems .size should be reserved for "unweighted" counts and either .freq or .weight for "weighted"?

So propose that:

  • population.size be changed to be count of hhs (rather than weighted)
  • keep .freq as "weighted" sum

I would also be open to being more explicit about what is being counted, eg:

  • add Population.persons_freq()
  • add Population.hhs_freq()
  • add Household.persons_freq()
  • remove Population.freq
  • remove Household.freq()
  • remove `Leg.freq

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants