Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for calmar on number of positive value #324

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

sylvainipp
Copy link
Contributor

@sylvainipp sylvainipp commented Dec 4, 2024

New features

  • Introduce the possibility to use calmar on variables changed with expressions starting with a space, with a target associated, for instance 'wage > 0' to have the number of positive wage in population, or '(wage > 0) * (pension > 0)' to have the number of person with both a positive wage and a positive amount of pension.
    • Update test

CHANGELOG.md Outdated

# 3.0.1 [#322](https://github.com/openfisca/openfisca-survey-manager/pull/322)
* New feature
- Introduce the possibility to use calmar on variables with a suffix "_number", with target the weighted sum of the number of positive value of the variable. Only the variable without "_number" should be in the tax benefit system.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't catch this change.
I would use another suffix like _positive.
But the best solution is to use an expression like 'X > 0'. And check the type of the result of the variable or the expression. If it is a boolean result than target on count otherwise on mass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is for instance to calibrate on the number of wages. Do you mean we could use ' > 0' as suffix, or something else ? The problem I was trying to tackle is to get back the original variable to use on it the adaptative_calculate_variable when calibrating on a number of entities with a positive value.

Copy link
Member

@benjello benjello Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are at least two problems.

  • the _number suffix is IMHO inappropriate (at least use _positive_count or _strictly_positive_count)
  • dealing only with strictly positive value could be easily generalized, so it would be nice to have it

You can use any expression like 'wage > 0' as a key for the margin and do the computation (just replace wage with self.simulationadaptative_calculate_variable(wage, period = period) and evaluate. You can then use any suffix you like if it is linked to the expression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I understand better. I try to implement it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen your changes. This is better. But can't we deal with more general expressions ?
If you are in a hurry, we can leave it as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of expressions do you have in mind ? Here we can already deal with inequalities and arithmetic changes, the only constraint being that it starts with a name of a variable (we can do something like : 'var ** 2 * (var>0)' for example). I will be happy to generalize it, but I don't see in which direction this could be useful. Maybe using several variables at once, but in this case it is more natural to create a new variable in my opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagine you have a lot of targets you want to satisfy with as many expressions for example conditionnaly on satisfying some boolean expressions (male average salary, women averay salary, age dependency, etc).
It may be very tiedous to create new variables.
And you can use numexpr to help you with that.
But again, if you do not need it now, it is okay to leave it as is.

Copy link
Contributor Author

@sylvainipp sylvainipp Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to generalize by allowing for several variables of the same entity in the expression. The code doesn't allow text content outside of openfisca variable names (no sum for instance), but I think it is already quite better (one can calibrate on (sex == 1) * salary and (sex == 1) * (salary > 0), even if not directly on np.mean((sex == 1) * salary) or (sex == 1) * salary * weights/ sum(weights * (salary == 1)). Thanks for your advice ! I will still search a bit but I'm not sure I will be able to separate correctly the openfisca variable from code words and to keep a good level of error handling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that OK for you @benjello, or is it preferable to allow for text outside variables (for instance by searching explicitely if the string is in the list of the tax and benefit system variables) ? I think it might be error-prone (and make the simulation longer), but if they are some use-case it might be worth it. Thanks again for the improvement you propose !

@sylvainipp sylvainipp requested a review from benjello December 5, 2024 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants