Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LinearModel: precise output type of predict() fnc #1479

Merged
merged 5 commits into from
Dec 13, 2024

Conversation

EvaJanouskova
Copy link
Collaborator

@EvaJanouskova EvaJanouskova commented Oct 7, 2024

I'm getting an error, because the predict() fnc does in some cases return numpy.bool instead of pd.Series.

(I've added assert in here to point it out.) Could we change the function to return pd.Series in all cases as declared when the function defined?

@tbhallett
Copy link
Collaborator

From the doctring, I think you can guanratee return of series by setting 'squeeze_single_row_output' to False.
If this doesn't solve the question, then please raise an issue with a demonstration of what you're doing and the behaviour you expect.
Thanks

@matt-graham
Copy link
Collaborator

Hi @EvaJanouskova - as @tbhallett said I think the squeeze_single_row_output argument here should allow you to override the default behaviour of the predict method to consistently give a pandas.Series output. As various parts of the code rely on the existing behaviour of treating single row outputs differently from multi row outputs, globally changing to always return a pandas.Series would be quite disruptive.

@EvaJanouskova
Copy link
Collaborator Author

Thank you, @tbhallett and @matt-graham, it is working.

Still shouldn't the declaration reflect that it may sometimes return different type? Smt like -> Unit[pd.Series, np.bool]?

@EvaJanouskova
Copy link
Collaborator Author

EvaJanouskova commented Oct 8, 2024

And if rng attribute not set to a NumPy RandomState instance, it will also return model output directly. So I would say -> Unit[pd.Series, np.bool, float]:?

@matt-graham
Copy link
Collaborator

@EvaJanouskova you're right that the current type annotation is a bit misleading (though this won't have any functional effect) - I think pd.Series | np.bool_ (or equivalently Union[pd.Series, np.bool_] would be more accurate. I don't think it's possible to get out a scalar float as the squeeze_single_row_output=True argument only has an effect if rng is not None in which case a boolean outcome is returned.

@EvaJanouskova
Copy link
Collaborator Author

@matt-graham, What I meant is that it returns float if rng is None. Isn't it so?

@matt-graham
Copy link
Collaborator

@matt-graham, What I meant is that it returns float if rng is None. Isn't it so?

If rng is None then the method will return a pandas.Series object irrespective of the value of squeeze_single_row_output:

TLOmodel/src/tlo/lm.py

Lines 454 to 463 in 8d0cfee

if rng:
outcome = rng.random_sample(len(result)) < result
# pop the boolean out of the series if we have a single row,
# otherwise return the series
if len(outcome) == 1 and squeeze_single_row_output:
return outcome.iloc[0]
else:
return outcome
else:
return result

The dtype of the panda.Series object will be a floating point type in this case but the data will be wrapped in a series even if there is only one value

@EvaJanouskova
Copy link
Collaborator Author

what about in this case:

TLOmodel/src/tlo/lm.py

Lines 439 to 442 in 8d0cfee

# Ensure result of floating point type even if all predictor coefficients
# are integer but intercept is floating point
if isinstance(self.intercept, float) and result.dtype == int:
result = result.astype(float)

@matt-graham
Copy link
Collaborator

what about in this case:

TLOmodel/src/tlo/lm.py

Lines 439 to 442 in 8d0cfee

# Ensure result of floating point type even if all predictor coefficients
# are integer but intercept is floating point
if isinstance(self.intercept, float) and result.dtype == int:
result = result.astype(float)

The result object there is still a pandas.Series - the Series.astype method returns another series with the dtype changed to the specified datatype (or potentially the same series if the specified datatype is the same as what is currently set to).

@EvaJanouskova
Copy link
Collaborator Author

Ah, yeah. I can see it now, thank you @matt-graham.

@EvaJanouskova EvaJanouskova changed the title LinearModel: predict() does not always return pd.Series LinearModel: precise output type of predict() fnc Oct 8, 2024
@tamuri tamuri merged commit 1a36962 into master Dec 13, 2024
62 checks passed
@tamuri tamuri deleted the EvaJ/master/lm_predict branch December 13, 2024 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants