Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse CSV output for all parameter values read outside read parameters method #1541

Merged
merged 7 commits into from
Dec 11, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions src/tlo/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -531,3 +531,40 @@ def clean_dataframe(dataframes_dict: dict[str, DataFrame]) -> None:
# return a dictionary if return_dict flag is set to True else return a dataframe
return all_data if return_dict else next(iter(all_data.values()))


def parse_csv_values_for_columns_with_mixed_datatypes(value: Any):
""" Pandas :py:func:`pandas.read_csv` function handles columns with mixed data types by defaulting to the object
data type, which often results in values being interpreted as strings. The most common place for this in TLO is
when we are reading parameters. This is not a problem when the parameters are read in read parameters method
using load_parameters_from_dataframe method as parameter values are mapped to their defined datatypes.

Problems arise when you're trying to directly use the output from the csv files like it is within some few files
in TLO. This method tries to provide a fix by parsing the parameter values in those few places to their best
possible data types

:param value: mixed datatype column value
"""
# if value is not a string then return value
if type(value) != str:
return value

value = value.strip() # Remove leading/trailing whitespace
# It is important to catch booleans early to avoid int(value) which will convert them into an interger value
# 0(False) or 1(True)
if value.lower() in ['true', 'false']:
return value.lower() == 'true'

try:
return int(value) # try converting the value to an interger, throw excepetion otherwise
except ValueError:
try:
return float(value) # try converting the value to a float, throw excepetion otherwise
except ValueError:
# Check if it's a list using `ast.literal_eval`
try:
parsed = ast.literal_eval(value)
if isinstance(parsed, list):
return parsed
except (ValueError, SyntaxError):
pass
return value # Return as a string if no other type fits
Loading