-
Notifications
You must be signed in to change notification settings - Fork 13.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: parse pandas pivot null values #29898
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -29,6 +29,7 @@ | |
from io import StringIO | ||
from typing import Any, Optional, TYPE_CHECKING, Union | ||
|
||
import numpy as np | ||
import pandas as pd | ||
from flask_babel import gettext as __ | ||
|
||
|
@@ -83,10 +84,11 @@ def pivot_df( # pylint: disable=too-many-locals, too-many-arguments, too-many-s | |
else: | ||
axis = {"columns": 1, "rows": 0} | ||
|
||
# pivoting with null values will create an empty df | ||
df = df.fillna("SUPERSET_PANDAS_NAN") | ||
|
||
# pivot data; we'll compute totals and subtotals later | ||
if rows or columns: | ||
# pivoting with null values will create an empty df | ||
df = df.fillna("NULL") | ||
df = df.pivot_table( | ||
index=rows, | ||
columns=columns, | ||
|
@@ -151,6 +153,18 @@ def pivot_df( # pylint: disable=too-many-locals, too-many-arguments, too-many-s | |
# add subtotal for each group and overall total; we start from the | ||
# overall group, and iterate deeper into subgroups | ||
groups = df.columns | ||
if not apply_metrics_on_rows: | ||
for col in df.columns: | ||
# we need to replace the temporary placeholder with either a string | ||
# or np.nan, depending on the column type so that they can sum correctly | ||
if pd.api.types.is_numeric_dtype(df[col]): | ||
df[col].replace("SUPERSET_PANDAS_NAN", np.nan, inplace=True) | ||
else: | ||
df[col].replace("SUPERSET_PANDAS_NAN", "nan", inplace=True) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I chose the string "nan" here because that is the default behavior when there is a null value when pivoting without sums. |
||
else: | ||
# when we applied metrics on rows, we switched the columns and rows | ||
# so checking column type doesn't apply. Replace everything with np.nan | ||
df.replace("SUPERSET_PANDAS_NAN", np.nan, inplace=True) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @betodealmeida we need this section here (from line 156) when totaling so that we 1) can sum with numbers (by converting the string "SUPERSET_PANDAS_NAN" with np.nan or 2) can sum with a string value. I'm using "nan" so that we don't print "SUPERSET_PANDAS_NAN". |
||
for level in range(df.columns.nlevels): | ||
subgroups = {group[:level] for group in groups} | ||
for subgroup in subgroups: | ||
|
@@ -171,7 +185,7 @@ def pivot_df( # pylint: disable=too-many-locals, too-many-arguments, too-many-s | |
for subgroup in subgroups: | ||
slice_ = df.index.get_loc(subgroup) | ||
subtotal = pivot_v2_aggfunc_map[aggfunc]( | ||
df.iloc[slice_, :].apply(pd.to_numeric), axis=0 | ||
df.iloc[slice_, :].apply(pd.to_numeric, errors="coerce"), axis=0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i think this might be the only change needed to deal with this issue: #27499 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. pandas is so opaque, especially when you haven't touched it for years - There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it's |
||
) | ||
depth = df.index.nlevels - len(subgroup) - 1 | ||
total = metric_name if level == 0 else __("Subtotal") | ||
|
@@ -186,6 +200,14 @@ def pivot_df( # pylint: disable=too-many-locals, too-many-arguments, too-many-s | |
if apply_metrics_on_rows: | ||
df = df.T | ||
|
||
# replace the remaining temporary placeholder string for np.nan after pivoting | ||
df.replace("SUPERSET_PANDAS_NAN", np.nan, inplace=True) | ||
df.rename( | ||
index={"SUPERSET_PANDAS_NAN": np.nan}, | ||
columns={"SUPERSET_PANDAS_NAN": np.nan}, | ||
inplace=True, | ||
) | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Converting the values back so that we don't print "SUPERSET_PANDAS_NAN" |
||
return df | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmmh, seems the frontend should be doing this ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mistercrunch I'm not sure I got this..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mistercrunch the problem here is that we do the pivot in Pandas (for reports and CSV download), and it will fail if the dataframe has
NaN
s.