fix: numeric operations on numeric values only #381

RogerHYang · 2023-03-15T16:28:07Z

resolves #380

mikeldking · 2023-03-16T17:56:06Z

src/phoenix/metrics/metrics.py

@@ -27,7 +27,7 @@ def calc(self, df: pd.DataFrame) -> int:

 class Sum(UnaryOperator, BaseMetric):
    def calc(self, df: pd.DataFrame) -> float:
-        return cast(float, df.loc[:, self.operand].sum())
+        return cast(float, pd.to_numeric(df.loc[:, self.operand], errors="coerce").sum())


Slightly confused about this fix. Does this coerce strings to ascii based numbers or?

Yes, it does. It's a bit of trade-off in the sense that this is guaranteed to work, because the flip-side is that, at least for the time being, there's no sure-fire way--in pandas exclusively (without the help of Arrows or Parquet)--to determine if a column is numeric or not a priori. Below is one such example of an integer column.

>>> import pandas as pd >>> from pandas.api.types import is_numeric_dtype >>> s = pd.Series([1,2,None], dtype=object) >>> s 0 1 1 2 2 None dtype: object >>> s.max() 2 >>> is_numeric_dtype(s) False >>> pd.to_numeric(s) 0 1.0 1 2.0 2 NaN dtype: float64

mikeldking · 2023-03-16T17:58:27Z

src/phoenix/metrics/metrics.py



 class Max(UnaryOperator, BaseMetric):
    def calc(self, df: pd.DataFrame) -> float:
-        return cast(float, df.loc[:, self.operand].max())
+        return cast(float, pd.to_numeric(df.loc[:, self.operand], errors="coerce").max())


I don't want a crazy number of unit tests either but it might be good to start scaffolding a small set so that we just start testing the edge cases (numeric, string, vector, timestamp) just so that we have assurance about these cases going forward and so that the expected outputs are captured - e.g. the tests can act as a form of documentation.

Nothing really robust, just trying to get us in the right direction. Maybe you were planning on some of this as part of the benchmarking effort?

yea, determining column type in pandas is tricky because it's intentionally very lenient (unlike arrows or numpy). let me scope this out some more and will add more tests in the future

* feat: clean-up tooltip styles (#382) * feat: add links to app / docs on launch (#383) * v0.0.2rc5 * convert to numeric (#381) * feat: add descriptions to use-cases (#387) * feat: add descriptions to use-cases * fix ruff styling * Fix prettier * Remove readme * pr comments * pr comments * rewrap * v0.0.2rc6 * docs: add module level docstring (#391) * docs: add module level docstring * docs: add module level docstring * Update README.md --------- Co-authored-by: Roger Yang <80478925+RogerHYang@users.noreply.github.com>

* docs: No subject (GITBOOK-34) * docs: sync 3-16-2023 (#390) * feat: clean-up tooltip styles (#382) * feat: add links to app / docs on launch (#383) * v0.0.2rc5 * convert to numeric (#381) * feat: add descriptions to use-cases (#387) * feat: add descriptions to use-cases * fix ruff styling * Fix prettier * Remove readme * pr comments * pr comments * rewrap * v0.0.2rc6 * docs: add module level docstring (#391) * docs: add module level docstring * docs: add module level docstring * Update README.md --------- Co-authored-by: Roger Yang <80478925+RogerHYang@users.noreply.github.com> * docs: serve local image files (GITBOOK-33) * remove "target_score" column from how-to example * docs: define actuals in schema how-to (GITBOOK-35) * docs: unnest api section (GITBOOK-36) * docs: small note to define actuals in quickstart (GITBOOK-37) * docs: change embedding vector column name in schema how-to example (GITBOOK-38) * rearrange columns in text embedding example * docs: clarify language in quickstart (GITBOOK-39) * docs: No subject (GITBOOK-43) * docs: remove tutorial sub-pages and add notebooks sub-page (GITBOOK-40) * docs: add explanation to warning in how-to schema (GITBOOK-44) * docs: using example datasets how-to guide (GITBOOK-47) * docs: fix broken link on phoenix basics page (GITBOOK-48) * docs: remove session.view() from quickstart (GITBOOK-49) * docs: update default parameters to Session.view (GITBOOK-50) * docs: sentiment classification notebook link (GITBOOK-51) * docs: remove note (GITBOOK-52) * docs: added regression schema example (GITBOOK-53) * docs: added ranking model schema examples (GITBOOK-54) * docs: What is Phoenix section (GITBOOK-55) * docs: updated images (GITBOOK-56) * fix merge confict in main * chore: clear output of quickstart.ipnb --------- Co-authored-by: Jason Lopatecki <jason@arize.com> Co-authored-by: Roger Yang <80478925+RogerHYang@users.noreply.github.com> Co-authored-by: Xander Song <xsong@arize.com> Co-authored-by: Alexander Song <axiomofjoy@gmail.com> Co-authored-by: Amit Goren <amit@arize.com>

convert to numeric

799e912

mikeldking approved these changes Mar 16, 2023

View reviewed changes

RogerHYang merged commit be2e2d5 into main Mar 16, 2023

RogerHYang deleted the numeric-operations branch March 16, 2023 19:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: numeric operations on numeric values only #381

fix: numeric operations on numeric values only #381

RogerHYang commented Mar 15, 2023

mikeldking Mar 16, 2023

RogerHYang Mar 16, 2023 •

edited

Loading

mikeldking Mar 16, 2023

RogerHYang Mar 16, 2023

fix: numeric operations on numeric values only #381

fix: numeric operations on numeric values only #381

Conversation

RogerHYang commented Mar 15, 2023

mikeldking Mar 16, 2023

Choose a reason for hiding this comment

RogerHYang Mar 16, 2023 • edited Loading

Choose a reason for hiding this comment

mikeldking Mar 16, 2023

Choose a reason for hiding this comment

RogerHYang Mar 16, 2023

Choose a reason for hiding this comment

RogerHYang Mar 16, 2023 •

edited

Loading