implement pyspark numeric operations to pass all/test_numeric.py #9

hjoo · 2019-08-02T15:30:33Z

This PR passes all numeric tests in ibis/tests/all/test_numeric.py except for test_divide_by_zero which is skipped for the PySpark backend (PySpark does not support col / 0 = inf).

Changes made:

Implemented the following operations: least, abs, round, ceil, floor, exp, sign, sqrt, log, ln, log2, log10, modulus, negate, add, divide, floor divide, isnan, isinf.
Implemented column expression execution for PySpark (needed to pass some of the column tests in test_numeric.py)
Fixed selection operation in the PySpark compiler.
Fixed a few lint errors.

…c.py

icexelloss · 2019-08-05T21:24:06Z

ibis/pyspark/client.py

+        elif isinstance(expr, types.ColumnExpr):
+            # expression must be named for the projection
+            expr = expr.name('tmp')
+            return self.compile(expr.to_projection()).toPandas()['tmp']


What does to_projection() do?

Turns the column expression into a table projection so self.compile() returns a data frame.

icexelloss · 2019-08-06T14:38:41Z

ibis/pyspark/client.py

+                # attach result column to a fake DataFrame and
+                # select the result
+                compiled = self._session.range(0, 1) \
+                    .withColumn("result", compiled) \


I think you can just do

self._session.range(0, 1).select(compiled)

Yep that works.

icexelloss · 2019-08-06T14:41:38Z

ibis/pyspark/compiler.py

+    scale = op.digits.op().value if op.digits is not None else 0
+    rounded = F.round(src_column, scale=scale)
+    if scale == 0:
+        rounded = rounded.astype('long')


Why do we need to cast type here?

Expected result is np.int64 when decimals or scale is 0. https://github.com/ibis-project/ibis/blob/master/ibis/tests/backends.py#L34

icexelloss · 2019-08-06T14:48:27Z

ibis/pyspark/compiler.py

+    import numpy as np
+    op = expr.op()
+
+    @pandas_udf('double', PandasUDFType.SCALAR)


Can we rewrite this using sql operations? e.g

F.when(src_column == 0, lit(0)).otherwise(F.when(src_column > 0, lit(1)).otherwise(-1))

* implement pyspark compiler numeric operations to pass all/test_numeric.py

This is a Pyspark backend for ibis. This is different from the spark backend where the ibis expr is compiled to SQL string. Instead, the pyspark backend compiles the ibis expr to pyspark.DataFrame exprs. Author: Li Jin <ice.xelloss@gmail.com> Author: Hyonjee Joo <5000208+hjoo@users.noreply.github.com> Closes ibis-project#1913 from icexelloss/pyspark-backend-prototype and squashes the following commits: 213e371 [Li Jin] Add pyspark/__init__.py 8f1c35e [Li Jin] Address comments f173425 [Li Jin] Fix tests 0969b0a [Li Jin] Skip unimplemented tests 1f9409b [Li Jin] Change pyspark imports to optional 26b041c [Li Jin] Add importskip 108ccd8 [Li Jin] Add scope e00dc00 [Li Jin] Address PR comments 4764a4e [Li Jin] Add pyspark marker to setup.cfg 7cc2a9e [Li Jin] Remove dead code 72b45f8 [Li Jin] Fix rebase errors 9ad663f [Hyonjee Joo] implement pyspark numeric operations to pass all/test_numeric.py (#9) 675a89f [Li Jin] Implement compiler rules to pass all/test_aggregation.py 215c0d9 [Li Jin] Link existing tests with PySpark backend (#7) 88705fe [Li Jin] Implement basic join c4a2b79 [Hyonjee Joo] add pyspark compile rule for greatest, fix bug with selection (#4) fa4ad23 [Li Jin] Implement basic aggregation, group_by and window (#3) 54c2f2d [Li Jin] Initial commit of pyspark DataFrame backend (#1)

implement pyspark compiler numeric operations to pass all/test_numeri…

ad9dac4

…c.py

icexelloss reviewed Aug 5, 2019

View reviewed changes

icexelloss reviewed Aug 6, 2019

View reviewed changes

small refactor of scalar expr compile and sign operation

2e78ee9

icexelloss merged commit baa54c4 into icexelloss:pyspark-backend-prototype Aug 7, 2019

icexelloss pushed a commit that referenced this pull request Aug 7, 2019

implement pyspark numeric operations to pass all/test_numeric.py (#9)

10d7431

* implement pyspark compiler numeric operations to pass all/test_numeric.py

icexelloss pushed a commit that referenced this pull request Aug 13, 2019

implement pyspark numeric operations to pass all/test_numeric.py (#9)

b85ea55

* implement pyspark compiler numeric operations to pass all/test_numeric.py

icexelloss pushed a commit that referenced this pull request Aug 15, 2019

implement pyspark numeric operations to pass all/test_numeric.py (#9)

fb7328c

* implement pyspark compiler numeric operations to pass all/test_numeric.py

icexelloss pushed a commit that referenced this pull request Aug 15, 2019

implement pyspark numeric operations to pass all/test_numeric.py (#9)

cca3ad8

* implement pyspark compiler numeric operations to pass all/test_numeric.py

icexelloss pushed a commit that referenced this pull request Aug 22, 2019

implement pyspark numeric operations to pass all/test_numeric.py (#9)

9ad663f

* implement pyspark compiler numeric operations to pass all/test_numeric.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement pyspark numeric operations to pass all/test_numeric.py #9

implement pyspark numeric operations to pass all/test_numeric.py #9

hjoo commented Aug 2, 2019

icexelloss Aug 5, 2019

hjoo Aug 5, 2019

icexelloss Aug 6, 2019

hjoo Aug 7, 2019

icexelloss Aug 6, 2019

hjoo Aug 7, 2019

icexelloss Aug 6, 2019

hjoo Aug 7, 2019

implement pyspark numeric operations to pass all/test_numeric.py #9

implement pyspark numeric operations to pass all/test_numeric.py #9

Conversation

hjoo commented Aug 2, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment