Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding new backend for MapD #1419

Closed
wants to merge 161 commits into from
Closed

Conversation

xmnlab
Copy link
Contributor

@xmnlab xmnlab commented Apr 13, 2018

Also resolves #1418 and resolves #893

xmnlab added 26 commits April 5, 2018 15:23
Added mapd backend initial files.
Improved mapd client and compiler; Added initial documentation.
README updated; Initial changes to use execute method.
Improving ibis.mapd client and compiler
Added Math, trigonometric and geometric operations
Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xmnlab! This is solid progress. Let's do a few more review cycles before we merge this in and try to clean up a bit of the duplication. Overall, though, this is pretty close.

ibis/expr/api.py Outdated
@@ -388,6 +389,8 @@ def row_number():

e = ops.E().to_expr()

pi = ops.Pi().to_expr().name('pi')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave this unnamed for now. No reason to make this choice for users.

ibis/expr/api.py Outdated
acos = _unary_op('acos', ops.Acos)
asin = _unary_op('asin', ops.Asin)
atan = _unary_op('atan', ops.Atan)
atan2 = _generic_op('atan2', ops.Atan2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a binary operation right? There should be something like _binary_op function lying around here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right. I think this is the right function: _binop_expr

@@ -516,6 +521,53 @@ class Log10(Logarithm):
"""Logarithm base 10"""


# TRIGONOMETRIC OPERATIONS

class TrigonometryUnary(UnaryOp):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change the name from TrigonometryUnary to TrigonometricUnary, and do the same for TrigonometryBinary.

@@ -2183,6 +2235,14 @@ def output_type(self):
return partial(ir.FloatingScalar, dtype=dt.float64)


class Pi(Constant):
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you could saying something here like The constant pi.

elif GPUDataFrame is not None and isinstance(
self.cursor, GPUDataFrame
):
result = self.cursor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're executing the same code (assigning self.cursor to result) in the case that the cursor is a pandas DataFrame or a GPUDataFrame, can you remove the last two elifs? Is there a case where self.cursor is not None and it's not either a pandas DataFrame or a GPUDataFrame?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right! thanks!

# compile the argument
compiled_arg = translator.translate(arg)

return 'CHAR_LENGTH(%s)' % compiled_arg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format strings.

import ibis.expr.datatypes as dt


def test_timestamp_accepts_date_literals(alltypes):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a BigQuery specific test. Do you really need it?

ibis/mapd/udf.py Outdated
@@ -0,0 +1,5 @@
"""
User Defined Function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't add this module unless there's support for this in MapD.

assert result == expected

'''
def test_simple_aggregate_execute(alltypes, df):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many of these tests are BigQuery specific. Can you remove them from here?

The preferred alternative is to add appropriate tests in ibis/tests/all/test_*.py. To do that, you'll also need to add a MapD class in ibis/tests/all/backends.py. There are many examples in that file to get you started.

@@ -0,0 +1,798 @@
from six import StringIO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a lot of the functions and objecs in here are duplicated from either the impala or bigquery backends. Can you see if you can reuse some of their functions so we have only what's needed for MapD?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I will take a look :) thanks!

@@ -22,6 +22,7 @@ dependencies:
- plumbum
- psycopg2
- pyarrow>=0.6.0
- pymapd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this require a version constraint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right! it is better to pin the version here. thanks!

ibis/__init__.py Outdated
@@ -71,6 +72,12 @@
# pip install ibis-framework[bigquery]
import ibis.bigquery.api as bigquery

with suppress(ImportError):
# pip install ibis-framework[mapd]
if sys.version_info[0] < 3:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use sys.version_info.major here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok! thanks

ibis/__init__.py Outdated
with suppress(ImportError):
# pip install ibis-framework[mapd]
if sys.version_info[0] < 3:
raise ImportError('ibis.mapd is not allowed it for Python 2.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo here, should read ibis.mapd is not allowed for Python 2 or The mapd backend is not supported under Python 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

Copy link
Contributor Author

@xmnlab xmnlab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cpcloud !

I've push new changes

dtype = self.left.type().largest
else:
dtype = dt.float64
return dtype.scalar_type()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. Sorry, my mistake. I was a little bit confused. I will changed that.

how = Arg(rlz.isin({'sample', 'pop'}), default=None)
where = Arg(rlz.boolean, default=None)

def output_type(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok! thanks!


class Distance(ValueOp):
"""
Calculates distance in meters between two WGS-84 positions.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can do that. We just cannot test it because mapd doesn't have a Euclidean operation.
I am just not sure if common Euclidean distance function works with lat lon parameters .. so maybe would be better to remove this function and create a issue to follow this discussion.

@xmnlab
Copy link
Contributor Author

xmnlab commented Jun 7, 2018

@cpcloud thanks a lot for reviewing this PR.

this is a compilation of the main fixes here:

  1. ci/requirements-dev-3.5.yml: I just made a rollback and works good.
  2. Degrees and Radians: I changed the input to numeric and I changed to output to float64, I also add tests for that on mapd tests.
  3. Correlation and Covariance: Sorry I misunderstood that, thanks for the patience :) ... I changed the output to dt.float64.scalar_type() .. I also added tests for these operations on mapd tests.
  4. Distance: I am not sure if Euclidean distance common functions work with lat lon .. so I removed that from this PR and I will create now a issue to follow up this.

if these points are ok for you, I think it is ready for a new review. I just changed the left and right parameters from correlation to column numeric.

Again, thank you so much for your attention.

@xmnlab
Copy link
Contributor Author

xmnlab commented Jun 13, 2018

hi @cpcloud @kszucs

any update about this PR?

thanks a lot!

@xmnlab
Copy link
Contributor Author

xmnlab commented Jun 13, 2018

there is a conflict now .. I will rebase here now.

Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xmnlab After you address this round of comments I wil approve and merge! Thanks for the effort!!

self, name, password=None, is_super=None, insert_access=None
):
"""
Create a new MapD database
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring looks wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right. thanks!

statement = ddl.DropDatabase(name)
self._execute(statement)

def create_user(self, name, password, is_super=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a new API. @xmnlab can you create a follow up issue to add this API to the clients that have support for such functionality?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I will do it.

)
self._execute(statement)

def drop_user(self, name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I will do it! thanks!

@@ -41,9 +41,11 @@ def test_timestamp_extract(backend, alltypes, df, attr):
@pytest.mark.parametrize('unit', [
'Y', 'M', 'D',
param('W', marks=pytest.mark.xfail),
'h', 'm', 's', 'ms', 'us', 'ns'
'h', 'm', 's', 'ms', 'us',
param('ns', marks=pytest.mark.xfail)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this pass now that you added the skipif_backend?

Copy link
Contributor Author

@xmnlab xmnlab Jun 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for some reason when it try to test ns unit .. raise an error that the pytest breaks ...
I tried to use pytest mark for ns and it didn't work .. I needed to skip that for mapd.
for another tests here .. I could remove skipif_backend and just put some xfail for some units and works good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to have both xfail on 'ns' and the skipif_backend('MapD') decorator? Shouldn't it be enough to just skip this on MapD altogether?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right! sorry, I will remove this now.

pytest.param(Impala, marks=pytest.mark.impala)
]

if sys.version_info.major == 3:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be > 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok thanks!

),
param(
lambda t: t.double_col.cov(t.float_col),
91.67005567565313,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately we will want to change these to use a pandas call or numpy call, so that we don't have to depend on hard coded values.

This is fine for now though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks!

@@ -0,0 +1,422 @@
from ibis.sql.compiler import DDL, DML
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have tests for the classes in this file? If not, please add them in a follow-up PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No ... I had some problems adding DDL tests because it was breaking the mapd database container. I will create now a issue for that.

@@ -1752,6 +1814,34 @@ def _string_like(self, patterns):
)


def _string_ilike(self, patterns):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this API have a test in this PR?

Copy link
Contributor Author

@xmnlab xmnlab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes done. I will commit my changes.

),
param(
lambda t: t.double_col.cov(t.float_col),
91.67005567565313,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks!

pytest.param(Impala, marks=pytest.mark.impala)
]

if sys.version_info.major == 3:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok thanks!

@@ -0,0 +1,422 @@
from ibis.sql.compiler import DDL, DML
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No ... I had some problems adding DDL tests because it was breaking the mapd database container. I will create now a issue for that.

statement = ddl.DropDatabase(name)
self._execute(statement)

def create_user(self, name, password, is_super=False):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I will do it.

self, name, password=None, is_super=None, insert_access=None
):
"""
Create a new MapD database
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right. thanks!

)
self._execute(statement)

def drop_user(self, name):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I will do it! thanks!

@@ -41,9 +41,11 @@ def test_timestamp_extract(backend, alltypes, df, attr):
@pytest.mark.parametrize('unit', [
'Y', 'M', 'D',
param('W', marks=pytest.mark.xfail),
'h', 'm', 's', 'ms', 'us', 'ns'
'h', 'm', 's', 'ms', 'us',
param('ns', marks=pytest.mark.xfail)
Copy link
Contributor Author

@xmnlab xmnlab Jun 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for some reason when it try to test ns unit .. raise an error that the pytest breaks ...
I tried to use pytest mark for ns and it didn't work .. I needed to skip that for mapd.
for another tests here .. I could remove skipif_backend and just put some xfail for some units and works good.

@cpcloud
Copy link
Member

cpcloud commented Jun 18, 2018

Merging on green!

@xmnlab
Copy link
Contributor Author

xmnlab commented Jun 18, 2018

@cpcloud thank you so much for your attention and support!

@cpcloud
Copy link
Member

cpcloud commented Jun 18, 2018

nice! bombs away!

@cpcloud cpcloud closed this in 037db67 Jun 18, 2018
@xmnlab xmnlab changed the title [WiP] Adding new backend for MapD Adding new backend for MapD Jun 19, 2018
This was referenced Jun 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The best approach to implement PI() function ENH: Trigonometric functions
4 participants