You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"SELECT -2613 FROM <table> HAVING (<TIMESTAMP> NOT BETWEEN <TIMESTAMP> AND MAX(<TIMESTAMP>)) " brings error, when using GPU.
However it is able to output result, when using CPU.
What you expected to happen:
It will not bring error, when using GPU.
Minimal Complete Verifiable Example:
importpandasaspdimportdask.dataframeasddfromdask_sqlimportContextc=Context()
df0=pd.DataFrame({
'c0': ['CAST((12998) AS SMALLINT)'],
})
t0=dd.from_pandas(df0, npartitions=1)
c.create_table('t0', t0, gpu=False)
c.create_table('t0_gpu', t0, gpu=True)
print('CPU Result::')
result1=c.sql("SELECT -2613 FROM t0 HAVING (TIMESTAMP '1991-02-28 13:42:12' NOT BETWEEN TIMESTAMP '1985-12-14 23:59:41' AND MAX(TIMESTAMP '2006-08-05 07:29:26'))").compute()
print(result1)
print('GPU Result::')
result2=c.sql("SELECT -2613 FROM t0_gpu HAVING (TIMESTAMP '1991-02-28 13:42:12' NOT BETWEEN TIMESTAMP '1985-12-14 23:59:41' AND MAX(TIMESTAMP '2006-08-05 07:29:26'))").compute()
print(result2)
Result:
INFO:numba.cuda.cudadrv.driver:init
CPU Result::
Empty DataFrame
Columns: [Int64(-2613)]
Index: []
GPU Result::
Traceback (most recent call last):
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/dataframe/utils.py", line 193, in raise_on_meta_error
yield
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/dataframe/core.py", line 6470, in elemwise
meta = partial_by_order(*parts, function=op, other=other)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/utils.py", line 1327, in partial_by_order
return function(*args2, **kwargs)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/indexed_frame.py", line 3375, in __array_ufunc__
ret = super().__array_ufunc__(ufunc, method, *inputs, **kwargs)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/frame.py", line 1761, in __array_ufunc__
return _array_ufunc(self, ufunc, method, inputs, kwargs)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/utils/utils.py", line 93, in _array_ufunc
return getattr(obj, op)(other)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/mixins/mixin_factory.py", line 11, in wrapper
return method(self, *args1, *args2, **kwargs1, **kwargs2)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/indexed_frame.py", line 3350, in _binaryop
ColumnAccessor(type(self)._colwise_binop(operands, op)),
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/nvtx/nvtx.py", line 101, in inner
result = func(*args, **kwargs)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/frame.py", line 1750, in _colwise_binop
else getattr(operator, fn)(left_column, right_column)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/mixins/mixin_factory.py", line 11, in wrapper
return method(self, *args1, *args2, **kwargs1, **kwargs2)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/column/datetime.py", line 405, in _binaryop
other = self._wrap_binop_normalization(other)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/column/column.py", line 606, in _wrap_binop_normalization
other = other.dtype.type(other.item())
ValueError: Converting an integer to a NumPy datetime requires a specified unit
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/tmp/bug19/bug19.py", line 19, in<module>
result2= c.sql("SELECT -2613 FROM t0_gpu HAVING (TIMESTAMP '1991-02-28 13:42:12' NOT BETWEEN TIMESTAMP '1985-12-14 23:59:41' AND MAX(TIMESTAMP '2006-08-05 07:29:26'))").compute()
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/context.py", line 513, in sql
return self._compute_table_from_rel(rel, return_futures)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/context.py", line 839, in _compute_table_from_rel
dc = RelConverter.convert(rel, context=self)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/physical/rel/convert.py", line 61, in convert
df = plugin_instance.convert(rel, context=context)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/physical/rel/logical/project.py", line 28, in convert
(dc,) = self.assert_inputs(rel, 1, context)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/physical/rel/base.py", line 84, in assert_inputs
return [RelConverter.convert(input_rel, context) forinput_relin input_rels]
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/physical/rel/base.py", line 84, in<listcomp>return [RelConverter.convert(input_rel, context) forinput_relin input_rels]
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/physical/rel/convert.py", line 61, in convert
df = plugin_instance.convert(rel, context=context)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/physical/rel/logical/filter.py", line 65, in convert
df_condition = RexConverter.convert(rel, condition, dc, context=context)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/physical/rex/convert.py", line 74, in convert
df = plugin_instance.convert(rel, rex, dc, context=context)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/physical/rex/core/call.py", line 1129, in convert
return operation(*operands, **kwargs)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/physical/rex/core/call.py", line 77, in __call__
return self.f(*operands, **kwargs)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/physical/rex/core/call.py", line 140, in reduce
return reduce(partial(self.operation, **kwargs), operands)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/dataframe/core.py", line 617, in __array_ufunc__
return elemwise(numpy_ufunc, *inputs, **kwargs)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/dataframe/core.py", line 6469, in elemwise
with raise_on_meta_error(funcname(op)):
File "/opt/conda/envs/rapids/lib/python3.10/contextlib.py", line 153, in __exit__
self.gen.throw(typ, value, traceback)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/dataframe/utils.py", line 214, in raise_on_meta_error
raise ValueError(msg) from e
ValueError: Metadata inference failed in`greater`.
Original error is below:
------------------------
ValueError('Converting an integer to a NumPy datetime requires a specified unit')
Traceback:
---------
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/dataframe/utils.py", line 193, in raise_on_meta_error
yield
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/dataframe/core.py", line 6470, in elemwise
meta = partial_by_order(*parts, function=op, other=other)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/utils.py", line 1327, in partial_by_order
return function(*args2, **kwargs)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/indexed_frame.py", line 3375, in __array_ufunc__
ret = super().__array_ufunc__(ufunc, method, *inputs, **kwargs)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/frame.py", line 1761, in __array_ufunc__
return _array_ufunc(self, ufunc, method, inputs, kwargs)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/utils/utils.py", line 93, in _array_ufunc
return getattr(obj, op)(other)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/mixins/mixin_factory.py", line 11, in wrapper
return method(self, *args1, *args2, **kwargs1, **kwargs2)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/indexed_frame.py", line 3350, in _binaryop
ColumnAccessor(type(self)._colwise_binop(operands, op)),
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/nvtx/nvtx.py", line 101, in inner
result = func(*args, **kwargs)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/frame.py", line 1750, in _colwise_binop
else getattr(operator, fn)(left_column, right_column)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/mixins/mixin_factory.py", line 11, in wrapper
return method(self, *args1, *args2, **kwargs1, **kwargs2)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/column/datetime.py", line 405, in _binaryop
other = self._wrap_binop_normalization(other)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/core/column/column.py", line 606, in _wrap_binop_normalization
other = other.dtype.type(other.item())
Thanks for filing @qwebug! In the past few months I haven't had as much capacity to be active on the issue tracker here so apologize in advance if many of the issues you've filed don't addressed right away, though we always invite external contributors if you have any interest in digging into this 😉 from your example, it's a little difficult to tell what in particular is causing the bug, but it does look like we seem to be passing an object that isn't supported into cuDF's datetime column mechanics.
I'd recommend trying to trim your example down a bit so it's more immediately obvious what the root cause here is. For example, I notice that the table in your example contains a SQL query - is this relevant to the failure you encountered? If not, it might make sense to use more trivial data here, i.e. ['a', 'b', 'c'] to quickly convey "this thing doesn't work on string data in general." It's also difficult to tell what part of the query causes things to break - do things work if we select a column instead of a scalar integer? Or if we choose a different type of scalar? Do things work if we include the MAX operation on one of the timestamps? I think if I were to rewrite your example, it'd probably look something like this (haven't tested any of this locally, purely an illustrative example):
importpandasaspdfromdask_sqlimportContextc=Context()
df=pd.DataFrame({
"a": list("abcde"),
})
c.create_table('df', df, gpu=True)
# this works!res=c.sql("SELECT -2613 FROM df HAVING (TIMESTAMP '1991-02-28 13:42:12' NOT BETWEEN TIMESTAMP '1985-12-14 23:59:41' AND TIMESTAMP '2006-08-05 07:29:26')").compute()
# this doesn't work!res=c.sql("SELECT -2613 FROM df HAVING (TIMESTAMP '1991-02-28 13:42:12' NOT BETWEEN TIMESTAMP '1985-12-14 23:59:41' AND MAX(TIMESTAMP '2006-08-05 07:29:26'))").compute()
Finally, I'm interested in if there's any additional context on how you encountered this issue (and the others you've filed)? Some of these queries seem like pretty carefully designed edge cases, which are great for unit testing even if they're sometimes difficult to find the bug in 😄
This problem came up at dask-sql version: 2023.6.0 .
And it has been fixed at dask-sql version: 2024.3.0, after my verification.
Thanks to the developers for their contributions.
What happened:
"SELECT -2613 FROM <table> HAVING (<TIMESTAMP> NOT BETWEEN <TIMESTAMP> AND MAX(<TIMESTAMP>)) " brings error, when using GPU.
However it is able to output result, when using CPU.
What you expected to happen:
It will not bring error, when using GPU.
Minimal Complete Verifiable Example:
Result:
Anything else we need to know?:
Environment:
The text was updated successfully, but these errors were encountered: