-
-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Python 3.12 support #1317
Add Python 3.12 support #1317
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1317 +/- ##
==========================================
- Coverage 85.63% 85.60% -0.04%
==========================================
Files 51 51
Lines 11286 11287 +1
==========================================
- Hits 9665 9662 -3
- Misses 1621 1625 +4 |
Tests passed, but get this warning in Python 3.12
|
Things I have done in this PR, other than upgrading to Python 3.12.
|
Do you have a reproducer for the invalid line? |
Searching that file for the text One option would be to convert to a legacy dask dataframe, where all of those things would continue working, at least until we entirely remove that system. |
A MRE for the dask keys could be this: import dask.dataframe as dd
import pandas as pd
import numpy as np
import datashader as ds
df = pd.DataFrame(
{
"x": np.array(([0.0] * 10 + [1] * 10)),
"y": np.array(([0.0] * 5 + [1] * 5 + [0] * 5 + [1] * 5)),
"i32": np.arange(20, dtype="i4"),
}
)
ddf = dd.from_pandas(df, npartitions=2)
ddf = ddf.repartition(npartitions=2) # If I comment out this lines it does not error.
c = ds.Canvas(plot_width=2, plot_height=2, x_range=(0, 1), y_range=(0, 1))
c.points(ddf, "x", "y", ds.count("i32")) |
thx @hoxbro The issue at play here is as follows:
both change the expression (a optimization step), which makes the keys in the graph no longer match There are different ways of addressing this:
|
I will push it and see how the CI likes it. However, I can see at least one problem still exists, where import dask
dask.config.set({"dataframe.query-planning": True})
import geopandas as gpd
import dask.dataframe as dd
import spatialpandas as sp
from geodatasets import get_path
world = sp.GeoDataFrame(gpd.read_file(get_path("nybb")))
type(dd.from_pandas(world, npartitions=2)) # dask_expr._collection.DataFrame Edit: I ran one cell, restarted the kernel and then ran the second. |
There seem to be one other type of failing tests: import dask.dataframe as dd
import pandas as pd
import datashader as ds
df = pd.DataFrame({"x": [0] * 5, "y": [0] * 5})
ddf = dd.from_pandas(df, npartitions=2)
cvs = ds.Canvas(plot_width=10, plot_height=10)
agg = cvs.line(ddf, "x", "y", line_width=0, agg=ds.reductions.any()) ---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[5], line 9
6 ddf = dd.from_pandas(df, npartitions=2)
8 cvs = ds.Canvas(plot_width=10, plot_height=10)
----> 9 agg = cvs.line(ddf, "x", "y", line_width=0, agg=ds.reductions.any())
File [~/projects/holoviz/repos/datashader/datashader/core.py:472](http://localhost:8888/lab/workspaces/auto-u/tree/repos/datashader/datashader/core.py#line=471), in Canvas.line(self, source, x, y, agg, axis, geometry, line_width, antialias)
464 if not isinstance(non_cat_agg, (
465 rd.any, rd.count, rd.max, rd.min, rd.sum, rd.summary, rd._sum_zero,
466 rd._first_or_last, rd.mean, rd.max_n, rd.min_n, rd._first_n_or_last_n,
467 rd._max_or_min_row_index, rd._max_n_or_min_n_row_index, rd.where,
468 )):
469 raise NotImplementedError(
470 f"{type(non_cat_agg)} reduction not implemented for antialiased lines")
--> 472 return bypixel(source, self, glyph, agg, antialias=glyph.antialiased)
File [~/projects/holoviz/repos/datashader/datashader/core.py:1335](http://localhost:8888/lab/workspaces/auto-u/tree/repos/datashader/datashader/core.py#line=1334), in bypixel(source, canvas, glyph, agg, antialias)
1333 with warnings.catch_warnings():
1334 warnings.filterwarnings('ignore', r'All-NaN (slice|axis) encountered')
-> 1335 return bypixel.pipeline(source, schema, canvas, glyph, agg, antialias=antialias)
File [~/projects/holoviz/repos/datashader/datashader/utils.py:114](http://localhost:8888/lab/workspaces/auto-u/tree/repos/datashader/datashader/utils.py#line=113), in Dispatcher.__call__(self, head, *rest, **kwargs)
112 typ = type(head)
113 if typ in lk:
--> 114 return lk[typ](head, *rest, **kwargs)
115 for cls in getmro(typ)[1:]:
116 if cls in lk:
File [~/projects/holoviz/repos/datashader/datashader/data_libraries/dask.py:51](http://localhost:8888/lab/workspaces/auto-u/tree/repos/datashader/datashader/data_libraries/dask.py#line=50), in dask_pipeline(df, schema, canvas, glyph, summary, antialias, cuda)
48 graph = df.__dask_graph__()
50 dsk.update(optimize(graph, keys))
---> 51 return scheduler(dsk, name)
File [~/miniconda3/envs/holoviz/lib/python3.12/site-packages/dask/threaded.py:90](http://localhost:8888/home/shh/miniconda3/envs/holoviz/lib/python3.12/site-packages/dask/threaded.py#line=89), in get(dsk, keys, cache, num_workers, pool, **kwargs)
87 elif isinstance(pool, multiprocessing.pool.Pool):
88 pool = MultiprocessingPoolExecutor(pool)
---> 90 results = get_async(
91 pool.submit,
92 pool._max_workers,
93 dsk,
94 keys,
95 cache=cache,
96 get_id=_thread_get_id,
97 pack_exception=pack_exception,
98 **kwargs,
99 )
101 # Cleanup pools associated to dead threads
102 with pools_lock:
File [~/miniconda3/envs/holoviz/lib/python3.12/site-packages/dask/local.py:512](http://localhost:8888/home/shh/miniconda3/envs/holoviz/lib/python3.12/site-packages/dask/local.py#line=511), in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs)
510 _execute_task(task, data) # Re-execute locally
511 else:
--> 512 raise_exception(exc, tb)
513 res, worker_id = loads(res_info)
514 state["cache"][key] = res
File [~/miniconda3/envs/holoviz/lib/python3.12/site-packages/dask/local.py:320](http://localhost:8888/home/shh/miniconda3/envs/holoviz/lib/python3.12/site-packages/dask/local.py#line=319), in reraise(exc, tb)
318 if exc.__traceback__ is not tb:
319 raise exc.with_traceback(tb)
--> 320 raise exc
File [~/miniconda3/envs/holoviz/lib/python3.12/site-packages/dask/local.py:225](http://localhost:8888/home/shh/miniconda3/envs/holoviz/lib/python3.12/site-packages/dask/local.py#line=224), in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
223 try:
224 task, data = loads(task_info)
--> 225 result = _execute_task(task, data)
226 id = get_id()
227 result = dumps((result, id))
File [~/miniconda3/envs/holoviz/lib/python3.12/site-packages/dask/core.py:127](http://localhost:8888/home/shh/miniconda3/envs/holoviz/lib/python3.12/site-packages/dask/core.py#line=126), in _execute_task(arg, cache, dsk)
123 func, args = arg[0], arg[1:]
124 # Note: Don't assign the subtask results to a variable. numpy detects
125 # temporaries by their reference count and can execute certain
126 # operations in-place.
--> 127 return func(*(_execute_task(a, cache) for a in args))
128 elif not ishashable(arg):
129 return arg
File [~/projects/holoviz/repos/datashader/datashader/data_libraries/dask.py:245](http://localhost:8888/lab/workspaces/auto-u/tree/repos/datashader/datashader/data_libraries/dask.py#line=244), in line.<locals>.chunk(df, df2)
243 plot_start = True
244 if df2 is not None:
--> 245 df = concat([df.iloc[-1:], df2])
246 plot_start = False
247 aggs = create(shape)
AttributeError: 'tuple' object has no attribute 'iloc' |
That's not as trivial as this one. There are 2 options:
What's the priority to make this passing? |
I'll have a look at the geopandas PR and see if I can crib from them. In the long run we definitely want full compatibility but will make a call whether to release with a short term patch based on the effort required. |
I think the only test failing now is related to SpatialPandas Dask, which likely needs to be updated in spatialpandas itself. Because of this, the environment variable is needed for now. The last failing test outside of that was related to attribute assignment being ignored with dask-expr, whereas it would work with classic dask.dataframe; see an MRE below. It seems to be related to dask/dask-expr#785, and can open an issue upstream if needed. import dask
import importlib
import pandas as pd
import dask.dataframe as dd
for b in [True, False]:
print("\ndataframe.query-planning =", b)
dask.config.set({"dataframe.query-planning": b})
dd = importlib.reload(dd)
df = pd.DataFrame(data=dict(cat1=["a", "b", "c", "a", "b"], cat2=["a", "b", "c", "a", "b"]))
ddf = dd.from_pandas(df, npartitions=2)
ddf["cat1"] = ddf["cat1"].astype("category")
ddf.cat2 = ddf.cat2.astype("category")
print(ddf.dtypes) |
this was fixed on Friday fortunately |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, thanks everybody! It takes a village! |
With numba now supporting Python 3.12, we should also add support to it.
We need conda-build, to be released on conda-forge before we can solve on conda.
Builds on top of #1314