Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove SerializebleLock in to_snowflake #39

Merged
merged 11 commits into from
Jul 17, 2023
Merged

Remove SerializebleLock in to_snowflake #39

merged 11 commits into from
Jul 17, 2023

Conversation

phobson
Copy link
Contributor

@phobson phobson commented Nov 23, 2022

Fixes #29

Just testing things out for now

@phobson
Copy link
Contributor Author

phobson commented Nov 23, 2022

With the larger dataframe, the partition sizes when we read back from snowflake are coming back a bit erratically sized. This is without any changes to core.py

Locally, I get:

(Pdb) from dask.utils import format_bytes
(Pdb) partition_sizes.map(format_bytes).to_frame("result").assign(expected="2 MiB")
        result expected
0     1.60 MiB    2 MiB
1     1.71 MiB    2 MiB
2     2.18 MiB    2 MiB
3     3.51 MiB    2 MiB
4     1.60 MiB    2 MiB
5     1.71 MiB    2 MiB
6     2.18 MiB    2 MiB
7     4.36 MiB    2 MiB
8     1.39 MiB    2 MiB
9   875.77 kiB    2 MiB
10    1.71 MiB    2 MiB
11    2.18 MiB    2 MiB
12    3.72 MiB    2 MiB
13    1.60 MiB    2 MiB
14    1.70 MiB    2 MiB
15    2.18 MiB    2 MiB
16    1.69 MiB    2 MiB
17    1.28 MiB    2 MiB
18    1.70 MiB    2 MiB
19    2.18 MiB    2 MiB
20    4.37 MiB    2 MiB
21    1.82 MiB    2 MiB
22    1.70 MiB    2 MiB
23    2.18 MiB    2 MiB
24    3.30 MiB    2 MiB
25    1.60 MiB    2 MiB
26    1.71 MiB    2 MiB
27    2.18 MiB    2 MiB
28    4.37 MiB    2 MiB
29    2.79 MiB    2 MiB
30    1.60 MiB    2 MiB
31    1.71 MiB    2 MiB
32    2.18 MiB    2 MiB
33    4.37 MiB    2 MiB
34    1.29 MiB    2 MiB

@phobson
Copy link
Contributor Author

phobson commented Nov 23, 2022

Push a commit to remove the SerializableLock. The difference in test durations is substantial:

Without 399da14 (status quo):

============================================= slowest 10 durations =============================================
708.74s call     dask_snowflake/tests/test_core.py::test_result_batching_nparitions_and_equality[twelve months]
664.69s call     dask_snowflake/tests/test_core.py::test_result_batching_partition_sizes[twelve months]
76.58s call     dask_snowflake/tests/test_core.py::test_result_batching_nparitions_and_equality[one month]
63.95s call     dask_snowflake/tests/test_core.py::test_result_batching_partition_sizes[one month]
9.74s call     dask_snowflake/tests/test_core.py::test_application_id_config
8.94s call     dask_snowflake/tests/test_core.py::test_application_id_default
8.27s call     dask_snowflake/tests/test_core.py::test_application_id_explicit
7.54s call     dask_snowflake/tests/test_core.py::test_write_read_roundtrip
7.01s call     dask_snowflake/tests/test_core.py::test_execute_params
6.94s call     dask_snowflake/tests/test_core.py::test_read_empty_result
=========================================== short test summary info ============================================
XFAIL dask_snowflake/tests/test_core.py::test_result_batching_partition_sizes[twelve months] - inconsistent partition sizing
=========================== 11 passed, 1 xfailed, 46 warnings in 1601.74s (0:26:41) ============================

With 399da14 (this PR):

============================================= slowest 10 durations =============================================
175.20s call     dask_snowflake/tests/test_core.py::test_result_batching_nparitions_and_equality[twelve months]
115.40s call     dask_snowflake/tests/test_core.py::test_result_batching_partition_sizes[twelve months]
27.99s call     dask_snowflake/tests/test_core.py::test_result_batching_nparitions_and_equality[one month]
26.80s call     dask_snowflake/tests/test_core.py::test_result_batching_partition_sizes[one month]
7.11s call     dask_snowflake/tests/test_core.py::test_application_id_explicit
7.11s call     dask_snowflake/tests/test_core.py::test_write_read_roundtrip
7.11s call     dask_snowflake/tests/test_core.py::test_application_id_config_on_cluster
6.97s call     dask_snowflake/tests/test_core.py::test_execute_params
6.93s call     dask_snowflake/tests/test_core.py::test_arrow_options
6.71s call     dask_snowflake/tests/test_core.py::test_application_id_default
=========================================== short test summary info ============================================
XFAIL dask_snowflake/tests/test_core.py::test_result_batching_partition_sizes[twelve months] - inconsistent partition sizing
============================ 11 passed, 1 xfailed, 46 warnings in 425.89s (0:07:05) ============================

@phobson phobson marked this pull request as ready for review November 23, 2022 22:30
@phobson
Copy link
Contributor Author

phobson commented Nov 23, 2022

@dchudz
Copy link

dchudz commented May 15, 2023

@jrbourbeau is this something we should revisit?

@fjetter
Copy link
Member

fjetter commented May 16, 2023

If the threading problem is resolved, I'd recommend removing the parallel=1 kwarg as well. Since everything is IO bound we should be able to insert with many more threads than we have CPUs.

@jrbourbeau jrbourbeau mentioned this pull request May 17, 2023
@jrbourbeau jrbourbeau changed the title Seeing if we can remove the SerializebleLock to speed up writes Remove SerializebleLock in to_snowflake May 26, 2023
@fjetter
Copy link
Member

fjetter commented Jul 10, 2023

@jrbourbeau are you still on this?

@jrbourbeau jrbourbeau merged commit e251a70 into main Jul 17, 2023
16 checks passed
@jrbourbeau jrbourbeau deleted the breaking-locks branch July 17, 2023 22:08
@phobson
Copy link
Contributor Author

phobson commented Jul 27, 2023

Heyyy look that it. Finally merged! 🎉

@mrocklin
Copy link
Member

You just gotta plant a lotta seeds 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Questioning lock necessity in write_snowflake
5 participants