Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize bitmask finding for chunk size 1 and single chunk cases #360

Merged
merged 6 commits into from
Apr 27, 2024

Conversation

dcherian
Copy link
Collaborator

@dcherian dcherian commented Apr 26, 2024

  1. With size-1 chunks, we needn't find unique labels in each chunk. There's only one label in each chunk!
  2. When there is only a single chunk, we should just run blockwise.
| Before [497e7bc1] <main> | After [4935636b] | Ratio | Benchmark (Parameter)                       |
|--------------------------+------------------+-------+---------------------------------------------|
| 876±6μs                  | 640±80μs         |  0.73 | cohorts.SingleChunk.time_graph_construct    |
| 1.86±0.01ms              | 229±30μs         |  0.12 | cohorts.ERA5Google.time_find_group_cohorts  |
| 329±10μs                 | 1.58±0.1μs       |     0 | cohorts.SingleChunk.time_find_group_cohorts |

@dcherian dcherian force-pushed the cohorts-size-1-chunks branch from 55ff59e to 00601ba Compare April 26, 2024 04:32
@dcherian dcherian enabled auto-merge (squash) April 27, 2024 03:21
@dcherian dcherian changed the title Optimize bitmask finding for chunk size 1 Optimize bitmask finding for chunk size 1 and single chunk cases Apr 27, 2024
@dcherian dcherian disabled auto-merge April 27, 2024 03:29
@dcherian dcherian enabled auto-merge (squash) April 27, 2024 03:40
@dcherian dcherian disabled auto-merge April 27, 2024 04:48
@dcherian dcherian merged commit 627bf2b into main Apr 27, 2024
15 checks passed
@dcherian dcherian deleted the cohorts-size-1-chunks branch April 27, 2024 04:48
dcherian added a commit that referenced this pull request Apr 27, 2024
* main:
  Optimize bitmask finding for chunk size 1 and single chunk cases (#360)
  Edits to climatology doc (#361)
dcherian added a commit that referenced this pull request Apr 27, 2024
* main:
  Optimize bitmask finding for chunk size 1 and single chunk cases (#360)
  Edits to climatology doc (#361)
dcherian added a commit that referenced this pull request May 2, 2024
* main: (64 commits)
  import `normalize_axis_index` from `numpy.lib` on `numpy>=2` (#364)
  Optimize `min_count` when `expected_groups` is not provided. (#236)
  Use threadpool for finding labels in chunk (#327)
  Manually fuse reindexing intermediates with blockwise reduction for cohorts. (#300)
  Bump codecov/codecov-action from 4.1.1 to 4.3.1 (#362)
  Add cubed notebook for hourly climatology example using "map-reduce" method (#356)
  Optimize bitmask finding for chunk size 1 and single chunk cases (#360)
  Edits to climatology doc (#361)
  Fix benchmarks (#358)
  Trim CI (#355)
  [pre-commit.ci] pre-commit autoupdate (#350)
  Initial minimal working Cubed example for "map-reduce" (#352)
  Bump codecov/codecov-action from 4.1.0 to 4.1.1 (#349)
  `method` heuristics: Avoid dot product as much as possible (#347)
  Fix nanlen with strings (#344)
  Fix direct quantile reduction (#343)
  Fix upstream-dev CI, silence warnings (#341)
  Bump codecov/codecov-action from 4.0.0 to 4.1.0 (#338)
  Fix direct reductions of Xarray objects (#339)
  Test with py3.12 (#336)
  ...
dcherian added a commit that referenced this pull request Jun 30, 2024
* main:
  Bump codecov/codecov-action from 4.3.1 to 4.4.1 (#366)
  Cubed blockwise (#357)
  Remove errant print statement
  import `normalize_axis_index` from `numpy.lib` on `numpy>=2` (#364)
  Optimize `min_count` when `expected_groups` is not provided. (#236)
  Use threadpool for finding labels in chunk (#327)
  Manually fuse reindexing intermediates with blockwise reduction for cohorts. (#300)
  Bump codecov/codecov-action from 4.1.1 to 4.3.1 (#362)
  Add cubed notebook for hourly climatology example using "map-reduce" method (#356)
  Optimize bitmask finding for chunk size 1 and single chunk cases (#360)
  Edits to climatology doc (#361)
  Fix benchmarks (#358)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant