Skip to content

Conversation

@jleifnf
Copy link

@jleifnf jleifnf commented Aug 11, 2023

Update the StreamSet's filter to remove multiple for-loops by build a hidden metadata table and using DataFrame.
Tested on StreamSet of 6471 streams (originally took ~50 seconds) to filter down streamset.filter(unit=re.compile(r'FREQ|FLAG'), name=re.compile(r'FQ|FLAG')) to 294 streams.

With this code change, the stream filter took ~10seconds (5x)..

@jleifnf jleifnf requested a review from justinGilmer August 11, 2023 00:36
@jleifnf jleifnf self-assigned this Aug 11, 2023
@jleifnf jleifnf marked this pull request as draft August 23, 2023 16:35
@andrewchambers
Copy link

Don't know if we want to add dask in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants