Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve glob_to_regex performance and add case sensitivity flag #268

Closed
wants to merge 2 commits into from

Conversation

squahtx
Copy link
Contributor

@squahtx squahtx commented Nov 5, 2021

Fixing #255 will involve more frequent usage of glob_to_regex, so it'd be nice if it didn't do so many string concatenations.

@squahtx squahtx requested a review from a team as a code owner November 5, 2021 12:14
@erikjohnston
Copy link
Member

Is this something we can also do to the Synapse implementation?

@richvdh
Copy link
Member

richvdh commented Nov 5, 2021

ARGH. Please can we draw a line in the sand and resurrect syutil, rather than having two or three copies of everything?

@squahtx
Copy link
Contributor Author

squahtx commented Nov 5, 2021

Is this something we can also do to the Synapse implementation?

I didn't know Synapse had an equivalent. Synapse's one seems far more clever and collapses runs of wildcards, which the technique here can't do.

Copy link
Contributor

@reivilibre reivilibre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing #255 will involve more frequent usage of glob_to_regex, so it'd be nice if it didn't do so many string concatenations.

I'm not quite sure how — hopefully we're only running a few on startup anyway (since re.compile is expensive), but I can't argue that this is probably a better way of doing it.

Though it is probably about time to have syutil; we do wind up with quite a bit of duplication :/.

sygnal/utils.py Outdated
Comment on lines 48 to 53

Args:
glob (str)
glob

Returns:
re.RegexObject
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just knock these lines out, they're noise at this point

@richvdh
Copy link
Member

richvdh commented Nov 5, 2021

Synapse's one seems far more clever and collapses runs of wildcards

Turns out that if someone sends you a glob with three * in a row, that DoSes your entire process if you naively convert it to .*.*.*.

That sort of lesson is one of the reasons I'm really keen we pull it out to a shared library, so we don't have to keep learning the same lesson.

@squahtx
Copy link
Contributor Author

squahtx commented Nov 5, 2021

Fixing #255 will involve more frequent usage of glob_to_regex, so it'd be nice if it didn't do so many string concatenations.

I'm not quite sure how — hopefully we're only running a few on startup anyway (since re.compile is expensive), but I can't argue that this is probably a better way of doing it.

I did a direct translation of re.search to glob_to_regex().match. I've put up #269 and #270 for judgement

@squahtx
Copy link
Contributor Author

squahtx commented Nov 5, 2021

Turns out that if someone sends you a glob with three * in a row, that DoSes your entire process if you naively convert it to .*.*.*.

All the globs that Sygnal has to deal with come from the config thankfully. I do agree that we should move glob handling into syutil at some point and ditch this implementation.

@reivilibre
Copy link
Contributor

I wonder if @richvdh feels strongly enough about making reviving syutil a blocker for this or not? There's the usual story of there being nothing more permanent than a temporary solution :P.

Otherwise, I'm happy to accept this now

@richvdh
Copy link
Member

richvdh commented Nov 8, 2021

I wonder if @richvdh feels strongly enough about making reviving syutil a blocker for this or not?

I guess not, but honestly I feel like we have this discussion every time someone touches some code that ought to be in syutil - "it's done now, let's leave it until next time". At some point we really have to just draw a line.

IOW: I'd be much happier to see this fixed with syutil, but if you just want to get on and land the fixes, I won't cry too much provided people actually promise to use syutil next time.

Copy link
Contributor

@reivilibre reivilibre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess not, but honestly I feel like we have this discussion every time someone touches some code that ought to be in syutil - "it's done now, let's leave it until next time". At some point we really have to just draw a line.

I'd have to agree with this (without wanting to pick on you!); I've opened matrix-org/matrix-python-common#1, it's up to you if you want to take it on or not right now (though I think I'll raise it at the next triage meeting :)).

@callahad
Copy link
Contributor

callahad commented Nov 9, 2021

Let's land it right now with the copy+pasted code and commit to extracting it out before the end of the month.

@richvdh If we don't extract it in time, I'll owe you a tipple of your choice.

@squahtx
Copy link
Contributor Author

squahtx commented Dec 3, 2021

Replaced by #281

@squahtx squahtx closed this Dec 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants