-
Notifications
You must be signed in to change notification settings - Fork 669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excludes: During directory traversal, use QRegularExpression #6063
Conversation
Done with a lot of help from @ckamm ! |
49ca551
to
781d704
Compare
src/csync/csync_exclude.cpp
Outdated
QString pattern = "^(" + exclude_only + ")$|^(" + exclude_and_remove + ")$"; | ||
ctx->parsed_traversal_excludes.regexp_exclude.setPattern(pattern); | ||
ctx->parsed_traversal_excludes.regexp_exclude.setPatternOptions(QRegularExpression::OptimizeOnFirstUsageOption | ||
#if defined(Q_OS_WIN) || defined(Q_OS_MAC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be using the check for case sensitivity in utility
} | ||
return false; | ||
|
||
return false; | ||
} | ||
|
||
static CSYNC_EXCLUDE_TYPE _csync_excluded_common(c_strlist_t *excludes, const char *path, int filetype, bool check_leading_dirs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My benchmark test shows that this is still significant when there are no exclude. Can they not be put in the regexp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what you mean? That excluded_common still shows up in benchmarks even when there are no excludes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes exactly. So this part is still slow and perhaps could be improved if merged within the same regexp.
(possibly in another capture?)
Altough i did not test the new patch which check that the file is not starting with .
so maybe it is ok.
src/csync/csync_exclude.cpp
Outdated
|
||
if (len_filename == 3 || (len_filename > 3 && filename[3] == '.')) { | ||
size_t n_words = sizeof(win_reserved_words_3) / sizeof(char *); | ||
for (size_t j = 0; j < n_words; j++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be written:
for (const char *word : win_reserved_words_3)
(less error prone)
src/csync/csync_exclude.cpp
Outdated
|
||
if (len_filename == 4 || (len_filename > 4 && filename[4] == '.')) { | ||
size_t n_words = sizeof(win_reserved_words_4) / sizeof(char *); | ||
for (size_t j = 0; j < n_words; j++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(same here)
src/csync/csync_exclude.cpp
Outdated
} | ||
|
||
size_t n_words = sizeof(win_reserved_words_n) / sizeof(char *); | ||
for (size_t j = 0; j < n_words; j++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and here)
src/csync/csync_exclude.cpp
Outdated
return s; | ||
} | ||
|
||
void csync_exclude_traversal_prepare(CSYNC *ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering if this should not be a member function of the parsed_traversal_excludes class
src/csync/csync_exclude.cpp
Outdated
} else { | ||
bname = path; | ||
} | ||
QString p = QString::fromUtf8(bname); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move under the if
781d704
to
e1c625d
Compare
@ogoffart updated |
From jenkins:
|
@ogoffart Jenkins passing, review needed:) |
b574e42
to
b61b294
Compare
src/csync/csync_private.h
Outdated
void prepare(c_strlist_t *excludes); | ||
|
||
QRegularExpression regexp_exclude; | ||
c_strlist_t *list_patterns_with_slashes = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not leaking? Where is it destroyed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1 because of the leak.
src/csync/csync_private.h
Outdated
QRegularExpression regexp_exclude; | ||
c_strlist_t *list_patterns_with_slashes = nullptr; | ||
/* FIXME ^^ at a later point use QRegularExpression too if those become popular */ | ||
} parsed_traversal_excludes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should be a pointer to the ExcludedFiles object instead, and everything should be put in the ExcludeFiles class. This would avoid the leak. Curently , the ~ExcludeFiles take care about destroying the excludes
pointer.
Also now i guess we could use QVector instead od c_strlist_t.
Also it seems that this does not optimize csync_excluded_no_ctx which is called by ExcludedFiles::isExcluded, this is important as well at startup as it is called for every directory while initiating the file system watcher... (see also #5245 ) |
@ogoffart Pushed a fix for the leak. Looks good? PS: I agree with optimizing the _no_ctx too. I don't agree with having the list in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree with having the list in Excludes object, it feels weird to have the ownership there. The ownership of something inside CSYNCshould be from CSYNC
well, the question is if it should still be inside CSYNC. Also, the excludes
list is already owned by the Excludes obect. The journal is already owned by the folder or the sync engine. And making it a aprt of exclude also means we keep the compled regexp there even between sync.
src/csync/csync_exclude.cpp
Outdated
/* Only for bnames (not paths) */ | ||
static QString convertToBnameRegexpSyntax(QString exclude) | ||
{ | ||
QString s = QRegularExpression::escape(exclude).replace("\\*", ".*").replace("?", "."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
would be escaped to \?
, so that's wrong.
Also the [ and ] needs de-escaping, same for {}, these were supported by fnmatch, i don't know if people have used them in expressions.
And the backslashes needs de-escaping ( \\\\
-> \\
), maybe more. (What if somebody had a pattern to ingore files with a *
(yes, you can have files with *
on unix), it would look like *\**
after escaping it whould be \*\\\*\*
and we need to convert it to .*\*.*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on fixing "?" (and adding tests for it!)
I'm not convinced it's worth it to try to emulate fnmatch
behavior completely. For example PathMatchSpec
we use on windows also offers quite different matching options.
If we want to be 100% backwards compatible we'll need to introduce a new pattern-prefix character to go into the 'fast regex' codepath and can never get rid of the fnmatch
based codepath.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would not need a pattern-prefix. We could just detect occurence of [ or { in the pattern and then put them in the slash pattern section. they should be rare anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ogoffart True. Minor: {
does not seem to have special meaning to fnmatch, only ?, *, [ and escaping. (had to check glibc's fnmatch_loop.c)
src/csync/csync_exclude.cpp
Outdated
static QString convertToBnameRegexpSyntax(QString exclude) | ||
{ | ||
QString s = QRegularExpression::escape(exclude).replace("\\*", ".*").replace("?", "."); | ||
//qDebug() << "Converted pattern" << exclude << "to regex" << s; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please cleanup.
f75fbab
to
e01b292
Compare
@ogoffart fixup pushed |
thanks! will rebase and squash my 3 commits and have @ckamm 's on top |
On Mac, this halves the time spent in csync_excluded_traversal when using check_csync_excluded_performance. A similar performance increase is seen on linux.
e01b292
to
a985b2f
Compare
On my Mac, this halves the time spent in csync_excluded_traversal
when using check_csync_excluded_performance.
On Linux the changes were not so much visible, maybe the fnmatch
there is much faster.