Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excludes: During directory traversal, use QRegularExpression #6063

Merged
merged 2 commits into from
Oct 5, 2017

Conversation

guruz
Copy link
Contributor

@guruz guruz commented Sep 27, 2017

On my Mac, this halves the time spent in csync_excluded_traversal
when using check_csync_excluded_performance.
On Linux the changes were not so much visible, maybe the fnmatch
there is much faster.

@guruz
Copy link
Contributor Author

guruz commented Sep 27, 2017

Done with a lot of help from @ckamm !
Kudos..

@ckamm ckamm force-pushed the regexp_exclude_2017_v2 branch from 49ca551 to 781d704 Compare September 28, 2017 10:15
QString pattern = "^(" + exclude_only + ")$|^(" + exclude_and_remove + ")$";
ctx->parsed_traversal_excludes.regexp_exclude.setPattern(pattern);
ctx->parsed_traversal_excludes.regexp_exclude.setPatternOptions(QRegularExpression::OptimizeOnFirstUsageOption
#if defined(Q_OS_WIN) || defined(Q_OS_MAC)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be using the check for case sensitivity in utility

}
return false;

return false;
}

static CSYNC_EXCLUDE_TYPE _csync_excluded_common(c_strlist_t *excludes, const char *path, int filetype, bool check_leading_dirs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My benchmark test shows that this is still significant when there are no exclude. Can they not be put in the regexp?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you mean? That excluded_common still shows up in benchmarks even when there are no excludes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes exactly. So this part is still slow and perhaps could be improved if merged within the same regexp.
(possibly in another capture?)

Altough i did not test the new patch which check that the file is not starting with . so maybe it is ok.


if (len_filename == 3 || (len_filename > 3 && filename[3] == '.')) {
size_t n_words = sizeof(win_reserved_words_3) / sizeof(char *);
for (size_t j = 0; j < n_words; j++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be written:

for (const char *word : win_reserved_words_3)

(less error prone)


if (len_filename == 4 || (len_filename > 4 && filename[4] == '.')) {
size_t n_words = sizeof(win_reserved_words_4) / sizeof(char *);
for (size_t j = 0; j < n_words; j++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same here)

}

size_t n_words = sizeof(win_reserved_words_n) / sizeof(char *);
for (size_t j = 0; j < n_words; j++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and here)

return s;
}

void csync_exclude_traversal_prepare(CSYNC *ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if this should not be a member function of the parsed_traversal_excludes class

} else {
bname = path;
}
QString p = QString::fromUtf8(bname);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move under the if

@ckamm ckamm force-pushed the regexp_exclude_2017_v2 branch from 781d704 to e1c625d Compare September 28, 2017 10:49
@ckamm
Copy link
Contributor

ckamm commented Sep 28, 2017

@ogoffart updated

@ogoffart
Copy link
Contributor

From jenkins:

CMakeFiles/owncloudsync.dir/objects.a(discoveryphase.cpp.obj):discoveryphase.cpp:(.text+0x573d): undefined reference to `csync_exclude_traversal_prepare(csync_s*)'

@guruz
Copy link
Contributor Author

guruz commented Sep 29, 2017

@ogoffart Jenkins passing, review needed:)

@ckamm ckamm force-pushed the regexp_exclude_2017_v2 branch from b574e42 to b61b294 Compare September 29, 2017 12:41
void prepare(c_strlist_t *excludes);

QRegularExpression regexp_exclude;
c_strlist_t *list_patterns_with_slashes = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not leaking? Where is it destroyed?

Copy link
Contributor

@ogoffart ogoffart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 because of the leak.

QRegularExpression regexp_exclude;
c_strlist_t *list_patterns_with_slashes = nullptr;
/* FIXME ^^ at a later point use QRegularExpression too if those become popular */
} parsed_traversal_excludes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be a pointer to the ExcludedFiles object instead, and everything should be put in the ExcludeFiles class. This would avoid the leak. Curently , the ~ExcludeFiles take care about destroying the excludes pointer.
Also now i guess we could use QVector instead od c_strlist_t.

@ogoffart
Copy link
Contributor

ogoffart commented Oct 2, 2017

Also it seems that this does not optimize csync_excluded_no_ctx which is called by ExcludedFiles::isExcluded, this is important as well at startup as it is called for every directory while initiating the file system watcher... (see also #5245 )

@guruz
Copy link
Contributor Author

guruz commented Oct 2, 2017

@ogoffart Pushed a fix for the leak. Looks good?

PS: I agree with optimizing the _no_ctx too. I don't agree with having the list in Excludes object, it feels weird to have the ownership there. The ownership of something inside CSYNC* should be from CSYNC*

Copy link
Contributor

@ogoffart ogoffart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree with having the list in Excludes object, it feels weird to have the ownership there. The ownership of something inside CSYNCshould be from CSYNC

well, the question is if it should still be inside CSYNC. Also, the excludes list is already owned by the Excludes obect. The journal is already owned by the folder or the sync engine. And making it a aprt of exclude also means we keep the compled regexp there even between sync.

/* Only for bnames (not paths) */
static QString convertToBnameRegexpSyntax(QString exclude)
{
QString s = QRegularExpression::escape(exclude).replace("\\*", ".*").replace("?", ".");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? would be escaped to \?, so that's wrong.

Also the [ and ] needs de-escaping, same for {}, these were supported by fnmatch, i don't know if people have used them in expressions.
And the backslashes needs de-escaping ( \\\\ -> \\ ), maybe more. (What if somebody had a pattern to ingore files with a * (yes, you can have files with * on unix), it would look like *\** after escaping it whould be \*\\\*\* and we need to convert it to .*\*.*

Copy link
Contributor

@ckamm ckamm Oct 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on fixing "?" (and adding tests for it!)

I'm not convinced it's worth it to try to emulate fnmatch behavior completely. For example PathMatchSpec we use on windows also offers quite different matching options.

If we want to be 100% backwards compatible we'll need to introduce a new pattern-prefix character to go into the 'fast regex' codepath and can never get rid of the fnmatch based codepath.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would not need a pattern-prefix. We could just detect occurence of [ or { in the pattern and then put them in the slash pattern section. they should be rare anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ogoffart True. Minor: { does not seem to have special meaning to fnmatch, only ?, *, [ and escaping. (had to check glibc's fnmatch_loop.c)

static QString convertToBnameRegexpSyntax(QString exclude)
{
QString s = QRegularExpression::escape(exclude).replace("\\*", ".*").replace("?", ".");
//qDebug() << "Converted pattern" << exclude << "to regex" << s;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please cleanup.

@guruz guruz force-pushed the regexp_exclude_2017_v2 branch from f75fbab to e01b292 Compare October 4, 2017 07:29
@guruz
Copy link
Contributor Author

guruz commented Oct 4, 2017

@ogoffart fixup pushed

@guruz
Copy link
Contributor Author

guruz commented Oct 5, 2017

thanks!

will rebase and squash my 3 commits and have @ckamm 's on top

guruz and others added 2 commits October 5, 2017 10:16
On Mac, this halves the time spent in csync_excluded_traversal
when using check_csync_excluded_performance. A similar performance
increase is seen on linux.
@guruz guruz force-pushed the regexp_exclude_2017_v2 branch from e01b292 to a985b2f Compare October 5, 2017 08:19
@guruz guruz merged commit 99d4060 into master Oct 5, 2017
@guruz guruz deleted the regexp_exclude_2017_v2 branch October 5, 2017 08:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants