-
Notifications
You must be signed in to change notification settings - Fork 4
snapshotter: use wcmatch.glob.globmatch function #154
Conversation
5fbbe41
to
341351f
Compare
If we have to add a dependency, then so be it :( I'm worried about two things :
I'm wondering if we could just replace the
As far as I know, the only case where we have subfolder in frozen parts is if we have projections in the table: In any case, we should have some projection in (m3db also uses |
Yes. I have tried to quarantine its use to the
This is a good point, it is like 20x faster to compile it first and this shouldn't be too much of a change. In [6]: %timeit Path('hello/asdf/sdf/world').match('hello/*/*/world')
3.26 µs ± 34.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [7]: %timeit path_matches_glob('hello/asdf/sdf/world', 'hello/*/*/world')
3.63 µs ± 17.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [8]: %timeit path_matches_glob('hello/asdf/sdf/world', 'hello/**/world')
3.96 µs ± 13.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [13]: compiled = re.compile(wcmatch.glob.translate('hello/**/world')[0][0])
In [14]: %timeit compiled.match('hello/asdf/sdf/world') is not None
196 ns ± 1.26 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [15]: Path('abc/123').match('**')
Out[15]: True
In [16]: Path('abc/123').match('*')
Out[16]: True
In [20]: Path('abc/123/456').match('abc/**')
Out[20]: False
In [21]: Path('abc/123/456').match('abc/*')
Out[21]: False
In [23]: Path('abc/123/456').match('**/456')
Out[23]: True
In [24]: Path('abc/123/456').match('*/456')
Out[24]: True
In [25]: Path('abc/123/456/456').match('abc/*/456')
Out[25]: False |
👍 In addition to pure speed that is visible in microbenchmark, that will avoid thrashing memory with thousand of short lived objects.
That's fair, let's use the library then. |
6a66cbf
to
1db69cf
Compare
Okay this one is ready now I think, I have put all the SnapshotGroup glob stuff into a little library |
605ed0e
to
31c89d1
Compare
All existing python glob match functions seem to have issues. Path.match does not accept '**'. When using fnmatch, '*' matches a / character which is not what we want. To fix this mess, I've introduced a new 3rd party library wcmatch to handle the globmatching.
* also fix bug with not reporting progress "done"
31c89d1
to
63b5063
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #154 +/- ##
==========================================
+ Coverage 87.02% 87.25% +0.22%
==========================================
Files 122 141 +19
Lines 7648 9980 +2332
==========================================
+ Hits 6656 8708 +2052
- Misses 992 1272 +280 ☔ View full report in Codecov by Sentry. |
All existing python glob match functions equivalent to
match(path: Path, glob: str) -> bool
have serious issues.Path.match
does not accept**
until python 3.13, see here. We need this for ClickHouse.fnmatch
accepts*
, but this character is translated to regex.*
and matches a/
character, which is not what we want.Neither of these functions passed the test I wrote.
To fix this mess, I've introduced a new 3rd party library
wcmatch
to handle the glob matching. I did not want to have to use a 3rd party library, but the matching logic is pretty complex and I did not want to write it myself. In the (very distant) future when our minimum supported python is python3.13 we can strip this out.