Skip to content

Commit

Permalink
Rewrite fuzzy rule to not contain the trailing ? when querystring is …
Browse files Browse the repository at this point in the history
…empty
  • Loading branch information
benoit74 committed Mar 18, 2024
1 parent fcaf1ad commit 60d174b
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion src/warc2zim/url_rewriting.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@
r"=[^&]+).*",
"replace": r"youtube.fuzzy.replayweb.page/\1\2",
},
{"pattern": r"([^?]+\?)[\d]+$", "replace": r"\1"},
{"pattern": r"([^?]+)\?[\d]+$", "replace": r"\1"},
{
"pattern": r"(?:www\.)?youtube(?:-nocookie)?\.com\/(youtubei\/[^?]+).*(videoId["
r"^&]+).*",
Expand Down
2 changes: 1 addition & 1 deletion tests/test_fuzzy_rules.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ def test_fuzzyrules_google_video_infos(google_video_info_case):
params=[
ContentForTests(
"www.example.com/page?1234",
"www.example.com/page?",
"www.example.com/page",
),
ContentForTests(
"www.example.com/page?foo=1234",
Expand Down

0 comments on commit 60d174b

Please sign in to comment.