Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

url process logical error #26

Open
332plim opened this issue Nov 9, 2024 · 2 comments
Open

url process logical error #26

332plim opened this issue Nov 9, 2024 · 2 comments

Comments

@332plim
Copy link

332plim commented Nov 9, 2024

in playwrite_scraper.py

line 306-308:
if part.isdigit():
path_parts[i] = "{page}"
return '/'.join(path_parts)
this part of the code change all number to 1
for example:
if the url is pure number such as "https://example.com/742148134880162944"
it will change to "https://example.com/1"
reproduced it natively

@332plim
Copy link
Author

332plim commented Nov 9, 2024

suggest adding a limit, like:

        if part.isdigit() and int(part) < 1000:
              path_parts[i] = "{page}"
              return '/'.join(path_parts)

usually there won't be that much pages to find, or please change the logic entirely

@itsOwen
Copy link
Owner

itsOwen commented Nov 9, 2024

You can add a pull request I will check it out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants