You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 23, 2025. It is now read-only.
It takes a word and scraps it in two languages. This however seems to send two requests to Wiktionary instead of just one (it is after all requesting the same page).
Is there a way I can scrap both languages in one request as to make the process faster and load on Wiktionary smaller?
EDIT: Assuming this is not currently implemented.
The parser could save the whole pages to /tmp/WiktionaryParser/. /tmp/ on every decent distro gets cleaned after reboot, and it should be a tmpfs on most distros (RAM storage).
So the parser just goes to check /tmp if the file is already there and not older than let's say 24 hours(user configurable?), and acts accordingly.
I think this should be user configurable behavior in case scrapping XXk pages can take a lot of memory.
If implemented, it should be mentioned on the README.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
At the moment I have this simple scrapper - https://haste.c0rn3j.com/ahiyofahuf.py
It takes a word and scraps it in two languages. This however seems to send two requests to Wiktionary instead of just one (it is after all requesting the same page).
Is there a way I can scrap both languages in one request as to make the process faster and load on Wiktionary smaller?
EDIT: Assuming this is not currently implemented.
The parser could save the whole pages to /tmp/WiktionaryParser/. /tmp/ on every decent distro gets cleaned after reboot, and it should be a tmpfs on most distros (RAM storage).
So the parser just goes to check /tmp if the file is already there and not older than let's say 24 hours(user configurable?), and acts accordingly.
I think this should be user configurable behavior in case scrapping XXk pages can take a lot of memory.
If implemented, it should be mentioned on the README.
The text was updated successfully, but these errors were encountered: