Skip to content
This repository has been archived by the owner on Feb 23, 2025. It is now read-only.

Optimalization when scrapping the same page for multiple languages #32

Open
C0rn3j opened this issue Jul 6, 2018 · 0 comments
Open

Comments

@C0rn3j
Copy link

C0rn3j commented Jul 6, 2018

At the moment I have this simple scrapper - https://haste.c0rn3j.com/ahiyofahuf.py

It takes a word and scraps it in two languages. This however seems to send two requests to Wiktionary instead of just one (it is after all requesting the same page).

Is there a way I can scrap both languages in one request as to make the process faster and load on Wiktionary smaller?

EDIT: Assuming this is not currently implemented.

The parser could save the whole pages to /tmp/WiktionaryParser/. /tmp/ on every decent distro gets cleaned after reboot, and it should be a tmpfs on most distros (RAM storage).

So the parser just goes to check /tmp if the file is already there and not older than let's say 24 hours(user configurable?), and acts accordingly.

I think this should be user configurable behavior in case scrapping XXk pages can take a lot of memory.

If implemented, it should be mentioned on the README.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant