-
-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wiktionary/Wikivoyage zim databases lag website by five months #1397
Comments
Are you talking about Wiktionary in English? Which content exactly is missing (two screenshots would be helpful)? |
Yes English. Here is an example of a diff from August which is missing from the December 2020 Kiwix Wiktionary Zim file. I just picked it at random, so far the December Zim file seems to be missing everything since around June or so. https://en.wiktionary.org/w/index.php?title=rocker&diff=prev&oldid=60027083 Someone added a sense to "rocker", number 4 here: Here's the Kiwix screenshot where you can see that it's missing: I guess the answer to my other question is that there is no reason for the Zim file to be out of date then? Certainly as a software developer I would expect the Zim file to have embedded in it a date corresponding to when it was compiled, so that this kind of ad-hoc testing would not be necessary. Or does it get updated one word at a time, so different dictionary entries are out of date by different amounts? But in that case I would expect each entry to come with a timestamp... |
@archenemies I will have a look (and move the ticket), but looks like a problem with a root cause in Wikimedia infrastructure. |
@archenemies BTW, revision id, like revision date are available in the upstream link in the foorter of each article. |
That's interesting about the upstream link in the footer, well "rocker" has the wrong link https://en.wiktionary.org/wiki/?title=rocker&oldid=61038509 because it points to a revision from 4 November 2020 with the "breve below" sense #4 filled in, but the page that Kiwix serves me lacks that sense. |
It looks like to be a bug in the Wikimedia REST API because it simply does not deliver the latest version (like you reported). See: https://en.wiktionary.org/api/rest_v1/page/mobile-sections/rocker. This is the root of the bug. On the I will do the necessary on both sides to improve the situation. |
A bug ticket has been open upstream at https://phabricator.wikimedia.org/T274359 |
@MananJethwani Here again this is "complicated" to change due to the architecture. |
@kelson42 Thank you so much for tracking that down and re-reporting the bug |
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions. |
Just to track and keep this issue fresh, it is still impossible to open the article "Cambridge" from the 2021-09 English Wikivoyage ostensibly due to this bug. (Cambridge is a major tourist destination pre- and post-pandemic, so it is a quite serious upstream bug!) |
See as well https://phabricator.wikimedia.org/T226931. It seems there is a momentum these days to fix it upstream... |
"Cambridge" still inaccessible in the December Wikivoyage in English... The lag hasn't caught up yet... |
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions. |
It looks like it would in some cases at least... |
A slight advance: the article |
I know this is just adding more examples, but just to underline the gravity of this issue, now almost anywhere I look in the latest English Wikivoyage ZIM ( At this point, it's not possible to recommend travelling with the latest Wikivoyage ZIMs! Do we have a timeline on switching to the new API? It is becoming quite urgent, unfortunately. |
It looks like this may finally be fixed in principle: see https://phabricator.wikimedia.org/T226931. The caveat is that articles will only get updated once they are edited after today, in which case the mobile-sections endpoint for the article should update. If an article is not edited since the fix, then the cache won't be changed, and it will still continue to serve out-of-date content. |
@Jaifroid They should purge the full cache, otherwise our bug won't be fixed. |
Well, I agree, but the maintainers are being cautious and want to watch the change for a couple of weeks to make sure they haven't introduced a regression. I think we can push for a full cache purge if everything seems OK..., |
I'm happy to report that the latest English Wikivoyage (February 2023) is now showing the latest revision for the article on Mompox (one of my test articles). I updated that page after the fix in https://phabricator.wikimedia.org/T226931, as a test, and it's pleasing to see that the fix has worked and made it into the Wikivoyage ZIM. Additionally, the dated COVID-19 warning no longer appears on the Argentina country page, though it still appears on some other Latin American country pages (e.g. Colombia). This is because some pages will have been edited since the fix, and others not yet. Even a null edit will, apparently, update the cache now. It would be worth testing Wiktionary pages that have been updated since round about 8th February. After confirmation, I think this issue could be closed. |
@Jaifroid Shoukd I restart a specific scrape for wiktionary? |
@kelson42 The last English Wiktionary scrape appears to be 31st October 2022 (at least that's the last one on download.kiwix.org), so yes, it would be good to try to get a new scrape if possible, though we could test other languages if we can identify a page updated since 8th Feb (or update a page manually with a minor edit). It might be worth doing this in a controlled way: make a minor edit to a page we know to be problematic, then run the scrape? |
OK, I'll edit one of the reported pages above and will let you know when done so you can initiate a scrape. |
@Jaifroid any new recipe to relaunch? |
Sorry, I realized I would have to download the latest available Wiktionary archive to find an article that is not updated... Nearly there. |
@kelson42 OK, I've made a minor edit (adding a derived word) to the So you could run a new scrape of English Wiktionary. We need a new one in any case, since the last one is a bit old now. |
@kelson42 Please note I meant WIKTIONARY, not Wikivoyage! |
Also something seems to be up these days with https://download.kiwix.org/zim/wiktionary/ . The last nopics were 2022-10 and 2022-09 and the last maxis were 2022-09 and 2022-07. The nopic used to be released every month, and the maxi used to be every 3 months. I use the maxi zims, but it's already 2023-02 now. |
@danielzgtg There was an issue about this here: #1789. It's been fixed very recently (fingers crossed). |
@Jaifroid Actually I have checked and your revision is delivered properly by the API, see https://en.wiktionary.org/api/rest_v1/page/mobile-sections/rocker. Closing the ticket. |
Thanks, @kelson42 -- great to be able to close this issue finally! |
I have a Wikitionary Zim file from December 2020, which I downloaded using the GUI kiwix-desktop interface (2020-12-10; "Pictures, Fulltext index"; 5.65 GB).
This works great for me but I'm not sure how to figure out which Wiktionary it is based on.
It lacks changes to Wiktionary made in August 2020, although it contains changes from May 2020.
Where can I find out which Wiktionary dump a Zim file is based on, and how do I find a Zim file which is based on a current version of Wiktionary?
(And where should I submit this issue?)
The text was updated successfully, but these errors were encountered: