Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kiwix-serve should reload automatically content (library/files) #243

Closed
kelson42 opened this issue Dec 6, 2018 · 11 comments · Fixed by kiwix/libkiwix#636 or #497
Closed

kiwix-serve should reload automatically content (library/files) #243

kelson42 opened this issue Dec 6, 2018 · 11 comments · Fixed by kiwix/libkiwix#636 or #497
Assignees
Milestone

Comments

@kelson42
Copy link
Contributor

kelson42 commented Dec 6, 2018

If the files open (library/ZIM files) change, then kiwix-serve should automatically detect it and reload them (or load new files and stop providing removed files). All of this should happen without service interruption. The idea here is to allow to update the library without having here to restart the whole service and create a service interruption. The reload (and not restart) should ideally be smart and do modifications in the internal library only where this is necessary.

@kelson42
Copy link
Contributor Author

kelson42 commented Apr 4, 2021

Another appriach might be to send a signal to the kiwix-serve which woukd then do the library reload.

@kelson42
Copy link
Contributor Author

kelson42 commented Aug 19, 2021

@mgautierfr @veloman-yunkan It seems we need more an more urgently this feature. We will launch a dev.library.kiwix.org which purpose seems to me quite obvious and for this purpose we would really benefit of an automatic refresh if a dev. ZIM file has been newly created.

Discussing this ticket with @rgaudin it seems that we should split it in two:

  • Implement first a function to reparse the library XML file (if existing) and reparse ZIM files/paths to secure they don't have changed and reload without any service interruption (this ticket)
  • Implement an optional feature which would monitor the loaded library XML/ZIM files to reload them if they change (would be easier to deal with for "normal" users. (new ticket created at kiwix-serve should reload automatically content (library/files) if files change #476)

So I propose to hook this reload function to the SIGHUP (1) signal. This would make this feature POSIX only but does not sound to me to be a big problem.

@mgautierfr @veloman-yunkan Does that sounds right?

@kelson42
Copy link
Contributor Author

kelson42 commented Sep 5, 2021

@mgautierfr @veloman-yunkan A quick feedback would be much appteciated as this is on the very top of my feature list.

@kelson42
Copy link
Contributor Author

kelson42 commented Oct 3, 2021

We should implement #482 first.

@kelson42
Copy link
Contributor Author

@veloman-yunkan Now than #482 is kind of sorted out can you please implement this ticket?

@kelson42 kelson42 added this to the 3.3.0 milestone Oct 16, 2021
@veloman-yunkan
Copy link
Collaborator

@kelson42 Yes. I will do it after #488 is merged.

@veloman-yunkan
Copy link
Collaborator

veloman-yunkan commented Oct 20, 2021

  • Implement first a function to reparse the library XML file (if existing) and reparse ZIM files/paths to secure they don't have changed and reload without any service interruption (this ticket)

@kelson42 This sentence is a little vague. Will you please elaborate it? What kind of changes to the content should we handle?

  1. A new ZIM file is added to the library
  2. A ZIM file removed from the library
  3. The path of a ZIM file with some id changes
  4. The path of a ZIM file stays the same but the contents of the ZIM file changes
  5. For a ZIM file with some path, its attributes specified in the XML (such as id, url, title, tags, etc) change

@kelson42
Copy link
Contributor Author

@veloman-yunkan Sorry for the late feedback but will try to answer your questions.

Cases 1 - 2 - 3 - 4 - 5

If I rephrase what I have written, whatever is given to kiwix-serve (XML+ZIM or ZIM directly) it has to be re-evalutated. Regarding a ZIM file, I think this means checking the ZIM UUID and if different reload it. Regarding the library.xml, if anything has changed, apply it accordingly on the internal libkiwix library.

For now, our catalogue, available at https://library.kiwix.org et refreshed once a day. That means that a new library.xml is generated and then the kiwix-serve behind https://library.kiwix.org is reloaded. We want to leave this behaviour to have an "immediate" ZIM file available in catalogue once this is published. This should be secured by a new piece of sometware which developement has just started at repository openzim/cms. CMS will be informed immediatly if a new ZIM is made available by the Zimfarm and them, modulo a few checks, will rewrite the library.xml and send a signal to https://library.kiwix.org to reload the library.

So typically this will happen every 10 minutes in average, therefore this is important that this process:

  • runs efficiently (don't reload things if not necessary)
  • without any service interruption.

@mgautierfr
Copy link
Member

@kelson42 Are we agree that if the couple (path/uuid) about a zim file in the xml has not changed, we can assume the file don't have to be reloaded (and if in cache, we don't need to drop it from the cache)?
IE, we assume that we will never have a two different zim files with same uuid and the uuid in the xml is always the real uuid (same than in the zim file).


Other metadata may have changed and so we must update our internal library with the metadata in the xml (not reading them from zim file itself).
The basic behavior is to always trust the metadata in the library.xml. CMS may change the metadata to something different than in the zim file and we want to use the metadata in the xml.

@kelson42
Copy link
Contributor Author

@kelson42 Are we agree that if the couple (path/uuid) about a zim file in the xml has not changed, we can assume the file don't have to be reloaded (and if in cache, we don't need to drop it from the cache)?

Yes, I don't see any scenario why this could go wrong and this sounds importance for the overall performance of the reload.

@veloman-yunkan
Copy link
Collaborator

Though kiwix/libkiwix#636 provides bulk of the code for this enhancement, it will only be finalized when #497 is merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants