Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify/document the technical details to help contributors get into it #218

Closed
mossroy opened this issue Apr 22, 2017 · 14 comments
Closed
Assignees
Milestone

Comments

@mossroy
Copy link
Contributor

mossroy commented Apr 22, 2017

As initally mentioned by @sharun-s in #196 :

It would be great if you could add some more detail on the difference between these two modes? I think it will help future contributors to get a better idea of the code and why they exist. Right now given the two different formats zim/evo archive, two different content injection modes (probably a third coming with webrequest), the weird zim format(whose advantages aren't clear), references to firefox os/geo location etc the barrier to entry for a dev to figure out what is going on is a bit high.

I think this is a fantastic project with lot of potential for all kinds of offline use-cases and I feel you will get more devs involved and exploring those use cases, if the core pieces of this project are well documented. Thanks for all the work you guys do!

@mossroy mossroy added this to the v2.1 milestone Apr 22, 2017
@mossroy mossroy self-assigned this Apr 22, 2017
@mossroy
Copy link
Contributor Author

mossroy commented Apr 22, 2017

@sharun-s : you're probably right, even if some of this info is already available.
Here is the situation of this application :

  • It was initially targeting Firefox OS, with Evopedia archive format. Both are now discontinued (as mentioned at the end of https://github.com/kiwix/kiwix-html5/blob/master/README.md)
  • Now the main target is desktop (as browser extensions), with ZIM archive format (the format used by the kiwix project). But Firefox OS and Evopedia archives are still supported
  • Drop compatibility with Evopedia archives #195 discusses about dropping Evopedia compatibility. It's true that dropping it would remove some legacy code, and help understand the code for developers. The main reason to keep it is for the geolocation feature, that might be implemented in ZIM files too, but I agree it's not a strong argument
  • I agree with you that mentions of Firefox OS should not bother the user or the developer. But I'm more reluctant to drop compatibility with it. There's not a lot of specific code for it, and I still use it on my phone
  • There's some more legacy code that might be removed about Phonegap : there has been an attempt to port this code on Phonegap, but it has never been finished. This could be removed to simplify, too

Regarding the archive file formats :

  • the Evopedia file format is discontinued with Evopedia itself : there will be no more new archives
  • the ZIM file format is actively used, and much more generic : it can hold wikimedia content, but also many more websites, content-types etc

Regarding the different content injection modes :

Regarding the geolocation feature :

  • the idea is to be able to find articles around your location (when visiting, for example). For that to work, you first need a compatible archive format (Evopedia is, ZIM is not yet as far as I know, but some work has been already done on it, so it might be in the future). Then you need to implement the algorithm to use it (it has been done for Evopedia, and should probably be adapted for ZIM)
  • for now, I kept it in the hope it will be soon available in ZIM files. It's very convenient to be able to compare your work with a working implementation. But, if it's unsure or too far in the future, it might be better to remove it

@mossroy
Copy link
Contributor Author

mossroy commented Apr 22, 2017

I would really love to have more contributors on this project.
If this is a strong barrier, it's worth working on it.

After writing the comment above, I feel that a "quick win" would be to remove legacy code and drop compatibility for Evopedia, geolocation feature, and Phonegap. I'll try when I find time to do it (pull requests are always welcome).
Dropping Firefox OS support will come, but I'm not ready for that ;-)
Simplifying the injection modes does not seem to be possible for now because of various API support.

@sharun-s : with the explanations above, what would be your expectation about documentation? I would suggest putting more detail on the 2 (or 3) injection modes in the README : would it be enough? (supposing I also drop the legacy code mentioned above)

@sharun-s
Copy link
Contributor

sharun-s commented Apr 22, 2017

@mossroy Thanks for the detailed reply. Appreciate it!
I am maintaining some notes as I go through the code. I'll add them here once I am done. I don't think you need to remove "jquery method"/Firefox OS/geolocation, Evopedia archive support as long as the core/basic pieces of the project are documented and module/function naming is clearer.

Some examples of issues that weren't clear to me as a newcomer to the code (please take it with a pinch of salt, as I haven't been through the entire code yet) -

  • Archive Format Differences
    A line or two about how the data/index is stored/loaded/searched for each currently supported format. I will post when done what I understand. You can then correct any bad assumptions/missing info etc.
  • Single archive/multiple archive files
    As I just downloaded a single ZIM file (I am testing with wiktionary) and only saw single ZIM's on the kiwix site, I assumed it meant different ZIM's as in wiktionary, wikiquotes, wikipedia etc as opposed to multiple files of one ZIM/archive?.
  • Naming as a source of ambiguity
    seeing selectedArchive = localarchive OR zimarchive was another source of confusion. I assumed localarchive referred to an evopedia archive. Add the above variables and it gets fun :-)
    There is overloading of the meaning of words like "archive" and "loading". Loading for example can refer to design choice (jquery/serviceworker/webrequest), loading process (devicestorage scan/file select ui/cookie/browser history) and ofcourse loading of different file format/number of files. Sometimes these things are also encoded where the word archive is used localArchive, archivefiles, archivelist etc.

That said I see now (thanks to your notes) that the code is in transition (Evo=>FirefoxOS=>WebExtension) and these things will clear up eventually. I don't think anything has to be deleted right now. But thinking about explicit naming/interfaces/better docs going forward will make things much clearer even during transition the next time. I'll add some suggestions when I am done with my notes. Will post in a day or two.

@sharun-s
Copy link
Contributor

I am taking my own sweet time plodding through the code. Will submit docs when done. Just wanted to make a note here on something I observed.

I am realizing there are 4-5 "app modes" existing side by side with this project -
FirefoxOS, WebExtension, Standalone*, LocalWebApp*, Mobile(Cordova/PhoneGap).

  • Standalone meaning it works just by clicking the index.html
  • LocalWebApp meaning running a little local http server to serve things at localhost

Each one operating under slightly different constraints (mostly related to cross-domain access and local file access policy) and supporting different use cases/features. The UI and the configurable data needs to be handled differently depending on the mode.

I see why this has happened as most of the code is reusable across all modes, but this is also complicating life. When I see an issue, it is not clear which mode to target the fix for, or if the fix can apply to all modes (for eg what I mentioned using a URL as a way to directly access content - #177 this is easy to do (with FirefoxOS/Cordova) by adding a archiveName param to the url to match against files on device, but not straightforward for the others ).

One way maybe to deal with these modes is to add a "mode" tag to each issue. And maybe down the road build targeting specific modes. To devs, it will help get an idea which mode has which issues and support/require which features. Right now I can say creating docs involves writing a lot of if conditions.

@mossroy
Copy link
Contributor Author

mossroy commented Apr 26, 2017

You also can multiply these "app modes" by the different browsers/versions, which can have different behaviors ;-) And there are also other ideas like making an Electron desktop app...
That's the pros and cons of web technologies : most of the code is exactly the same, but you sometimes have to do a lot of adjustments to be widely compatible.

I don't know exactly what notes/docs you're making, but don't spend too much time on all these different modes. As I said earlier, some of them might be removed to reduce the effort of documenting & maintaining.
For now, the main target is WebExtensions (for Firefox and Chrome) on ZIM files.

Using tags on issues can be a good idea. But we need to keep them simple, like having only one tag to identify issues on secondary modes : something like "not-main-usage"?

@kelson42
Copy link
Collaborator

@mossroy I tend to think that code which is not used and where we do not have immediate plan to use should be removed (still in the history) or for parts which might be still of interest, just put in a dedicated branch. Having a code base which is easily understandable for new comers is important - keeping only the pieces of code which are effectively useful in the current product, is an easy step to do to implement that vision.

@mossroy
Copy link
Contributor Author

mossroy commented May 10, 2017

I merged #226 and #227, which should help for this issue.
I also created a "not-main-usage" tag for issues regarding Firefox OS (or other usages of this code that are not the main target)
Please let me know if you would expect something else, or if I can close this issue

@sharun-s
Copy link
Contributor

sharun-s commented May 10, 2017

Issue with "non-main-usage" tag is that the meaning of main usage has changed over time and it's hard to say it won't again in future ( given your luck ;-) ) .

For example I have been using the code from a month back and service worker mode doesn't work. Would be convenient to lookup a tagged feature commit, but I leave it to you as I am no git expert. There are probably ways to search and find it.

The Readme should have a line with something like -
"For old features currently unsupported please refer to the github history or contact xyz.
They include - FirefoxOS webapp implementation, a partial mobile app, search nearby articles implementation [, whatever else]". Dev's who are interested in this stuff should know code has already been written. Also it might spark someone's imagination to do find nearby images etc.

A simple summary of the ZIM file format would have helped me greatly at the outset. Like so-

  1. All web resources[html, js, images, css] are stored as compressed\uncompressed blobs
  2. Multiple blobs form a cluster
  3. Content is accessible via a URL or a Article Title.
  4. Mapping from URL\Title to cluster+blob is maintained in a Directory Entry metadata struct.
  5. A list of pointers to all DirectoryEntry structs ordered by URL and Title is maintained in the ZIM.
    For further details please look at the openzim website.

I don't know how many times I visited the openzim page and closed it because the first thing I saw was just the header table referring to terms as yet undefined. Even though the image makes sense in hindsight, a short summary makes that page much less intimidating.

Since Title.js has been removed references to Title object can/should also be removed.
Like readArticle and displayArticleInForm use title everywhere when they really mean DirEnt. Haven't had time to check but there are probably other function/variable names where title refers to dirent.

Need a comment somewhere on TitleID
Kind of undocumented where this title id string came from, who put it there, it's format, why it was reqd etc. I guess it's a perf (lookup) optimization.

All that said good to see things are looking much simpler!

@mossroy
Copy link
Contributor Author

mossroy commented May 28, 2017

After reading the code again, I agree with you that the notions of Title (and TitleId) are unclear, I'll try to refactor that.
I'll also add some explanations on old features in the README.

Regarding explanations on the ZIM file format, I thing it would be better to put everything in the openzim web page, so that it is always up-to-date, and available for all the applications using this file format.
@kelson42 might have an opinion on that.
Technically, this web page is a wiki : I think you can create an account on it, and edit the content

@mossroy
Copy link
Contributor Author

mossroy commented May 28, 2017

@sharun-s : I created the PR #241. Could you please review it?

@sharun-s
Copy link
Contributor

@mossroy ok. Just post comments if any there right?

@mossroy
Copy link
Contributor Author

mossroy commented May 28, 2017

Please post your comments in the PR itself

@kelson42
Copy link
Collaborator

@mossroy @sharun-s

The ZIM format specs has been created and put on the wiki pretty quickly. I agree that we should improve them. That said I do not really have a process clear in my mind about how to do that.

I suspect we should have a proper rewrite of that spec next time we will make a major change, which I hope will happen before end of year.

Meanwhile, I think it would be helpful to gather complains about the spec explanaition directly in the taslk page http://www.openzim.org/wiki/Talk:ZIM_file_format

People will be contacted back at the time we would be so far.

@mossroy
Copy link
Contributor Author

mossroy commented May 28, 2017

I close this issue as it is mainly fixed by PR #241 and the previous ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants