Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dead links to categories / Support categories #15

Closed
Popolechien opened this issue Sep 1, 2016 · 33 comments
Closed

Dead links to categories / Support categories #15

Popolechien opened this issue Sep 1, 2016 · 33 comments
Assignees
Labels
Milestone

Comments

@Popolechien
Copy link

Wikisource home page directly links to categories for books - however this is not correctly handled and most links are unclickable.
screenshot_2016-09-01-16-14-21 1

@kelson42
Copy link
Collaborator

Yes, categories are not mirrored. This is a work to do in mwoffliner. Probablyt the top priority.

@kelson42 kelson42 added the bug label Sep 17, 2016
@kelson42
Copy link
Collaborator

@ISNIT0 Here is currently the TOP priority topic on mwoffliner. It's not extremly complicated but need a bit work. Let me know if you are interested to have a look so I can explain you a bit.

@ISNIT0
Copy link
Contributor

ISNIT0 commented Jan 31, 2017

@kelson42 I'm interested :) What's the best place to start looking?

@kelson42
Copy link
Collaborator

kelson42 commented Jan 31, 2017

@ISNIT0 Let's make a video conf about that. Let me know when you have time.

@kelson42 kelson42 reopened this Jan 31, 2017
@kelson42 kelson42 changed the title Links to categories do not work Dear links to categories / Support categories Jul 10, 2017
@tim-moody
Copy link
Contributor

Was 'probably the top priority' in Sept of 2016 yet still not implemented. Any time frame?

@kelson42
Copy link
Collaborator

No, it is still the top priority, but there is nobody to work on this so far.

@WikiDocJames
Copy link

Thanks for pointing me to this. Hope to see it fixed sometime soon. Maybe a google summer of code project for someone?

@kelson42
Copy link
Collaborator

@WikiDocJames Maybe even if this first GSoC we are managing is focus on Kiwix-Android. That said if someone comes to me and is motivated and capable, I might consider to mentor it myself.

@Popolechien Popolechien changed the title Dear links to categories / Support categories Dead links to categories / Support categories Apr 9, 2018
@holta
Copy link

holta commented Apr 10, 2018

Further Context: this issue directly affects Haiti schools who've made clear they would use Vikidia IF its link ("84 super articles") were clickable in the top right, as seen in the current Vikidia ZIM here:
http:// iiab . me : 3000 /vikidia_fr_all_novid_2018-03/

Current Vikidia ZIM downloaded from:
http://download.kiwix.org/zim/vikidia/vikidia_fr_all_2018-03.zim

Compare the original (online) version at https://fr.vikidia.org works far better. However the offline version (above ZIM file) is extremely frustrating to educators or children, when the most important link ("84 super articles") is not yet fixed — in future these essential materials should appear much like they do online here:
https://fr.vikidia.org/wiki/Cat%C3%A9gorie:Super_article

PS @kelson42 has clarified that he's hopeful this will be fixed before the end of 2018.

@kelson42 kelson42 added this to the 2.0 milestone Sep 18, 2018
@kelson42
Copy link
Collaborator

Things to do (the ones I can see):

  • Include "Category" namespace to the namespace to scrape per default
  • Verify the category pages are scrapped properly
  • Secure that links within articles pages to categories work properly
  • Secure the category links at the bottom of the page are displayed properly
  • Secure the list of articles are displayed properly like online (sorted alphabetically) are displayed also offline
  • Secure the category pagination works properly
  • Remove articles which are not mirrored from the category list of articles.
  • Mirrors categories which only have at least one article in it.

@kelson42 kelson42 modified the milestones: 2.0, 1.9 Sep 20, 2018
@ISNIT0
Copy link
Contributor

ISNIT0 commented Mar 12, 2019

What is the best thing to do for an articleList selection? Keep all the many parent categories? Not keep categories? Keep only one level of categories? Something else?

@kelson42
Copy link
Collaborator

@ISNIT0 Keep each category with at least one non-category child and merge all categories (to the top one) if there is only one sub-category.

@ISNIT0
Copy link
Contributor

ISNIT0 commented Mar 15, 2019

What about categories with media? e.g. https://commons.wikimedia.org/wiki/Category:Birds_in_art

@ISNIT0
Copy link
Contributor

ISNIT0 commented Mar 15, 2019

There doesn't seem to be a way to get the structured data of what order to show the sub-categories in. It's not just Alphabetical:
e.g. https://bm.wikipedia.org/wiki/Cat%C3%A9gorie:Lien_th%C3%A9matique_pour_cat%C3%A9gories
The single category is in the "G" namespace

and
https://en.wikipedia.org/wiki/Category:London
There is a *, Β (greek letter), Ι, Ξ, and Σ

Any suggestions here @Popolechien?

The query I'm currently using is this: https://bm.wikipedia.org/w/api.php?action=query&list=categorymembers&cmtype=subcat&cmlimit=500&format=json&cmtitle=Cat%C3%A9gorie%3ALien_th%C3%A9matique_pour_cat%C3%A9gories
Which only gives back the article namespace, pageid, and title.

@ISNIT0
Copy link
Contributor

ISNIT0 commented Apr 19, 2019

Progress:
I've added a --getCategories work in progress flag which enables the category scraping.
There are certainly issues with the current implementation, so it should not be used yet.

Each article has a Categories section added to the bottom with a list of links to category pages, each Category page has a Sub-categories section which links to sub-category pages.

TODO:

  • Display categories as on wikipedia.org
  • List pages within a category
  • Improve efficiency of category page scraping

ISNIT0 added a commit that referenced this issue Apr 19, 2019
@ISNIT0
Copy link
Contributor

ISNIT0 commented Apr 19, 2019

It seems to display the categories in the same way as MediaWiki displays them, we need information that isn't available through the API. Instead I'm just grouping them Alphabetically which is pretty close

ISNIT0 added a commit that referenced this issue Apr 19, 2019
@ISNIT0
Copy link
Contributor

ISNIT0 commented Apr 29, 2019

Progress so far:
https://framadrop.org/r/R1A5MJwaey#hxz6gNnGy7mFqv23Hf9SJNQSVL9JPS+pHOGQgVcyDvc=

Known issues:

@Popolechien
Copy link
Author

Yeah, the hidden categories not being weeded out is a real blocker. These are useless and take up quite some space. @ISNIT0 what's your plan about those?

@ISNIT0
Copy link
Contributor

ISNIT0 commented Apr 29, 2019

@Popolechien I've just updated the comment above, we're now not scraping them at all. Is this okay?

@Popolechien
Copy link
Author

perfect.

ISNIT0 added a commit that referenced this issue Apr 29, 2019
ISNIT0 added a commit that referenced this issue May 1, 2019
✨ Progress on #15 and implemented #677
@ISNIT0
Copy link
Contributor

ISNIT0 commented May 1, 2019

@kelson42 @Popolechien
For review:

@Popolechien
Copy link
Author

Niiiice.
Am I right to understand that all categories within the categories will also be showcased (ie not only the categories in the articles themselves)?
Either way, good job!

@ISNIT0
Copy link
Contributor

ISNIT0 commented May 1, 2019

BM Full nopic: https://framadrop.org/r/cyk0sHthFk#vjOsZMdLvq9vqrulrpSOO/WUqSAlZ7ehMf6Zv36aVy0=

No, the current logic is to check each article for categories as it's downloaded. Then we only end up with categories that contain at least one article as per @kelson42's spec:

Mirrors categories which only have at least one article in it.

@kelson42
Copy link
Collaborator

kelson42 commented May 7, 2019

I have tested with https://framadrop.org/r/D1EE0C6YxL#SwJO6719lYGfukNN1i71HHy1glAK4MaJTdKiifDHBlo=:

  • In Category pages, if an article is not in the selection it should not be listed (currently in black)
  • I think we should choose an other namespace to put category pages. The ZIM specs talks about "U" see https://wiki.openzim.org/wiki/ZIM_file_format#Namespaces
  • Up categories should be migrated too, it is not the case in "Category:2010s in Austria".

@ISNIT0
Copy link
Contributor

ISNIT0 commented May 7, 2019

@kelson42 What do you mean by "Up categories should be migrated too"?

@kelson42
Copy link
Collaborator

kelson42 commented May 7, 2019

@ISNIT0 In mean "categories parent categories", the full ancestor three should be downloaded (but of course in a simplified version).

@ISNIT0
Copy link
Contributor

ISNIT0 commented May 7, 2019

@kelson42 You previously said:

Mirrors categories which only have at least one article in it.

@samkellerhals
Copy link

@ISNIT0 @kelson42 @Popolechien thanks a lot for working on this - I think the addition of live category links will make for a huge improvement! So far I've been working with Kirundi/Kynarwanda/French zims for use in refugee camps and they also appeared with dead links on the index.html page. Are you thinking of applying these changes (active category links) to all zims currently available for download via the kiwix website?

@kelson42
Copy link
Collaborator

@samkellerhals This is the goal, might take a few additional months to see it happening everywhere.

@ISNIT0
Copy link
Contributor

ISNIT0 commented May 12, 2019

@kelson42 This is now doing the tree-shaking/graph simplification:
https://framadrop.org/r/dIaIeQVRtO#zZjY9W6s5P6ukctJPxU8GDvEQpzAUPdsqSKXbQohwII=

Because this is done using the top 100 articles, there is not a lot of shared categorisation, but Mantis is a good example

@ISNIT0
Copy link
Contributor

ISNIT0 commented May 12, 2019

@kelson42 I'd like to move the namespacing item you mentioned into a separate ticket and add it to 2.0

I can see it causing lots of back-and-forth with routing edge-cases

@kelson42
Copy link
Collaborator

kelson42 commented May 19, 2019

@ISNIT0 From what I can see from last file you have proposed https://framadrop.org/r/P1S5xi6PRm#A6fiUMsysQsdZzr72yXsT6i/QaYm/Dc97iJZZtYktVg= This looks quite good :) That said I was not able to check if the pagination works fine! Do you have a demo ZIM for that?

@kelson42
Copy link
Collaborator

AFAIK everything has now been implemented in 1.9, except #762 to be done in 2.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants