-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recent change in Wikipedias Layout breaks the table of contents #143
Comments
Table of Contents redesignData Structure and RepresentationFirst, we need to know exactly what information we want in the end. We need to ask ourselves, what information do we need for our current functionality to work (Don't want to overcomplicate this)? Well, we need the following things:
For every item, we would need the following things:
When we look at what the API gives us, we can see that everything we need is given to us {
"parse": {
"title": "GitHub",
"pageid": 18545292,
"sections": [
{
"toclevel": 1,
"level": "2",
"line": "History",
"number": "1",
"index": "1",
"fromtitle": "GitHub",
"byteoffset": 5788,
"anchor": "History",
"linkAnchor": "History"
},
...
],
...
}
} When now try to put these requirements into a rust data type, we come out with the following (this is just a first draft. variable names and types are not fixed) struct TableOfContents {
id: usize,
title: String,
items: Vec<TableOfContentsItem>
}
struct TableOfContentsItem {
id: usize,
level: usize, // or an Enum, don't know yet
text: String,
anchor: String,
} The Integration with the ArticleThe ProblemThe current implementation separates the table of contents from the article. The UI has to fetch the table of contents from the article and then do its actions there manually (matching the items against headers, fetching the title, formatting the item title, etc.). This adds a lot of overhead in the UI that's not necessary (we can add functions for everything the UI needs from the article to When we look at the things we currently do with the table of contents, we can see that it's all over the place. For example, look at what happens when you click on an item in the toc:
There are quite a few things wrong and just dangerous about this. First, the whole thing depends on the headers being in the same order everywhere (in the Redesigned SolutionThe solution for this would be quite simple actually. We use identifiers to refer to an item instead of its index. Currently, we need the following functionality:
As we can see, that's not too complicated. One thing to note about the text of these items. Because the displayed text can be changed in the config, we format the text right in the parsing stage. This configuration change only configures the text displayed in the NamingI believe this is the right time to change the naming of some of the structs and variables of the table of contents. A better name for it would be Section or Sections. This is also the name the Wikipedia API uses for the toc. This would mean that the |
Making API CallsOne Struct handles it allCurrently, interacting with the Wikipedia API happens directly inside of the Structure and current RequirementsCurrently, we only need four different main things for the handler to do:
When we only implement these four things, we not only keep the struct clean and simple, but we can expand it in the future. The handler would then have private helper functions that validate arguments and check for wikimedia errors. The return types for each of these functions would be json, keeping it simple (we don't have to create a struct for each API response and don't have to deal with many many For the error handling, we would create an TestingCreating one struct for handling all of the wikimedia API interactions enables us to test it better. We can mock the Wikipedia API for example and then test all the different things that can happen for each of the requirements. |
Things that need to be done:
|
| Linked Issues | #143 | |---------------|------| Changes made: * feat: add Parser, ElementParser and Element traits * feat: add Article and Section and implement Article::from_mediawiki * feat: add Text element * feat: add Header element * feat: add MediawikiParagraphParser * feat: add MediawikiUnsupportedElementParser * feat: add MediawikiParser::new * feat: finish basic parsing * feat: add Header element * feat: implement link element * feat: add italic and bold element parser * feat: add MediawikiListParser
TLDR: The Table of contents won't be visible until the issue with the parser has been resolved. All other features should be working correctly. Check out this issue frequently, as all news about the progress will be posted here
Issue Status
A checklist of what needs to be done can be found here. Currently, the progress in development can be tracked by looking at the following branches related to the issue:
Recently, the Wikimedia Foundation announced its first change to the design of its site (including the articles) in over a decade, in which they "Put Usability at the Forefront"
One of the changes was a renewal of the table of contents, now on the left of the article. This changed its appearance in the HTML code considerably, revealing some significant flaws in the parsing design (We've always assumed that the table of contents would be at a specific location in a specific format. This is now no longer the case).
I've already come up with a solution to this problem. We can get the table of contents easily from the Wikipedia JSON API, already formatted with all of the required data. A quick example of the returned table of contents (internally called
sections
) for the article MeaningAs we can see, the API gives us all of the necessary data (
level
,number
,linkAnchor
). To implement this change we would, however, have to rewrite the whole table of contents section in the parser. Because we only get the id of the span that contains the header, we would also have to rewrite the header handling system. This requires tapping into the code of the UI.These are some pretty big changes, but it'll be a high priority for me to fix this as soon as possible. I'll start with designing the interfaces for the new parser and once the design is fleshed out, I'll start implementing the changes.
The text was updated successfully, but these errors were encountered: