Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redirection error in latest (2022-09) ifixit zim (English and Russian version) #826

Closed
kelson42 opened this issue Oct 3, 2022 · 11 comments · Fixed by #827
Closed

Redirection error in latest (2022-09) ifixit zim (English and Russian version) #826

kelson42 opened this issue Oct 3, 2022 · 11 comments · Fixed by #827
Assignees
Labels
Milestone

Comments

@kelson42
Copy link
Collaborator

kelson42 commented Oct 3, 2022

From ifixit created by laggykiller: openzim/ifixit#86

The latest (2022-09) ifixit zim (English and Russian version) has redirection error ('redirected you too many times')

The zim in question is here:
https://download.kiwix.org/zim/ifixit/ifixit_en_all_2022-09.zim
https://download.kiwix.org/zim/ifixit/ifixit_ru_all_2022-09.zim

You can test it out now from here:
https://library.kiwix.org/ifixit_en_all_2022-09
https://library.kiwix.org/ifixit_ru_all_2022-09

The previous versions (2022-06) are normal:
https://download.kiwix.org/zim/ifixit/ifixit_en_all_2022-06.zim
https://download.kiwix.org/zim/ifixit/ifixit_ru_all_2022-06.zim

Other languages are not affected

When viewed from kiwix-serve, it produces redirection error:
image

When viewed from desktop application (e.g. Windows), the homepage is broken:
image

@kelson42 kelson42 added the bug label Oct 3, 2022
@kelson42
Copy link
Collaborator Author

kelson42 commented Oct 3, 2022

@kelson42 this might be a kiwix-serve issue. The ZIM entries look correct:

# this is OK. that's that's what it always looks like
zim.main_entry.is_redirect
> True
zim.main_entry.get_redirect_entry()
> Entry(url=Main-Page, title=Main-Page)

# here we see that we redirect from Main-Page to home/home. Again, that's normal. We do it everywhere
zim.main_entry.get_redirect_entry().is_redirect
> True
zim.main_entry.get_redirect_entry().get_redirect_entry()
> Entry(url=home/home, title=iFixit: The Free Repair Manual)

Problem revolves around the handling of /:

curl -I http://192.168.5.80:9999/ifixit_en_all_2022-09/
HTTP/1.1 302 Found
Connection: close
Content-Length: 0
Location: /ifixit_en_all_2022-09/
Access-Control-Allow-Origin: *
Cache-Control: no-cache, no-store, must-revalidate
Date: Mon, 03 Oct 2022 09:55:23 GMT
======================
Requesting :
full_url  : /ifixit_en_all_2022-09/
method    : OTHER (1)
version   : HTTP/1.1
request#  : 0
headers   :
 - accept : '*/*'
 - host : '192.168.5.80:9999'
 - user-agent : 'curl/7.79.1'
arguments :
Parsed :
full_url: /ifixit_en_all_2022-09/
url   : /ifixit_en_all_2022-09/
acceptEncodingDeflate : 0
has_range : 0
is_valid_url : 1
.............
** running handle_content
Response :
httpResponseCode : 302
headers :
 - Location: '/ifixit_en_all_2022-09/'
 - Access-Control-Allow-Origin: '*'
 - Cache-Control: 'no-cache, no-store, must-revalidate'
Request time : 0.005543s
----------------------

Accessing the home/home entry directly works as expected

curl -I http://192.168.5.80:9999/ifixit_en_all_2022-09/home/home
HTTP/1.1 200 OK
Connection: close
Content-Length: 21948
Content-Type: text/html
Access-Control-Allow-Origin: *
ETag: "1664790915539257/c"
Cache-Control: max-age=2723040, public
Date: Mon, 03 Oct 2022 09:57:59 GMT
======================
Requesting :
full_url  : /ifixit_en_all_2022-09/home/home
method    : OTHER (1)
version   : HTTP/1.1
request#  : 1
headers   :
 - accept : '*/*'
 - host : '192.168.5.80:9999'
 - user-agent : 'curl/7.79.1'
arguments :
Parsed :
full_url: /ifixit_en_all_2022-09/home/home
url   : /ifixit_en_all_2022-09/home/home
acceptEncodingDeflate : 0
has_range : 0
is_valid_url : 1
.............
** running handle_content
Found home/home
mimeType: text/html
Response :
httpResponseCode : 200
headers :
 - Content-Type: 'text/html'
 - Access-Control-Allow-Origin: '*'
 - ETag: '"1664791208912869/c"'
 - Cache-Control: 'max-age=2723040, public'
Request time : 0.003537s
----------------------

Kiwix-JS is not affected by this bug, nor is kiwix-desktop macOS (97)

@kelson42
Copy link
Collaborator Author

kelson42 commented Oct 3, 2022

Both were created using openzim/ifixit:0.2.1 so can't be related to a code change anywhere.

@kelson42
Copy link
Collaborator Author

kelson42 commented Oct 3, 2022

@rgaudin @benoit74 I strongly suspect this is linked to something which has changed upstream. Anyway the problem is accute! We should disable the recipes and remove the problematic ZIM files from the repo!

@kelson42
Copy link
Collaborator Author

kelson42 commented Oct 3, 2022

@rgaudin @benoit74 Indeed, looks super serious, the homepage seems to redirect on itself!!!

@kelson42
Copy link
Collaborator Author

kelson42 commented Oct 3, 2022

This is weird, I did not released any change in the scraper since previous
version.
Did you released any big changes in any underlying library ?

@kelson42 kelson42 added this to the 12.0.0 milestone Oct 3, 2022
@kelson42
Copy link
Collaborator Author

kelson42 commented Oct 3, 2022

@veloman-yunkan I have migrated the ticket from the iFixit repo (but the comments are not in the right order). This is a blocker for 12.0.0 IMO, pretty unclear why such a problem has been unoticed so far.

@kelson42 kelson42 mentioned this issue Oct 3, 2022
13 tasks
@rgaudin
Copy link
Member

rgaudin commented Oct 3, 2022

Confirming that the 2022-06 version is not affected

❯ curl -I http://192.168.5.80:9999/ifixit_en_all_2022-06/
HTTP/1.1 302 Found
Connection: close
Content-Length: 0
Location: /ifixit_en_all_2022-06/home/home
Access-Control-Allow-Origin: *
Cache-Control: no-cache, no-store, must-revalidate
Date: Mon, 03 Oct 2022 10:33:27 GMT
❯ curl -I http://192.168.5.80:9999/ifixit_en_all_2022-06/home/home
HTTP/1.1 200 OK
Connection: close
Content-Length: 21948
Content-Type: text/html
Access-Control-Allow-Origin: *
ETag: "1664793096418630/c"
Cache-Control: max-age=2723040, public
Date: Mon, 03 Oct 2022 10:33:37 GMT

ZIM entries are similar:

zim.main_entry.is_redirect
> True
zim.main_entry.get_redirect_entry()
> Entry(url=Main-Page, title=Main-Page)

zim.main_entry.get_redirect_entry().is_redirect
> True
zim.main_entry.get_redirect_entry().get_redirect_entry()
> Entry(url=home/home, title=iFixit: The Free Repair Manual)

Metadata are similar as well (not that it should matter)

{'Counter': 'application/javascript=2;application/vnd.ms-fontobject=1;font/sfnt=1;font/woff=1;font/woff2=6;image/gif=3;image/jpeg=15;image/png=24;image/svg+xml=38;image/vnd.microsoft.icon=1;image/webp=409652;text/css=18;text/html=173708;text/x-component=1',
 'Creator': 'iFixit',
 'Date': '2022-06-16',
 'Description': "iFixit is a global community of people helping each other repair things. Let's fix the world, one device at a time. Troubleshoot with experts in the Answers forum—and build your own how-to guides to share with the world. Fix your Apple and Android devices—and buy all the parts and tools needed for your DIY repair projects.",
 'FaviconPath': 'illustration',
 'Language': 'eng',
 'Name': 'ifixit_en_all',
 'Publisher': 'openZIM',
 'Tags': 'iFixit;_videos:yes;_pictures:yes;_category:iFixit',
 'Title': 'iFixit: The Free Repair Manual'}
{'Counter': 'application/javascript=2;application/vnd.ms-fontobject=1;font/sfnt=1;font/woff=1;font/woff2=6;image/gif=3;image/jpeg=15;image/png=24;image/svg+xml=38;image/vnd.microsoft.icon=1;image/webp=417315;text/css=18;text/html=178324;text/x-component=1',
 'Creator': 'iFixit',
 'Date': '2022-09-14',
 'Description': "iFixit is a global community of people helping each other repair things. Let's fix the world, one device at a time. Troubleshoot with experts in the Answers forum—and build your own how-to guides to share with the world. Fix your Apple and Android devices—and buy all the parts and tools needed for your DIY repair projects.",
 'FaviconPath': 'illustration',
 'Language': 'eng',
 'Name': 'ifixit_en_all',
 'Publisher': 'openZIM',
 'Tags': '_pictures:yes;_category:iFixit;_videos:yes;iFixit',
 'Title': 'iFixit: The Free Repair Manual'}

@veloman-yunkan
Copy link
Collaborator

veloman-yunkan commented Oct 4, 2022

The issue with ifixit_en_all_2022-09.zim is that it contains an item at the empty path "":

$ zimdump list --url="" ifixit_en_all_2022-09.zim 
path: 
* title:          Sales Policies
* idx:            0
* type:           item
* mime-type:      text/html
* item size:      4547

Currently in kiwix-serve the main entry is returned for an empty ("") or root ("/") path only if no entry exists at that path in the ZIM file:

zim::Entry getEntryFromPath(const zim::Archive& archive, const std::string& path)
{
try {
return archive.getEntryByPath(path);
} catch (zim::EntryNotFound& e) {
if (path.empty() || path == "/") {
return archive.getMainEntry();
}
}
throw zim::EntryNotFound("Cannot find entry for non empty path");
}

@rgaudin
Copy link
Member

rgaudin commented Oct 4, 2022

Ah! Very interesting ; I did not know that. Thanks. We'll fix the scraper. Leaving this open to discuss whether we still want that (weird?) behavior

@veloman-yunkan
Copy link
Collaborator

veloman-yunkan commented Oct 4, 2022

In any case, there is a bug in InternalServer::handle_content() which assumes that the entry fetched for an empty path is a redirect entry has to be converted to an HTTP redirect:

auto entry = getEntryFromPath(*archive, urlStr);
if (entry.isRedirect() || urlStr.empty()) {
// If urlStr is empty, we want to mainPage.
// We must do a redirection to the real page.
return build_redirect(bookName, getFinalItem(*archive, entry));
}

veloman-yunkan added a commit that referenced this issue Oct 4, 2022
Before this fix the root URL for a book was assumed to resolve to the
main page.  This was not true for ZIM files containing an entry at an
empty path or with a path equal to "/", resulting in issue #826. The
logic behind this behaviour is found in `kiwix::getEntryFromPath()`.

The fix to that issue is a little more general and will result in an
HTTP redirect in any case where `kiwix::getEntryFromPath(zim, path)`
returns an entry with a real path different from the requested one. In
particular, this will affect the behaviour on ZIM files with the old
namespace scheme, where the requested resource - if not found - is also
looked up in the 'A', 'I', 'J', and/or '-' namespaces. Now instead of
returning the contents of that other resource an HTTP redirect response
will be sent.
@kelson42
Copy link
Collaborator Author

kelson42 commented Oct 4, 2022

I guess we can close this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants