Skip to content

Commit

Permalink
Merge pull request #406 from openzim/html_as_fetch
Browse files Browse the repository at this point in the history
HTML document can be retrieved as `fetch`
  • Loading branch information
benoit74 authored Oct 8, 2024
2 parents c8e1e96 + 3c7363f commit 38e5902
Show file tree
Hide file tree
Showing 17 changed files with 119 additions and 4 deletions.
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added

- Enrich test website with img srcset situations (in preparation for #403)

### Fixed

- HTML document can be retrieved as `fetch` resource type (#405)

## [2.1.1] - 2024-09-05

### Changed
Expand Down
2 changes: 1 addition & 1 deletion src/warc2zim/content_rewriting/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ def get_rewrite_mode(self, record, mimetype):
def get_resourcetype_rewrite_mode(self, record, resourcetype, mimetype):
"""Get current record rewrite mode based on WARC-Resource-Type and mimetype"""

if resourcetype in ["document", "xhr"] and mimetype == "text/html":
if resourcetype in ["document", "xhr", "fetch"] and mimetype == "text/html":
# TODO : Handle header "Accept" == "application/json"
if getattr(record, "method", "GET") == "GET":
return "html"
Expand Down
2 changes: 1 addition & 1 deletion src/warc2zim/items.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from pathlib import Path

from jinja2.environment import Template
from libzim.writer import Hint # pyright: ignore[reportMissingImports]
from libzim.writer import Hint # pyright: ignore[reportMissingModuleSource]
from warcio.recordloader import ArcWarcRecord
from zimscraperlib.types import get_mime_for_name
from zimscraperlib.zim.items import StaticItem
Expand Down
102 changes: 102 additions & 0 deletions test-website/content/image-srcset.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>Test website</title>
<link
rel="apple-touch-icon"
sizes="180x180"
href="./icons/apple-touch-icon.png"
/>
<link
rel="icon"
type="image/png"
sizes="32x32"
href="./icons/favicon-32x32.png"
/>
<link
rel="icon"
type="image/png"
sizes="16x16"
href="./icons/favicon-16x16.png"
/>
<link rel="manifest" href="./icons/site.webmanifest" />
<link rel="shortcut icon" href="./icons/favicon.ico" />
</head>

<body>
<p>
This page contains tests around varying images based on screen / device.
</p>

<h2>Image srcset</h2>

<p>
An image should be displayed below at all screen sizes and pixel ratios.
</p>

<img
srcset="
./images/image1.png,
./images/image1-1x.png 1x,
./images/image1-2x.png 2x
"
src="./images/image1.png"
alt="an image"
/>

<p>
Another image should be displayed below at all screen sizes and pixel
ratios.
</p>

<img
srcset="./images/image4-1.5x.png 1.5x"
src="./images/image4.png"
alt="an image"
/>

<h2>Picture sources - with srcset pixel ratio</h2>

<p>
An image should be displayed below at all screen sizes and pixel ratios.
</p>

<picture>
<source
srcset="
./images/image2.png,
./images/image2-1x.png 1x,
./images/image2-2x.png 2x
"
type="image/png"
/>
<img src="./images/image2.png" alt="an image" />
</picture>

<h2>Picture sources - with media queries</h2>

<p>
An image should be displayed below at all screen sizes and pixel ratios.
</p>

<picture>
<source
srcset="./images/image3-high.png"
media="all and (min-width: 1280px)"
type="image/png"
/>
<source
srcset="./images/image3-medium.png"
media="all and (min-width: 600px)"
type="image/png"
/>
<source
srcset="./images/image3-small.png"
media="all and (min-width: 0px)"
type="image/png"
/>
<img src="./images/image3.png" alt="an image" />
</picture>
</body>
</html>
Binary file added test-website/content/images/image1-1x.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added test-website/content/images/image1-2x.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added test-website/content/images/image2-1x.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added test-website/content/images/image2-2x.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added test-website/content/images/image2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added test-website/content/images/image3-high.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added test-website/content/images/image3-medium.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added test-website/content/images/image3-small.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added test-website/content/images/image3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added test-website/content/images/image4-1.5x.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added test-website/content/images/image4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions test-website/content/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@
<li><a href="./bad-redirections.html">Bad redirections</a></li>
<li><a href="./content-types/index.html">Handling of content types</a></li>
<li><a href="./http-equiv-redirect.html">Redirect with http-equiv meta directive</a></li>
<li><a href="./image-srcset.html">Image with srcset</a></li>
</ul>
</body>

Expand Down
8 changes: 6 additions & 2 deletions tests/test_warc_to_zim.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,11 +202,15 @@ def verify_warc_and_zim(self, warcfile, zimfile):
elif record.rec_type == "response":
# We must have a payload
assert payload
payload_content = payload.content.tobytes()
payload_content = (
payload.content.tobytes() # pyright:ignore[reportAttributeAccessIssue]
)

# if HTML, still need to account for the head insert, otherwise should
# have exact match
if payload.mimetype.startswith("text/html"):
if payload.mimetype.startswith( # pyright:ignore[reportAttributeAccessIssue]
"text/html"
):
assert head_insert in payload_content
elif record.rec_type == "resource":
# we do not want to embed resources "as-is"
Expand Down

0 comments on commit 38e5902

Please sign in to comment.