Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

openzim / warc2zim Public

Notifications You must be signed in to change notification settings
Fork 4
Star 44

Code
Issues 53
Pull requests 1
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: openzim/warc2zim

Releases · openzim/warc2zim

2.1.3

01 Nov 13:17

benoit74

This commit was signed with the committer’s verified signature.

benoit74

GPG key ID: B89606434FC7B530

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

2.1.3 Latest

Latest

Changed

Upgrade to wombat 3.8.3 (#414)

Assets 2

Loading

All reactions

2.1.2

08 Oct 12:28

benoit74

This commit was signed with the committer’s verified signature.

benoit74

GPG key ID: B89606434FC7B530

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

2.1.2

Added

Enrich test website with img srcset situations (in preparation for #403)

Changed

Upgrade dependencies, including wombat 3.8.2 (#407)

Fixed

HTML document can be retrieved as fetch resource type (#405)

Assets 2

Loading

All reactions

2.1.1

05 Sep 07:14

benoit74

This commit was signed with the committer’s verified signature.

benoit74

GPG key ID: B89606434FC7B530

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

2.1.1

Changed

Upgrade dependencies, including wombat 3.8.0 (#386)

Assets 2

Loading

All reactions

2.1.0

09 Aug 07:43

benoit74

This commit was signed with the committer’s verified signature.

benoit74

GPG key ID: B89606434FC7B530

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

2.1.0

Added

New fuzzy-rule for cheatography.com (#342), der-postillon.com (#330), iranwire.com (#363)
Properly rewrite redirect target url when present in HTML tag (#237)
New --encoding-aliases argument to pass encoding/charset aliases (#331)
Add support for SVG favicon (#148)
Automatically index PDF content and use PDF title (#289 and #290)

Changed

Upgrade to python-scraperlib 4.0.0
Generate fuzzy rules tests in Python and Javascript (#284)
Refactor HTML rewriter class to make it more open to change and expressive (#305)
Detect charset in document header only for HTML documents (#331)
Use software property from warcinfo record to set ZIM Scraper metadata (#357)
Store ContentDate as metadata, based on WARC-Date (#358)
Remove domain specific rules (#328)
Revisit retrieve_illustration logic to prefer best favicons (#352 and #369)
Upgrade dependencies (zimscraperlib 4.0.0, wombat.js 3.7.12 and others) (#376)

Fixed

Handle case where the redirect target is bad / unsupported (#332 and #356)
Fixed WARC files handling order to follow creation order (#366)
Remove subsequent slashes in URLs, both in Python and JS (#365)
Ignore non HTTP(S) WARC records (#351)
Fix vimeo_cdn_fix fuzzy rule for proper operation in Javascript (#348)
Performance issue linked to new "extensible" HTML rewriting rules (#370)

Assets 2

Loading

All reactions

2.0.3

24 Jul 05:28

benoit74

This commit was signed with the committer’s verified signature.

benoit74

GPG key ID: B89606434FC7B530

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

2.0.3

Changed

Moved rules definition from JSON to YAML and documented update process (#216)
Upgrade to wombat.js 3.7.11

Added

Exit with cleaner message when no entries are expected in the ZIM (#336) and when main entry is not processable (#337)
Add debug log for items whose content is empty (#344)

Fixed

Some resources rewrite mode are still not correctly identified (#326)

Assets 2

Loading

All reactions

2.0.2

18 Jun 13:26

benoit74

This commit was signed with the committer’s verified signature.

benoit74

GPG key ID: B89606434FC7B530

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

2.0.2

Added

Add --ignore-content-header-charsets option to disable automatic retrieval of content charsets from content first bytes (#318)
Add --content-header-bytes-length option to specify how many first bytes to consider when searching for content charsets in header (#320)
Add --ignore-http-header-charsets option to disable automatic retrieval of content charsets from content HTTP Content-Type headers (#318)

Changed

Simplify logic deciding content charset, stop guessing with chardet (#312)

Fixed

Rewrite only content with mimetype text-html when WARC-Resource-Type is html (#313)

Assets 2

Loading

All reactions

2.0.1

13 Jun 10:15

benoit74

This commit was signed with the committer’s verified signature.

benoit74

GPG key ID: B89606434FC7B530

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

2.0.1

Added

Add support for multiple languages in --lang CLI argument (#300)

Changed

Use the new WARC-Resource-Type header to decide rewrite mode (when present in WARC) (#296)
Upgrade Python dependencies + wombat.js 3.7.5

Fixed

Drop integrity attribute in HTML <script> and <link> tags (#298)
Use automatic detection of content encoding also for JS, JSON and CSS files (#301)
Set correct charset in HTML documents (#253)

Assets 2

Loading

All reactions

2.0.0

04 Jun 07:18

benoit74

This commit was signed with the committer’s verified signature.

benoit74

GPG key ID: B89606434FC7B530

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

2.0.0

Added

Allow to specify a scraper suffix for the ZIM scraper metadata at the CLI (#168)
New test website to test many known situations supposed to be handled (#166)

Changed

Replace Service Worker approach by scraper-side rewriting of static content (kiwix/overview#95)
Adopted Python bootstrap conventions (#152)
Upgrade dependencies, especially move to Python 3.12 (only) and zimscraperlib 3.3.2
Change wording in logs about the return code 100 (which is not an error code)
Added checks in converter.py to verify output directory existence, logging appropriate error messages and cleanly exit if checks fail. (#106)
Added check for invalid zim file names (#232)
Changed default publisher metadata from 'Kiwix' to 'openZIM' (#150)

Assets 2

Loading

All reactions

1.5.5

18 Jan 07:52

benoit74

This commit was signed with the committer’s verified signature.

benoit74

GPG key ID: B89606434FC7B530

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

1.5.5

Changed

Code restructuration in preparation for 2.x

Assets 2

Loading

All reactions

1.5.4

18 Sep 08:16

rgaudin

This commit was signed with the committer’s verified signature. The key has expired.

rgaudin rgaudin

GPG key ID: 447475A4CFBA2E24

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

1.5.4

Changed

Using wabac.js 2.16.11
Using cover resize method for favicon to prevent issues with too-small ones
Fixed direct link hack when inside an outer frame (kiwix-serve 3.5+) #119

Assets 2

Loading

All reactions

Previous 1 2 Next

Previous Next

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.