Skip to content

Releases: Own-Data-Privateer/hoardy-web

extension-v1.19.0

21 Dec 13:35
extension-v1.19.0
Compare
Choose a tag to compare

[extension-v1.19.0] - 2024-12-21: Reworked popup UI, better replay integration

Changed (1)

  • Popup UI:

    • Reorganized the whole layout by assigning tags to all elements and allowing switching between those tags as if they were tabs.

      The original idea was to unroll in steps a-la uBlock Origin, but this is superior.

    • Improved some help strings.

Added

  • Core + Popup UI + Shortcuts:

    • Added Replay from the archiving server configuration option.

      It's a tristate of: disallow, enable if Submit dumps via 'HTTP' option is enabled and the server supports it, enable even if Submit dumps via 'HTTP' option is disabled.

    • Added Include in global replays per-tab options.

    • Added popup UI button and keyboard shortcut both of which re-navigate all tabs for which Include in global replays is set to their replays.

    • Added popup UI button, keyboard shortcut, and context menu item all of which re-navigate a currently active tab to its replay.

    • Added Force 'Work offline' in replayed tabs configuration option which does the same thing the similar options for file: and data: URL does, but for tabs that point to replay URLs.
      Enabled by default.

    • Added 🎄 Winter Days mode seasonal theme.

    • Added Escape notification messages configuration option to help support more notification daemons.
      Disabled by default.

Changed (2)

  • The Help page:

    • Merged "Handling of failures" section into "Archival".

    • Reworded some awkward places.

  • Core + manifest.json:

    • Improved server checking logic and error messages.

    • Improved keyboard shortcut descriptions.

  • Improved documentation.

Fixed

  • Core:

    • Snapshot buttons and keyboard shortcuts will no longer take DOM snapshots of replay pages, unless Capture snapshots of all URLs option is set.

    • On Chromium, fixed Hoardy-Web trying to collect and archive replay pages.

extension-v1.18.0+tool-v0.20.0+simple_server-v1.8.0

16 Dec 11:35
extension-v1.18.0
Compare
Choose a tag to compare

[extension-v1.18.0] - 2024-12-16: Replay integration, incremental improvements

This release integrates the extension with tool-v0.20.0, which can now do both archival and replay over HTTP, see below.

Changed

  • Core:

    • From now on, all requests to all URLs under Server URL will be ignored, allowing you work with tool-v0.20.0-replayed pages without fiddling with any settings.

    • From now on, the extension will respect archiving server's settings and features given by its /hoardy-web/server-info endpoint, if such a thing exists.

    • The default value of Server URL does not specify /pwebarc/dump endpoint anymore, as this is now configurable server-side.

      For old configs, you can keep the old value, the archiving server handling code will silently elide that path away.

    • From now on, before the first archival, the extension will check that a working archiving server is available at the given Server URL and generate errors describing what exactly appears to be broken when not.

  • Popup UI:

    • From now on, if you set Server URL setting to an empty string, it will be reset to the default value.

    • Improved CPU usage when switching tabs really quickly.

[tool-v0.20.0] - 2024-12-16: Replay over HTTP, mirroring of non-GET reqres

Changed: Incompatible changes

  • export mirror:

    • Renamed export mirror sub-command to just mirror.
  • *:

    • Renamed all --no-overwrites options -> --no-overwrite.
  • *, --expr:

    • Renamed source -> agent.
    • Renamed raw_path_parts -> path_parts.
    • Renamed mq_raw_path -> mq_path.
    • Renamed qsl_urlencode atom -> unparse_query.

Fixed: Incompatible changes

  • Improved URL normalization:

    • From now on, it will preserve "=" symbols in query strings even when parameter values are empty, like browsers do.
    • URL path and query quoting and unquoting is now, hopefully, equivalent to what browsers do, too.

This changes the file names generated by organize with the default --output format a bit.

Added

  • serve:

    • Implemented the serve sub-command, which runs hoardy-web as a web server that can replay archived data over HTTP, a-la heritrix and pywb.

      After starting it with something like hoardy-web serve path/to/your/archives, you can then navigate to

      This is very reminiscent of the Wayback Machine by design, yes.

    • Added /hoardy-web/server-info endpoint support for future integration with the extension, similar to that of simple_server (hoardy-web-sas) now does.

    • Implemented archiving server support when running serve with --archive-to option.

      This is similar to simple_server, except newly archived reqres will become available for replay immediately.

    • Implemented archiving-server-only mode when running serve with --no-replay option.

      In this mode, it is essentially equivalent to simple_server, except hoardy-web serve supports arbitrary --output formats.

    • Implemented --latest option, which only indexes and allows replay for the latest available visit to each available URL.

      Archiving new reqres updates the index accordingly, as expected.

    • Documented it all across the whole repository.

  • mirror:

    • Implemented rendering of non-GET reqres.

      So, e.g., DOM snapshots and web search answer pages done via POST will be included in the outputs now.

      If you do not want some of those, you can filter them out with --not-method POST or some such.

  • scrub, mirror, serve:

    • From now on, malformed URLs will be kept as-is instead of being voided out.

    • From now on, more types of IE-pragmas will be censored out by -iepragmas (which is the default).

    • From now on, scrub will use +verbose and +whitespace as defaults.

      This is a much nicer default, and after content-addressed outputs were implemented in tool-v0.19.0, the resulting space savings -verbose,-whitespace produce are mostly inconsequential now.

    • Simplified semantics of (+|-)pretty, it does not set the verbose option anymore.

  • *:

    • Added --structure and --raw-qbody options.
    • Added a bunch more parsed URL properties.
    • Added a bunch more similar reqres properties.

Changed

  • mirror:

    • Changed semantics of --nearest option a bit.
      From now on, it will parse its argument as a time interval and then take the middle of it as the target value.

      This is much nicer in practice since, from now on, giving --nearest 2024 is much less likely to get you the stuff from 2023.
      It will try to give you stuff nearest to 2024-07-02 00:00:00 instead.

    • Improved performance.

  • *:

    • Renamed --no-remap option -> --raw-sbody, the old name is kept as an alias.
  • Improved documentation and help strings.

    Most notably, the input filtering options are shown only once now.

  • pyproject.toml now explicitly specifies optional mitmproxy file format support.

Fixed

  • * in theory, but only ever triggered by mirror:

    • Fixed a file descriptor semi-leak when lazily reloading reqres.

[simple_server-v1.8.0] - 2024-12-16

Added

  • Added -t, --to, and --archive-to aliases for --root.

  • Added /hoardy-web/server-info endpoint for future integration with the extension.

Changed

  • From now on, "/" and most other non-word symbols (except "_", "-", and space) in bucket names are forbidden and will be removed.

    This will simplify some future things.

  • From now on, when several buckets are specified via several profile query parameters, the last one will be used.

  • Renamed --uncompressed -> --no-compress, the old name is kept as an alias.

  • Slightly improved performance.

  • Started typechecking with mypy.

tool-v0.19.0

07 Dec 13:35
tool-v0.19.0
Compare
Choose a tag to compare

[tool-v0.19.0] - 2024-12-07: Powerful filtering, exporting of different URL visits, hybrid export modes

Changed: Semantics

  • *:

    • In --expr expressions, sha256 function changed semantics.
      From now on it returns the raw hash digest instead of the hexadecimal one.
      To get the old value, use sha256|to_hex.

Added

  • * except organize --move, organize --hardlink, organize --symlink, get, and run:

    • From now on, all sub-commands except for above can take inputs in all supported file formats.

      I.e., you can now do

      hoardy-web export mirror --to ~/hoardy-web/mirror1 mitmproxy.*.dump

      on mitmproxy dumps without even importing them first.

    • By default, the above commands now also automatically dispatch between loaders of different file formats based on file extensions.
      So you can mix and match different file formats on the same command line.

    • Added a bunch of --load-* options that force a specific loader instead, e.g. --load-wrrb, --load-mitmproxy.

  • *:

    • Added a ton of new filtering options.

      For example, you can now do:

      hoardy-web find --method GET --method DOM --status-re .200C --response-mime text/html \
        --response-body-grep-re "\bPotter\b" ~/hoardy-web/raw

      As before, these filters can still be used with other commands, like stream, or export mirror, etc.

      --root-* options of export mirror now use the same syntax and machinery as the normal input filters.

      Also, the overall filtering semantics changed a bit.
      The top-level logical expression the filters compute is now a large conjunction.
      I.e. the above example now compiles to, a bit simplified, (response.method == "GET" or response.method == "DOM") and re.match(".200C", status) and (response_mime == "text/html") and re.match("\\bPotter\\b", response.body).

    • Added a bunch of new --output formats.
      Mostly, this adds a bunch of output formats that refer to stimes.
      Mainly, to simplify export mirror --all usage, described below.

  • export mirror:

    • Implemented exporting of different URL visits.

      I.e., you can now export not just --latest visit to each URL, but an --oldest one, or one --nearest to a given date, or --all of them.

    • Implemented --latest-hybrid, --oldest-hybrid, and --nearest-hybrid options.

      These allow you to export each page with resource requisites that are date-vise closest to the stime of the page itself, instead of taking globally --latest, --oldest, or --nearest versions of all requisite URLs.

      At the moment, this takes a lot more memory, but makes the results much more consistent for websites that do not use versioned resource requisites.

    • Implemented --hardlink and --symlink options, which allow exporting into content-addressed destinations.

      I.e. export mirror --hardlink will render and write each exported file to <--to>/_content/<hash/based/path>.<ext> and only then hardlink the result to <--to>/<output/format/based/path>.<ext> target destination.
      And similarly for --symlink.

      Typically, doing this saves quite a bit of space, e.g., when pages refer to the same resource requisites by slightly different URLs, same images and fonts get distributed via different CDN hosts, when you export --all visits to some URLs and many of those are absolutely identical, etc.

      So, from now on, --hardlink is the default.
      The old behavior can be archived by running it with --copy instead.

    • Implemented --relative and --absolute options, which control if URLs should be remapped to relative or absolute file: URLs, respectively.

  • Documented all the new things.

  • Added a bunch of new test-cli.sh tests.

Changed

  • export mirror:

    • Switched default --output to hupq_n to prevent collisions when using --*-hybrid and --all.

    • Improved handling of base HTML tags, _targets are supported now.

    • Links that reference a page from itself will no longer refer to the page's filename, even when the link has no fragment.

      The results can be a bit confusing, but this makes the new content de-duplication options much more effective.

    • Made export mirror default filters explicit and changed them from --method "GET" --status-re ".200C" to --method "GET" --method "DOM" --status-re ".200C".

    • Implemented --ignore-bad-inputs and --index-all-inputs options to allow you to change the above default.

    • Improved output log format.

  • Improved file loading performance a bit.

  • Improved documentation.

Fixed

  • Added a bunch of new tests for organize, which cover the organize --symlink --latest bug of tool-v0.18.0.
    Won't happen again.

  • Fixed a couple of silly filtering-related bugs.

tool-v0.18.1

30 Nov 16:34
tool-v0.18.1
Compare
Choose a tag to compare

[tool-v0.18.1] - 2024-11-30: Hotfixes

Fixed

tool-v0.18.0 introduced a bunch of issues:

  • organize:

    • Fixed organize --symlink --latest dereferencing output files, which lead to it overwriting plain WRR source files containing updated URLs with symlinks to their newer versions.

      The good news is that this bug was only triggered when organize --symlink --latest was run with some newly archived data and, for each updated URL, it only overwrote the second to last WRR file with a symlink to the latest WRR file.
      Unfortunately, this error was self-propagating, so those files could then get overwritten again by the next invocation of organize --symlink --latest with some more new data.
      This could happen up to 7 times, at which point it would start crashing, because of the OS symlink deferencing limit.

      You can check if you were affected by running:

      cd ~/web/raw ; find . -type l

      The paths it outputs will be the paths of lost WRR files.

      A reminder that it is good to do daily backups, I suppose.

      The next version will have a test for this, but I'm releasing this hotfix an hour after I discovered this.

    • Fixed it assert-crashing sometimes when running with --symlink.

    • Improved memory consumption a bit.

  • export mirror:

    • Fixed overly large memory consumption.

tool-v0.18.0

20 Nov 14:36
tool-v0.18.0
Compare
Choose a tag to compare

[tool-v0.18.0] - 2024-11-20: Incremental improvements

Added

  • export mirror:

    • Implemented the --boring option, which allows you to load some input PATHs without adding them as roots, even when no --root-* options are specified.

      This make CLI a bit more convenient to use.
      The README.md has a new example showcasing it.

  • export mirror, scrub:

    • Implemented support for @import CSS rules using a string token in place of a URL.

      As far as I can see, this syntax is rarely used in practice.
      But the spec allows this, so.

    • Implemented interpret_noscript option, which enables inlining of noscript tags when scrub is running with -scripts.

      That is, export mirror will now use this feature by default.

      This is needed because some websites put link tags with CSS under noscript, thus making such pages look broken when scrubbed with -scripts (which is the default) and then opened in a browser with scripts enabled.

Changed

  • *: Refactored/reworked a large chunk of internals, as a result:

    • organize can now take WRR bundles as inputs too,
    • export mirror became much faster at indexing inputs that contain archives of the same URLs, repeatedly.

    In general, these changes are aimed towards making hoardy-web completely input-agnostic.
    That is, wouldn't it be nice if you could feed mitmproxy files to export mirror directly, instead of going through import mitmproxy first?

  • export mirror, scrub:

    • From now on, it will stop generating link tags with void URLs, it will simply censor them out instead.

    • scrub with +verbose set will now also show original rel attr values for censored out tags.

    • Also, in general, the outputs of scrub with +verbose set are much prettier now.

  • Improved documentation.

tool-v0.17.0

09 Nov 17:48
tool-v0.17.0
Compare
Choose a tag to compare

extension-v1.17.2

09 Nov 10:27
extension-v1.17.2
Compare
Choose a tag to compare

[extension-v1.17.2] - 2024-11-09: Documentation fixes, mostly

Changed

  • The Help page:

    • Rewrote "Conventions" and "'Work offline' mode" sections of to be much more readable.
  • *:

    • Improved contrast when running with a light CSS color scheme.

Fixed

  • Documentation:

    • Fixed some typos.
  • *:

    • Fixed some potential state display inconsistency bugs and improved UI pages' init performance when the core is very busy.

extension-v1.17.1

01 Nov 10:36
extension-v1.17.1
Compare
Choose a tag to compare

[extension-v1.17.1] - 2024-11-01: Annoyance fixes

Changed

  • Popup UI:

    • Reverted most of the block reordering bit of popup UI rework of extension-v1.17.0.

      The "Globally" block is near the top again.

    • Edited the "Persistence" block a bit more.

      Mainly, to stop graying out always-useful stat lines, even when the associated features are disabled, to prevent possible confusion there.

    • Renamed some options and stat lines, mostly to make their names shorter to make popup UI on Fenix more readable.

  • Toolbar button:

    • Edited its title format to be much shorter, especially on Fenix.

    • Reverted the ordering of parts there to how it was before extension-v1.17.0.

      The (much shorter now) "globally" part is at the front again because otherwise the badge being at the front there too without an explanation of its format is kind of confusing.

  • Core + All internal pages:

    • Improved message handling infrastructure.

    • Used it to improve initialization functions of all internal pages, improving efficiency and making the resulting UI much less flaky.

  • The Help page:

    • Documented what webNavigation permission is used for, improved the rest a bit.
  • *:

    • Renamed build.sh firefox target to firefox-mv2, for consistency.

Fixed

  • UI:

    • Fixed flaky rendering of Help and Changelog pages on Fenix.

      They render properly now the very first time you load them, no reloads needed.

    • Fixed duplication of history entries when navigating internal links.

    • Fixed source links sometimes failing to being highlighted when pressing the browser's "Back" button.

    • Fixed some small CSS nitpicks.

  • Popup UI + Documentation:

    • Realigned some help strings with reality.
  • Fixed some more mostly inconsequential things.

extension-v1.17.0

30 Oct 04:50
extension-v1.17.0
Compare
Choose a tag to compare

[extension-v1.17.0] - 2024-10-30: Halloween special: major UI and state display improvements, fine-grained Work offline mode, add-on reloading with its state preserved, new options, etc

In related news, I have 💸☕ a Patreon account now.

Fixed: Possibly important

  • Core:

    • Fixed a bug in upgradeConfig that was resetting bucket settings to their default values or upgrade to extension-v1.13.0.
      So, this is no longer relevant, but still.
      Also, refactored code there to prevent such errors in the future.

      However, just in case, if you previously set bucket settings to something other than their default values and those settings are important to you, you should probably check your settings to ensure everything there is set as you expect it to be.

Changed: Important UI

  • Core + Popup UI + Documentation:

    • Renamed failed state and related failed* stats to unarchived state and unarchived* stats.
      Introduced a new failed stat that is now a sum of unstashed and unarchived stats.
      Edited the popup UI and the other pages appropriately.

      This makes documentation's terminology more consistent, and simplifies UI a bit.

      In particular, the Retry button of Queued/Failed stat line will both retry stashing unstashed and archiving unarchived reqres now.

  • Popup UI:

    • Reworked the whole thing quite a bit:

      • Improved option names and help strings.
      • Sorted sections and options to follow a more logically consistent order.
      • Improved layout.
      • Fixed some typos there.
    • From now on, setting Bucket for the current tab will set Bucket for its new children too, similar to how the rest of those settings work.

    • From now on, setting any of the Bucket settings to nothing will reset it to the parent/default value.
      I.e.:

      • Setting Bucket of This tab's new children to nothing will reset it to Bucket value of This tab.
      • Setting Bucket of This tab to nothing will reset it to Bucket value of New root tabs.
      • Setting Bucket of New root tabs to nothing will reset it to default.
  • The Help page:

    • The previous "Desktop" JavaScript-generated layout became columns CSS layout and JS-operation mode, while the "Mobile" JavaScript-generated layout became linear CSS layout and JS-operation mode.
      The page will now automatically switch between these two layouts and modes synchronously, depending on viewport width.

      (As before, in linear mode hovering over a link does nothing, but in columns mode, hovering over a link referring to a target in popup UI scrolls the popup UI column to that target and highlights it.)

      I.e., this means that on a Desktop browser, you can now zoom the Help page to arbitrary zoom levels and it will just switch between layouts and link-hover behaviors depending on available viewport width.

    • Greatly improved the styling of all links and documented it in the "Conventions" section.

  • All internal pages:

    • All internal pages now color-code links depending on where they point to, using exactly the same CSS as the Help page.

    • All pages now use the same history state handling behaviour.

      I.e., using the "Back" button of your browser will now not only go back, but also highlight the last link you clicked.

    • All documentation pages now set viewport width to device-width, set content's max-width to 900px and width to 100% - padding, preventing horizontal scroll, when possible.

    • Improved the CSS styling in general.

  • Core + Popup UI + General UI:

    • Implemented a new popup UI tristate toggle named Color scheme which allows Hoardy-Web's color-scheme to be different from the browser's default.

    • Implemented a mechanism and popup UI settings for applying additional themes and experimental features.

    • And then I looked at the date. Which is why ◥▅◤◢▅◣◥▅◤ Hoardy-Web now has 🦇 Halloween mode. ◥▅◤◢▅◣◥▅◤.

    • Also, from now on, the neutral states of tristate toggles are displayed with toggle knobs being in the middle of the things, not on their left.
      This is not a political statement.
      This mans that all tristate toggles, from left to right, now go false -> null -> true both internally (exactly as they did before) and externally (which is new).

Changed: State display

  • Core + Toolbar button + Icons:

    • Replaced toolbar button's icons representing Cartesian products of other icons with animations.

      In other words, the previous "this tab has limbo mode enabled while this tab's children do not" icon will now instead be represented with an animation that switches between "this tab has limbo mode enabled" and "this tab is idle" icons instead.

      This both takes less space in the XPI/CRX, makes for a cuter UI, and is the only reasonable solution when the core wants to display more than two icons at the same time.

    • Improved toolbar button's badge and title format a bit.

      "This tab" part goes first now, then "its new children", then "globally".

      Also, the order of sub-parts of those strings is more consistent now.

    • From now on, internal UI updater will generate icon animation frames for all important statuses and setting states.

      • When per-tab and per-tab's-new-children animation frames are equal, the repeated part will be elided.
      • When per-tab and per-tab's-new-children animation frames differ, the main icon will be inserted at the end to make it obvious when the animation loop restarts (otherwise, it's easy to interpret such animation loops incorrectly).
    • The update frequency of toolbar button's icon, badge, and title now depends on the amount of not yet done stuff still queued in the core.

      I.e., from now on, when the core has a lot of stuff to do (like when re-archiving thousands of reqres at the same time), it will start updating toolbar button's properties less to trade update latency for improved performance, and vice versa.

    • Greatly improved performance of state display updates. It's uses 2-1000x less CPU now, depending on what the core is doing.

  • Icons:

    • Renamed the error icon to failed and added a new error icon.

      From now on, the failed icon will only be used for archival/stashing errors, while the error icon will only be used for internal errors (i.e. bugs).

    • Improved all icons to make them more visually distinct when they are being rendered at 48x48 or less, both in light and dark mode.

    • On Chromium, all icons are now rendered with transparent backgrounds, so now they will look nice in the dark mode too.

Added: State display

  • Core + Popup UI + Toolbar button:

    • From now on, popup UI and toolbar button's badge and title will display information about currently running internal actions.

      (Implementing this took a surprising amount of effort in improvements to infrastructure code.)

  • Core + Toolbar button + Icons:

    • Added a new in_limbo icon for "this tab has data in limbo" status.
      Unlike most other icons, this icon will never be used alone, it will always be an animation frame of something longer.
  • Core + Popup UI + Toolbar button:

    • Implemented Animate toolbar icon every setting for controlling toolbar icon animation speed.

Fixed: State display

  • Core + Toolbar button:

    • Fixed a bunch of bugs that prevented updates to toolbar button's icon and badge in some cases.

    • The icon and the badge will no longer get stuck when the core is very busy, like when re-archiving a lot of stuff all at once.

Added: Work offline mode

  • Core + Popup UI + Toolbar button + Icons + Documentation:

    • Implemented Work offline mode, options, their popup UI, shortcuts, and icons.

      This mode does the same thing as File > Work Offline checkbox of Firefox, except it supports per-tab/per-other-origin operation, not just the whole-browser one.
      Also, enabling any these options will not break requests that are still in flight, and the requests they do cancel can be logged.

      That is, enabling Work offline in a tab will start canceling all new requests that tab generates, and the resulting canceled reqres will get logged if Track new requests option is enabled in the same tab.
      Similarly for background tasks and other origins.

      This can be generally useful for debugging your own websites with dynamic responsive CSS, or if you just want to prevent a tab from accessing the network for some reason.

      However, the main reason this exists is that the files generated by hoardy-web export mirror do not get scrubbed absolutely correctly at the moment, and the resulting pages can end up with some references to remote resources (in cases when an exported page uses some rare HTML and CSS tag combinations, or lazy-load images via JavaScript, but still).
      With Work offline options enabled in a tab, you can now be sure that opening pages generated by hoardy-web export mirror won't send any requests to the network.

      In fact, from now on, by default, Hoardy-Web will enable Work offline in all tabs pointing to file: URLs.
      This can be disabled in the settings.

    • Documented it in more detain on the Help page.

    • Added a new offline toolbar icon to display the above state.

Added: Reloading with state preserved

  • Core + Popup UI + Documentation:

    • Implemented reloadSelf action that reloads the add-on while preserving its state.

      This action is different from similar Reload buttons in browser's own UI in that trigge...

Read more

tool-v0.16.0

19 Oct 10:23
tool-v0.16.0
Compare
Choose a tag to compare