Releases: Own-Data-Privateer/hoardy-web
extension-v1.19.0
[extension-v1.19.0] - 2024-12-21: Reworked popup UI, better replay integration
Changed (1)
-
Popup UI:
-
Reorganized the whole layout by assigning tags to all elements and allowing switching between those tags as if they were tabs.
The original idea was to unroll in steps a-la
uBlock Origin
, but this is superior. -
Improved some help strings.
-
Added
-
Core + Popup UI + Shortcuts:
-
Added
Replay from the archiving server
configuration option.It's a tristate of: disallow, enable if
Submit dumps via 'HTTP'
option is enabled and the server supports it, enable even ifSubmit dumps via 'HTTP'
option is disabled. -
Added
Include in global replays
per-tab options. -
Added popup UI button and keyboard shortcut both of which re-navigate all tabs for which
Include in global replays
is set to their replays. -
Added popup UI button, keyboard shortcut, and context menu item all of which re-navigate a currently active tab to its replay.
-
Added
Force 'Work offline' in replayed tabs
configuration option which does the same thing the similar options forfile:
anddata:
URL does, but for tabs that point to replay URLs.
Enabled by default. -
Added
🎄 Winter Days mode
seasonal theme. -
Added
Escape notification messages
configuration option to help support more notification daemons.
Disabled by default.
-
Changed (2)
-
The
Help
page:-
Merged "Handling of failures" section into "Archival".
-
Reworded some awkward places.
-
-
Core +
manifest.json
:-
Improved server checking logic and error messages.
-
Improved keyboard shortcut descriptions.
-
-
Improved documentation.
Fixed
-
Core:
-
Snapshot
buttons and keyboard shortcuts will no longer takeDOM
snapshots of replay pages, unlessCapture snapshots of all URLs
option is set. -
On Chromium, fixed
Hoardy-Web
trying to collect and archive replay pages.
-
extension-v1.18.0+tool-v0.20.0+simple_server-v1.8.0
[extension-v1.18.0] - 2024-12-16: Replay integration, incremental improvements
This release integrates the extension
with tool-v0.20.0
, which can now do both archival and replay over HTTP
, see below.
Changed
-
Core:
-
From now on, all requests to all URLs under
Server URL
will be ignored, allowing you work withtool-v0.20.0
-replayed pages without fiddling with any settings. -
From now on, the
extension
will respect archiving server's settings and features given by its/hoardy-web/server-info
endpoint, if such a thing exists. -
The default value of
Server URL
does not specify/pwebarc/dump
endpoint anymore, as this is now configurable server-side.For old configs, you can keep the old value, the archiving server handling code will silently elide that path away.
-
From now on, before the first archival, the
extension
will check that a working archiving server is available at the givenServer URL
and generate errors describing what exactly appears to be broken when not.
-
-
Popup UI:
-
From now on, if you set
Server URL
setting to an empty string, it will be reset to the default value. -
Improved CPU usage when switching tabs really quickly.
-
[tool-v0.20.0] - 2024-12-16: Replay over HTTP
, mirroring of non-GET
reqres
Changed: Incompatible changes
-
export mirror
:- Renamed
export mirror
sub-command to justmirror
.
- Renamed
-
*
:- Renamed all
--no-overwrites
options ->--no-overwrite
.
- Renamed all
-
*
,--expr
:- Renamed
source
->agent
. - Renamed
raw_path_parts
->path_parts
. - Renamed
mq_raw_path
->mq_path
. - Renamed
qsl_urlencode
atom ->unparse_query
.
- Renamed
Fixed: Incompatible changes
-
Improved URL normalization:
- From now on, it will preserve "=" symbols in query strings even when parameter values are empty, like browsers do.
- URL path and query quoting and unquoting is now, hopefully, equivalent to what browsers do, too.
This changes the file names generated by organize
with the default --output
format a bit.
Added
-
serve
:-
Implemented the
serve
sub-command, which runshoardy-web
as a web server that can replay archived data overHTTP
, a-laheritrix
andpywb
.After starting it with something like
hoardy-web serve path/to/your/archives
, you can then navigate to- http://127.0.0.1:3210/web/*/* to see the list of all available URLs and their versions (visits), or to
- something like http://127.0.0.1:3210/web/2/https://archiveofourown.org/works/3733123 to view the latest archived version of that URL, or to
- something like http://127.0.0.1:3210/web/*/https://archiveofourown.org/works/3733123 to view the list of all visits to this URL,
- which also works with glob patterns http://127.0.0.1:3210/web/*/https://archiveofourown.org/works/[0-9]*.
This is very reminiscent of the Wayback Machine by design, yes.
-
Added
/hoardy-web/server-info
endpoint support for future integration with theextension
, similar to that ofsimple_server
(hoardy-web-sas
) now does. -
Implemented archiving server support when running
serve
with--archive-to
option.This is similar to
simple_server
, except newly archived reqres will become available for replay immediately. -
Implemented archiving-server-only mode when running
serve
with--no-replay
option.In this mode, it is essentially equivalent to
simple_server
, excepthoardy-web serve
supports arbitrary--output
formats. -
Implemented
--latest
option, which only indexes and allows replay for the latest available visit to each available URL.Archiving new reqres updates the index accordingly, as expected.
-
Documented it all across the whole repository.
-
-
mirror
:-
Implemented rendering of non-
GET
reqres.So, e.g.,
DOM
snapshots and web search answer pages done viaPOST
will be included in the outputs now.If you do not want some of those, you can filter them out with
--not-method POST
or some such.
-
-
scrub
,mirror
,serve
:-
From now on, malformed URLs will be kept as-is instead of being voided out.
-
From now on, more types of IE-pragmas will be censored out by
-iepragmas
(which is the default). -
From now on,
scrub
will use+verbose
and+whitespace
as defaults.This is a much nicer default, and after content-addressed outputs were implemented in
tool-v0.19.0
, the resulting space savings-verbose,-whitespace
produce are mostly inconsequential now. -
Simplified semantics of
(+|-)pretty
, it does not set theverbose
option anymore.
-
-
*
:- Added
--structure
and--raw-qbody
options. - Added a bunch more parsed URL properties.
- Added a bunch more similar reqres properties.
- Added
Changed
-
mirror
:-
Changed semantics of
--nearest
option a bit.
From now on, it will parse its argument as a time interval and then take the middle of it as the target value.This is much nicer in practice since, from now on, giving
--nearest 2024
is much less likely to get you the stuff from 2023.
It will try to give you stuff nearest to2024-07-02 00:00:00
instead. -
Improved performance.
-
-
*
:- Renamed
--no-remap
option ->--raw-sbody
, the old name is kept as an alias.
- Renamed
-
Improved documentation and help strings.
Most notably, the input filtering options are shown only once now.
-
pyproject.toml
now explicitly specifies optionalmitmproxy
file format support.
Fixed
-
*
in theory, but only ever triggered bymirror
:- Fixed a file descriptor semi-leak when lazily reloading reqres.
[simple_server-v1.8.0] - 2024-12-16
Added
-
Added
-t
,--to
, and--archive-to
aliases for--root
. -
Added
/hoardy-web/server-info
endpoint for future integration with theextension
.
Changed
-
From now on, "/" and most other non-word symbols (except "_", "-", and space) in bucket names are forbidden and will be removed.
This will simplify some future things.
-
From now on, when several buckets are specified via several
profile
query parameters, the last one will be used. -
Renamed
--uncompressed
->--no-compress
, the old name is kept as an alias. -
Slightly improved performance.
-
Started typechecking with
mypy
.
tool-v0.19.0
[tool-v0.19.0] - 2024-12-07: Powerful filtering, exporting of different URL visits, hybrid export modes
Changed: Semantics
-
*
:- In
--expr
expressions,sha256
function changed semantics.
From now on it returns the raw hash digest instead of the hexadecimal one.
To get the old value, usesha256|to_hex
.
- In
Added
-
*
exceptorganize --move
,organize --hardlink
,organize --symlink
,get
, andrun
:-
From now on, all sub-commands except for above can take inputs in all supported file formats.
I.e., you can now do
hoardy-web export mirror --to ~/hoardy-web/mirror1 mitmproxy.*.dump
on
mitmproxy
dumps without evenimport
ing them first. -
By default, the above commands now also automatically dispatch between loaders of different file formats based on file extensions.
So you can mix and match different file formats on the same command line. -
Added a bunch of
--load-*
options that force a specific loader instead, e.g.--load-wrrb
,--load-mitmproxy
.
-
-
*
:-
Added a ton of new filtering options.
For example, you can now do:
hoardy-web find --method GET --method DOM --status-re .200C --response-mime text/html \ --response-body-grep-re "\bPotter\b" ~/hoardy-web/raw
As before, these filters can still be used with other commands, like
stream
, orexport mirror
, etc.--root-*
options ofexport mirror
now use the same syntax and machinery as the normal input filters.Also, the overall filtering semantics changed a bit.
The top-level logical expression the filters compute is now a large conjunction.
I.e. the above example now compiles to, a bit simplified,(response.method == "GET" or response.method == "DOM") and re.match(".200C", status) and (response_mime == "text/html") and re.match("\\bPotter\\b", response.body)
. -
Added a bunch of new
--output
formats.
Mostly, this adds a bunch of output formats that refer tostime
s.
Mainly, to simplifyexport mirror --all
usage, described below.
-
-
export mirror
:-
Implemented exporting of different URL visits.
I.e., you can now export not just
--latest
visit to each URL, but an--oldest
one, or one--nearest
to a given date, or--all
of them. -
Implemented
--latest-hybrid
,--oldest-hybrid
, and--nearest-hybrid
options.These allow you to export each page with resource requisites that are date-vise closest to the
stime
of the page itself, instead of taking globally--latest
,--oldest
, or--nearest
versions of all requisite URLs.At the moment, this takes a lot more memory, but makes the results much more consistent for websites that do not use versioned resource requisites.
-
Implemented
--hardlink
and--symlink
options, which allow exporting into content-addressed destinations.I.e.
export mirror --hardlink
will render and write each exported file to<--to>/_content/<hash/based/path>.<ext>
and only then hardlink the result to<--to>/<output/format/based/path>.<ext>
target destination.
And similarly for--symlink
.Typically, doing this saves quite a bit of space, e.g., when pages refer to the same resource requisites by slightly different URLs, same images and fonts get distributed via different CDN hosts, when you export
--all
visits to some URLs and many of those are absolutely identical, etc.So, from now on,
--hardlink
is the default.
The old behavior can be archived by running it with--copy
instead. -
Implemented
--relative
and--absolute
options, which control if URLs should be remapped to relative or absolutefile:
URLs, respectively.
-
-
Documented all the new things.
-
Added a bunch of new
test-cli.sh
tests.
Changed
-
export mirror
:-
Switched default
--output
tohupq_n
to prevent collisions when using--*-hybrid
and--all
. -
Improved handling of
base
HTML
tags,_target
s are supported now. -
Links that reference a page from itself will no longer refer to the page's filename, even when the link has no
fragment
.The results can be a bit confusing, but this makes the new content de-duplication options much more effective.
-
Made
export mirror
default filters explicit and changed them from--method "GET" --status-re ".200C"
to--method "GET" --method "DOM" --status-re ".200C"
. -
Implemented
--ignore-bad-inputs
and--index-all-inputs
options to allow you to change the above default. -
Improved output log format.
-
-
Improved file loading performance a bit.
-
Improved documentation.
Fixed
-
Added a bunch of new tests for
organize
, which cover theorganize --symlink --latest
bug oftool-v0.18.0
.
Won't happen again. -
Fixed a couple of silly filtering-related bugs.
tool-v0.18.1
[tool-v0.18.1] - 2024-11-30: Hotfixes
Fixed
tool-v0.18.0
introduced a bunch of issues:
-
organize
:-
Fixed
organize --symlink --latest
dereferencing output files, which lead to it overwriting plainWRR
source files containing updated URLs with symlinks to their newer versions.The good news is that this bug was only triggered when
organize --symlink --latest
was run with some newly archived data and, for each updatedURL
, it only overwrote the second to lastWRR
file with a symlink to the latestWRR
file.
Unfortunately, this error was self-propagating, so those files could then get overwritten again by the next invocation oforganize --symlink --latest
with some more new data.
This could happen up to 7 times, at which point it would start crashing, because of the OS symlink deferencing limit.You can check if you were affected by running:
cd ~/web/raw ; find . -type l
The paths it outputs will be the paths of lost
WRR
files.A reminder that it is good to do daily backups, I suppose.
The next version will have a test for this, but I'm releasing this hotfix an hour after I discovered this.
-
Fixed it
assert
-crashing sometimes when running with--symlink
. -
Improved memory consumption a bit.
-
-
export mirror
:- Fixed overly large memory consumption.
tool-v0.18.0
[tool-v0.18.0] - 2024-11-20: Incremental improvements
Added
-
export mirror
:-
Implemented the
--boring
option, which allows you to load some inputPATH
s without adding them as roots, even when no--root-*
options are specified.This make CLI a bit more convenient to use.
TheREADME.md
has a new example showcasing it.
-
-
export mirror
,scrub
:-
Implemented support for
@import
CSS
rules using a string token in place of a URL.As far as I can see, this syntax is rarely used in practice.
But the spec allows this, so. -
Implemented
interpret_noscript
option, which enables inlining ofnoscript
tags whenscrub
is running with-scripts
.That is,
export mirror
will now use this feature by default.This is needed because some websites put
link
tags withCSS
undernoscript
, thus making such pages look broken whenscrub
bed with-scripts
(which is the default) and then opened in a browser with scripts enabled.
-
Changed
-
*
: Refactored/reworked a large chunk of internals, as a result:organize
can now takeWRR
bundles as inputs too,export mirror
became much faster at indexing inputs that contain archives of the same URLs, repeatedly.
In general, these changes are aimed towards making
hoardy-web
completely input-agnostic.
That is, wouldn't it be nice if you could feedmitmproxy
files toexport mirror
directly, instead of going throughimport mitmproxy
first? -
export mirror
,scrub
:-
From now on, it will stop generating
link
tags with void URLs, it will simply censor them out instead. -
scrub
with+verbose
set will now also show originalrel
attr values for censored out tags. -
Also, in general, the outputs of
scrub
with+verbose
set are much prettier now.
-
-
Improved documentation.
tool-v0.17.0
See CHANGELOG.md
.
extension-v1.17.2
[extension-v1.17.2] - 2024-11-09: Documentation fixes, mostly
Changed
-
- Rewrote "Conventions" and "'Work offline' mode" sections of to be much more readable.
-
*
:- Improved contrast when running with a light
CSS
color scheme.
- Improved contrast when running with a light
Fixed
-
Documentation:
- Fixed some typos.
-
*
:- Fixed some potential state display inconsistency bugs and improved UI pages' init performance when the core is very busy.
extension-v1.17.1
[extension-v1.17.1] - 2024-11-01: Annoyance fixes
Changed
-
Popup UI:
-
Reverted most of the block reordering bit of popup UI rework of
extension-v1.17.0
.The "Globally" block is near the top again.
-
Edited the "Persistence" block a bit more.
Mainly, to stop graying out always-useful stat lines, even when the associated features are disabled, to prevent possible confusion there.
-
Renamed some options and stat lines, mostly to make their names shorter to make popup UI on Fenix more readable.
-
-
Toolbar button:
-
Edited its title format to be much shorter, especially on Fenix.
-
Reverted the ordering of parts there to how it was before
extension-v1.17.0
.The (much shorter now) "globally" part is at the front again because otherwise the badge being at the front there too without an explanation of its format is kind of confusing.
-
-
Core + All internal pages:
-
Improved message handling infrastructure.
-
Used it to improve initialization functions of all internal pages, improving efficiency and making the resulting UI much less flaky.
-
-
- Documented what
webNavigation
permission is used for, improved the rest a bit.
- Documented what
-
*
:- Renamed
build.sh
firefox
target tofirefox-mv2
, for consistency.
- Renamed
Fixed
-
UI:
-
Fixed flaky rendering of
Help
andChangelog
pages on Fenix.They render properly now the very first time you load them, no reloads needed.
-
Fixed duplication of history entries when navigating internal links.
-
Fixed source links sometimes failing to being highlighted when pressing the browser's "Back" button.
-
Fixed some small
CSS
nitpicks.
-
-
Popup UI + Documentation:
- Realigned some help strings with reality.
-
Fixed some more mostly inconsequential things.
extension-v1.17.0
[extension-v1.17.0] - 2024-10-30: Halloween special: major UI and state display improvements, fine-grained Work offline
mode, add-on reloading with its state preserved, new options, etc
In related news, I have 💸☕ a Patreon account now.
Fixed: Possibly important
-
Core:
-
Fixed a bug in
upgradeConfig
that was resettingbucket
settings to their default values or upgrade toextension-v1.13.0
.
So, this is no longer relevant, but still.
Also, refactored code there to prevent such errors in the future.However, just in case, if you previously set
bucket
settings to something other than their default values and those settings are important to you, you should probably check your settings to ensure everything there is set as you expect it to be.
-
Changed: Important UI
-
Core + Popup UI + Documentation:
-
Renamed
failed
state and relatedfailed*
stats tounarchived
state andunarchived*
stats.
Introduced a newfailed
stat that is now a sum ofunstashed
andunarchived
stats.
Edited the popup UI and the other pages appropriately.This makes documentation's terminology more consistent, and simplifies UI a bit.
In particular, the
Retry
button ofQueued/Failed
stat line will both retry stashingunstashed
and archivingunarchived
reqres now.
-
-
Popup UI:
-
Reworked the whole thing quite a bit:
- Improved option names and help strings.
- Sorted sections and options to follow a more logically consistent order.
- Improved layout.
- Fixed some typos there.
-
From now on, setting
Bucket
for the current tab will setBucket
for its new children too, similar to how the rest of those settings work. -
From now on, setting any of the
Bucket
settings to nothing will reset it to the parent/default value.
I.e.:- Setting
Bucket
ofThis tab's new children
to nothing will reset it toBucket
value ofThis tab
. - Setting
Bucket
ofThis tab
to nothing will reset it toBucket
value ofNew root tabs
. - Setting
Bucket
ofNew root tabs
to nothing will reset it todefault
.
- Setting
-
-
-
The previous "Desktop"
JavaScript
-generated layout becamecolumns
CSS
layout andJS
-operation mode, while the "Mobile"JavaScript
-generated layout becamelinear
CSS
layout andJS
-operation mode.
The page will now automatically switch between these two layouts and modes synchronously, depending on viewport width.(As before, in
linear
mode hovering over a link does nothing, but incolumns
mode, hovering over a link referring to a target in popup UI scrolls the popup UI column to that target and highlights it.)I.e., this means that on a Desktop browser, you can now zoom the
Help
page to arbitrary zoom levels and it will just switch between layouts and link-hover behaviors depending on available viewport width. -
Greatly improved the styling of all links and documented it in the "Conventions" section.
-
-
All internal pages:
-
All internal pages now color-code links depending on where they point to, using exactly the same
CSS
as theHelp
page. -
All pages now use the same history state handling behaviour.
I.e., using the "Back" button of your browser will now not only go back, but also highlight the last link you clicked.
-
All documentation pages now set viewport width to
device-width
, set content'smax-width
to900px
andwidth
to100% - padding
, preventing horizontal scroll, when possible. -
Improved the
CSS
styling in general.
-
-
Core + Popup UI + General UI:
-
Implemented a new popup UI tristate toggle named
Color scheme
which allowsHoardy-Web
's color-scheme to be different from the browser's default. -
Implemented a mechanism and popup UI settings for applying additional themes and experimental features.
-
And then I looked at the date. Which is why ◥▅◤◢▅◣◥▅◤
Hoardy-Web
now has🦇 Halloween mode
. ◥▅◤◢▅◣◥▅◤. -
Also, from now on, the neutral states of tristate toggles are displayed with toggle knobs being in the middle of the things, not on their left.
This is not a political statement.
This mans that all tristate toggles, from left to right, now gofalse
->null
->true
both internally (exactly as they did before) and externally (which is new).
-
Changed: State display
-
Core + Toolbar button + Icons:
-
Replaced toolbar button's icons representing Cartesian products of other icons with animations.
In other words, the previous "this tab has limbo mode enabled while this tab's children do not" icon will now instead be represented with an animation that switches between "this tab has limbo mode enabled" and "this tab is idle" icons instead.
This both takes less space in the
XPI
/CRX
, makes for a cuter UI, and is the only reasonable solution when the core wants to display more than two icons at the same time. -
Improved toolbar button's badge and title format a bit.
"This tab" part goes first now, then "its new children", then "globally".
Also, the order of sub-parts of those strings is more consistent now.
-
From now on, internal UI updater will generate icon animation frames for all important statuses and setting states.
- When per-tab and per-tab's-new-children animation frames are equal, the repeated part will be elided.
- When per-tab and per-tab's-new-children animation frames differ, the
main
icon will be inserted at the end to make it obvious when the animation loop restarts (otherwise, it's easy to interpret such animation loops incorrectly).
-
The update frequency of toolbar button's icon, badge, and title now depends on the amount of not yet done stuff still queued in the core.
I.e., from now on, when the core has a lot of stuff to do (like when re-archiving thousands of reqres at the same time), it will start updating toolbar button's properties less to trade update latency for improved performance, and vice versa.
-
Greatly improved performance of state display updates. It's uses 2-1000x less CPU now, depending on what the core is doing.
-
-
Icons:
-
Renamed the
error
icon tofailed
and added a newerror
icon.From now on, the
failed
icon will only be used for archival/stashing errors, while theerror
icon will only be used for internal errors (i.e. bugs). -
Improved all icons to make them more visually distinct when they are being rendered at 48x48 or less, both in light and dark mode.
-
On Chromium, all icons are now rendered with transparent backgrounds, so now they will look nice in the dark mode too.
-
Added: State display
-
Core + Popup UI + Toolbar button:
-
From now on, popup UI and toolbar button's badge and title will display information about currently running internal actions.
(Implementing this took a surprising amount of effort in improvements to infrastructure code.)
-
-
Core + Toolbar button + Icons:
- Added a new
in_limbo
icon for "this tab has data in limbo" status.
Unlike most other icons, this icon will never be used alone, it will always be an animation frame of something longer.
- Added a new
-
Core + Popup UI + Toolbar button:
- Implemented
Animate toolbar icon every
setting for controlling toolbar icon animation speed.
- Implemented
Fixed: State display
-
Core + Toolbar button:
-
Fixed a bunch of bugs that prevented updates to toolbar button's icon and badge in some cases.
-
The icon and the badge will no longer get stuck when the core is very busy, like when re-archiving a lot of stuff all at once.
-
Added: Work offline
mode
-
Core + Popup UI + Toolbar button + Icons + Documentation:
-
Implemented
Work offline
mode, options, their popup UI, shortcuts, and icons.This mode does the same thing as
File > Work Offline
checkbox of Firefox, except it supports per-tab/per-other-origin operation, not just the whole-browser one.
Also, enabling any these options will not break requests that are still in flight, and the requests they do cancel can be logged.That is, enabling
Work offline
in a tab will start canceling all new requests that tab generates, and the resultingcanceled
reqres will get logged ifTrack new requests
option is enabled in the same tab.
Similarly for background tasks and other origins.This can be generally useful for debugging your own websites with dynamic responsive
CSS
, or if you just want to prevent a tab from accessing the network for some reason.However, the main reason this exists is that the files generated by
hoardy-web export mirror
do not getscrub
bed absolutely correctly at the moment, and the resulting pages can end up with some references to remote resources (in cases when an exported page uses some rareHTML
andCSS
tag combinations, or lazy-load images viaJavaScript
, but still).
WithWork offline
options enabled in a tab, you can now be sure that opening pages generated byhoardy-web export mirror
won't send any requests to the network.In fact, from now on, by default,
Hoardy-Web
will enableWork offline
in all tabs pointing tofile:
URLs.
This can be disabled in the settings. -
Documented it in more detain on the
Help
page. -
Added a new
offline
toolbar icon to display the above state.
-
Added: Reloading with state preserved
-
Core + Popup UI + Documentation:
-
Implemented
reloadSelf
action that reloads the add-on while preserving its state.This action is different from similar
Reload
buttons in browser's own UI in that trigge...
-
tool-v0.16.0
See CHANGELOG.md
.