Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Add ArchiveBox as an archival endpoint #380

Open
ctag opened this issue Dec 22, 2022 · 9 comments
Open

[Feature Request] Add ArchiveBox as an archival endpoint #380

ctag opened this issue Dec 22, 2022 · 9 comments
Labels
enhancement New feature or request

Comments

@ctag
Copy link

ctag commented Dec 22, 2022

ArchiveBox is a self-hosted web archival tool that stores pages in a variety of formats. While it appears there isn't a stable REST API yet, there is a pretty simple CLI that should be workable with linkding.

Would a pull request for adding Archivebox support be given consideration?

@sissbruecker
Copy link
Owner

Integrating this might be an option, but I'd prefer using a REST API rather than including a CLI tool into the Docker image, and then trying to drive that from the Django app.

Apart from that the expectations should be clarified. Creating the snapshot seems pretty clear. What do you expect to happen after the snapshot was created?

  • Should the snapshot link surface in the bookmark list?
  • Should this be an alternative to the Wayback machine integration, and replace the snapshot URL created by it?

An alternative could be to expose the web_archive_snapshot_url field in linkding's REST API, which would be fairly trivial to do, and then write a 3rd party app that:

  • Checks linkding for bookmarks that don't have a snapshot URL yet
  • Create an archive box snapshot for bookmarks that don't have one yet
  • Update the bookmark's snapshot URL in linkding

@sissbruecker sissbruecker added the enhancement New feature or request label Jan 6, 2023
@pratio
Copy link

pratio commented Aug 29, 2023

@sissbruecker It would be great if the snapshot link would replace the internet archive link. It would be enough if we can manually trigger it for the bookmarks that haven't been archived from the admin panel.

@webysther
Copy link

REST API is arrived: ArchiveBox/ArchiveBox#1397 (comment)

@huyz
Copy link

huyz commented Sep 14, 2024

How would this be different from having ArchiveBox pull an RSS feed from LinkDing?

@webysther
Copy link

How would this be different from having ArchiveBox pull an RSS feed from LinkDing?

I have no ideia, how do that?

@ctag
Copy link
Author

ctag commented Sep 14, 2024

That sounds like a good idea to me. From what I can tell linkding provides RSS feeds under settings -> integrations. And Archivebox can pull rss feeds.

I haven't tried it yet, but I imagine the big difference with that solution is that it's not as integrated. It would be nice to browse the bookmark within linkding and be able to select the at-home-archival link, without having to switch over to archivebox and then go hunting for it.

@Kos
Copy link

Kos commented Oct 10, 2024

I've did my first Archivebox backup of my Linkding feed yesterday. It was very easy: I grabbed the RSS link and added it to Archivebox UI with depth=1 and my archive methods of choice (wget + mercury + title + favicon).

I understand that Archivebox also comes with a scheduler that can refresh the backups regularly, so once my Linkding RSS is registered, I expect that it will keep my archive updated.

(This is good enough for me.)

Maybe the crux of this request is for Linkding to do its own orchestration of backups? E.g. whenever there's a new bookmark created, Linkding would enqueue a Huey task to archive it.

Opinion: If we'd like to go this way, my preferred solution would be for Linkding's background task to actually perform the backup using the tool of choice (e.g. wget), so that archivebox wouldn't be required in the first place. Archivebox itself is just an orchestrator of backups, so it seems redundant for one orchestrator to require another.

@webysther
Copy link

I start testing the snapshot option in linkding and works great! This kill the need of archivebox, I have tested a few websites and SingleFile make a great job cloning the page in one single file.

image

The option to freely upload another assets was great also, I have to spend more time to discovery how to by-pass the cloudflare for some websites...

@ctag
Copy link
Author

ctag commented Oct 11, 2024

Opinion: If we'd like to go this way, my preferred solution would be for Linkding's background task to actually perform the backup using the tool of choice (e.g. wget), so that archivebox wouldn't be required in the first place. Archivebox itself is just an orchestrator of backups, so it seems redundant for one orchestrator to require another.

I see the value of Archivebox as the multiple options it provides to get easy coverage of tools for archival. No single tool that I've tried has "just worked"(tm) on all webpages. For some pages only screenshots will actually save the content in the manner it was displayed.

@webysther That feature doesn't appear to have made it to the docker image. How does it fair against http://acid.matkelly.com/ ?

Edit: OK, I pulled the latest-plus image, and the snapshot feature does much better on the acid test than I anticipated.

image

I'm still leery of the pitfalls of using a single tool to archive, but given this feature is available in linkding and works quite well I think I'd be a mistake to ask for yet another feature just to integrate Archivebox. For page-doesn't-archive edge cases I think the RSS feed to Archivebox will suffice just fine.

I don't want to rugpull any further discussion, but as for my original ask I think this feature request can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants