Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send new article & reply URLs to Wayback machine #136

Closed
MrOrz opened this issue Oct 24, 2019 · 5 comments · Fixed by #344
Closed

Send new article & reply URLs to Wayback machine #136

MrOrz opened this issue Oct 24, 2019 · 5 comments · Fixed by #344

Comments

@MrOrz
Copy link
Member

MrOrz commented Oct 24, 2019

When user submits an article and reply, we can assume that the containing URLs can can be published to Wayback machine.

We should send these docs to Internet Archive so that in the future anyone wants its backup, they can have a trustful thirdparty's archive page to go to.

Send an archive: http://web.archive.org/save/${URL}
Get snapshot API: https://archive.org/help/wayback_api.php

@MrOrz
Copy link
Member Author

MrOrz commented Oct 24, 2019

If archiving requires headless browswer, we can implement the archiving function in url-resolver instead.
https://help.archive.org/hc/en-us/articles/360001513491-Save-Pages-in-the-Wayback-Machine

@MrOrz
Copy link
Member Author

MrOrz commented Oct 26, 2019

@MrOrz
Copy link
Member Author

MrOrz commented Jan 9, 2020

@MrOrz
Copy link
Member Author

MrOrz commented May 21, 2020

Also, here is a tool that can send to multiple archivers:
https://github.com/oduwsdl/archivenow

There is a server mode available, thus it seems that we can directly dockerize the server so that rumors-api can invoke it whenever it got a url to archive.

@MrOrz
Copy link
Member Author

MrOrz commented Nov 16, 2023

Another promising archiver is https://github.com/ArchiveBox/ArchiveBox
It will:

  • produce single file html
  • generate screenshot
  • extract text using readability and mercury
  • push to Internet Archive

We can also consider not directly plugging these tools into APIs. We can instead do batch archive using Cofacts API instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant