Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All content needs to be rehosted #32

Open
zimmertr opened this issue Jun 2, 2020 · 24 comments
Open

All content needs to be rehosted #32

zimmertr opened this issue Jun 2, 2020 · 24 comments
Labels
Engineering Changes our tools and data pipeline

Comments

@zimmertr
Copy link

zimmertr commented Jun 2, 2020

These videos, images, and other information should be downloaded and rehosted elsewhere in addition to posting the original source. Otherwise the content is at stake of removed.

The content should also be easily available for mass download. This will prevent the loss of the content in the event that this repository is removed. Perhaps using something like bittorent or a self hosted peer-to-peer synchronization program akin to Google Drive/Dropbox.

I'm happy to donate towards hosting fees if necessary.

@2020PB
Copy link
Collaborator

2020PB commented Jun 2, 2020

We could use some CI rules to auto archive the linked footage. We currently have a repo with most of the videos archived (link on the README) but we should definitely automate it.

@zimmertr
Copy link
Author

zimmertr commented Jun 2, 2020

Thanks for pointing out the other repo in the Readme. I admit I missed this when glossing over it. However, I am not sure this is a long term solution given that the contents are not hosted and published by a third party. Right now they are at the mercy of GitHub/Microsoft which has not necessarily acted favorably towards less legal repos in the past.

Not to indicate this is currently less than legal. But I also don't trust that illegality is a requirement for action anymore. And I think that Microsoft would take a stance against the public if under enough pressure.

@2020PB
Copy link
Collaborator

2020PB commented Jun 2, 2020

That's correct, however that repo is also archived on IPFS so if it goes down it will not be gone. Do you know if there is a way to have an external API verify that it is being called by the CI scripts on a github repo? I have an API we can upload files to for IPFS reupload, but I don't want it to get spammed when I add the token.

I will look into this tonight, but if you know of a good way to do this please lmk and I will try to set something up.

@zimmertr
Copy link
Author

zimmertr commented Jun 2, 2020

I'm sorry I don't. However, I do have infrastructure development skills if you need assistance building out a server/cloud infrastructure for this project.

@2020PB
Copy link
Collaborator

2020PB commented Jun 2, 2020

Awesome, I will let you know! I don't think it should be necessary because we already have a good resource for hosting stuff, but you never know what we'll need later!

@hunterwilliams
Copy link

hunterwilliams commented Jun 3, 2020

@2020PB "Do you know if there is a way to have an external API verify that it is being called by the CI scripts on a github repo? I have an API we can upload files to for IPFS reupload, but I don't want it to get spammed when I add the token."

This should be a non-issue. You can attach the token in the CI server so that no one can read it back/see it assuming someone doesn't allow in a Pull Request that somehow makes it visible in CI logs. Note: I assume the host header could be checked as well.

@2020PB I've created this -> https://github.com/hunterwilliams/link-archival which can assist with archiving everything in an automated fashion. If there is a file structure etc you want that would be good to know. Needs a bit of process though to just drop in. Willing to assist. It can be used now though to download /screenshot some items (i haven't gotten through automating all of video downloads just twitter).

@zimmertr zimmertr changed the title All content need to be rehosted All content needs to be rehosted Jun 3, 2020
@mjmaurer
Copy link

mjmaurer commented Jun 3, 2020

Love the work being done here! To me, the ideal infra would be uploading images/vids to ipfs and using a P2P solution (gunjs, orbitdb) for storing structured JSON that links out to IPFS and includes additional metadata.

Edit: I think IPFS rehosting is more realistic for this repo as it exists

@mjmaurer
Copy link

mjmaurer commented Jun 3, 2020

I'd be happy to build in IPFS hosting, but I'd like #163 to be addressed. Rehosting on P2P makes it much more likely to always exist. I personally wouldn't want an image of me widely circulated without my consent.

@bonedaddy
Copy link
Collaborator

bonedaddy commented Jun 4, 2020

I've been archiving data onto IPFS, and have the following archived media:

Hosting data on IPFS may be a bit tricky if you want to be anonymous or not publicly be known as backing up the data. If you're using IPFS it is trivial at best to find out, and trace people hosting content. If you want anonymity or to not be identified as someone backing up the data, IPFS is not a good idea.

Latest archive

@mjmaurer
Copy link

mjmaurer commented Jun 4, 2020

As it is, someone gives up anonymity in the form of a PR

@ubershmekel
Copy link
Collaborator

@mjmaurer if you need anonymity - you can message the mods on reddit. Is that good enough?

@valadect
Copy link

valadect commented Jun 4, 2020

Mentioned in the other issue but made a script that will download all the videos and also screenshot the webpages for posterity. But hopefully that helps with the ephemeral nature of the internet. Downloading all the links now so very much still a WIP but feel free to play around with it https://github.com/valadect/pbbackup

@nathanfranke
Copy link

Would it be illegal to post media files in the repo itself so that people can back them up simply by cloning the repo? Of course this could just be a secondary backup method. (And bear with me since I am new to this community).

@valadect
Copy link

valadect commented Jun 5, 2020

@nathanfranke Considering no one is profiting from it and also the fact that it is for educational use it should be fine for the most part.

@tgalopin
Copy link

tgalopin commented Jun 5, 2020

I'd be happy to provide a backup server in France if that's useful, to avoid US law and companies.

@ubershmekel
Copy link
Collaborator

@nathanfranke there's this repo which I think is dedicated to files: https://github.com/pb-files/pb-videos

Though we don't have a good way to link between the two repos yet.

@bonedaddy
Copy link
Collaborator

If you want to be able to mirror media locally, and optionally upload to an IPFS node checkout the downloader tool in the tools folder.

@modelmat
Copy link

modelmat commented Jun 6, 2020

Perhaps reposting on LBRY might be useful?

@ubershmekel ubershmekel added the Engineering Changes our tools and data pipeline label Jun 6, 2020
@krmax44
Copy link

krmax44 commented Jun 6, 2020

Maybe looking into Archive.org for hosting might be worth a shot. They offer an S3-like API to upload files: https://github.com/vmbrasseur/IAS3API#internet-archive-s3-api-documentation

@DavidVorick
Copy link

DavidVorick commented Jun 6, 2020

Hey guys, just learning about this project, happy to host everything on Skynet. Skynet is a platform similar to IPFS, except instead of seeding the files yourself, a decentralized platform called Sia (similar to what Filecoin is meant to be) seeds the files for you. You get uptime + decentralization without having to host anything yourself.

How can I get started?

Is there a chatroom somewhere? Some of this might be easier to do in real time. I've got questions like:

  • What's the legality of these? Who owns the copyright? Can we get the uploaders to add CC0 licenses so that there are no legal conerns?
  • Where are we sourcing these from? Is this list complete or are there other places we can look?
  • Should we keep snapshots of the repo overtime? Or should we upload every file to Skynet individually and just keep a growing list? Do we want to add the Skylinks to the repo here?

@ubershmekel
Copy link
Collaborator

I'm not a lawyer, but I hope this data would fall under "fair use" such as in a documentary: https://en.wikipedia.org/wiki/Fair_use#Documentary_films

@nathanfranke
Copy link

I would hope it is considered criticism or documentary since getting permission from all filmers would be effectively impossible.

@xloem
Copy link

xloem commented Jun 7, 2020

Please link to the mirrors inside the repository so they can be found.

I have made a barebones sia-skynet remote for git-annex in https://github.com/xloem/gitlakepy (EDIT: fixed link) which should help skynet interoperate with git or datalad a little. I have also made a barebones bsv remote for git at https://github.com/xloem/git-remote-bsv providing for storing lightweight git repositories on a different storage-oriented blockchain than skynet. Unlike skynet bsv content cannot eventually be lost on the network.

@karan
Copy link

karan commented Jun 9, 2020

Absolutely need to do this.

What about using Internet Archive? If needed, I have a few TB of storage on my NAS I can donate on temporary basis as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Engineering Changes our tools and data pipeline
Projects
None yet
Development

No branches or pull requests