Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIP96 - Adding pubkeys to file urls to allow data migration between media servers. #1097

Closed
wants to merge 1 commit into from

Conversation

quentintaranpino
Copy link

@quentintaranpino quentintaranpino commented Mar 2, 2024

Background:

Thanks to NIP96 we have standardised the way most (some 😅) of nostr clients upload files to media servers. Allowing the user to decide where to upload their files freely.

Problem:

Once the user has started using a server, if they decide to change, their data is scattered across multiple servers at once. Making administration (deletion, organization, etc.) difficult.

Solution:

Specify the pubkey that uploaded the file in the url's published by the servers, allowing to standardize the paths, not only of the files but also of the pubkey that uploaded it.

Example:

https://nostrcheck.me/media/89e14be49ed0073da83b678279cd29ba5ad86cf000b6a3d1a4c3dc4aa4fdd02c/1d7e174198e61331bee68c07c9df6d66ab3a904dc47704803abad9deebd635f0.webp

As you can see, the path contains the pubkey and the original_hash of the file, apart from its extension.

/89e14be49ed0073da83b678279cd29ba5ad86cf000b6a3d1a4c3dc4aa4fdd02c/1d7e174198e61331bee68c07c9df6d66ab3a904dc47704803abad9deebd635f0.webp

If nostr clients remove the server and use the list of NIP96 servers that has configured (example bellow of how I implemented it in Coracle), we will add resilience to the availability of the files over time, and we will also allow the user to decide where to have these files.

image

Having (or not) the feature of data migration by the servers would not be necessary, since the path of the files uploaded by the user to a new server should be the same (same pubkey + same original hash).

Finally, my server (nostrcheck-server) and all implementations using it, will have the option to download all user data in a compressed file to be imported / exported on demand wherever you want.

nostrcheck-server as of version 0.5.0 already implements this.

@quentintaranpino
Copy link
Author

This change does not break backward compatibility, as all the operation is the same.

The only thing that would happen is that if a server does not implement this it may have a disadvantage with the others.

@quentintaranpino
Copy link
Author

Hello @fiatjaf @arthurfranca What do you think about this? The goal is to take weight off the servers and be able to re-generate the same urls deterministically on any server. This will allow to migrate data with ease and reduce the dependency that Nostr has with media hostings.

@arthurfranca
Copy link
Contributor

If I got it right, the problem you want to solve is that a media url on a note may have been uploaded by someone else other than the note author. And your idea is to know who was the uploader so client can look for their kind:10096 list to be able to try fallbacking to other media servers listed there when media download fails.

If that's the case, I think adding an extra nip96u tag to NIP-94, meaning nip96 uploader's pubkey, would be better. This extra info can be added to the url (e.g. using NIP-54 proposal) along with ox to enable the above mentioned fallback when the uploader isn't the same as the note author.

@quentintaranpino
Copy link
Author

Hello!

I don't think I made myself clear. Basically what I want is to normalise the URL's of all the media servers.

https://server.name/pubkey/original_hash

This way, users have control of what they have uploaded to the servers and can eventually create an account or contact the server and request its removal. ( Or even make requests to the server's API to delete the files themselves.)

The current alternative does not define this and therefore servers can generate or organise files in their own way.

If we make this change, all url's will be created equal for the same file, example:

https://nostrcheck.me/media/89e14be49ed0073da83b678279cd29ba5ad86cf000b6a3d1a4c3dc4aa4fdd02c/3a2714bcf073b7a5c7a5d3bfebb717b15f624a1fccfb0a276af3c95d16a7220e.webp

This file, uploaded on another server should have the same URL only changing the first part as its hash is the same and the pubkey is the same as well.

This allows this person to have control over what has been uploaded, it also allows to re-upload the file on several servers at the same time, it paves the way for nostr clients to allow to migrate data from one server to another (NIP 96 forces to have an endpoint to delete), imagine a button in your favourite client that says (migrate data, or delete all your media files). With NIP96 and this patch this would be possible.

Of course the nostr clients can go through their list of NIP96 servers in case a URL doesn't work, as fallback.

I think it's a small change, that doesn't affect current implementations (whoever doesn't implement it will lose competitive advantage) and really only brings more sovereignty to the user.

@arthurfranca
Copy link
Contributor

it paves the way for nostr clients to allow to migrate data from one server to another (NIP 96 forces to have an endpoint to delete), imagine a button in your favourite client that says (migrate data, or delete all your media files).

How clients would know all files an user uploaded if your PR gets merged? How the file migration flow would be different from today?

Currently, clients can only rely on kind:1063 events (NIP-94) for files they care to keep track of. But there is currently no standard way to get all media urls that were inlined on kind:1 events, for example, and I don't see how /<pubkey>/<hash> path on download urls would change that (we could, however, standardize an one-letter/indexable tag as a flag to mean "this note has one or more media urls", so clients can download all of them and extract the urls).

See on line 164 that nip94_event.tags.*.url "Could also be any url to download the file"? Would it still be the case?
Cause initially the NIP-96 PR had just the fixed api paths but people also asked to have alternative custom download urls.
So currently, the server MUST have a standard /<hash> download url but may also additionally use a /any-path/any-file-name and use it to set the nip94_event.tags.*.url value as a custom download url.

Why you changed just the download section but other routes still use /<hash>?


The way I understand is that the /<pubkey>/<hash> path you want (instead of the current spec with just /<hash>) is an implementation detail of how the server can keep track of who uploaded each file and it would lead to file duplication (two different users can upload the same file with same hash => /<alice-pubkey>/<same-hash> and /<bob-pubkey>/<same-hash>).

An alternative implementation without using a database is, for example, adding a /<hash>/owners.txt with pubkeys of everyone who uploaded the same file, and remove an entry when an user asks for media file deletion. It could also have a /<bob-pubkey>-files.txt with all of Bob's file hashes and the list should be updated upon media file deletion/upload issued by Bob.


Maybe what we need is a server route that returns a .txt with all hashes of files uploaded by the user?

@quentintaranpino
Copy link
Author

In the end, it all comes down to the same goal that was sought with the hash in the URL, that easily with the file's URL you can know the hash of the original file and (now also with this PR) know who uploaded this file.

Internally all servers (at least the ones that implement my software), already discriminate who has uploaded which file, regardless of how the URL is displayed, this is not a problem to solve with this PR, I think that the optimisation of disk space for two files uploaded by different people is a server-side thing.

See on line 164 that nip94_event.tags..url "Could also be any url to download the file"? Would it still be the case?
Cause initially the NIP-96 PR had just the fixed api paths but people also asked to have alternative custom download urls.
So currently, the server MUST have a standard / download url but may also additionally use a /any-path/any-file-name and use it to set the nip94_event.tags.
.url value as a custom download url.

The goal is to unify the urls from /pubkey/hash, but allow each server to use its own custom paths (as long as they are specified in the nip96.json file). That is, we are not forcing the server to have https://media/must-be-this-url/pubkey/hash, it is simply https://custom-server/custom-url/pubkey/hash.

What am I looking for with this (and I have already implemented it in nostrcheck and other servers with the same engine)?

Before, when a file was uploaded by a pubkey that did not have an account on the server (in my case they are free, but in the case of nostr.build for example they can be paid). The upload was placed in a "public" folder without specifying which pubkey uploaded it (because it was not registered). I think they do something similar in the case of nostr.build.

This has changed now in my software, what do we gain? If a pubkey that never had a relationship with nostrcheck.me, uploaded a file there one day, if his nostr client has adopted NIP96 he will be able to delete this file even if he never had an account with the server. I am convinced that this is an improvement over the current situation, because the alternative is to have a signed note with your pubkey, with a URL pointing to an image that you no longer want to show to the world.

How clients would know all files an user uploaded if your PR gets merged? How the file migration flow would be different from today?

Maybe what we need is a server route that returns a .txt with all hashes of files uploaded by the user?

Yes, I think that would be an interesting thing to require in NIP96. I will work on it and upload changes to complement this PR, I think that apart from the endpoint for DELETE we should urge the servers to have an endpoint to return the original_hash of all files uploaded by that pubkey. (protected by NIP98 of course)

Finally, I insist. This change would not break the current compatibility with NIP-96, because if a server does not want to adopt these measures it will only lose competitiveness (apart from not complying with the standard), but everything done so far could be migrated gradually in a smooth way over time.

@arthurfranca
Copy link
Contributor

Before, when a file was uploaded by a pubkey that did not have an account on the server (in my case they are free, but in the case of nostr.build for example they can be paid). The upload was placed in a "public" folder without specifying which pubkey uploaded it (because it was not registered). I think they do something similar in the case of nostr.build.

This has changed now in my software, what do we gain? If a pubkey that never had a relationship with nostrcheck.me, uploaded a file there one day, if his nostr client has adopted NIP96 he will be able to delete this file even if he never had an account with the server. I am convinced that this is an improvement over the current situation, because the alternative is to have a signed note with your pubkey, with a URL pointing to an image that you no longer want to show to the world.

I'm feeling dumb cause I still don't get it 😱. Isn't NIP-98 Authorization header already enough to know the uploader's pubkey so that the user can delete the file later? Your nip96.json has "is_nip98_required":true so your server will always know who is uploading a file. Your server just needs to keep track of all different pubkeys that uploaded a file with a specific orginial hash.

Explain Like I'm Five? ^^

@quentintaranpino
Copy link
Author

Before, when a file was uploaded by a pubkey that did not have an account on the server (in my case they are free, but in the case of nostr.build for example they can be paid). The upload was placed in a "public" folder without specifying which pubkey uploaded it (because it was not registered). I think they do something similar in the case of nostr.build.

This has changed now in my software, what do we gain? If a pubkey that never had a relationship with nostrcheck.me, uploaded a file there one day, if his nostr client has adopted NIP96 he will be able to delete this file even if he never had an account with the server. I am convinced that this is an improvement over the current situation, because the alternative is to have a signed note with your pubkey, with a URL pointing to an image that you no longer want to show to the world.

I'm feeling dumb cause I still don't get it 😱. Isn't NIP-98 Authorization header already enough to know the uploader's pubkey so that the user can delete the file later? Your nip96.json has "is_nip98_required":true so your server will always know who is uploading a file. Your server just needs to keep track of all different pubkeys that uploaded a file with a specific orginial hash.

Explain Like I'm Five? ^^

My server knows that pubkey has uploaded the file (it has always known), but the nostr client does not know if that URL is a file uploaded by the same user who has pasted it in a type 1 note, or if it is not.

The same argument can be said for the hash. My server already knows the original hash, but of course we assume that we want a system that does not require trust in a third party, so the hash is published.
In this context, obviously the servers (if they are configured in this way) already know the user's pubkey, but a third party does not know that.

Example:

Bob uploads a file via his favourite nostr client via NIP96 to a media server.

Alice sees this note, copies the URL and pastes it into a note of her own.

With the current system, you can't establish a URL<->Pubkey relationship unless you ask the servers, so a third party won't know if that URL was generated by Bob or Alice.

With the new system, anyone knows that that original_hash was uploaded by that pubkey, allowing among other things to "advance" that information to a client before requesting a DELETE of a file that eventually might not be theirs.

In a future where NIP96 is widely used and where there are thousands of media servers, this will facilitate the "human" reading of a URL, as well as establishing an easy to interpret fallback mechanism for nostr clients:

servers-url-published-nip96/pubkey/hash

This could improve the resilience of the data.

I feel a bit dumb too, because I'm not able to express correctly what I'm trying to improve with this. I apologise in advance for the likelihood that you have wasted your time with this.

(Thinking out loud (I haven't thought about it enough), if the length of the final URL could be a problem we could set up a hash that combines the pubkey + original_hash, getting "short" URL's as now but with the information of the pubkey that uploaded it intrinsic. The latter would require rethinking some PR stuff, but it would be a pretty elegant solution in my opinion.)

@arthurfranca
Copy link
Contributor

the nostr client does not know if that URL is a file uploaded by the same user who has pasted it in a type 1 note, or if it is not.

Ohhhhhh now I get it. But I kinda addressed it in my first reply. We should introduce nip96u metadata (uploader's pubkey), it seems more correct than adding the pubkey to the url, specially because the current spec says that nip94_event.tags.*.url can be any path, even without the hash as filename. This apart from the api routes path rules.

With the new system, anyone knows that that original_hash was uploaded by that pubkey, allowing among other things to "advance" that information to a client before requesting a DELETE of a file that eventually might not be theirs.

Yeah, though client should not auto delete media along with the note (may ask though) cause the same media may had been reused on other notes the user owns.

Currently the nip94_event.tags.*.url can be https://any.thing/custom-path/custom-filename. But clients can add ox and nip96u.

With #521 it would look like https://any.thing/custom-path/custom-filename#ox=<hash>&nip96u=<pubkey> while with NIP-92 it would be ["imeta", "url https://any.thing/custom-path/custom-filename", "ox <hash>", "nip96u <pubkey>"].

@@ -164,7 +164,7 @@ The upload response is a json object as follows:
// Could also be any url to download the file
// (using or not using the /.well-known/nostr/nip96.json's "download_url" prefix),
// for load balancing purposes for example.
["url", "https://your-file-server.example/custom-api-path/719171db19525d9d08dd69cb716a18158a249b7b3b3ec4bbdec5698dca104b7b.png"],
["url", "https://your-file-server.example/custom-api-path/pubkey/719171db19525d9d08dd69cb716a18158a249b7b3b3ec4bbdec5698dca104b7b.png"],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of changing the URL scheme now. Also, I do not see the point of this or how it will help anything. There might be a use case for this, but since we do not (or should not) allow directory browsing, having npub in the URL does not really help.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, especially when #1236 would allow the user to list their files and easily migrate them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is to make it easy to know if a file was uploaded by a pubkey or another, it is clear that with a call to the API we can have it, but for a user it will not be easy.

I think this is in line with what blossom promotes, url's standards, in this case the original pubkey (in my opinion) is very relevant.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if you are considering that maintaining additional information comes at a cost, compute or otherwise. Adding npub to a URL will require to somehow know that this npub has this hash under it, and that some other npub might have the same hash. The less info you need to resolve what to serve, the cheaper, faster and more efficient you can do it.

I see it all the time on the nostr, most people think in single servers and single DBs, and nothing about how and when can you scale. If you run a single server then you are good. If you run a highly distributed infra across the globe and trying to shave nanoseconds of time from each request because you serve thousands a second, and trying to remove any possible call to anything when serving a file. Then you are screwed with all of this. It is nice to have things that nobody needs or cares about.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@quentintaranpino I think a better and simpler way to solve the issue you mention on this pull request is the following:

  1. Obtain a list of the servers that were used to upload media.
  2. Query those servers for all files.
  3. Upload them to the new server.
  4. Request deletion from the original server.

This does not require having the npub in the url.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal was to not rely on API calls to know if an image was uploaded by a pubkey. In case nobody sees the point, I give up 😂

@arthurfranca @v0l @fishcakeday @sant0s12

I am going to close this PR, if you think there is something interesting in it you can add it to the other PR's that are open.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spirit of this PR will live on https://link.to/<image-sha256>.png#ox=<hash>&nip96u=<pubkey> kind of urls if nip96u tag is added heheh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants