-
Notifications
You must be signed in to change notification settings - Fork 616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NIP-95 - Storage and Shared File #345
base: master
Are you sure you want to change the base?
Conversation
@Egge7 |
BSON can not exists inside a JSON as JSON is text and BSON is binary. BSON is a binary representation of a JSON object, which enables it to hold raw data keys (this makes it great for storage as you do not lose space due to encoding). Websockets can transmit blobs as well as text, but everything on nostr is designed to be text ,which is why I proposed a sub-protocol for binary transmissions. |
Is there a need for the |
I don't think so, since the id should contain the data and the signature will verify nothing changed. It would be pretty neat for large files to chunk them out across relays though, and then you could reduce load off one relay and query many for different chunks of the file. In that case you would probably want the hash for verification when you reassemble everything. |
Yep, breaking the file down into multiple events could be interesting to avoid connection hiccups requiring apps to restart the download from scratch. It could be very simple with a new "File Header" kind {
"id": <32-bytes lowercase hex-encoded sha256 of the the serialized event data>,
"pubkey": <32-bytes lowercase hex-encoded public key of the event creator>,
"created_at": <unix timestamp in seconds>,
"kind": 30065,
"tags": [
["d", <string with name of file>],
["decrypt",<algorithm>,<Decryption Params>],
["p", <32-bytes hex of a pubkey>, <recommended relay URL>],
["hash",< SHA256 hexencoded string of the complete raw data>],
["format",<"Base64" or "BSON">],
["e", <part1>, <relay>],
["e", <part2>, <relay>],
["e", <part...>, <relay>],
["e", <partn>, <relay>],
],
"content": "",
"sig": <64-bytes hex of the signature of the sha256 hash of the serialized event data, which is the same as the "id" field>
} |
Here is a link to the second layer protocol I proposed a couple of weeks back: https://github.com/nostr-ing/nostr-ing-protocol It is pretty much the same approach, but moved to a second layer that operates on raw data, so its much more efficient than doing all of this directly on nostr. It uses chunks by default in order to facilitate streams of live data too if applicable. Now I have to say, that I have not spent much time on this since then and never implemented it fully, but still believe that this is a good approach to handling data transmission. on top of nostr. |
If you use gzip to compress the entire JSON automatically when connecting with relays, Base64 is not that bad: https://lemire.me/blog/2019/01/30/what-is-the-space-overhead-of-base64-encoding/ |
Would be interesting to see performance comparison between @Egge7 's protocol and just keeping it in nostr with base64 chunks when performing encoding/decoding + reassembling + hash validation client / relay side |
As I said I only started spec'ing and never actually implemented anything. You can definitely do so and then do some benchmarking. However even without benchmarking we know that base64 will take up about 33% more space vs. raw binary data because it uses 6bits per byte. |
About the "hash" tag really makes no sense. It ended up coming in copy/paste on NIP94. I will remove it. I also changed it to be a regular event as being replaceable could cause more trouble than ease. With that the kind changed to 1064 |
This suggestion of dividing the data into several parts is very good. But perhaps the use of the e tag is not the ideal way. Because I need to know the event id of each part and what is its position in the order of the data. For example I have a file divided into 3 parts. I can first receive any event from the list and from there I have to be able to reconstruct all of them. And know in which position I include the first one I received. I think the tag idea is good but they could be put on the NIP-94 event. Because that way there would only be 1 event that represents the whole file and in it would have the list of parts to be downloaded |
The issue with that idea is that every part cites every other part. They become heavier objects to parse. Instead, my proposal was to do just one header kind that cites all parts, has the hash, decrypting info, etc. Each part only has its contents and cites the header. |
That's why I suggested the NIP-94 (#337 ) which is a file sharing header. |
That's different. NIP-94 is just hashed URL. There are no file "parts" baked into it. This one is for stored-in-relay files. |
It can be used in the same way. As I put in the description, the NIP-94 must be used to disclose the NIP-95 file referencing the event. Because the NIP-95 is not returned in broad searches and only if the event is effectively requested. I can reference the event by the e tag. And use multiple tags and for multiple parts. You don't need to create another kind for this. It is perfectly possible to use the NIP-94 |
I would separate them. It's too confusing to make them the same event and there might be reasons to separate a search for files in url from a search for files inside relays. Clients might want to support one but not the other, etc. There is no shortage of integers for each kind. |
I see no problem in creating another specific event. But the operation would be the same as the NIP-94 and would only remove the url tag. I'll put the definition of this new event here in the description. |
Hummmm NIP 94 shouldnt describe file parts. This new one should describe how to reassemble and test hashes. |
I think I get @frbitten idea. The relay is expected to save the file data in one complete disk file (or blob/bson at db). Then, it will serve it as an event (also, he says it could optionally serve from an https url, but this wouldn't be nostr). For this to work, the relay would recreate the event dynamically (with same created_at and all other keys), upon request, before serving it to a client.
People later suggested it would be better to split file in chunks. Now it won't be only one disk file/db record. NIP-94 could have (when referencing a NIP-95 event instead of a simple url tag) tags pointing to the parts like @vitorpamplona said: {
// ...,
tags: [
// ...,
["e", /* NIP-95 event id */, /* recommended relay url */], // first part
["e", /* NIP-95 event id */, /* recommended relay url */] // second part
]
} While a NIP-95 event would be {
id: 'an id',
pubkey: 'a pubkey',
tags: [
["e", /* NIP-94 event id */, /* recommended relay url */] // file header,
],
content: "string with sliced base64 data",
created_at: 11111111
} So NIP-95 wouldn't be stored as an event. It will be a file with created_at as file metadata, compressed base64 data and filename taken from the event.id. I think the e tag can't be a filesystem metadata like created_at can, so maybe it should be inside an auxiliary event-id.json. Well so better to put all metadata inside the json file like info about how the relay stored the main file, like the 🤔 All this to save like 28% of space in comparison to storing the uncompressed base64 event. Worth it! |
I am not sure why you would do all that. You can simply receive the event, take the base64 contents, convert it back to binary, save it on disk with the event ID as filename, and store the JSON without the Easy. |
In fact, relays should do that for all events whose |
They are all valid options. The issue is that the way to store it is up to the relay. NOSTR defines the form of communication. How the data is stored is up to the relay to define the best format. I see no problem spending a little more on the transfer for simplicity. As I mentioned above, I can't imagine anyone wanting to store gigabytes that way. The trend will be small files. The Relay can also define maximum limits. |
I see this NIP being used for profile pictures, images, and short videos inside posts. Large files, but not THAT large. |
I was trying to figure out what @frbitten suggested when he also said "the file name being the event id. So it can be easily found and searched. And because it is not in the database, it does not interfere with the indexing of common events." If the event isn't stored, relay wouldn't be indexing the event.id, created_at and e tag (although
Easier indeed. |
That's exactly the idea. This event is outside of all indexing, searching, and nostr relay processing. It will only be returned if requested by its own ID. This avoids a lot of processing and sending unwanted data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been referring to this NIP as I do some brainstorming for an experimental implementation of similar functionality. Since this is still an open PR, I'll share thoughts here in case you want to consider any of them for this NIP.
Things I'm planning on changing for my experiment:
The "meta" event (kind 1065):
- should have the lower ID, because it will be both searched/fetched by users first, and sent to the server first.
- MUST include metadata for mime type and (final, deserialized, binary) file size, in bytes, so that clients can choose whether they want to fetch it.
- MUST include an
"o""x" tag with the sha256 of the file. - Can support large(r) files by referring to multiple other event IDs, the binary contents of which can be concatenated together, in tag-order, to reconstruct the original file.
- MUST specify, when multiple "content" events are included, a tag
block_size
. The "block_size" is the binary size, in bytes, of the data of each block before the last block (which may be smaller, but not zero). This allows relays and clients to only fetch certain parts of a file, ex: when streaming video, or resuming downloads.
Clients:
Because of the overhead of signing/encoding/fetching/sending each message, clients should choose larger block sizes for multi-event file uploads when possible, but they may choose smaller block sizes when it makes sense for content, or to meet relays' max_message_length limitations/policies. Clients SHOULD prefer block sizes that are multiples of 3, because those encode into base64 efficiently. (3bytes:4chars).
Relays:
- MAY choose to reject a "meta" event if it doesn't want to store files, a user is over a quota, the file content type is not acceptable, the file is lager than allowed by that relay, or whatever reasons the relay operators wish. Rejection of the "meta" event serves as a proxy rejection for all of its associated "content" events, too, so clients shouldn't send "content" events in that case.
- Relays SHOULD reject any content events they receive if they have not yet accepted the associated "meta" event.
- Relays MAY(/SHOULD?) enforce a minimum block size for multi-part files, and/or a maximum number of parts, to avoid abuse. (encoding a large file with 1-byte blocks would have ridiculous overhead.)
Regarding the overhead of Base64 -- it's unfortunate for the wire protocol but short of changing that, the best workaround is that servers can just pull the file contents out and store them in binary (either in a file or as a BLOB). They may want to store the binary in a content-addressable store, and requiring a sha hash of the file helps with that.
I'm https://njump.me/nfnitloop.com if anyone wants to brainstorm more about this. I'm new to Nostr, so forgive me if I missed anything that should be obvious. 😅
95.md
Outdated
|
||
Another solution is for Relays that want to implement this functionality and use a No-SQL database with mongodb that already supports large documents without harming performance. | ||
|
||
The relay can allow access to this data via URL, having its own URL pattern for these cases. And if you receive a `NIP-94` referring to a `NIP-95` you can include the URL in the proper `NIP-94` tag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Including URLs in events can lead to linkrot. (Though they're a nice fallback for clients that haven't implemented this NIP for pulling contents out of events.)
Maybe a better alternative would be to document a way for relays to advertise with NIP-11 what the URL pattern is to fetch these files from it via HTTP(S).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the NIP-94 can be generated either by the client when creating the NIP-95 note (in this case you will need a definition of the URL format) or it can be generated by the relay itself to list your saved files. Because in NIP-94, the author of the note is not as important as it is the disclosure of the URL
"kind": 1064, | ||
"tags": [ | ||
["decrypt",<algorithm>,<Decryption Params>], | ||
["p", <32-bytes hex of a pubkey>, <recommended relay URL>], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is p
redundant with pubkey
here? Is this just so that you can recommend a relay URL for that pubkey?
Do you want to include an example type
tag here? Actually, metadata like that should probably be moved to the 1065 kind, right? Would be useful for clients to know mime type, size, dimensions, etc, before choosing whether to fetch the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Just to reference other users, following the NIP-01 standard. It is not mandatory.
fixed descriptions
The idea was for each NIP-95 event to have a NIP-94 to describe it. NIP-94 already supports all the description fields you suggested. To prevent clients and servers from having to transmit large volumes of data just so the user knows the basic characteristics of the file. Therefore, the NIP-95 event is only returned by relays in a direct request for its ID, it will not be returned in broad searches. Ex: Imagine that I have a 1MB image sent on a NIP-95. If I just want to know its size, I will need to receive the entire event of more than 1MB to have this information. With NIP-94 I have all the "meta" information without having to transfer 1MB on the network. |
FYI, Amethyst's implementation still uses kind |
Vitor well remembered. |
It was mostly to make sure NIP-94 clients don't get bogged down having to support and manage NIP-95 blobs. Separating the headers allows clients to filter what they can support (url-based files or base64 blobs). They can easily filter for both if they support both. Since the treatment of NIP-95 events is different (saving the blob to disc, etc), it think it is a good option to separate the header as well. In that way, Clients can design filters inside specific models to correctly process their returns. |
I was distracted and no longer remembered all the details. I adjusted it according to @vitorpamplona warning. Please see if @NfNitLoop meets your needs now. |
@frbitten I'll have a look a bit later today. But I didn't see you comment on my ideas for multi-part files. (breaking a large file up into smaller chunks so that each chunk fits within message size limits.) Too far for this NIP? It would probably break existing implementations, so it's probably best to make it its own kind, so that could be a separate NIP. 🤔 Can probably re-use the "content" event message and make a new "meta" kind, as with the NIP-94/1065 split. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(OK, I was curious so I looked now. 😊)
One of the features from my earlier brain-dump is that clients SHOULD send the meta (1065) event to relays before the content (1064) event(s). Then relays MAY reject the 1065 event, and thus spare themselves the bandwidth of receiving 1064 events if
- They don't support them at all
- The user/pubkey isn't allowed to post such events
- The file is too big
- The file (hash) has been banned
- etc, etc.
Nostr event | ||
------------------ | ||
This NIP specifies the use of the `1064` event type, having in `content` the binary data that you want to be stored in Base64 format. | ||
* `type` a string indicating the data type of the file. The MIME types format must be used (https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types/Common_types) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you missed it in my other comment but IMO type
probably belongs on kind 1065?
Ah, yes, it's already over there as an m
tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If kind 1065
is optional, type
is needed. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's important to separate the metadadta (kind 1065) and the real data (kind 1064). To quote another part of this doc:
Another defined event is the
1065
which is used as a header for the data contained in an event 1064. This way the data can be disclosed without overloading the communications when sending a large amount of data.
I don't need a "type" here if the "m" tag is present on 1065 -- it'll be redundant. And I'm not sure 1065 events should be considered optional. Clients should have a smaller 1065 metadata event available so that they can decide whether they want to download the (likely) larger 1064 event. (Or, in the case of multi-part attachments, multiple events.)
This must be defined in some way. I think we never did it because none of us had that need at the time. But breaking the file into many events and recombining them must be deterministic. |
Yep! See my definition of how to do it here.
So a multi-part content event would look like: {
"id": "…",
"pubkey": "…",
"sig": "…"
"created_at": 1234,
"kind": 1065,
"content": "(description)"
"tags": [
["m", <MIME type>],
["x",<Hash SHA-256>],
// (etc)
["size", "123456768"],
["block_size", "100000"]
["e", "event1"], // (base64-decoded content) must be block_size
["e", "event2"], // same as above
// … etc.
["e", "event_last"], // may be smaller, but non-zero. The remaining bytes.
]
} One cool feature of this approach is that since the k1065 lists all the event IDs of the parts, clients could optionally fetch different parts from different relays as a way to spread the bandwidth costs, or repair files that have missing/corrupted blocks. |
Nice! Do you have a demo somewhere? |
@vitorpamplona Not yet. I'm working toward implementing a demo, which is why I shared my thinking here. I'll have a lot more time to spend on it in a couple weeks. I'm taking the long/scenic route, writing my own relay (and probably web client) to test it out in. Wasn't sure if I should shove multiple events into a kind 1065 or grab a new kind # to experiment with. (Is there an atomic way to grab a kind # or nip # for this kind of experimentation? Or find one that's not yet in use?) |
If you are designing your own relays, just connect your tests to your relay and then delete when you finish the test. Otherwise, yes, just pick a number you have not seen yet in the major relays. |
To need to break into several events, I imagine you are thinking about transmitting very large files. I don't know if Nostr is the best option for this. One option is for event 1065 to have several |
It's not necessarily "very large" files. It's just more the idea of having a general-purpose solution. The current state of NIP-95 only supports files up to the (encoded) message size that relays will accept. I think I've seen that's usually something like 128KB? Even a simple JPEG can be larger than that.
This is just what I'm proposing above! 😊
Adding an index is unnecessary. Tags are in a JSON array, and JSON arrays (unlike objects/maps) are ordered. (Which I assume is precisely the reason that the tags are in an array. It gives them a deterministic order for the event ID hash function.) I've seen other nips warn about relying on ordering tags for giving extra meaning to them (ex: this But IMO in kind 1065, the ONLY |
The size limit of the relays is precisely because we don't want data that is not conversations. If this nip is merged they will prohibit these types. So dividing into several just to get around the limit is not a good option. Your suggestion is good but you have to pay attention to what the use would be. |
This is why we need a demo with lots of testing. I agree that breaking the size down just to avoid limits in the current relays is not a good solution. Breaking into multiple payloads should be about parallelizing downloads, avoiding websocket payload limits, or some other reason. |
I thought that's what I'd said. Maybe I communicated it poorly. 😅 Agreed, I'm not trying to work around web servers' content policies. In fact I explicitly made some suggestions to allow them to enforce policies:
i.e.: A server rejecting the kind 1065 event is also saying: don't send me any of its associated 1064 events either, they're not welcome here. |
I think we are all in agreement. Let's just make sure we test this file chunking thing very well. |
Working on my experimental implementation for multi-part files, and had some questions come up for kind 1065:
|
Summary is used as if it were the file name. The concept of file name is not valid in this case because we do not have a file but a data buffer. That's why summary was used instead of filename. It is a very short description to easily identify the data. For downloading, I think it's best to use the rule in NIP-96 that was created after this text. This maintains a standard already used. For the relay to inform you where the download path is, we have two options:
I would particularly prefer there to be a specific kind to access the operating properties of a relay. To keep everything within the nostr protocol and not have to use the HTTP protocol. But I'm not familiar with this format and needs to propose how it works. |
@vitorpamplona Hello, Vitor. I think the if NIP-95 image be quoted by other notes. It should display like other normal image not a quoted note. Maybe the NIP-95 is better for backup and recover. I am planning to add support for it. |
This is a possibility. Amethyst only displays them as quotes because we couldn't finish the new design to move the actions you can do to the image itself out of the usual quote interface, like react, boost, report, zap etc. |
Initially this NIP was together with the PR of NIP-94 but I thought it better to separate it because it will require a greater discussion and it is not necessary to link the approval of one NIP with the other.
I suggest you read #337 before starting the discussion here.
@Egge7 we can continue the discussion here.