Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIP-95 - Storage and Shared File #345

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

frbitten
Copy link
Contributor

Initially this NIP was together with the PR of NIP-94 but I thought it better to separate it because it will require a greater discussion and it is not necessary to link the approval of one NIP with the other.

I suggest you read #337 before starting the discussion here.

@Egge7 we can continue the discussion here.

@frbitten frbitten mentioned this pull request Mar 10, 2023
@frbitten
Copy link
Contributor Author

@Egge7
Is BSON not supported by NOSTR just because no relay or client has implemented support for it? Or is it due to some BSON data parser issue where it is not expecting BSON to exist in the JSON string?
I've never worked with BSON and I don't know what implications its use in the event would have.

@Egge21M
Copy link
Contributor

Egge21M commented Mar 10, 2023

@Egge7 Is BSON not supported by NOSTR just because no relay or client has implemented support for it? Or is it due to some BSON data parser issue where it is not expecting BSON to exist in the JSON string? I've never worked with BSON and I don't know what implications its use in the event would have.

BSON can not exists inside a JSON as JSON is text and BSON is binary. BSON is a binary representation of a JSON object, which enables it to hold raw data keys (this makes it great for storage as you do not lose space due to encoding). Websockets can transmit blobs as well as text, but everything on nostr is designed to be text ,which is why I proposed a sub-protocol for binary transmissions.

@vitorpamplona
Copy link
Collaborator

Is there a need for the hash tag now that the contents are in the json itself?

@nschairer
Copy link

I don't think so, since the id should contain the data and the signature will verify nothing changed.

It would be pretty neat for large files to chunk them out across relays though, and then you could reduce load off one relay and query many for different chunks of the file. In that case you would probably want the hash for verification when you reassemble everything.

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Mar 16, 2023

Yep, breaking the file down into multiple events could be interesting to avoid connection hiccups requiring apps to restart the download from scratch.

It could be very simple with a new "File Header" kind

{
  "id": <32-bytes lowercase hex-encoded sha256 of the the serialized event data>,
  "pubkey": <32-bytes lowercase hex-encoded public key of the event creator>,
  "created_at": <unix timestamp in seconds>,
  "kind": 30065,
  "tags": [
    ["d", <string with name of file>],
    ["decrypt",<algorithm>,<Decryption Params>],
    ["p", <32-bytes hex of a pubkey>, <recommended relay URL>],
    ["hash",< SHA256 hexencoded string of the complete raw data>],
    ["format",<"Base64" or "BSON">], 
    ["e", <part1>, <relay>],
    ["e", <part2>, <relay>],
    ["e", <part...>, <relay>],
    ["e", <partn>, <relay>],
  ],
  "content": "",
  "sig": <64-bytes hex of the signature of the sha256 hash of the serialized event data, which is the same as the "id" field>
}

@Egge21M
Copy link
Contributor

Egge21M commented Mar 16, 2023

Here is a link to the second layer protocol I proposed a couple of weeks back: https://github.com/nostr-ing/nostr-ing-protocol

It is pretty much the same approach, but moved to a second layer that operates on raw data, so its much more efficient than doing all of this directly on nostr. It uses chunks by default in order to facilitate streams of live data too if applicable.

Now I have to say, that I have not spent much time on this since then and never implemented it fully, but still believe that this is a good approach to handling data transmission. on top of nostr.

@vitorpamplona
Copy link
Collaborator

If you use gzip to compress the entire JSON automatically when connecting with relays, Base64 is not that bad: https://lemire.me/blog/2019/01/30/what-is-the-space-overhead-of-base64-encoding/

@nschairer
Copy link

nschairer commented Mar 17, 2023

Would be interesting to see performance comparison between @Egge7 's protocol and just keeping it in nostr with base64 chunks when performing encoding/decoding + reassembling + hash validation client / relay side

@Egge21M
Copy link
Contributor

Egge21M commented Mar 17, 2023

Would be interesting to see performance comparison between @Egge7 's protocol and just keeping it in nostr with base64 chunks when performing encoding/decoding + reassembling + hash validation client / relay side

As I said I only started spec'ing and never actually implemented anything. You can definitely do so and then do some benchmarking. However even without benchmarking we know that base64 will take up about 33% more space vs. raw binary data because it uses 6bits per byte.

@frbitten
Copy link
Contributor Author

About the "hash" tag really makes no sense. It ended up coming in copy/paste on NIP94. I will remove it.

I also changed it to be a regular event as being replaceable could cause more trouble than ease. With that the kind changed to 1064

@frbitten
Copy link
Contributor Author

This suggestion of dividing the data into several parts is very good. But perhaps the use of the e tag is not the ideal way.

Because I need to know the event id of each part and what is its position in the order of the data.

For example I have a file divided into 3 parts.
In event 0 you would have tags for events 1 and 2.
In event 1 you would have a tag for events 0 and 2
In event 2 you would have a tag for events 0 and 1

I can first receive any event from the list and from there I have to be able to reconstruct all of them. And know in which position I include the first one I received.

I think the tag idea is good but they could be put on the NIP-94 event. Because that way there would only be 1 event that represents the whole file and in it would have the list of parts to be downloaded

@vitorpamplona
Copy link
Collaborator

The issue with that idea is that every part cites every other part. They become heavier objects to parse. Instead, my proposal was to do just one header kind that cites all parts, has the hash, decrypting info, etc. Each part only has its contents and cites the header.

@frbitten
Copy link
Contributor Author

The issue with that idea is that every part cites every other part. They become heavier objects to parse. Instead, my proposal was to do just one header kind that cites all parts, has the hash, decrypting info, etc. Each part only has its contents and cites the header.

That's why I suggested the NIP-94 (#337 ) which is a file sharing header.

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Mar 17, 2023

That's different. NIP-94 is just hashed URL. There are no file "parts" baked into it. This one is for stored-in-relay files.

@frbitten
Copy link
Contributor Author

It can be used in the same way.

As I put in the description, the NIP-94 must be used to disclose the NIP-95 file referencing the event. Because the NIP-95 is not returned in broad searches and only if the event is effectively requested.

I can reference the event by the e tag. And use multiple tags and for multiple parts.
The URL tag can also be used in this case if the relay provides the option to download NIP-95 event data via HTTP.

You don't need to create another kind for this. It is perfectly possible to use the NIP-94

@vitorpamplona
Copy link
Collaborator

I would separate them. It's too confusing to make them the same event and there might be reasons to separate a search for files in url from a search for files inside relays. Clients might want to support one but not the other, etc.

There is no shortage of integers for each kind.

@frbitten
Copy link
Contributor Author

Eu os separaria. É muito confuso torná-los o mesmo evento e pode haver motivos para separar uma pesquisa de arquivos em url de uma pesquisa de arquivos dentro de retransmissões. Os clientes podem querer oferecer suporte a um, mas não ao outro, etc.

Não faltam números inteiros para cada tipo.

I see no problem in creating another specific event. But the operation would be the same as the NIP-94 and would only remove the url tag.

I'll put the definition of this new event here in the description.

@vitorpamplona
Copy link
Collaborator

Hummmm NIP 94 shouldnt describe file parts. This new one should describe how to reassemble and test hashes.

@arthurfranca
Copy link
Contributor

arthurfranca commented Mar 17, 2023

[...] a possible solution is for this NIP not to be recorded in the database, but on disk, the file name being the event id [...]

I think I get @frbitten idea. The relay is expected to save the file data in one complete disk file (or blob/bson at db). Then, it will serve it as an event (also, he says it could optionally serve from an https url, but this wouldn't be nostr).

For this to work, the relay would recreate the event dynamically (with same created_at and all other keys), upon request, before serving it to a client. So, maybe the one who uploaded won't be the owner of the NIP-95 event (or else, relay wouldn't be able to recreate the event with same id). The private key would be one the relay owns.

Better yet, the private key used to sign the event when recreating it dynamically should be a shared one, so that other relays could use it to sign and recreate the event themselves – I think the NIP-94 event id could be used as a private key.

People later suggested it would be better to split file in chunks. Now it won't be only one disk file/db record.

NIP-94 could have (when referencing a NIP-95 event instead of a simple url tag) tags pointing to the parts like @vitorpamplona said:

{
  // ...,
  tags: [
    // ...,
    ["e", /* NIP-95 event id */, /* recommended relay url */], // first part
    ["e", /* NIP-95 event id */, /* recommended relay url */] // second part
  ]
}

While a NIP-95 event would be

{
  id: 'an id',
  pubkey: 'a pubkey',
  tags: [
    ["e", /* NIP-94 event id */, /* recommended relay url */] // file header,
  ],
  content: "string with sliced base64 data",
  created_at: 11111111
}

So NIP-95 wouldn't be stored as an event. It will be a file with created_at as file metadata, compressed base64 data and filename taken from the event.id. I think the e tag can't be a filesystem metadata like created_at can, so maybe it should be inside an auxiliary event-id.json. Well so better to put all metadata inside the json file like info about how the relay stored the main file, like the file extension (e.g. .gzip) used compression, created_at, stringified tags array.

🤔 All this to save like 28% of space in comparison to storing the uncompressed base64 event. Worth it!

@vitorpamplona
Copy link
Collaborator

I am not sure why you would do all that. You can simply receive the event, take the base64 contents, convert it back to binary, save it on disk with the event ID as filename, and store the JSON without the .content field in the database. When a client requests it, pull the event from the database, use the Id to fetch the .content from the disk, recreate the base64 version, and send it signed by the original author as if nothing has happened.

Easy.

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Mar 17, 2023

In fact, relays should do that for all events whose .content is bigger than.. say... 1MB... unless their DB solution handles binary content in disk really well.

@frbitten
Copy link
Contributor Author

They are all valid options. The issue is that the way to store it is up to the relay.
Another option is to use a NO-SQL database such as mongoDB which already supports large files (up to 16MBs) and could put everything in the database. And with that, it would gain the facility to replicate the database and files in several clustered instances to meet a large demand from customers.

NOSTR defines the form of communication. How the data is stored is up to the relay to define the best format. I see no problem spending a little more on the transfer for simplicity. As I mentioned above, I can't imagine anyone wanting to store gigabytes that way. The trend will be small files. The Relay can also define maximum limits.

@vitorpamplona
Copy link
Collaborator

I see this NIP being used for profile pictures, images, and short videos inside posts. Large files, but not THAT large.

@arthurfranca
Copy link
Contributor

I am not sure why you would do all that.

I was trying to figure out what @frbitten suggested when he also said "the file name being the event id. So it can be easily found and searched. And because it is not in the database, it does not interfere with the indexing of common events."

If the event isn't stored, relay wouldn't be indexing the event.id, created_at and e tag (although e tag could be ee instead just to avoid indexing if it is expected to request NIP-95 by its event id only). So it would save a bit of space.

I am not sure why you would do all that. You can simply receive the event, take the base64 contents, convert it back to binary, save it on disk with the event ID as filename, and store the JSON without the .content field in the database. When a client requests it, pull the event from the database, use the Id to fetch the .content from the disk, recreate the base64 version, and send it signed by the original author as if nothing has happened.

Easier indeed.

@frbitten
Copy link
Contributor Author

That's exactly the idea. This event is outside of all indexing, searching, and nostr relay processing. It will only be returned if requested by its own ID. This avoids a lot of processing and sending unwanted data

Copy link
Contributor

@NfNitLoop NfNitLoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been referring to this NIP as I do some brainstorming for an experimental implementation of similar functionality. Since this is still an open PR, I'll share thoughts here in case you want to consider any of them for this NIP.

Things I'm planning on changing for my experiment:

The "meta" event (kind 1065):

  • should have the lower ID, because it will be both searched/fetched by users first, and sent to the server first.
  • MUST include metadata for mime type and (final, deserialized, binary) file size, in bytes, so that clients can choose whether they want to fetch it.
  • MUST include an "o" "x" tag with the sha256 of the file.
  • Can support large(r) files by referring to multiple other event IDs, the binary contents of which can be concatenated together, in tag-order, to reconstruct the original file.
  • MUST specify, when multiple "content" events are included, a tag block_size. The "block_size" is the binary size, in bytes, of the data of each block before the last block (which may be smaller, but not zero). This allows relays and clients to only fetch certain parts of a file, ex: when streaming video, or resuming downloads.

Clients:
Because of the overhead of signing/encoding/fetching/sending each message, clients should choose larger block sizes for multi-event file uploads when possible, but they may choose smaller block sizes when it makes sense for content, or to meet relays' max_message_length limitations/policies. Clients SHOULD prefer block sizes that are multiples of 3, because those encode into base64 efficiently. (3bytes:4chars).

Relays:

  • MAY choose to reject a "meta" event if it doesn't want to store files, a user is over a quota, the file content type is not acceptable, the file is lager than allowed by that relay, or whatever reasons the relay operators wish. Rejection of the "meta" event serves as a proxy rejection for all of its associated "content" events, too, so clients shouldn't send "content" events in that case.
  • Relays SHOULD reject any content events they receive if they have not yet accepted the associated "meta" event.
  • Relays MAY(/SHOULD?) enforce a minimum block size for multi-part files, and/or a maximum number of parts, to avoid abuse. (encoding a large file with 1-byte blocks would have ridiculous overhead.)

Regarding the overhead of Base64 -- it's unfortunate for the wire protocol but short of changing that, the best workaround is that servers can just pull the file contents out and store them in binary (either in a file or as a BLOB). They may want to store the binary in a content-addressable store, and requiring a sha hash of the file helps with that.

I'm https://njump.me/nfnitloop.com if anyone wants to brainstorm more about this. I'm new to Nostr, so forgive me if I missed anything that should be obvious. 😅

95.md Outdated Show resolved Hide resolved
95.md Outdated

Another solution is for Relays that want to implement this functionality and use a No-SQL database with mongodb that already supports large documents without harming performance.

The relay can allow access to this data via URL, having its own URL pattern for these cases. And if you receive a `NIP-94` referring to a `NIP-95` you can include the URL in the proper `NIP-94` tag
Copy link
Contributor

@NfNitLoop NfNitLoop Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including URLs in events can lead to linkrot. (Though they're a nice fallback for clients that haven't implemented this NIP for pulling contents out of events.)

Maybe a better alternative would be to document a way for relays to advertise with NIP-11 what the URL pattern is to fetch these files from it via HTTP(S).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the NIP-94 can be generated either by the client when creating the NIP-95 note (in this case you will need a definition of the URL format) or it can be generated by the relay itself to list your saved files. Because in NIP-94, the author of the note is not as important as it is the disclosure of the URL

95.md Outdated Show resolved Hide resolved
"kind": 1064,
"tags": [
["decrypt",<algorithm>,<Decryption Params>],
["p", <32-bytes hex of a pubkey>, <recommended relay URL>],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is p redundant with pubkey here? Is this just so that you can recommend a relay URL for that pubkey?

Do you want to include an example type tag here? Actually, metadata like that should probably be moved to the 1065 kind, right? Would be useful for clients to know mime type, size, dimensions, etc, before choosing whether to fetch the file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Just to reference other users, following the NIP-01 standard. It is not mandatory.

@frbitten
Copy link
Contributor Author

I've been referring to this NIP as I do some brainstorming for an experimental implementation of similar functionality. Since this is still an open PR, I'll share thoughts here in case you want to consider any of them for this NIP.

Things I'm planning on changing for my experiment:

The "meta" event:

  • should have the lower ID, because it will be both searched/fetched by users first, and sent to the server first.
  • MUST include metadata for mime type and (final, deserialized, binary) file size, in bytes, so that clients can choose whether they want to fetch it.
  • MUST include an "o" tag with the sha256 of the file.
  • Can support large(r) files by referring to multiple other event IDs, the binary contents of which can be concatenated together, in tag-order, to reconstruct the original file.
  • MUST specify, when multiple "content" events are included, a tag block_size. The "block_size" is the binary size, in bytes, of the data of each block before the last block (which may be smaller, but not zero). This allows relays and clients to only fetch certain parts of a file, ex: when streaming video, or resuming downloads.

Clients: Because of the overhead of signing/encoding/fetching/sending each message, clients should choose larger block sizes for multi-event file uploads when possible, but they may choose smaller block sizes when it makes sense for content, or to meet relays' max_message_length limitations/policies. Clients SHOULD prefer block sizes that are multiples of 3, because those encode into base64 efficiently. (3bytes:4chars).

Relays:

  • MAY choose to reject a "meta" event if it doesn't want to store files, a user is over a quota, the file content type is not acceptable, the file is lager than allowed by that relay, or whatever reasons the relay operators wish. Rejection of the "meta" event serves as a proxy rejection for all of its associated "content" events, too, so clients shouldn't send "content" events in that case.
  • Relays SHOULD reject any content events they receive if they have not yet accepted the associated "meta" event.
  • Relays MAY(/SHOULD?) enforce a minimum block size for multi-part files, and/or a maximum number of parts, to avoid abuse. (encoding a large file with 1-byte blocks would have ridiculous overhead.)

Regarding the overhead of Base64 -- it's unfortunate for the wire protocol but short of changing that, the best workaround is that servers can just pull the file contents out and store them in binary (either in a file or as a BLOB). They may want to store the binary in a content-addressable store, and requiring a sha hash of the file helps with that.

I'm https://njump.me/nfnitloop.com if anyone wants to brainstorm more about this. I'm new to Nostr, so forgive me if I missed anything that should be obvious. 😅

The idea was for each NIP-95 event to have a NIP-94 to describe it. NIP-94 already supports all the description fields you suggested. To prevent clients and servers from having to transmit large volumes of data just so the user knows the basic characteristics of the file.

Therefore, the NIP-95 event is only returned by relays in a direct request for its ID, it will not be returned in broad searches.

Ex: Imagine that I have a 1MB image sent on a NIP-95. If I just want to know its size, I will need to receive the entire event of more than 1MB to have this information. With NIP-94 I have all the "meta" information without having to transfer 1MB on the network.

@vitorpamplona
Copy link
Collaborator

FYI, Amethyst's implementation still uses kind 1065 as a header for kind 1064 blobs. Kind 1065 is just a copy of NIP-94s kind 1063 file header without the URL-based tags.

@frbitten
Copy link
Contributor Author

frbitten commented Mar 28, 2024

FYI, Amethyst's implementation still uses kind 1065 as a header for kind 1064 blobs. Kind 1065 is just a copy of NIP-94s kind 1063 file header without the URL-based tags.

Vitor well remembered.
I no longer remember what was the reason for proposing to use a different kind and not using the 1063 kind of NIP-64. Remember?
Because the url tag in NIP-94 is not mandatory.

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Mar 28, 2024

It was mostly to make sure NIP-94 clients don't get bogged down having to support and manage NIP-95 blobs. Separating the headers allows clients to filter what they can support (url-based files or base64 blobs). They can easily filter for both if they support both.

Since the treatment of NIP-95 events is different (saving the blob to disc, etc), it think it is a good option to separate the header as well. In that way, Clients can design filters inside specific models to correctly process their returns.

@frbitten
Copy link
Contributor Author

I was distracted and no longer remembered all the details. I adjusted it according to @vitorpamplona warning. Please see if @NfNitLoop meets your needs now.

@NfNitLoop
Copy link
Contributor

@frbitten I'll have a look a bit later today. But I didn't see you comment on my ideas for multi-part files. (breaking a large file up into smaller chunks so that each chunk fits within message size limits.)

Too far for this NIP? It would probably break existing implementations, so it's probably best to make it its own kind, so that could be a separate NIP. 🤔 Can probably re-use the "content" event message and make a new "meta" kind, as with the NIP-94/1065 split.

Copy link
Contributor

@NfNitLoop NfNitLoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(OK, I was curious so I looked now. 😊)

One of the features from my earlier brain-dump is that clients SHOULD send the meta (1065) event to relays before the content (1064) event(s). Then relays MAY reject the 1065 event, and thus spare themselves the bandwidth of receiving 1064 events if

  • They don't support them at all
  • The user/pubkey isn't allowed to post such events
  • The file is too big
  • The file (hash) has been banned
  • etc, etc.

Nostr event
------------------
This NIP specifies the use of the `1064` event type, having in `content` the binary data that you want to be stored in Base64 format.
* `type` a string indicating the data type of the file. The MIME types format must be used (https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types/Common_types)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you missed it in my other comment but IMO type probably belongs on kind 1065?

Ah, yes, it's already over there as an m tag.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If kind 1065 is optional, type is needed. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important to separate the metadadta (kind 1065) and the real data (kind 1064). To quote another part of this doc:

Another defined event is the 1065 which is used as a header for the data contained in an event 1064. This way the data can be disclosed without overloading the communications when sending a large amount of data.

I don't need a "type" here if the "m" tag is present on 1065 -- it'll be redundant. And I'm not sure 1065 events should be considered optional. Clients should have a smaller 1065 metadata event available so that they can decide whether they want to download the (likely) larger 1064 event. (Or, in the case of multi-part attachments, multiple events.)

@vitorpamplona
Copy link
Collaborator

I didn't see you comment on my ideas for multi-part files.

This must be defined in some way. I think we never did it because none of us had that need at the time. But breaking the file into many events and recombining them must be deterministic.

@NfNitLoop
Copy link
Contributor

NfNitLoop commented Mar 28, 2024

@vitorpamplona

But breaking the file into many events and recombining them must be deterministic.

Yep! See my definition of how to do it here.

The "meta" (kind 1065) event:

  • [...]
  • Can support large(r) files by referring to multiple other (kind 1064) "content" event IDs, the binary contents of which can be concatenated together, in tag-order, to reconstruct the original file.
  • MUST specify, when multiple "content" events are included, a tag block_size. The "block_size" is the binary size, in bytes, of the data of each block before the last block (which may be smaller, but not zero). This allows relays and clients to only fetch certain parts of a file, ex: when streaming video, or resuming downloads.

So a multi-part content event would look like:

{
  "id": "…",
  "pubkey":  "…",
  "sig": "…"
  "created_at":  1234,
  "kind": 1065,
  "content": "(description)"
  "tags": [
    ["m", <MIME type>],
    ["x",<Hash SHA-256>],
    // (etc)
    ["size", "123456768"],
    ["block_size", "100000"]
    ["e",  "event1"],  // (base64-decoded content) must be block_size
    ["e",  "event2"], // same as above
    // … etc.    
    ["e", "event_last"], // may be smaller, but non-zero. The remaining bytes.
  ]
}

One cool feature of this approach is that since the k1065 lists all the event IDs of the parts, clients could optionally fetch different parts from different relays as a way to spread the bandwidth costs, or repair files that have missing/corrupted blocks.

@vitorpamplona
Copy link
Collaborator

Nice! Do you have a demo somewhere?

@NfNitLoop
Copy link
Contributor

Do you have a demo somewhere?

@vitorpamplona Not yet. I'm working toward implementing a demo, which is why I shared my thinking here. I'll have a lot more time to spend on it in a couple weeks. I'm taking the long/scenic route, writing my own relay (and probably web client) to test it out in.

Wasn't sure if I should shove multiple events into a kind 1065 or grab a new kind # to experiment with. (Is there an atomic way to grab a kind # or nip # for this kind of experimentation? Or find one that's not yet in use?)

@vitorpamplona
Copy link
Collaborator

Is there an atomic way to grab a kind # or nip # for this kind of experimentation? Or find one that's not yet in use?

If you are designing your own relays, just connect your tests to your relay and then delete when you finish the test. Otherwise, yes, just pick a number you have not seen yet in the major relays.

@frbitten
Copy link
Contributor Author

To need to break into several events, I imagine you are thinking about transmitting very large files. I don't know if Nostr is the best option for this.
Relays have a size limit at the event. This is how they limit the volume of data.

One option is for event 1065 to have several e tags. With each one having its part id and probably a second attribute that would be the index to define the concatenation order. I don't know if relying on the order in JSON is a good option.

@NfNitLoop
Copy link
Contributor

NfNitLoop commented Mar 28, 2024

To need to break into several events, I imagine you are thinking about transmitting very large files. I don't know if Nostr is the best option for this.

It's not necessarily "very large" files. It's just more the idea of having a general-purpose solution. The current state of NIP-95 only supports files up to the (encoded) message size that relays will accept. I think I've seen that's usually something like 128KB? Even a simple JPEG can be larger than that.

One option is for event 1065 to have several e tags.

This is just what I'm proposing above! 😊

and probably a second attribute that would be the index to define the concatenation order. I don't know if relying on the order in JSON is a good option.

Adding an index is unnecessary. Tags are in a JSON array, and JSON arrays (unlike objects/maps) are ordered. (Which I assume is precisely the reason that the tags are in an array. It gives them a deterministic order for the event ID hash function.)

I've seen other nips warn about relying on ordering tags for giving extra meaning to them (ex: this e tag is what I'm replying to, this e tag is the "root" of the thread.)

But IMO in kind 1065, the ONLY e tags that should be supported/allowed would be to the 1064 events that make up the file contents, so this isn't an issue there.

@frbitten
Copy link
Contributor Author

The size limit of the relays is precisely because we don't want data that is not conversations. If this nip is merged they will prohibit these types. So dividing into several just to get around the limit is not a good option.
The correct option would be to use relays that support this functionality.
For example, if you use the MongoDB database, the size limit for each event in the database is 16MB.

Your suggestion is good but you have to pay attention to what the use would be.

@vitorpamplona
Copy link
Collaborator

Your suggestion is good but you have to pay attention to what the use would be.

This is why we need a demo with lots of testing.

I agree that breaking the size down just to avoid limits in the current relays is not a good solution. Breaking into multiple payloads should be about parallelizing downloads, avoiding websocket payload limits, or some other reason.

@NfNitLoop
Copy link
Contributor

NfNitLoop commented Mar 28, 2024

Breaking into multiple payloads should be about parallelizing downloads, avoiding websocket payload limits,

I thought that's what I'd said. Maybe I communicated it poorly. 😅

Agreed, I'm not trying to work around web servers' content policies. In fact I explicitly made some suggestions to allow them to enforce policies:

One of the features from my earlier brain-dump is that clients SHOULD send the meta (1065) event to relays before the content (1064) event(s). Then relays MAY reject the 1065 event, and thus spare themselves the bandwidth of receiving 1064 events if

  • They don't support them at all
  • The user/pubkey isn't allowed to post such events
  • The file is too big
  • The file (hash) has been banned
  • etc, etc.

i.e.: A server rejecting the kind 1065 event is also saying: don't send me any of its associated 1064 events either, they're not welcome here.

@arthurfranca arthurfranca mentioned this pull request Mar 28, 2024
@vitorpamplona
Copy link
Collaborator

I think we are all in agreement. Let's just make sure we test this file chunking thing very well.

@NfNitLoop
Copy link
Contributor

NfNitLoop commented Apr 1, 2024

Working on my experimental implementation for multi-part files, and had some questions come up for kind 1065:

  • What's the difference between the summary tag, and the "description" which goes into the event's content? Might be worth calling that out. (👍 for explicitly documenting alt is for accessibility.)
  • Might be handy to have a name (or: fileName, file_name, file?) tag, to assist when folks want to download the file. (I'm going to use name.)

@frbitten
Copy link
Contributor Author

frbitten commented Apr 1, 2024

Working on my experimental implementation for multi-part files, and had some questions come up for kind 1065:

  • What's the difference between the summary tag, and the "description" which goes into the event's content? Might be worth calling that out. (👍 for explicitly documenting alt is for accessibility.)
  • Might be handy to have a name (or: fileName, file_name, file?) tag, to assist when folks want to download the file. (I'm going to use name.)

Summary is used as if it were the file name. The concept of file name is not valid in this case because we do not have a file but a data buffer. That's why summary was used instead of filename. It is a very short description to easily identify the data.

For downloading, I think it's best to use the rule in NIP-96 that was created after this text. This maintains a standard already used.

For the relay to inform you where the download path is, we have two options:

  • Use the NIP-96 format where there is a file with the settings at/.well-known/nostr/nip95.json.
  • In NIP-11 there is a configuration option and limits for each relay functionality. But there is no such forecast today.

I would particularly prefer there to be a specific kind to access the operating properties of a relay. To keep everything within the nostr protocol and not have to use the HTTP protocol. But I'm not familiar with this format and needs to propose how it works.

@haorendashu
Copy link
Contributor

@vitorpamplona Hello, Vitor. I think the if NIP-95 image be quoted by other notes. It should display like other normal image not a quoted note.

Maybe the NIP-95 is better for backup and recover. I am planning to add support for it.

@vitorpamplona
Copy link
Collaborator

@vitorpamplona Hello, Vitor. I think the if NIP-95 image be quoted by other notes. It should display like other normal image not a quoted note.

This is a possibility. Amethyst only displays them as quotes because we couldn't finish the new design to move the actions you can do to the image itself out of the usual quote interface, like react, boost, report, zap etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.