-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for inlinling via the id-hash #5281
Conversation
56a41e4
to
de8b51f
Compare
This comment has been minimized.
This comment has been minimized.
de8b51f
to
d3422a2
Compare
This comment has been minimized.
This comment has been minimized.
0da66f9
to
dd9e76b
Compare
dd9e76b
to
d1a2ddc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! That's really clean and elegant. However, we'll probably want to move that helper somewhere else (e.g., go-cid). I can do the gx stuff if you make the PR.
We are putting to much stuff in |
I kind of agree but that won't help. If we put a bunch of useful stuff in go-cid-util, we'll run into the same problem (depend on it everywhere). Really, we just need to make updating packages in gx less of a pain. |
I do not agree with this assessment. For one thing just to use the See ipfs/go-cid#68 |
Hm. Yeah, you're probably right. |
d1a2ddc
to
4761272
Compare
License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>
4761272
to
08c96ba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UI nits. I'd rather not bother explaining "identity hashing" to the user. They can just know that we're inlining the data into the CID.
core/commands/add.go
Outdated
@@ -120,6 +123,8 @@ You can now check what blocks have been created by: | |||
cmdkit.BoolOption(fstoreCacheOptionName, "Check the filestore for pre-existing blocks. (experimental)"), | |||
cmdkit.IntOption(cidVersionOptionName, "CID version. Defaults to 0 unless an option that depends on CIDv1 is passed. (experimental)"), | |||
cmdkit.StringOption(hashOptionName, "Hash function to use. Implies CIDv1 if not sha2-256. (experimental)").WithDefault("sha2-256"), | |||
cmdkit.BoolOption(inlineOptionName, "Inline small objects using identity hash. (experimental)"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just say "inline small blocks (or objects?) into CIDs". Users won't really care about how we do this.
core/commands/add.go
Outdated
@@ -44,6 +45,8 @@ const ( | |||
fstoreCacheOptionName = "fscache" | |||
cidVersionOptionName = "cid-version" | |||
hashOptionName = "hash" | |||
inlineOptionName = "inline" | |||
idHashLimitOptionName = "id-hash-limit" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--inline-limit
.
core/commands/add.go
Outdated
@@ -120,6 +123,8 @@ You can now check what blocks have been created by: | |||
cmdkit.BoolOption(fstoreCacheOptionName, "Check the filestore for pre-existing blocks. (experimental)"), | |||
cmdkit.IntOption(cidVersionOptionName, "CID version. Defaults to 0 unless an option that depends on CIDv1 is passed. (experimental)"), | |||
cmdkit.StringOption(hashOptionName, "Hash function to use. Implies CIDv1 if not sha2-256. (experimental)").WithDefault("sha2-256"), | |||
cmdkit.BoolOption(inlineOptionName, "Inline small objects using identity hash. (experimental)"), | |||
cmdkit.IntOption(idHashLimitOptionName, "Identity hash maxium size. (experimental)").WithDefault(64), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Maximum inline block size" (or something like that).
Also, should this imply --inline
?
core/commands/add.go
Outdated
if inline { | ||
fileAdder.CidBuilder = cidutil.InlineBuilder{ | ||
Builder: fileAdder.CidBuilder, | ||
Limit: idHashLimit} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💢
Limit: idHashLimit,
}
go fmt
is usually pretty good about being pedantic about things like this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran go format and it was happy. Very well I will change it.
core/commands/add.go
Outdated
@@ -123,7 +123,7 @@ You can now check what blocks have been created by: | |||
cmdkit.BoolOption(fstoreCacheOptionName, "Check the filestore for pre-existing blocks. (experimental)"), | |||
cmdkit.IntOption(cidVersionOptionName, "CID version. Defaults to 0 unless an option that depends on CIDv1 is passed. (experimental)"), | |||
cmdkit.StringOption(hashOptionName, "Hash function to use. Implies CIDv1 if not sha2-256. (experimental)").WithDefault("sha2-256"), | |||
cmdkit.BoolOption(inlineOptionName, "Inline small blcoks into CID. (experimental)"), | |||
cmdkit.BoolOption(inlineOptionName, "Inline small blocks into CIDs. (experimental)"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops sorry. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Np. I didn't want to bother you with it (and then made more work for myself by forgetting about GitCop 🙄).
c1438f3
to
fcd7ea2
Compare
License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>
fcd7ea2
to
b0f90e3
Compare
Fixed the failed test. It was not obvious what was going on see: #5398 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One final comment from @whyrusleeping. On further thought, I'd like to get the IPFS in browsers working group to comment on this.
core/commands/add.go
Outdated
@@ -120,6 +123,8 @@ You can now check what blocks have been created by: | |||
cmdkit.BoolOption(fstoreCacheOptionName, "Check the filestore for pre-existing blocks. (experimental)"), | |||
cmdkit.IntOption(cidVersionOptionName, "CID version. Defaults to 0 unless an option that depends on CIDv1 is passed. (experimental)"), | |||
cmdkit.StringOption(hashOptionName, "Hash function to use. Implies CIDv1 if not sha2-256. (experimental)").WithDefault("sha2-256"), | |||
cmdkit.BoolOption(inlineOptionName, "Inline small blocks into CIDs. (experimental)"), | |||
cmdkit.IntOption(inlineLimitOptionName, "Maximum block size to inline. (experimental)").WithDefault(64), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whyrusleeping says: don't add this right now, just make it 64. If users complain, we can revisit it later; this is the most conservative approach and should be good enough for pretty much all cases. That fine with you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to know the reasons for this. Is it the change in the API string because of the use of WithDefault() or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reasoning is simply "fewer knobs" (or "be conservative"). That is, we don't really need this knob so we might as well add it later when needed (when a user presents a use-case for changing the default).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't necessarily agree with that philosophy but won't object strongly.
I can see a case of wanting either 32 or 64 and don't think there is a clear answer. I am okay with limiting it to either 32 or 64 if you think that will be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User here looking forward to this feature, I'm mirroring a large-ish dataset using filestore to avoid disk duplication, in this case there is a few files being updated often which is causing issues with missing data, these are 41-42 bytes in size. example: http://distfiles.gentoo.org/experimental/timestamp.x
So minimal size here is 64 bytes - but I don't see any good reason to not add the option for the user to change this. Not adding this option could also result in assumptions being made about the actual max of ident hashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that we may actually need to define a limit on the maximum CID size and 64 bytes is really long (see my comment below). However, having to define a new block for a ~40 byte file really does suck.
Let's wait for the browser working group to comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes 64 bytes is long, but if it is part of a directory entry, the user is ever unlikely to need to actual CID and it won't show up as part of the URL. All the more reason to keep this option.
I could change the default to 32 though so it won't create a problem. When someone knows what they are doing they can increase it to suit there needs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless a user just adds a file directly. E.g., /ipfs/z3Fxd5vkFxAu4YbpXLBj8EvUCEeXLhNRGaYpEL43EHt99kaB82LfGbEVBqUJPaSZsHtJANxV8EGDZ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless a user just adds a file directly.
Correct. So make the default 32
to avoid increasing the size of hashes by default. When a user has a directory with lots of small files they want inline they can increase the limit .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need to talk to the browser people. I know there were talks about supporting https://CID.dweb.link
and that won't work for long CIDs. The current plan is ipfs://CID
and there may be length restrictions.
(nvm, didn't see the response from lidel)
This change allows us to inline small blocks into CIDs. For performance, we'd like to be able to inline up to N bytes of data into the CID itself. If
If we instead, e.g., limit the length to 32, we'd get:
The benefits are:
The drawbacks are:
So, the question is: in the short term, how important is it to fit CIDs into a single subdomain? This discussion is a continuation of multiformats/cid#21 which we had never really finished. |
@Stebalien It is not important. Don't worry about cid-in-subdomain, pick the best inline size for the job. Longer answer:
|
@Stebalien based on your @lidel and @NiKiZe comments I believe even more strongly that we should keep the If we are going to consider enabling I am okay with limiting the value to a small set of fixed values (32, 64, maybe 128), but I see that as an arbitrary limitation. Bases on your comment at multiformats/cid#21 (comment), values in between 32 and 64 might be useful. I am also okay with instead setting a maxim value. 64 might be okay, but 128 would increase the utility of this feature. Please do note, that we don't limit the size of CIDs anywhere else and I have in fact performed (informal) test with large CIDs and didn't ran into any problems, including (I believe) bitswap. When CIDs start getting extremely large we might run into filesystem limitations if they are ever stored, but 128 will not hit this limit. As the code is written now users can still create a Cid with the identity hash by using If we still decide not to provide this option we should stick with |
The issue is that this also applies to directories (it applies to all blocks). For example, if I create a directory and put a single
Will this cause a problem? The last time I discussed this with someone we discussed multiple sub-domains (break it up with dots every N bytes) but I'm not sure if that's a reasonable solution.
My primary concern is security. When coding, we tend to assume that CIDs are short so I worry that an attacker could cause problems by introducing large CIDs. If we limit the size of the CIDs we create, we can add hard limits later on without breaking everything. However, I think the performance limit here will be more like 1KiB (and we can just say "don't be absurd"). You've convinced me to keep the size option. Personally, I'd default to either 32 or 45 (it doesn't have to be a multiple of 2). The latter makes base32 encoded CIDs 80 characters long which I consider to be a reasonable upper limit. 45 byte max: /ipfs/bafkqallbmfqwcylbmfqwcylbmfqwcylbmfqwcylbmfqwcylbmfqwcylbmfqwcylbmfqwcylbmfqwccq |
I am leaning towards 32, just because at that size CIDs are guaranteed not to increase. I can be convinced otherwise and don't have a strong objection to some other number. |
It adds an ugly conversion step, but it is much better than not supporting long cids at all. If we have to convert, we could look into using encoding similar to one described in Human Friendly Names and putting every word as a separate DNS label (but that could hit 255 characters fast..). Added the idea to subdomain notes at ipfs/in-web-browsers#89 (comment)
Ouch, I did not think about inlining directories. In that context setting default
If we leave an option to change the default via (Still, I don't have strong feelings about this, we could work around issues introduced by |
Agreed with @lidel. Not a big deal, do what you want to here. The important thing for future browser support is cidv1b32 with go-ipfs. Once we can use the address bar, I believe the hashes can be very long and it won't cause any issues. The subdomain stuff is a short-term hack for security origins on the HTTP gateway, ideally we'll never even have to use it. |
As long as the option to change it is there, the default size won't be an issue.
|
I think the consensuses is to for now, go with a conservative default of 32. This is not a very useful size, but if you have a use for the feature you can increase the default size. At some point we should impose length limits on CIDs to keep things from getting out of hand, but we can do that later. For now I advice anyone using this feature to avoid setting the default too high to avoid running into this limit in the future. I can not see us imposing a limit smaller then 64 so that is very safe. Slightly larger values such as 128 may also be okay. From what @Stebalien told be going up to 1 KiB may even be okay, but I see that as really pushing it if the actual CID needs to be displayed. |
License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>
420125e
to
7e3265a
Compare
The issue here is the gateway (which is, itself, a stop-gap). The only way to put different IPFS "websites" in different origins is to use sub-domains. @kevina SGTM. |
Add support for inlining via the id-hash to the
ipfs add
command using the new Builder interface.