-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(gateway): Gateway.FastDirIndexThreshold #8853
Conversation
With the current interfaces it seems browsing through the directory to get its number of entries will always be blocking on the entry node fetch. At this time asserting that a directory has, say, 5000 entries to stop its listing implies fetching 5000 nodes (defeating the purpose of the cutoff), which even if parallelizing will quickly be halted on the missing/slow entries. Ideally I would need to extend the Directory interface to be able to get its number of entries (not the entries themselves) or maybe provide another version of its iterator that advances through the entries without fetching them (unless some property like Size is requested). |
On a second-third thought: the best I can do to deliver something useful here (which this is not it) without breaking interfaces is to make an independent Eventually we will need to make the directory iterator (linked above) parallel or switch entirely to the I'm removing review requests and marking this as 'In Progress' again but will at least need a confirmatory 👍 here to proceed, and also will be prioritizing other issues next week (unless explicitly told otherwise, your call @lidel; this should still be an easy patch that won't take too much time). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@schomatis refactoring generated dir listing to use UnixfsAPI.Ls()
is a good plan 👍
If you are ok with tacking this before CIDv1, that would be the best (I'd like to include this fix in go-ipfs 0.13, if possible) 🙏
817a8bb
to
1ae30bd
Compare
@lidel Implemented a usable limit using |
This is alternative take on the way we limit the HTML listing output. Instead of a hard cut-off, we list up to HTMLDirListingLimit. When a directory has more items than HTMLDirListingLimit we show additional header and footer informing user that only $HTMLDirListingLimit items are listed. This is a better UX.
1ae30bd
to
11d364b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @schomatis
I took it for a spin and worked as expected, but realized the UX was not the best – it was a hard cut-off, and it is better to show some items to users. Played a bit with different ways of communicating this and ended up with bit simpler solution:
- we list up to
Gateway.HTMLDirListingLimit
items - if limit is reached or surpassed, we dont fetch any remaining blocks, only show an actionable message in header and footer
- rebased this on top of refactor(gw/dir-index-html): remove gobindata #8872 (which removed the bindata from assets, making our code way simpler)
Lmk your thoughts on this.
Demo
$ ipfs config --json Gateway.HTMLDirListingLimit 2
Opening /ipfs/bafybeihfg3d7rdltd43u3tfvncx7n5loqofbsobojcadtmokrljfthuc7y
Confirmed we only fetch small number of blocks by opening dir listing on an empty repo, and then shutting down the daemon. Then, in offline mode, you can see only parent and the first few blocks were fetched:
$ ipfs ls -s --size=false --resolve-type=false bafybeihfg3d7rdltd43u3tfvncx7n5loqofbsobojcadtmokrljfthuc7y | wc -l
1864
$ ipfs ls -s bafybeihfg3d7rdltd43u3tfvncx7n5loqofbsobojcadtmokrljfthuc7y | wc -l
3
$ ipfs ls -s bafybeihfg3d7rdltd43u3tfvncx7n5loqofbsobojcadtmokrljfthuc7y
QmbQDovX7wRe9ek7u6QXe9zgCXkTzoUSsTFJEkrYV1HrVR - 1 - Barrel - Part 1/
QmdC5Hav9zdn2iS75reafXBq1PH4EnqUmoxwoxkS5QtuME - 10 - Pi Equals/
QmcyyLvDzCrduuvGVUQEh1DzFvM7UWGfc9sUg87PjjYCw7 - 100 - Family Circus/
Error: ipld: could not find Qmd8NDeJhzf614FSBxZwu4QD2Az14tQtJhQXJf8h4fqiSx
Note from Stewards sync:
|
From Discord thread: we're partially moving back to the
doesn't apply for the HAMT implementation (normally used for directories with many entries as in this use case). |
see explainer in docs/config.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did the refactor and pushed the updated version:
- Switched to using inexpensive dir listing via
Ls()
withResolveChildren(false)
. - Sizes from child nodes are read (fetched) only if the directory is smaller than
Gateway.FastDirIndexThreshold
- Size column is not present in directories over
Gateway.FastDirIndexThreshold
. - Setting
Gateway.FastDirIndexThreshold
to0
will always produce fast responses without fetching any child nodes, allowing gateway operators to decrease load even further.- Personally, I would set that as the default, and always return fast inexpensive result, but was worried people will complain about missing item sizes.
- I removed footer and header because they just felt like noise.
- Size column is not present in directories over
Demo
You can test with bafybeiggvykl7skb2ndlmacg2k5modvudocffxjesexlod2pfvg5yhwrqm
(has 10k items, loads fast)
Feedback appreciated.
Co-authored-by: Alan Shaw <alan.shaw@protocol.ai>
2022-04-21: per verbal, @Jorropo is going to do a quick check and approve/merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* fix(core/gateway): option to limit directory size listing * feat(gw): HTMLDirListingLimit This is alternative take on the way we limit the HTML listing output. Instead of a hard cut-off, we list up to HTMLDirListingLimit. When a directory has more items than HTMLDirListingLimit we show additional header and footer informing user that only $HTMLDirListingLimit items are listed. This is a better UX. * fix: 0 disables Gateway.HTMLDirListingLimit * refactor: Gateway.FastDirIndexThreshold see explainer in docs/config.md * refactor: prealoc slices * docs: Gateway.FastDirIndexThreshold * refactor: core/corehttp/gateway_handler.go ipfs/kubo#8853 (comment) * docs: apply suggestions from code review Co-authored-by: Alan Shaw <alan.shaw@protocol.ai> Co-authored-by: Marcin Rataj <lidel@lidel.org> Co-authored-by: Alan Shaw <alan.shaw@protocol.ai> This commit was moved from ipfs/kubo@25cc85f
TLDR: this PR skips costly "size" column if directory is bigger or equal
Gateway.FastDirIndexThreshold
.Details in https://github.com/ipfs/go-ipfs/pull/8853/files#pullrequestreview-942930258
Demo
You can test with
bafybeiggvykl7skb2ndlmacg2k5modvudocffxjesexlod2pfvg5yhwrqm
(has 10k items, loads fast)Click to expand old notes by schomatis
[re] "band-aid approach" from #8178 (comment)
Take notice that there may be some caching mechanism that won't apply this limit if the directory has already been served. Take it for a spin and see if this works for your use case. It's the basic implementation of the stopgap suggested but we might need to add either a timeout or fetch entries metadata in parallel (right now this is sequential and a missing/slow entry consumes time from the total gateway timeout).