Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gateway): Gateway.FastDirIndexThreshold #8853

Merged
merged 8 commits into from
Apr 28, 2022

Conversation

schomatis
Copy link
Contributor

@schomatis schomatis commented Apr 6, 2022

Toward #8178,

TLDR: this PR skips costly "size" column if directory is bigger or equal Gateway.FastDirIndexThreshold.

Details in https://github.com/ipfs/go-ipfs/pull/8853/files#pullrequestreview-942930258

Demo

You can test with bafybeiggvykl7skb2ndlmacg2k5modvudocffxjesexlod2pfvg5yhwrqm (has 10k items, loads fast)

Screen Shot 2022-04-14 at 23 54 55


Click to expand old notes by schomatis

[re] "band-aid approach" from #8178 (comment)

ipfs config --json Gateway.FastDirIndexThreshold 3
ipfs daemon --offline

mkdir big-dir
touch big-dir/file{1..5}
BIG_DIR=$(ipfs add big-dir -r -Q)
echo "http://localhost:8080/ipfs/$BIG_DIR"
# http://localhost:8080/ipfs/QmZsx9Brmf8KGgr7o8P895WPJ7DSqTY8h2fWkvHDSnRJkM

rm big-dir/file{3..5}
BIG_DIR=$(ipfs add big-dir -r -Q)
echo "http://localhost:8080/ipfs/$BIG_DIR"
# http://localhost:8080/ipfs/QmTb1wneppbtKYuR6EwJdnieGVyrtXz3VebYU2WG3GQ1Pz

Take notice that there may be some caching mechanism that won't apply this limit if the directory has already been served. Take it for a spin and see if this works for your use case. It's the basic implementation of the stopgap suggested but we might need to add either a timeout or fetch entries metadata in parallel (right now this is sequential and a missing/slow entry consumes time from the total gateway timeout).

@schomatis schomatis requested review from alanshaw and lidel April 6, 2022 20:36
@schomatis schomatis self-assigned this Apr 6, 2022
@schomatis
Copy link
Contributor Author

With the current interfaces it seems browsing through the directory to get its number of entries will always be blocking on the entry node fetch. At this time asserting that a directory has, say, 5000 entries to stop its listing implies fetching 5000 nodes (defeating the purpose of the cutoff), which even if parallelizing will quickly be halted on the missing/slow entries.

Ideally I would need to extend the Directory interface to be able to get its number of entries (not the entries themselves) or maybe provide another version of its iterator that advances through the entries without fetching them (unless some property like Size is requested).

@schomatis
Copy link
Contributor Author

On a second-third thought: the best I can do to deliver something useful here (which this is not it) without breaking interfaces is to make an independent UnixfsAPI.Ls() call to have the directory entries as fast as possible and decide if we want to go ahead and sequentially fetch entries metadata later (with the N limit of the band-aid approach). UnixfsAPI.Ls() is what ipfs ls uses and basically we can do a ipfs ls --size=false --resolve-type=false internally to get the parallel entry count.

Eventually we will need to make the directory iterator (linked above) parallel or switch entirely to the UnixfsAPI.Ls() API (which sounds like the right thing to do; we probably shouldn't have been using UnixfsAPI.Get() in the first place).

I'm removing review requests and marking this as 'In Progress' again but will at least need a confirmatory 👍 here to proceed, and also will be prioritizing other issues next week (unless explicitly told otherwise, your call @lidel; this should still be an easy patch that won't take too much time).

@schomatis schomatis removed request for alanshaw and lidel April 8, 2022 13:06
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schomatis refactoring generated dir listing to use UnixfsAPI.Ls() is a good plan 👍
If you are ok with tacking this before CIDv1, that would be the best (I'd like to include this fix in go-ipfs 0.13, if possible) 🙏

config/gateway.go Outdated Show resolved Hide resolved
@schomatis schomatis force-pushed the schomatis/fix/gw/restrict-listing branch from 817a8bb to 1ae30bd Compare April 11, 2022 23:28
@schomatis schomatis requested a review from lidel April 11, 2022 23:31
@schomatis
Copy link
Contributor Author

@lidel Implemented a usable limit using UnixfsAPI.Ls().

schomatis and others added 2 commits April 12, 2022 04:42
This is alternative take on the way we limit the HTML listing output.
Instead of a hard cut-off, we list up to HTMLDirListingLimit.
When a directory has more items than HTMLDirListingLimit we show
additional header and footer informing user that only $HTMLDirListingLimit
items are listed. This is a better UX.
@lidel lidel force-pushed the schomatis/fix/gw/restrict-listing branch from 1ae30bd to 11d364b Compare April 12, 2022 02:43
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @schomatis

I took it for a spin and worked as expected, but realized the UX was not the best – it was a hard cut-off, and it is better to show some items to users. Played a bit with different ways of communicating this and ended up with bit simpler solution:

  • we list up to Gateway.HTMLDirListingLimit items
  • if limit is reached or surpassed, we dont fetch any remaining blocks, only show an actionable message in header and footer
  • rebased this on top of refactor(gw/dir-index-html): remove gobindata #8872 (which removed the bindata from assets, making our code way simpler)

Lmk your thoughts on this.

Demo

$ ipfs config --json Gateway.HTMLDirListingLimit 2

Opening /ipfs/bafybeihfg3d7rdltd43u3tfvncx7n5loqofbsobojcadtmokrljfthuc7y

Screen Shot 2022-04-12 at 04 13 04

Confirmed we only fetch small number of blocks by opening dir listing on an empty repo, and then shutting down the daemon. Then, in offline mode, you can see only parent and the first few blocks were fetched:

$  ipfs ls -s --size=false --resolve-type=false bafybeihfg3d7rdltd43u3tfvncx7n5loqofbsobojcadtmokrljfthuc7y | wc -l
1864
$ ipfs ls -s bafybeihfg3d7rdltd43u3tfvncx7n5loqofbsobojcadtmokrljfthuc7y | wc -l
3
$ ipfs ls -s bafybeihfg3d7rdltd43u3tfvncx7n5loqofbsobojcadtmokrljfthuc7y
QmbQDovX7wRe9ek7u6QXe9zgCXkTzoUSsTFJEkrYV1HrVR -         1 - Barrel - Part 1/
QmdC5Hav9zdn2iS75reafXBq1PH4EnqUmoxwoxkS5QtuME -         10 - Pi Equals/
QmcyyLvDzCrduuvGVUQEh1DzFvM7UWGfc9sUg87PjjYCw7 -         100 - Family Circus/
Error: ipld: could not find Qmd8NDeJhzf614FSBxZwu4QD2Az14tQtJhQXJf8h4fqiSx

core/coreapi/unixfs.go Show resolved Hide resolved
core/corehttp/gateway.go Outdated Show resolved Hide resolved
@lidel lidel changed the title fix(core/gateway): option to limit directory size listing feat(gateway): Gateway.HTMLDirListingLimit Apr 12, 2022
@lidel lidel requested a review from alanshaw April 12, 2022 03:17
@lidel lidel added this to the go-ipfs 0.13 milestone Apr 12, 2022
@lidel
Copy link
Member

lidel commented Apr 12, 2022

Note from Stewards sync:

  • change this to always list all items (no cut-off) because we already have the root block
    • back to Ls() code we had before
    • lower limit to 100 or 1000
  • when item list is bigger than the limit, skip size / type check + display header/footer informing the dir was too big to fetch details for all children

@lidel lidel assigned lidel and unassigned schomatis Apr 12, 2022
@schomatis
Copy link
Contributor Author

From Discord thread: we're partially moving back to the Ls approach since

change this to always list all items (no cut-off) because we already have the root block

doesn't apply for the HAMT implementation (normally used for directories with many entries as in this use case).

see explainer in docs/config.md
@lidel lidel changed the title feat(gateway): Gateway.HTMLDirListingLimit feat(gateway): Gateway.FastDirIndexThreshold Apr 14, 2022
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the refactor and pushed the updated version:

  • Switched to using inexpensive dir listing via Ls() with ResolveChildren(false).
  • Sizes from child nodes are read (fetched) only if the directory is smaller than Gateway.FastDirIndexThreshold
    • Size column is not present in directories over Gateway.FastDirIndexThreshold.
    • Setting Gateway.FastDirIndexThreshold to 0 will always produce fast responses without fetching any child nodes, allowing gateway operators to decrease load even further.
      • Personally, I would set that as the default, and always return fast inexpensive result, but was worried people will complain about missing item sizes.
    • I removed footer and header because they just felt like noise.

Demo

Screen Shot 2022-04-14 at 23 54 55

You can test with bafybeiggvykl7skb2ndlmacg2k5modvudocffxjesexlod2pfvg5yhwrqm (has 10k items, loads fast)

Feedback appreciated.

core/corehttp/gateway.go Show resolved Hide resolved
docs/config.md Outdated Show resolved Hide resolved
@Jorropo Jorropo self-requested a review April 15, 2022 18:45
core/corehttp/gateway_handler.go Outdated Show resolved Hide resolved
@lidel lidel requested a review from Jorropo April 15, 2022 19:04
@BigLep BigLep linked an issue Apr 18, 2022 that may be closed by this pull request
config/gateway.go Outdated Show resolved Hide resolved
config/gateway.go Show resolved Hide resolved
docs/config.md Outdated Show resolved Hide resolved
docs/config.md Outdated Show resolved Hide resolved
core/corehttp/gateway_handler_unixfs_dir.go Show resolved Hide resolved
core/corehttp/gateway_handler_unixfs_dir.go Show resolved Hide resolved
Co-authored-by: Alan Shaw <alan.shaw@protocol.ai>
@BigLep
Copy link
Contributor

BigLep commented Apr 21, 2022

2022-04-21: per verbal, @Jorropo is going to do a quick check and approve/merge.

@BigLep BigLep mentioned this pull request Apr 26, 2022
65 tasks
Copy link
Contributor

@Jorropo Jorropo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Jorropo Jorropo merged commit 25cc85f into master Apr 28, 2022
@Jorropo Jorropo deleted the schomatis/fix/gw/restrict-listing branch April 28, 2022 17:36
hacdias pushed a commit to ipfs/boxo that referenced this pull request Jan 27, 2023
* fix(core/gateway): option to limit directory size listing

* feat(gw): HTMLDirListingLimit

This is alternative take on the way we limit the HTML listing output.
Instead of a hard cut-off, we list up to HTMLDirListingLimit.
When a directory has more items than HTMLDirListingLimit we show
additional header and footer informing user that only $HTMLDirListingLimit
items are listed. This is a better UX.

* fix: 0 disables Gateway.HTMLDirListingLimit

* refactor: Gateway.FastDirIndexThreshold

see explainer in docs/config.md

* refactor: prealoc slices

* docs: Gateway.FastDirIndexThreshold

* refactor: core/corehttp/gateway_handler.go

ipfs/kubo#8853 (comment)

* docs: apply suggestions from code review

Co-authored-by: Alan Shaw <alan.shaw@protocol.ai>

Co-authored-by: Marcin Rataj <lidel@lidel.org>
Co-authored-by: Alan Shaw <alan.shaw@protocol.ai>

This commit was moved from ipfs/kubo@25cc85f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Faster gateway directory listings
6 participants