Optimize retrieval from Filecoin #404

gammazero · 2023-10-25T19:03:49Z

Optimize retrieval so that when requested retrieval ranges do not align with singularity file ranges, only the minimal number of retrieval requests are made.

This is accomplished by creating a separate reader for each singularity file range. For reads that are larger than a range, multiple ranges are read until the read request is satisfied or until all data is read. For reads smaller than the amount of data remaining in the range, the range reader is maintained so that it can continue to be read from by subsequent reads.

This approach associates a reader with each Singularity file range, and not the ranges requested via the API (in HTTP range header). This avoids needing to parse the range header in order to create readers where each reads some number of Singularity ranges. Rather, as arbitrary requested ranges are read, an existing reader for the corresponding singularity range(s) is reused if the requested range falls on a singularity range from a previous read. This also means that there is only a single retrieval for each singularity range, whereas if readers were associated with requested ranges then multiple readers could overlap the same singularity range and require multiple retrievals of the same range.

Fixes #366
Fixes filecoin-project/motion#143

As an optimization, only one singularity range reader is maintained at a time. This works because once a new singularity range is selected by the requested range read, then it is highly unlikely that a subsequent read request will fall on a
a singularity range that was already read from, previous to the new one.

Additional changes:

The filecoinReader implementation supports the io.WriteTo interface to allow direct copying to an io.Writer.
local and non-local readers can handle forward seeks within the current range without additional retrieval.
The FilecoinRetriever interface supports the RetrieveReader function that returns an io.ReadCloser to read data from.

Benchmark

Added a benchmark of filecoin retrievals to compare before and after optimization.

Benchmark uses a file composed of 4 sections, each 16Mib in size. The entire file is retrieved by requesting 1Mib chunks. The 1Mib reads are done through io.CopyN, which copies the data through a 32k buffer.

The non-optimized version does a retrieval for each buffer copy to copy the file data. The optimized version only does as many retrievals as there are independently retrievable sections of the file.

Before optimization:

goos: darwin
goarch: arm64
pkg: github.com/data-preservation-programs/singularity/handler/file
BenchmarkFilecoinRetrieve
    retrieve_test.go:403: Number of retrieve requests: 2048
    retrieve_test.go:403: Number of retrieve requests: 2048
    retrieve_test.go:403: Number of retrieve requests: 2048
BenchmarkFilecoinRetrieve-10                   4         328260781 ns/op

After optimization: (PR #404)

goos: darwin
goarch: arm64
pkg: github.com/data-preservation-programs/singularity/handler/file
BenchmarkFilecoinRetrieve
    retrieve_test.go:692: Number of retrieve requests: 4
    retrieve_test.go:692: Number of retrieve requests: 4
    retrieve_test.go:692: Number of retrieve requests: 4
BenchmarkFilecoinRetrieve-10                  27          37292077 ns/op

Optimize retrieval so that when requested retrieval ranges do not align with singularity file ranges, only the minimal number of retrieval requests are made. This is accomplished by creating a separate reader for each singularity file range. For reads that are larger than a range, multiple ranges are read until the read request is satisfied or until all data is read. For reads smaller than the amount of data remaining in the range, the range reader is maintained so that it can continue to be read from by subsequent reads. This approach associates a reader with each Singularity file range, and not the ranges requested via the API (in HTTP range header). This avoids needing to parse the range header in order to set of a reader that reads some number of Singularity ranges. Rather, as arbitrary requested ranges are read, an existing reader for a range is reused if the requested range falls on a singularity range from a previous read. This also means that there is only a single retrieval for each singularity range, whereas if readers were associated with requested ranges then multiple readers could overlap the same singularity range and require multiple retrievals of the same range. Fixes #366 Additional changes: - The filecoinReader implementation supports the WriteTo interface to allow direct copying to an io.Wwriter. - local and non-local readers can handle forward seek within current range without additional retrieval. - The FilecoinRetriever interface supports the RetrieveReader function that returns an io.ReadCloser to read data from.

codecov · 2023-10-25T19:14:03Z

Codecov Report

Attention: 60 lines in your changes are missing coverage. Please review.

Comparison is base (5742fa8) 73.94% compared to head (0bd67a0) 73.83%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #404      +/-   ##
==========================================
- Coverage   73.94%   73.83%   -0.12%     
==========================================
  Files         148      149       +1     
  Lines        9669     9813     +144     
==========================================
+ Hits         7150     7245      +95     
- Misses       1776     1815      +39     
- Partials      743      753      +10

Files	Coverage Δ
handler/file/interface.go	`100.00% <ø> (ø)`
handler/file/range_reader.go	`78.37% <78.37%> (ø)`
storagesystem/util.go	`79.22% <61.29%> (+0.27%)`	⬆️
retriever/retriever.go	`55.71% <0.00%> (-12.71%)`	⬇️
handler/file/retrieve.go	`62.68% <68.23%> (+7.21%)`	⬆️

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

handler/file/range_reader.go

storagesystem/util.go

handler/file/range_reader.go

handler/file/retrieve.go

rvagg · 2023-10-26T09:41:35Z

I think this is OK and solves the bulk of the issues at the singularity level, but unless I'm not looking at it right, you're going to end up having to do basically the same style of thing in motion. Over there, the singularity blob reader still only implements a basic io.ReadSeeker and motion uses http.ServeContent so it'll still end up doing multiple small reads through io.CopyN for each range requested. That'll end up here, but each of those reads will instantiate a new version of all of this and we're back to lots of smaller reads and no ability to take advantage of WriteTo.

So what to do about that? I think you're going to have to do something persistent in motion that hangs on to a RetrieveFile reference and is able to do a WriteTo over there as well. That could be possible, SingularityReader is held on to for the entirety of the operation so if it gets a WriteTo and can do this same juggling with a handle into singularity then we might solve this all.

But then after all that I guess you have the complexity question—is it better to solve this through the abstractions that let us continue to use http.ServeContent (and many other utilities and operations that assume standard io interfaces), or would it be simpler to parse the ranges and pass them down, returning a single reader that does it all? 🤷

Co-authored-by: Rod Vagg <rod@vagg.org>

Also add test to check the retrieve goroutines have exited.

xinaxu

Good PR for improving the retrieval performance except for a few things still need to be addressed.

I don't see io.WriterTo being used so I assume you're still serving io.ReadSeekCloser to http.ServeContent so that writeToN is called repeatedly for each io.Read()
If so, a major concern I'm seeing is to too much logic inside io.Read() -> writeToN().

My understanding of io.Reader is to have each Read() call to be very lightweight and have all heavyweight SQL query inside io.Seek() or the object constructor.

Below are some facts that I'm aware of that I would leverage

http.ServeContent seeks to the beginning and end to figure out the file size. After which, it will seek to the range start and only do Read() (see the code @rvagg refers to)
Singularity Files are splitted into FileRanges of 1GB large
Read() is usually called repeatedly with a fixed buffer size of 32K (see io.Copy)

My recommendation of how the ReadSeekCloser can be implemented is

still leverage http.ServeContent which does all the parsing of range header
The constructor of filecoinReader should lookup all file ranges and their corresponding storage providers. So that if there is no SPs serving a certain range of that file, we can early return the error status rather than aborting the HTTP response when it reaches that range. This also eliminates all database calls inside Read()
Seeking to the rangeStart will close all underlying io.Closer, e.g. lassie pipe. This looks inefficient, but given http.ServeContent will never seek again once Read() starts, this path will never be reached
Read() will open the underlying lassie pipe for that offset if it's not nil. We have a little overhead for the first Read() but after that it will be all sequential read. We don't need to optimize the forward seeking once read happens as http.ServeContent never seeks again after starting to read.
Avoid loop inside Read(), if a single Read() crosses the end boundary of a file range, simply set the underlying lassie pipe to nil, increment the file range index so the next Read() will create a new lassie pipe with the new file range.

storagesystem/util.go

handler/file/retrieve.go

gammazero · 2023-10-26T19:30:37Z

I'm against having an actual read happening inside Seek(). If we really need to add this optimization, I'd recommend adding a field seekMode which allows the caller to choose the behavior, so that ...

I agree, as that seems like the more correct/expected behavior. I have removed that handling of a seek within a read, as well as the corresponding handling here. Now the code does not continue reading from the same stream (pipe reader) following a Seek.

xinaxu

Discussed offline in a meeting. Looks good now. Waiting for some great benchmark result.

rvagg · 2023-10-27T01:47:48Z

Worth clarifying this point because I think it's kind of important to what's being achieved here:

This looks inefficient, but given http.ServeContent will never seek again once Read() starts, this path will never be reached

This is unfortunately not true because as long as we accept HTTP spec semantics of Range headers then we can get stacked ranges in the same query, and http.ServeContent will faithfully skip around the ranges as requested, seeking each time: https://cs.opensource.google/go/go/+/refs/tags/go1.21.3:src/net/http/fs.go;l=325-337

I hate this about the spec and would love to know what utility it has for general use, and specifically whether it's even going to be useful to Motion users. Perhaps we can't know this ahead of time and we just have to accept it. My inclination would be to ditch the Go Range handling entirely and just accept a singular range, rejecting requests with commas in the range request and then fix it later if we find a tool/user of motion that needs something more sophisticated.

But, we can also just go along with it, and I think the solution @gammazero has here gets at this problem.

gammazero added 2 commits October 23, 2023 06:59

implement WriteTo

1113a4f

gammazero requested review from rvagg and xinaxu October 25, 2023 19:04

gammazero added 3 commits October 25, 2023 12:52

fix lint warnings

3af40d2

add error check

b08ec7c

update comment

9270a12