Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Layer some new abstractions on top of, and alongside, the existing Streaming storage APIs.
These changes are made use of by Delta/Parquet streaming. Breaking them out into their own focused PR here for everyone's reviewing sanity.
There is a general trend or purpose here to move some of the complex low-level file management out of StreamingDataset and into the storage APIs, simplifying the core StreamingDataset logic where we can.
Also wrapped some behavior that was sus for my purposes, but I do not pretend to know all the ways the storage APIs will be used, now and in the future.
Also
wait_for_file_to_exist
has a new home in this PR.If we are able to "stream"-line or otherwise better organize this code later, all the better.
For now, let's keep going toward Delta...