Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add file modes for binary/text #107

Merged
merged 1 commit into from
Jul 22, 2024
Merged

Add file modes for binary/text #107

merged 1 commit into from
Jul 22, 2024

Conversation

dberenbaum
Copy link
Contributor

Added methods as suggested in #102 (comment).

I didn't change any of the existing API, and I don't know if we can drop having classes like TextFile and ImageFile because we need a default read mode for some operations. For example, if we have a multimodal dataset with both an image and text file joined in each row and want to pass this to pytorch, each file object needs to know whether to pass the contents to pytorch as a text or image.

Copy link

cloudflare-workers-and-pages bot commented Jul 20, 2024

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: c22e781
Status: ✅  Deploy successful!
Preview URL: https://2cba755f.datachain-documentation.pages.dev
Branch Preview URL: https://file-modes.datachain-documentation.pages.dev

View logs

source = IndexedFile(file=file, index=index)
yield [source, *record.values()]
index += 1
with tqdm(desc="Parsed by pyarrow", unit=" rows") as pbar:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated, but added a progressbar here so parsing tables shows some progress

@dberenbaum dberenbaum requested a review from skshetry July 20, 2024 15:08
Copy link

codecov bot commented Jul 20, 2024

The author of this PR, dberenbaum, is not an activated member of this organization on Codecov.
Please activate this user on Codecov to display this PR comment.
Coverage data is still being uploaded to Codecov.io for purposes of overall coverage calculations.
Please don't hesitate to email us at support@codecov.io with any questions.

@dberenbaum dberenbaum merged commit c44c3da into main Jul 22, 2024
19 checks passed
@dberenbaum dberenbaum deleted the file_modes branch July 22, 2024 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants