Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new feature: Automatically add Content-Type based on the extension in the path (or provide an API for automatic setting) #4902

Closed
1 task done
czy-29 opened this issue Jul 16, 2024 · 9 comments · Fixed by #4912
Labels
enhancement New feature or request

Comments

@czy-29
Copy link
Contributor

czy-29 commented Jul 16, 2024

Feature Description

If I understand correctly, in the current version of OpenDAL, no matter what file we upload to the object storage, its default Content-Type will always be application/octet-stream. If I want to modify this default behavior, I have to manually set the Content-Type myself using something like Operator::write_with. This is not very convenient, and it should not be the expected behavior. The expected Content-Type should automatically match the path extension. Of course, I can easily implement this capability at the application level using mime_guess myself, but I think it would be more convenient to sink this behavior into OpenDAL. Either change the default behavior of Operator::write directly to this, or add a method similar to content_type_guessed to FutureWrite. Of course, the specific API design can be discussed further, but I believe that no matter how we choose, adding this feature will make uploading files more convenient and intuitive.

Problem and Solution

Allow Operator::write to automatically set the corresponding Content-Type based on the file extension in the path, or add a method to FutureWrite to automatically set this.

Additional Context

No response

Are you willing to contribute to the development of this feature?

  • Yes, I am willing to contribute to the development of this feature.
@czy-29 czy-29 added the enhancement New feature or request label Jul 16, 2024
@Xuanwo
Copy link
Member

Xuanwo commented Jul 16, 2024

Thank you for submitting this pull request. I understand the feature request, but I don't think auto-detection is a good approach.

Perhaps we could implement a MimeGuessLayer that detects and sets the content_type when it's not specified by the user. In the MimeGuessLayer, we could detect the file extension or even read the magic bits to determine it based on user needs. This would be much more flexible and useful.

Users can use this layer like this:

let op = op.layer(MimeGuessLayer::default());

op.write("x.pdf", bs).await?; <-- content-type will be `pdf`.

What do you think?

@Xuanwo
Copy link
Member

Xuanwo commented Jul 16, 2024

This layer could do more: detect the content_type if storage service doesn't provide it. For example, local fs doesn't store the content_type for file. We can use this layer to detect it when user call stat or list:

let op = op.layer(MimeGuessLayer::default().with_abc()); <-- API name need design, but it's optional.

let meta = op.stat("x.pdf").await?; <-- `meta.content_type()` will be `pdf`.

let entries = op.list("path/to/dir").await?; <-- all entries will have correct content_type.

@czy-29
Copy link
Contributor Author

czy-29 commented Jul 16, 2024

Wow, that's a great idea! I think it's much better than my previous design, and it's indeed more useful and flexible! This design perfectly meets my expectations and needs, so I fully approve of it.

@Xuanwo
Copy link
Member

Xuanwo commented Jul 16, 2024

I see you've indicated a willingness to contribute. Given the recent design changes, are you still interested in implementing it?

@czy-29
Copy link
Contributor Author

czy-29 commented Jul 16, 2024

I am indeed willing to contribute, but I am still a novice in contributing to large open source projects and unsure if I can handle it well. Moreover, if we follow our newly designed concept, this feature would appear to require a significant amount of work, especially if we include the reading of file format magic headers (I'm not sure if there are any existing crates in the Rust community that can retrieve MIME information based on magic headers, and if not, we may need to create a large table).

@Xuanwo
Copy link
Member

Xuanwo commented Jul 16, 2024

especially if we include the reading of file format magic headers

This is optional. We can implement the basic mime_guess::from_path() first.

@czy-29
Copy link
Contributor Author

czy-29 commented Jul 16, 2024

Let me take some time to read the source code first, in order to better understand the overall architecture of Layer‘s implementation.

@czy-29
Copy link
Contributor Author

czy-29 commented Jul 17, 2024

I have finished reading the code related to Layer today, and I am not sure if I understand it correctly. Before implementing it, I would like to confirm with you that the following approach is correct:

Firstly, MimeGuessLayer should implement Layer<A> and return an internal type that implements LayeredAccess (such as MimeGuessAccessor<A>).
Then, in the LayeredAccess implementation of MimeGuessAccessor<A>, each associated type can directly use the associated type corresponding to Inner (actually type A).
Then, for all required methods except for write, simply call the corresponding method of self.inner() to implement it.
For write, first judge whether args.content_type (args is OpWrite type) is None. If yes, call args.with_content_type, and then call self.inner().write with the new args.
Among the remaining methods that provide default implementations, only stat needs to be implemented. The implementation of stat is to first call self.inner().stat, and then for the returned RpStat, call map_metadata. Then, the same process is used to determine whether the content type is None. If it is, call with_content_type and add the new content type and return.
Finally, there should be no need to add new logic to list, as I have read in the document that when calling list, if the user requests content type, it will automatically call stat internally to retrieve it. Therefore, as long as we implement this logic on stat, users will automatically have the ability to obtain the content type when calling list.

I don't know if my understanding above is correct?

@Xuanwo
Copy link
Member

Xuanwo commented Jul 18, 2024

Looks great, let's go!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants