Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: support tree, suffix, and parameters (RFC 6838/6839 and others) #67

Open
maxlinc opened this issue Sep 10, 2014 · 8 comments
Labels
Milestone

Comments

@maxlinc
Copy link

maxlinc commented Sep 10, 2014

mime-types was "built to conform to the MIME types of RFCs 2045 and 2231". RFC 2045 is itself composed of many other RFCs, some of which have been obsoleted or updated. For example, it refers to RFC 2048 - Multipurpose Internet Mail Extensions (MIME) Part Four: Registration Procedures (which defined the vendor tree). That RFC was obsoleted by RFC 4288 and 4289. RFC 4288 in turn was obsoleted by RFC 6838.

In short, too many RFCs to keep track of, but Wikipedia's summary is pretty good.

These newer RFCs have introduced or standardized three important concepts - tree, suffix, and parameters. The structure of a mime-type name is:
top-level type name / [ tree. ] subtype name [ +suffix ] [ ; parameters ]

These are concepts are commonly used by modern applications. Parameters are often used to define charsets or codecs for videos:

text/plain; charset=utf-8
video/mp4; codecs="avc1.640028"

Suffix is used to indicate an underlying structure or container format. The following suffixes are registered: +xml, +json, +ber, +der, +fastinfoset, +wbxml, +zip, and +cbor. Some examples include SVG images or Atom feeds, which are their own registered format but use XML as the underlying structure:

image/svg+xml
application/atom+xml

The vendor tree is commonly used by RESTful services, especially in combination with a +suffix. GitHub APIs, for example, return the following mime-types:

application/json
application/vnd.github+json
application/vnd.github.v3+json
application/vnd.github.v3.raw+json
application/vnd.github.v3.text+json
application/vnd.github.v3.html+json
application/vnd.github.v3.full+json
application/vnd.github.v3.diff
application/vnd.github.v3.patch

It would be nice if mime-types supported these concepts. I'm not sending a PR yet because exactly what "support" looks like probably requires some discussion. I think the tree is simple and would just be an (optional) part, like sub_type or media_type. The suffix concept may also be similar, though I think it'd be useful if it was used for inheritable default values (e.g. if MIME::Types['application/vnd.github+json+json'] returned an unregistered type based on application/json, rather than returning nothing). The parameters concept is probably the one that needs the most thought, because right now they're ignored during lookup but no while creating types:

MIME::Type.new('text/plain; charset=utf-8') == MIME::Type.new('text/plain; charset=ascii')
# => false
MIME::Types['text/plain; charset=utf-8'] == MIME::Types['text/plain; charset=ascii']
# => true
@halostatue
Copy link
Member

I'll need to think about this some, and I agree with the concepts.

@maxlinc
Copy link
Author

maxlinc commented Sep 16, 2014

Perfect. I just wanted to get someone thinking about it.

I don't have many specific use-cases in mind, mostly because I'm not sure how mime-types is used by most projects. I do have one use-case in mind, though: selecting a parser, serializer or formatter for a MIME::Type.

Grape, for example, uses the mime type to select a serializer or a parser. A similar selection mechanism could be used in middleware for things like formatting (e.g. pretty-printing JSON) or linting (e.g. JSONLint).

(Note: I started thinking about this while working on a code generator from http://swagger.io/ to Grape, not on Grape itself)

Typically it isn't necessary to distinguish between "application/vnd.github.v3.text+json" and "application/vnd.github.v3.html+json" for this use-case. The service itself may need to know the difference (to return a different object or use a different query), but in most cases it's only necessary to know the "underlying structure or container format" - json - so you can use appropriate methods like to_json or JSON.parse. In that case it's enough to know that to know that either the media_type or suffix is "json".

@halostatue
Copy link
Member

Deferring this to post-3.0; I think we have the features we need to be able to support this, but I don’t know what the API is going to look like.

@halostatue halostatue modified the milestones: 3.0, Future Nov 21, 2015
@bf4
Copy link

bf4 commented Apr 5, 2016

@halostatue (hi!) Came across this issue looking for suffix support for examples related to json-api/json-api#1020 :)

@halostatue
Copy link
Member

I use HAL personally, but I still want to support this in the future.

@ioquatix
Copy link

ioquatix commented Oct 9, 2016

If you are interested, there is a media type parser here: https://github.com/ioquatix/http-accept/blob/master/lib/http/accept/media_types.rb which conforms to rfc7231

I don't know if this is really appropriate for this library. Let's face it, there are hundreds of ways to compare content types and media ranges. It's not something that can be easily standardised in a way that works for everyone. IT might be best just to provide a library (or use the one above for example) to do the parsing and implement application specific logic where it makes sense.

@Nakilon
Copy link

Nakilon commented Aug 19, 2022

there is a media type parser here

That's exactly what I was looking for. I want to figure out the input html charset to then parse it again but with proper encoding.

require "http/accept"
encoding = HTTP::Accept::MediaTypes.parse(
  Oga.parse_html(input.encode "utf-8", undef: :replace).
      at_css("[http-equiv='Content-Type']")["content"]
)[0].parameters.fetch("charset")
# => "windows-1251"
input.force_encoding Encoding.find encoding
html = Oga.parse_html input.encode "utf-8"

@ioquatix
Copy link

Awesome! That code was written a long time ago, I'm glad it's still useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants