-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Associative list implementation for headers #747
Conversation
Many thanks for this PR, and in particular for all the cleanups to header.mli along with your changes. A few quick questions:
|
I would like to re-contextualize this part of CoHTTP when several recent PRs are already merged:
So I follow @avsm about the migration process when we currently don't have a good test suit to check if we break something or not (as highlighted by #701). |
Very good point. One option would be to move the existing implementation to Cohttp.Header.V1, add Cohttp.Header.V2, alias Cohttp.Header to Cohttp.Header.V1 and then let users use the new V2 implementation and gradually migrate. In a few releases, we can add a deprecation warning to Cohttp.Header.V1 and remove that code. |
The option of having V1 and V2 sounds like a good idea. Gives time to people to migrate and we can do it gradually without worrying about subtle semantics changes. I think we should try to aim to get this in the next release |
Instead of complexing the API (with V1 and V2) can't we just make sure we get the "correct" API? :-) The changes seems like a good improvement to me and they come with proper coverage testing. I guess the only missing thing is a CHANGE entry describing the semantic changes (and if these changes are too big, maybe we can try to rename some functions to avoid breakages). |
Thanks for all your comments. I will add notes in the CHANGE entry and comments about related RFCs. I'm working on adding fuzzing tests as it seems quite appropriate for this module and would enable to get a proper test suite to describe and maintain the important invariants (especially the ones linked to RFCs). I am also writing a short guide to explain the differences with the previous implementation and how to use the new functions (for example, I don't really have a strong opinion about the V1/V2 idea. If a major release is expected soon, maybe we can just wait for it ? |
(I just made a force-pushed because I did not used rebase the first time. Sorry !) |
Is |
@lyrm a big +1 for fuzz testing! I agree with @samoht that for the moment, if you can add functions that do not change the semantics, we can migrate much more easily to a new major version. The sequence could be:
With this scheme, we shouldn't need a major revision bump in the short term, since new users can switch to V2 in their own time. |
If we are extra careful to not change the semantics of existing functions I don't think we need to add the V1/V2 complexity. If that's not possible, I agree that we should make sure users are aware of that semantic changes and breaking compilation of their program is the best move :-) |
The initial idea was to make the header module more predictable (no unrequested changes in header order for example) which is mainly done by changing the header type from a map to an associative list. However, it seemed also very appropriate to change the
So the issue I have with the previous There is an other issue related to the original implementation of let h = add (init ()) "accept" "text/*" in
let h = add h "accept" "application/xml in
get h "accept" was previously returning "application/xml, text/*". You get the same result if you create headers from parsing this:
Also, RFC 7230#section-3.2.2 says that order matters. It was corrected for the particular case of the As it is the There are no other important semantic changes except the (and there are other particular cases, like And obviously, some tests have been changed to match the fact that the previous implementation did not preserve headers order. So I am not sure on how to do the V1/V2 idea properly. |
I have two suggestions, I'd like to hear @lyrm, @avsm and @samoht opinion on this (and of any other interested person, maybe @hannesm?). It seems to me that the semantics change is there to fix an issue of the api and that the change of some types will also likely break old code. I am no longer sure it is worth going light and keeping two copies of the code and tests for V1 and V2, and try to adjust the code to support both. Since we will be releasing 3.1, the first new (available) major version. Why don't we just clearly emphasize the breaking changes in the release, leaving ample space to the issue of types and semantics in the headers module (we could even have a post-install message in this release for double safety), and ship only this new version, asking the users of the libraries to update code? If we want to wait to have the fuzzing before actually releasing this, why don't we release EDIT: made clearer that I am suggesting to ship only this new version |
Since I got mentioned here, my opinion is that versioning is to be done on the opam package level, not on the module / API level. I do not think that two (multiple) API V1/V2 solve anything or are convenient. In contrast, they imply more code and are rather confusing (when will V1 be removed (will it ever be)) - leading to more maintenance overhead and more complex code someone has to read and review. If there's no "versioned API", this means for existing clients of this library: they can move forward and adapt version 3, including the new API. For new clients: they are not confused about which version to use (documentation is easier to read and understand). |
I agree with Hannes on this one :-) V1/V2 could makes sense for long-lived APIs where we want clients to use both. Here I feel this is not needed if the changes are copiously documented. |
Thanks for all the extra discussion. The reason I pushed the V1/V2 suggestion is that it helps with the situation of having a version upgrade of cohttp silently (i.e. no build failure) changing the semantics of a program. Consider if (for example) something adding a security-critical CORS header changed in some way because of this API change that caused it to not emit it. With V1/V2, there is a decision from the application author to move their application to the new behaviour that is independent of the cohttp version. It is indeed more heavyweight, but for good reason -- HTTP header handling is a pretty important of any non-trivial webserver. I'd be happy to approve this PR without V1/V2 if we could get a clear CHANGES entry in this PR that distills down @lyrm's comment above into a programmer-facing guide for what they need to check for in their code when migrating to this new interface. Then, we can run this CHANGES entry past (for example) @talex5 and the https://github.com/ocurrent/overview stack and just check that there are no surprises with upgrading. If it's the case that upgrading can be done cleanly by following the CHANGES entry (and I do now believe this to be the case thanks to @lyrm's clarifications above), I am fine to move ahead with the improvements in this PR and get them into a release soon. |
To avoid "silently changing semantics underneath", another option is to rename all the functions or the module (plus the mentioned CHANGES entry), so nobody will accidentally use the new semantics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OCurrent does very little with headers, so probably isn't much of a test.
Looking at the diff, it's a bit hard to see exactly what changed here. It seems concerning that the unit-tests had to be changed so much!
I wonder if we should just expose the list type directly and let users manage the headers using OCaml's List
module. e.g. it's not clear from the name whether add
adds to the end of the headers (which sounds like it would be slow), or prepends (which is fast, but maybe unexpected). But it would be clear with something like:
let headers = Header.[
v "content-type" "text/plain";
v "accept" "foo";
]
…ction to get the same effect than previous [get] function (on maps) and correct all get_[some header] functions accordingly.
…s to match the new implementation.
… by upgrade/upgrade_all functions.
I finally pushed the fuzzing tests as well as the change log and some improvements in the header module doc. |
Thanks @lyrm ! By looking at the API again, I am wondering if that could work if we turn headers into an heterogeneous list with typed keys to know if an entry is a list of string of just a string. Are the header entries typed in the RFCs? Would that make sense to ensure it in the header module? The goal would be to only have one val content_range : int64 key
val media_type : string
val connection_close : bool
val get: t -> 'a key -> 'a No idea if this works and if this is a good idea but if we change the Header API maybe we could consider adding more types :-) |
Fantastic! @lyrm if you run |
This sounds like a very nice idea and it likely also make it nicer, later on, to address #726 EDIT: Keep in mind, though, that the header fields are practically infinitely extensible:
So if we go in that direction, which in principle seems really nice, we will have to take that into account and make sure to have a In principle we could try to auto-generate most of them from the list at https://www.iana.org/assignments/message-headers/message-headers.xhtml and if we have a nice library for parsing the various types it could become type-safe headers all the way, but seems like a massive work. |
For a concrete example of such thing, If you want to go to this way, it requires more (orthogonal) works:
It requires more work about the parser of fields and may be it's the good time to:
And post-process values with associated parser (the |
update_test (); | ||
update_all_test (); | ||
clean_dup_test (); | ||
() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This module is fantastic. I am going to point here people that want to get an idea on how to use crowbar!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
cohttp/test/unitary_test_header.ml
Outdated
] | ||
H.(to_list (map (fun _k v -> v ^ a) prebuilt)) | ||
|
||
let fold_tests () = () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we create an issue tracking the implementation of these two tests? Do you have in mind something specific already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just added these missing tests.
Co-authored-by: Marcello Seri <mseri@users.noreply.github.com>
Manually merged on |
…wt-unix, cohttp-lwt-jsoo and cohttp-async (5.0.0) CHANGES: - Cohttp.Header: new implementation (lyrm mirage/ocaml-cohttp#747) + New implementation of Header modules using an associative list instead of a map, with one major semantic change (function ```get```, see below), and some new functions (```clean_dup```, ```get_multi_concat```) + More Alcotest tests as well as fuzzing tests for this particular module. ### Purpose The new header implementation uses an associative list instead of a map to represent headers and is focused on predictability and intuitivity: except for some specific and documented functions, the headers are always kept in transmission order, which makes debugging easier and is also important for [RFC7230§3.2.2](https://tools.ietf.org/html/rfc7230#section-3.2.2) that states that multiple values of a header must be kept in order. Also, to get an intuitive function behaviour, no extra work to enforce RFCs is done by the basic functions. For example, RFC7230§3.2.2 requires that a sender does not send multiple values for a non list-value header. This particular rule could require the ```Header.add``` function to remove previous values of non-list-value headers, which means some changes of the headers would be out of control of the user. With the current implementation, an user has to actively call dedicated functions to enforce such RFCs (here ```Header.clean_dup```). ### Semantic changes Two functions have a semantic change : ```get``` and ```update```. #### get ```get``` was previously doing more than just returns the value associated to a key; it was also checking if the searched header could have multiple values: if not, the last value associated to the header was returned; otherwise, all the associated values were concatenated and returned. This semantics does not match the global idea behind the new header implementation, and would also be very inefficient. + The new ```get``` function only returns the last value associated to the searched header. + ```get_multi_concat``` function has been added to get a result similar to the previous ```get``` function. #### update ```update``` is a pretty new function (mirage/ocaml-cohttp#703) and changes are minor and related to ```get``` semantic changes. + ```update h k f``` is now modifying only the last occurrences of the header ```k``` instead of all its occurrences. + a new function ```update_all``` function has been added and work on all the occurrences of the updated header. ### New functions : + ```clean_dup``` enables the user to clean headers that follows the {{:https://tools.ietf.org/html/rfc7230#section-3.2.2} RFC7230§3.2.2} (no duplicate, except ```set-cookie```) + ```get_multi_concat``` has been added to get a result similar to the previous ```get``` function. - Cohttp.Header: performance improvement (mseri, anuragsoni mirage/ocaml-cohttp#778) **Breaking** the headers are no-longer lowercased when parsed, the headers key comparison is case insensitive instead. - cohttp-lwt-unix: Adopt ocaml-conduit 5.0.0 (smorimoto mirage/ocaml-cohttp#787) **Breaking** `Conduit_lwt_unix.connect`'s `ctx` param type chaged from `ctx` to `ctx Lazy.t` - cohttp-mirage: fix deprecated fmt usage (tmcgilchrist mirage/ocaml-cohttp#783) - lwt_jsoo: Use logs for the warnings and document it (mseri mirage/ocaml-cohttp#776) - lwt: Use logs to warn users about leaked bodies and document it (mseri mirage/ocaml-cohttp#771) - lwt, lwt_unix: Improve use of logs and the documentation, fix bug in the Debug.enable_debug function (mseri mirage/ocaml-cohttp#772) - lwt_jsoo: Fix exception on connection errors in chrome (mefyl mirage/ocaml-cohttp#761) - lwt_jsoo: Fix `Lwt.wakeup_exn` `Invalid_arg` exception when a js stack overflow happens in the XHR completion handler (mefyl mirage/ocaml-cohttp#762). - lwt_jsoo: Add test suite (mefyl mirage/ocaml-cohttp#764).
…wt-unix, cohttp-lwt-jsoo and cohttp-async (5.0.0) CHANGES: - Cohttp.Header: new implementation (lyrm mirage/ocaml-cohttp#747) + New implementation of Header modules using an associative list instead of a map, with one major semantic change (function ```get```, see below), and some new functions (```clean_dup```, ```get_multi_concat```) + More Alcotest tests as well as fuzzing tests for this particular module. ### Purpose The new header implementation uses an associative list instead of a map to represent headers and is focused on predictability and intuitivity: except for some specific and documented functions, the headers are always kept in transmission order, which makes debugging easier and is also important for [RFC7230§3.2.2](https://tools.ietf.org/html/rfc7230#section-3.2.2) that states that multiple values of a header must be kept in order. Also, to get an intuitive function behaviour, no extra work to enforce RFCs is done by the basic functions. For example, RFC7230§3.2.2 requires that a sender does not send multiple values for a non list-value header. This particular rule could require the ```Header.add``` function to remove previous values of non-list-value headers, which means some changes of the headers would be out of control of the user. With the current implementation, an user has to actively call dedicated functions to enforce such RFCs (here ```Header.clean_dup```). ### Semantic changes Two functions have a semantic change : ```get``` and ```update```. #### get ```get``` was previously doing more than just returns the value associated to a key; it was also checking if the searched header could have multiple values: if not, the last value associated to the header was returned; otherwise, all the associated values were concatenated and returned. This semantics does not match the global idea behind the new header implementation, and would also be very inefficient. + The new ```get``` function only returns the last value associated to the searched header. + ```get_multi_concat``` function has been added to get a result similar to the previous ```get``` function. #### update ```update``` is a pretty new function (mirage/ocaml-cohttp#703) and changes are minor and related to ```get``` semantic changes. + ```update h k f``` is now modifying only the last occurrences of the header ```k``` instead of all its occurrences. + a new function ```update_all``` function has been added and work on all the occurrences of the updated header. ### New functions : + ```clean_dup``` enables the user to clean headers that follows the {{:https://tools.ietf.org/html/rfc7230#section-3.2.2} RFC7230§3.2.2} (no duplicate, except ```set-cookie```) + ```get_multi_concat``` has been added to get a result similar to the previous ```get``` function. - Cohttp.Header: performance improvement (mseri, anuragsoni mirage/ocaml-cohttp#778) **Breaking** the headers are no-longer lowercased when parsed, the headers key comparison is case insensitive instead. - cohttp-lwt-unix: Adopt ocaml-conduit 5.0.0 (smorimoto mirage/ocaml-cohttp#787) **Breaking** `Conduit_lwt_unix.connect`'s `ctx` param type chaged from `ctx` to `ctx Lazy.t` - cohttp-mirage: fix deprecated fmt usage (tmcgilchrist mirage/ocaml-cohttp#783) - lwt_jsoo: Use logs for the warnings and document it (mseri mirage/ocaml-cohttp#776) - lwt: Use logs to warn users about leaked bodies and document it (mseri mirage/ocaml-cohttp#771) - lwt, lwt_unix: Improve use of logs and the documentation, fix bug in the Debug.enable_debug function (mseri mirage/ocaml-cohttp#772) - lwt_jsoo: Fix exception on connection errors in chrome (mefyl mirage/ocaml-cohttp#761) - lwt_jsoo: Fix `Lwt.wakeup_exn` `Invalid_arg` exception when a js stack overflow happens in the XHR completion handler (mefyl mirage/ocaml-cohttp#762). - lwt_jsoo: Add test suite (mefyl mirage/ocaml-cohttp#764).
…wt-unix, cohttp-lwt-jsoo and cohttp-async (5.0.0) CHANGES: - Cohttp.Header: new implementation (lyrm mirage/ocaml-cohttp#747) + New implementation of Header modules using an associative list instead of a map, with one major semantic change (function ```get```, see below), and some new functions (```clean_dup```, ```get_multi_concat```) + More Alcotest tests as well as fuzzing tests for this particular module. ### Purpose The new header implementation uses an associative list instead of a map to represent headers and is focused on predictability and intuitivity: except for some specific and documented functions, the headers are always kept in transmission order, which makes debugging easier and is also important for [RFC7230§3.2.2](https://tools.ietf.org/html/rfc7230#section-3.2.2) that states that multiple values of a header must be kept in order. Also, to get an intuitive function behaviour, no extra work to enforce RFCs is done by the basic functions. For example, RFC7230§3.2.2 requires that a sender does not send multiple values for a non list-value header. This particular rule could require the ```Header.add``` function to remove previous values of non-list-value headers, which means some changes of the headers would be out of control of the user. With the current implementation, an user has to actively call dedicated functions to enforce such RFCs (here ```Header.clean_dup```). ### Semantic changes Two functions have a semantic change : ```get``` and ```update```. #### get ```get``` was previously doing more than just returns the value associated to a key; it was also checking if the searched header could have multiple values: if not, the last value associated to the header was returned; otherwise, all the associated values were concatenated and returned. This semantics does not match the global idea behind the new header implementation, and would also be very inefficient. + The new ```get``` function only returns the last value associated to the searched header. + ```get_multi_concat``` function has been added to get a result similar to the previous ```get``` function. #### update ```update``` is a pretty new function (mirage/ocaml-cohttp#703) and changes are minor and related to ```get``` semantic changes. + ```update h k f``` is now modifying only the last occurrences of the header ```k``` instead of all its occurrences. + a new function ```update_all``` function has been added and work on all the occurrences of the updated header. ### New functions : + ```clean_dup``` enables the user to clean headers that follows the {{:https://tools.ietf.org/html/rfc7230#section-3.2.2} RFC7230§3.2.2} (no duplicate, except ```set-cookie```) + ```get_multi_concat``` has been added to get a result similar to the previous ```get``` function. - Cohttp.Header: performance improvement (mseri, anuragsoni mirage/ocaml-cohttp#778) **Breaking** the headers are no-longer lowercased when parsed, the headers key comparison is case insensitive instead. - cohttp-lwt-unix: Adopt ocaml-conduit 5.0.0 (smorimoto mirage/ocaml-cohttp#787) **Breaking** `Conduit_lwt_unix.connect`'s `ctx` param type chaged from `ctx` to `ctx Lazy.t` - cohttp-mirage: fix deprecated fmt usage (tmcgilchrist mirage/ocaml-cohttp#783) - lwt_jsoo: Use logs for the warnings and document it (mseri mirage/ocaml-cohttp#776) - lwt: Use logs to warn users about leaked bodies and document it (mseri mirage/ocaml-cohttp#771) - lwt, lwt_unix: Improve use of logs and the documentation, fix bug in the Debug.enable_debug function (mseri mirage/ocaml-cohttp#772) - lwt_jsoo: Fix exception on connection errors in chrome (mefyl mirage/ocaml-cohttp#761) - lwt_jsoo: Fix `Lwt.wakeup_exn` `Invalid_arg` exception when a js stack overflow happens in the XHR completion handler (mefyl mirage/ocaml-cohttp#762). - lwt_jsoo: Add test suite (mefyl mirage/ocaml-cohttp#764).
Purpose
Right now, headers are represented by a map, which is good to automatically insure certain RFCs properties but is also quite counterintuitive when debugging as it changes the headers order and enforces RFCs without enabling the user to control it.
The purpose of this PR is:
Header as an associative list
Main changes
Implementing the header with an associative list basically enables to keep the order in which the header are transmitted and so to maintain the invariant
to_list (of_list h) = h
.That means:
add
function does neither remove nor replace any headers;get
function only retrieves the last transmitted value. To get all values associated to a key, the functionsget_multi
orget_multi_concat
must be used;map
anditer
types have changed to match the new header type;Update function
There are two functions
update
andupdate_last
now. Basicallyupdate h k f
callsf
on the concatenated values associated withk
and affects all occurrences ofk
(meaning, for example, all are removed if the value returns byf
isNone
).In opposition,
update_last
works with the last value associated tok
and affects only this value.New functions
get_multi_concat
concatenates all values associated to a key. The~list_value_only
optional argument enables to get the last value only if the searched header is not a list-value header and all the values otherwise. Though,get ~list_value_only:true
enables to get the same result that the previousget
function.clean_dup
enables to remove unauthorized duplicates and concatenated values of list-value headers. In other word, it does the same kind of clean up that a map automatically does but need an active call by the user and preserved order of transmission.Related changes
Cohttp/test/header_test.ml
andCohttp_lwt_unix/test/test_parser.ml
: some tests have been added and previous ones have been correcting accordingly to the new implementation.Cohttp_lwt_jsoo.ml
andCohttp_curl_async.ml
: the few changes are due toiter
andmap
new types.Removing and factoring header modifications
WIP