Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define several tricky cases for encoding based on spec. #456

Closed
wants to merge 18 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions testcases/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Test cases for CloudEvents encoding

These cases provide a set of "difficult" or edge-case encodings of valid
CloudEvents, for use in testing various CloudEvents implementations. The cases
cover transformation between a [canonical JSON object](../json-format.md) and a
transport-specific output. Test cases are organized by common prefix with the
following suffixes denoting different transports:

| Suffix | Encoding |
| ----------- | ------------------------------------------------------- |
| `json` | [JSON event](../json-format.md) |
| `http` | [HTTP binary request](../http-transport-binding.md) |
| `http-json` | [HTTP structured request](../http-transport-binding.md) |
| `mqtt` | [MQTT binary publish](../mqtt-transport-binding.md) |
| `mqtt-json` | [MQTT structured publish](../mqtt-transport-binding.md) |
| `amqp` | [AMQP message](../amqp-transport-binding.md) |

If multiple files exist with the same input prefix, they all represent the same
CloudEvent rendered across the different transports. For example, given the
files:

- `unicode-input.json`
- `unicode-input.http`
- `unicode-input.mqtt`
- `binary-date.json`
- `binary-date.amqp`

These form two sets of tests: the `unicode-input` case is defined for JSON,
HTTP, and MQTT, and the `binary-date` case is defined for QSON and AMQP.

Test cases which describe particularly unexpected formats should include
comments in the JSON document using Javascript comment format (`//` or
`/* .. */`).

<!-- TODO: translation from batch to multiple individual requests.

What are the semantics if a single message in a JSON batch is incorrect?
- Missing required fields in one array element?
- Incorrect field types / values in another element?
-->
11 changes: 11 additions & 0 deletions testcases/contenttype.http
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
POST / HTTP/1.1
Host: handler.example.com
Content-Type: application/xml
Content-Length: 24
CE-specversion: 0.3
evankanderson marked this conversation as resolved.
Show resolved Hide resolved
CE-id: 123
CE-source: https://cloudevents.io/contenttype
CE-type: io.cloudevetns.contenttype-test
evankanderson marked this conversation as resolved.
Show resolved Hide resolved
CE-datacontentencoding: Base64

PG11Y2ggd293PSJ4bWwiLz4=
12 changes: 12 additions & 0 deletions testcases/contenttype.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{
"specversion": "0.3",
"id": "123",

"source": "https://cloudevents.io/contenttype",
"type": "io.cloudevents.contenttype-test",

"datacontentencoding": "Base64",
"datacontenttype": "application/xml",

"data": "PG11Y2ggd293PSJ4bWwiLz4=", // <much wow="xml"/>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove trailing ,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
12 changes: 12 additions & 0 deletions testcases/contenttype.mqtt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
------------------ PUBLISH ---------------
Topic Name: mytopic
Content Type: application/xml
--------------- User Properties ----------
specversion: 0.3
id: 123
source: https://cloudevents.io/contenttype
type: io.cloudevents.contenttype-test
datacontentencoding: Base64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clemensv is there an MQTT equivalent property for datacontentencoding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Searching for "encoding" in https://docs.oasis-open.org/mqtt/mqtt/v5.0/os/mqtt-v5.0-os.html produces 25 hits, all of which cover UTF-8 or property encoding, rather than payload encoding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duglin No, there is no equivalent in MQTT. For MQTT 3.1.x, you even need to know the datacontenttype apriori as a convention on the topic. This particular field missing is not an issue, though, because the body is always a byte sequence.

------------------ payload ---------------
PG11Y2ggd293PSJ4bWwiLz4=
------------------------------------------
11 changes: 11 additions & 0 deletions testcases/extensions-map.http
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
POST / HTTP/1.1
Host: handler.example.com
Content-Length: 0
CE-specversion: 0.3
CE-id: 123
CE-source: https://cloudevents.io/extensions-map
CE-type: io.cloudevents.extensions-map-test
CE-this0is0ok-\: ok
CE-this0is0ok-": {\"u005c": 17}
Content-Type: application/xml

17 changes: 17 additions & 0 deletions testcases/extensions-map.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
"specversion": "0.3",
"id": "123",

"source": "https://cloudevents.io/extensions-map",
"type": "io.cloudevents.extensions-map-test",

/***
* EXTENSIONS, with a map value
*/
"this0is0ok": {
"\\": "ok",
"\"": {
"\u005c": "17"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you show 17 as a string here but as a number in testcases/extensions-map.http - should they match?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integers and Strings have this problem for unknown extension schemas, I believe.

I.e. given an HTTP header:

CE-Foobar: 17

Does this render as:

{
  ...
  "foobar": 17
}

or

{
  ...
  "foobar": "17"
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's true - I think it'll show up as a string though since I think unknown attributes are kept as string. But, was that the point of the test? I only commented on it because I thought you were trying to be consistent - if not then never mind.

}
}
}
11 changes: 11 additions & 0 deletions testcases/extensions-map.mqtt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
------------------ PUBLISH ---------------
Topic Name: mytopic
Content Type:
--------------- User Properties ----------
specversion: 0.3
id: 123
source: https://cloudevents.io/extensions-map
type: io.cloudevents.extensions-map-test
this0is0ok: {"\\":"ok","\"":{"\\": "17"}}
------------------ payload ---------------
------------------------------------------
9 changes: 9 additions & 0 deletions testcases/unicode-strings.http
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
POST / HTTP/1.1
Host: handler.example.com
Content-Length: 0
CE-specversion: 0.3
CE-id: %22%F0%9F%98%89%03%3c.is%2BFine%E7%81%AB%22
CE-source: https://cloudevents.io/unicode-strings
CE-type: %22%F0%9F%98%89%03%3c.is%2BFine%E7%81%AB%22
CE-subject: %22%F0%9F%98%89%03%3c.is%2BFine%E7%81%AB%22

22 changes: 22 additions & 0 deletions testcases/unicode-strings.http-json
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
POST / HTTP/1.1
Host: handler.example.com
Content-Length: 517
Content-Type: applications/cloudevents+json; charset=utf-8

{
"specversion": "0.3",
"id": "\"😉\u033c.is+Fine火\u0022", // \u22\u1F609\u033c\u2e\u69\u73\u2b\u46\u69\u6e\u65\u706b\u22
"source": "", // relative-ref -> relative-part -> path-empty
"type": "\"😉\u033c.is+Fine火\u0022",
"datacontentencoding": "Base64",
"datacontenttype": "application/xml",
"schemaurl": "%2f%2f%anonymous@example.com/a&b;?x'=%2f//#//",
"subject": "\"😉\u033c.is+Fine火\u0022",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove trailing ,

"data": "PG11Y2ggd293PSJ4bWwiLz4=",
evankanderson marked this conversation as resolved.
Show resolved Hide resolved
"this0is0ok": {
"\\": "ok",
"\"": {
"\u005c": "good,too"
}
}
}
16 changes: 16 additions & 0 deletions testcases/unicode-strings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"specversion": "0.3",
// "Printable" might or might not include " ", but probably includes
// characters that produce at least some visible output
// Go defines this as:
// > Such characters include letters, marks, numbers, punctuation, symbols,
// > and spaces, from categories L, M, N, P, S, Zs.
// https://golang.org/pkg/unicode/#IsGraphic
"id": "\"😉\u033c.is+Fine火\u0022", // \u22\u1F609\u033c\u2e\u69\u73\u2b\u46\u69\u6e\u65\u706b\u22

"source": "https://cloudevents.io/unicode-strings"
"type": "\"😉\u033c.is+Fine火\u0022",

// OPTIONAL field
"subject": "\"😉\u033c.is+Fine火\u0022",
evankanderson marked this conversation as resolved.
Show resolved Hide resolved
}
11 changes: 11 additions & 0 deletions testcases/unicode-strings.mqtt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
------------------ PUBLISH ---------------
Topic Name: mytopic
Content Type:
--------------- User Properties ----------
specversion: 0.3
id: "😉̼.is+Fine火"
source: https://cloudevents.io/unicode-strings
type: "😉̼.is+Fine火"
subject: "😉̼.is+Fine火"
------------------ payload ---------------
------------------------------------------
9 changes: 9 additions & 0 deletions testcases/urls.http
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
POST / HTTP/1.1
Host: handler.example.com
Content-Length: 0
CE-specversion: 0.3
CE-id: 123
CE-source:
CE-type: io.cloudevetns.url-test
CE-schemaurl: %2f%2f%anonymous@example.com/a&b;?x'=%2f//#//

16 changes: 16 additions & 0 deletions testcases/urls.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"specversion": "0.3",

"id": "123",

// Source is a URI-reference, https://tools.ietf.org/html/rfc3986#appendix-A
"source": "", // relative-ref -> relative-part -> path-empty
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If #478 goes thru then this will need to be changed

"type": "io.cloudevents.url-test",

// URI-reference: https://tools.ietf.org/html/rfc3986#appendix-A
// Verify that % decoding is done properly. This is a 'path-noscheme'
// that percent-decodes to a '"//" authority path-abempty'.
"schemaurl": "%2f%2f%anonymous@example.com/a&b;?x'=%2f//#//",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove trailing ,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to say something in our spec about doing a percent-decoding?
I want to say 'no', but some kind of "heads-up" might be good, not from a strcmp perspective, but rather to remind people that per RFC3986 someone might URL-encoding them even if they're not meant to be deference-able. Perhaps something in the primer?

what do people think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tricky case, because it is a valid URL, but if you percent-decode it, it is not a valid URL, but instead a hier-part, which requires a scheme ":" prefix to be a valid URI production.

Clarified the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it looks like I didn't look at this quite closely enough (but may perhaps need to clarify the HTTP spec as well):

https://github.com/cloudevents/spec/blob/master/http-transport-binding.md#3132-http-header-values

Non-printable ASCII characters and non-ASCII characters MUST first be encoded according to UTF-8, and then each octet of the corresponding UTF-8 sequence MUST be percent-encoded to be represented as HTTP header characters, in compliance with RFC7230, sections 3, 3.2, 3.2.6. The rules for encoding of the percent character ('%') apply as defined in RFC 3986 Section 2.4.

However, the characters I encoded are not in the set (non-printable ASCII characters and non-ASCII characters), so it's not clear whether it is allowed for them to be percent-encoded or not.

Thinking about this from a software-implementer point of view, it would be a nightmare to scan through each percent-encoded string and determine whether or not the character was from the (non-printable ASCII or non-ASCII) character set before deciding whether or not to process the escape sequence.

I'll clarify the HTTP transport bindings to make it clear that all HTTP Header Values should be decoded through a single round of percent-decoding, and update this PR when that lands.


"data": "",
evankanderson marked this conversation as resolved.
Show resolved Hide resolved
}